\datethis
\def\adj{\mathrel{\!\mathrel-\mkern-8mu\mathrel-\mkern-8mu\mathrel-\!}}
\def\addj{\mathrel{\!\mathrel=\mkern-8mu\mathrel=\mkern-8mu\mathrel=\!}}
\def\adddj{\mathrel{\!\mathrel\equiv\mkern-8mu\mathrel\equiv\mkern-8mu\mathrel\equiv\!}}
\def\slug{\hbox{\kern1.5pt\vrule width2.5pt height6pt depth1.5pt\kern1.5pt}}
@*Intro. This program implements and explains the classical HK algorithm
for bipartite matching, due to John~E. Hopcroft and Richard~M. Karp
[``An $n^{5/2}$ algorithm for maximum matchings in bipartite graphs,''
{\sl SIAM Journal on Computing\/ \bf2} (1973), 225--231].
Input on |stdin| is an $m\times n$ matrix $(a_{ij})$ of $0$s and $1$s, one
row per line. In this exposition the rows correspond to ``boys'' and the
columns correspond to ``girls.'' (Of course I do this in order to take advantage
of expressive words in the English language, not to imply any distinction
between male and female.) Boy~$i$ can be matched to girl~$j$
only if $a_{ij}=1$. The object is to match as many pairs as possible.
We assume that $m\le n$, because that makes the algorithm slightly faster.
Given a partial matching (possibly empty), an {\it augmenting path\/}
is a simple path of the form
$$g_0\adj b_0\addj g_1\adj b_1\addj\ \cdots\ \adj b_{k-1}\addj g_k\adj b_k,$$
where $g_0$ is a free (unmatched) girl,
each $g_i$ is matchable to $b_i$, each $b_i$ for $i
#include
unsigned int vbose=show_rounds; /* nonzero bits trigger optional output */
@; /* the main data structures are introduced in context */
main() {
register int b,f,g,i,j,k,l,m,n,p,q,qq,r,t,tt,marks,final_level;
@;
for (r=1,f=n;f>n-m;r++) {
if (vbose&show_rounds) fprintf(stderr,"Beginning round %d...\n",
r);
@;
@;
@;
if (vbose&show_rounds)
fprintf(stderr," ... %d pairs now matched (rank %d).\n",
n-f,final_level);
if (vbose&show_partials) @;
}
fprintf(stderr,"Maximum matching of size %d found after %d round%s.\n",
n-f,f==n-m?r-1:r,(f==n-m?r-1:r)==1?"":"s");
if (vbose&show_answer) @;
}
@*The easy stuff (input).
This part of the program is trivial, but it usually takes longer than the
HK algorithm itself. (Because a $0$--$1$ matrix is not
an efficient representation of a sparse matrix.)
Malformed input is rudely rejected.
@=
@;
if (n==0) {
fprintf(stderr,"There must be at least one girl!\n");
exit(-1);
}
else if (n==maxn) {
fprintf(stderr,"Recompile me: I can't deal with %d or more girls!\n",
maxn);
exit(-2);
}
for (t=0,m=1;;m++) {
if (m>n) {
fprintf(stderr,"There are %d girls, so there can't be more than %d boys!\n",
n,n);
exit(-3);
}
@;
if (!fgets(buf,maxn+1,stdin)) break;
if (buf[n]!='\n') {
fprintf(stderr,"The input for boy %d doesn't specify %d girls!\n",
m+1,n);
exit(-4);
}
}
fprintf(stderr,
"OK, I've read the matrix for %d boys, %d girls, %d potential matchups.\n",
m,n,t);
@;
@ @=
n=0;
if (fgets(buf,maxn+1,stdin))
for (;buf[n]!='\n' && n=
for (j=n;j;j--) {
p=buf[j-1]-'0';
if (p<0 || p>1) {
fprintf(stderr,"Improper character `%c' in the input for boy %d!\n",
buf[j-1],m);
exit(-5);
}
if (p) {
if (++t>=maxt) {
fprintf(stderr,
"Recompile me: I can't handle %d or more potential matchups!\n",
maxt);
exit(-6);
}
tip[t]=m,next[t]=glink[j],glink[j]=t;
}
}
@ Later there will be a similar data structure for some of the boys' partners.
The |next| and |tip| arrays must be able to accommodate $2t+m$ entries:
$t$ for the original graph, $t$ for the edges at round~$0$,
and $m$ for the edges from $\top$.
@=
char buf[maxn+1]; /* buffer for input from |stdin| */
int blink[maxn],glink[maxn]; /* list heads for potential partners */
int next[maxt+maxt+maxn],tip[maxt+maxt+maxn]; /* links and suitable partners */
@*Even easier stuff (the output).
There's a |mate| table, showing the current partner (if any) of every boy.
Also an |imate| table, showing the current partner (if any) of every girl.
@=
int mate[maxn],imate[maxn];
@ To report the matches-so-far, we simply show every boy's mate.
@=
{
for (p=1;p<=m;p++) fprintf(stderr," %d",
mate[p]);
fprintf(stderr,"\n");
}
@*The cool stuff (a dag of SAPs).
With a breadth-first search, we can create a directed acyclic graph
in which the paths from a dummy node called $\top$ to a dummy node
called $\bot$ correspond one-to-one with
the augmenting paths of minimum length. Each of those paths
will contain |final_level| existing matches.
This dag has a representation something like our representation
of the girls' choices, but even sparser: The first arc from boy~$i$
to a suitable girl is in |blink[i]|, with |tip| and |next| as before.
Each girl, however, has exactly one outgoing arc in the dag, namely
her |imate|. An |imate| of~$0$ is a link to $\bot$.
The other dummy node, $\top$, has a list of free boys,
beginning at |dlink|.
An array called |mark| keeps track of the level (plus 1) at which
a boy has entered the dag. All marks must be zero when we begin.
@=
final_level=-1,tt=t;
for (marks=l=i=0,q=f;;l++) {
for (qq=q;i@;
else if (p<=l) continue;
if (vbose&show_dags) fprintf(stderr," %d->%d=>%d\n",
b,g,imate[g]);
/* arc from |b| to |g|, then to her mate */
tip[++tt]=g,next[tt]=blink[b],blink[b]=tt;
}
}
if (q==qq) break; /* nothing new on the queue for the next level */
}
@ @=
int queue[maxn]; /* girls seen during the breadth-first search */
int iqueue[maxn]; /* inverse permutation, for the first |f| entries */
int mark[maxn]; /* where boys appear in the dag */
int marked[maxn]; /* which boys have been marked */
int dlink; /* head of the list of free boys in the dag */
@ Once we know we've reached the final level, we don't allow
any more boys at that level unless they're free. We also
reset |q| to~|qq|, so that the dag will not reach a greater level.
@=
{
if (final_level>=0 && mate[b]) continue;
else if (final_level<0 && mate[b]==0) final_level=l,dlink=0,q=qq;
mark[b]=l+1,marked[marks++]=b,blink[b]=0;
if (mate[b]) queue[q++]=mate[b];
else {
if (vbose&show_dags) fprintf(stderr," top->%d\n",
b); /* arc from $\top$ to the free boy |b| */
tip[++tt]=b,next[tt]=dlink,dlink=tt;
}
}
@ We have no SAPs if and only no free boys were found.
@=
if (final_level<0) break;
@ @=
while (marks) mark[marked[--marks]]=0;
@*Makin' progress.
We've just built the dag of shortest augmenting paths, by starting
from dummy node~$\bot$ at the bottom and proceeding breadth-first
until discovering |final_level| and essentially reaching the dummy node~$\top$.
Now we more or less reverse the process: We start at $\top$ and proceed
{\it depth\/}-first, harvesting a maximal set of
vertex-disjoint augmenting paths as we go.
(Any maximal set will be fine; we needn't bother to look for an especially
large one.)
The dag is gradually dismantled as SAPs are removed, so that
their boys and girls won't be reused. A subtle point arises here
when we look at a girl~|g| who was part of a previous SAP: In that
case her mate will have been changed to a boy whose |mark| is negative.
This is true even if |l=0| and |g| was previously free.
@=
while (dlink) {
b=tip[dlink], dlink=next[dlink];
l=final_level;
enter_level: boy[l]=b;
advance:@+if (blink[b]) {
g=tip[blink[b]],blink[b]=next[blink[b]];
if (imate[g]==0) @;
if (mark[imate[g]]<0) goto advance;
b=imate[g],l--;
goto enter_level;
}
if (++l>final_level) continue;
b=boy[l];
goto advance;
}
@;
@ At this point $|g|=g_0$ and $|b|=|boy[0]|=b_0$ in an augmenting path.
The other boys are |boy[1]|, |boy[2]|, etc.
@=
{
if (l) fprintf(stderr,"I'm confused!\n");
/* a free girl should occur only at level 0 */
@;
while (1) {
if (vbose&show_updates) fprintf(stderr,"%s %d-%d",
l?",":" match",b,g);
mark[b]=-1;
j=mate[b],mate[b]=g,imate[g]=b;
if (j==0) break; /* |b| was free */
g=j,b=boy[++l];
}
if (vbose&show_updates) fprintf(stderr,"\n");
continue;
}
@ @=
int boy[maxn]; /* the boys being explored during the depth-first search */
@ When the matrix was input, we tacitly assumed that |glink[g]| was
initially~$0$ for all~$g$. When each dag was built, we explicitly initialized
each |blink[b]| to~$0$. The |queue| and |iqueue| structures
need to be initialized to nonzero values, so we do that here.
@=
for (g=1;g<=n;g++) queue[g-1]=g,iqueue[g]=g-1;
@ @=
f--; /* |f| is the number of free girls */
j=iqueue[g]; /* where is |g| in |queue|? */
i=queue[f], queue[j]=i, iqueue[i]=j; /* OK to clobber |queue[f]| */
@*Why it works.
The HK algorithm relies on three simple
lemmas about matchings in graphs. Recall that a {\it matching\/}
in~$G$ is a subgraph of~$G$ in which every vertex has degree $\le1$.
Suppose $A$ and $B$ are matchings in the same graph~$G$, and
let $A\oplus B$ be the set of edges that are in one but not both.
This is a subgraph in which every vertex has degree~2 or less.
Consequently every connected component of $A\oplus B$
is either a cycle or a path.
A cycle in $A\oplus B$ must have even length, because its $A$ edges
alternate with its $B$ edges. For example, the shortest possible cycle
is a 4-cycle such as
$$v_0\adj v_1\addj v_2\adj v_3\addj v_4,$$
where `$\,\adj\,$' denotes an edge of~$A$ and
`$\,\addj\,$' denotes an edge of~$B$.
A path in $A\oplus B$ can have either odd or even length. If it involves
$2k+1$ vertices, so that its length is $2k$, it must be ``balanced,''
with $k$ edges from~$A$ and $k$ edges from~$B$. But if it involves
$2k$ vertices, and has length $2k-1$, it must be either an
$A$-path (with $k$ edges from~$A$ and $k-1$ from~$B$) or a
$B$-path (with $k$ edges from~$B$ and $k-1$ from~$A$).
A path in $A\oplus B$ of length $2k-1$ is said to have {\it rank\/}~$k$.
Therefore the $A$-paths of ranks 1, 2, 3, \dots, look like this:
$$v_0\adj v_1,
\qquad
v_0\adj v_1\addj v_2\adj v_3,
\qquad
v_0\adj v_1\addj v_2\adj v_3\addj v_4\adj v_5,
\qquad\ldots\,;$$
and the $B$-paths are similar but with `$\,\adj\,$' and `$\,\addj\,$' reversed.
Suppose $c_A$ components are $A$-paths, $c_B$ components are $B$-paths,
and $c_0$ components are either cycles or balanced paths. Then
$c_A-c_B=\vert A\vert-\vert B\vert$, where $\vert A\vert$ stands for
the number of edges in~$A$. For example, if $\vert A\vert=15$ and
$\vert B\vert=12$, we might have five $A$-paths and two $B$-paths in
$A\oplus B$; or we might have four and one, etc. But in any case
$A\oplus B$ must contain at least three $A$-paths.
Notice that every $A$-path is an augmenting path for~$B$, and every
$B$-path is an augmenting path. Therefore we have
\smallskip\noindent
{\bf Lemma 1.}\enspace
{\sl If $A$ and $B$ are matchings in~$G$ with $\vert A\vert>\vert B\vert$,
then $B$ has at least $\vert A\vert-\vert B\vert$ augmenting paths
that are vertex-disjoint.}
\smallskip\noindent
{\it Proof.}\enspace
The $A$-paths of $A\oplus B$ are vertex-disjoint.\quad\slug
\smallskip
Thus, if $s$ is the size of a maximum matching, and if $B$ is any
matching with $\vert B\vert=r$, there must be $s-r$ disjoint
augmenting paths. We don't know what they are; but we do know
that they exist. That's why the HK algorithm knows that it has
found a maximum matching when it is discovered that there are
no augmenting paths.
@ Now let's consider the effect of an augmentation. Suppose a new
matching~$C$ is created by changing, say,
$$v_0\adj v_1\addj v_2\adj v_3\addj v_4\adj v_5
\qquad\hbox{to}\qquad
v_0\adddj v_1\addj v_2\adddj v_3\addj v_4\adddj v_5$$
and retaining the other edges of~$B$. What are the components
of~$A\oplus C$?
Answer: Every vertex of $\{v_0,v_1,\ldots,v_5\}$ is isolated (has degree~0)
in $A\oplus C$; so one former $A$-component of $A\oplus B$ has essentially
vanished. But every {\it other\/} component of $A\oplus B$ remains
unchanged in $A\oplus C$, except that each `$\,\addj\,$' becomes `$\,\adddj\,$',
because it involves none of $\{v_0,v_1,\ldots,v_5\}$.
(In particular, balanced components remain balanced, $A$-paths remain
$A$-paths, and $B$-paths become $C$-paths.)
Algorithm HK actually works only with
the {\it shortest\/} augmenting paths for~$B$,
namely the $A$-paths of minimum rank~$k$. It finds
$q$ such paths, all disjoint, and uses them to find a larger
matching~$C$ with $\vert C\vert=q+r$. Furthermore, the algorithm
knows that every {\it other\/} augmenting path of rank~$k$
has at least one vertex~$v$ whose mate in~$C$ is different from its
mate (if any) in~$B$. (Those other SAPs appeared in the dag, when the
dag was created, but they were eventually removed.)
\smallskip\noindent
{\bf Lemma 2.}\enspace
{\sl In that scenario, every augmenting path of the matching $C$ has
rank greater than~$k$.}
\smallskip\noindent
{\it Proof.}\enspace
We know that an augmentation obliterates components but makes no
shorter ones. The only question is whether any $A$-paths of rank~$k$
still exist with respect to~$C$. But every such path was wiped out
when one of its vertices was first assigned a new mate.\quad\slug
@ An example will be helpful at this point. Consider the bipartite
graph consisting of $n$ girls $\{x_1,\ldots,x_n\}$ and $n$ boys
$\{y_1,\ldots,y_n\}$, with $x_i$ matchable to $y_j$ if and only if
$i\le j$. This graph has a unique matching, because $x_n$ can
only be matched to $y_n$, and then $x_{n-1}$ can only be matched
to $y_{n-1}$, etc.
If $A$ is that perfect matching and $B$ is the empty matching,
every edge $x_i\adj y_j$ with $i\le j$ is an $A$-path of rank~$0$.
One of the sets of {\it disjoint\/} $A$-paths is
$$x_1\adj y_2,\qquad
x_2\adj y_3,\qquad
\cdots{},\qquad
x_{n-1}\adj y_n;$$
and this set is {\it maximal}, because it leaves out only
$x_n$ and $y_1$ (an unmatchable pair).
If we augment $B$ with all those $A$-paths, we get the matching $C$
whose $n-1$ edges are
$$x_1\adddj y_2,\qquad
x_2\adddj y_3,\qquad
\cdots{},\qquad
x_{n-1}\adddj y_n.$$
And Lemma 2 proves that $C$ has no augmenting path of rank~$0$.
In fact, $C$ has only one augmenting path, namely
$$y_1\adj x_1\adddj y_2\adj
x_2\adddj y_3\adj x_3
\cdots{}\adj
x_{n-1}\adddj y_n\addj x_n;$$
and its rank is $n-1$.
(Suppose we permute the girls of this example in any of $n!$
ways, then independently permute the boys in any of $n!$ ways,
and feed the result to this program as a matrix of $0$s and $1$s.
Surprise: The dag that is constructed in
round~1 will always have its lists ordered in such a way that the
optimum maximum matching is found immediately, with no need for round~2!
Thus it's not as easy to find good test data as we might think,
if we want to exercise the more subtle parts of this algorithm.)
@ That was an extreme example. The SAPs found in round 1 always
have rank~$0$, and the SAPs found in round~2 usually have rank~$1$
(not $n-1$). And we do know, in general, that the only augmenting paths
that remain after round~$r$ will have rank $r$ or more.
Suppose a maximum matching $A$ has $s$ edges, and we've found a matching
$B$ of size~$q$ in round~$r$. If $q<{r\over r+1}s$, Lemmas 1~and~2
tell us that there exist $s-q$ disjoint augmenting paths,
each of rank~$r$ or more. So each of them contains at least $r+1$
different edges of~$A$, a total of at least $(r+1)(s-q)$.
But that's greater than $(r+1)(s-{r\over r+1}s)=s$; a contradiction!
We've proved
\smallskip\noindent
{\bf Lemma 3.}\enspace
{\sl The current matching after round~$r$ has at least ${r\over r+1}s$
as many edges as the final matching.}\quad\slug
\smallskip
Hopcroft and Karp therefore said, ``Aha!
Look at round $r=\lceil\sqrt s-1\rceil$.
It gives us a partial match of at least ${r\over r+1}s\ge s-\sqrt s$ edges.
Therefore we'll finish in at most $\sqrt s$ more rounds.''\quad\slug
@ By the way, our proofs of Lemmas 1 and 2 show that they are true
in {\it any\/} graph, bipartite or not. We used bipartiteness only
because it gave us a way to construct a dag of all SAPs.
Hopcroft and Karp said they were hoping to extend their method
from the bipartite case to the general case, presumably by taking advantage
of the structure found by Jack Edmonds in his pioneering work on blossoms
[``Paths, trees and flowers,'' {\sl Canadian Journal on Mathematics\/ \bf17}
(1965), 449--467].
That dream eventually came true when Harold N. Gabow and Robert E. Tarjan
published ``Faster scaling algorithms for general graph-matching
problems,'' [{\sl Journal of the ACM\/ \bf38} (1991), 815--853],
a paper that emphasized the solution to considerably more general problems.
See Harold N. Gabow, ``The weighted matching approach to maximum
cardinality matching,'' {\sl Fundamenta Informaticae\/ \bf154} (2017),
109--130, for a simplified exposition.
@ An idea for further investigation:
We've seen that at least half of the matches are initiated
already in the first round.
The first round is a lot simpler than subsequent rounds, because it
begins with everybody free.
Therefore it might be a good idea to do that round faster, with a
custom-tuned implementation just to get off to a quicker start.
@*Index.