1. NP-Complete Problems#
1.1. Problems and Intractability#
Not all computing problems are born equal. Some problems are more difficult than others. The difficulty of a problem is part of its description. Following, Garey & Johnson’s Computers and Intractability to describe properly a problem we need:
A general description of all its parameters, and
A statement of what properties the answer, or solution, is required to satisfy.
An instance of a problem is obtained by specifying particular values for all the problem parameters.
Graphs. Since almost all combinatorial problems can be mapped to a graph \(G=(V,E)\), the parameters of the problem are typically the number of nodes \(n=|V|\) and the statement is the desired property satisfied by the nodes, paths, cycles and subgraphs of \(G\) in the solution. It is the fulfillment of this property what makes them tractable or intractable!
1.2. The SAT problem#
Satisfiability. Consider the following problem that is not (apparently) is not related to graphs: Given well-formed formula (wff) in first-order logic, determine whether it is satisfiable.
We have a set of \(n\) Boolean variables \(X=\{x_1,x_2,\ldots,x_n\}\), where \(x_i\in\{0,1\}\;\forall i\).
In the wff, we may have variables \(x_i\) and their negations \(\neg x_i\).
The wff is in Conjunctive Normal Form (CNF) i.e. a collection of disjunctions \(\lor\) joined by \(\land\) connectors.
For instance, the wff
is in CNF: it has 3 clauses with three literals each. Then, determining if it is satisfiable means determining whether there is a Boolean assignment for the variables so that the wff is true (evaluated to \(1\)).
Since the wff is in CNF, all the clauses must be evaluated to \(1\).
Inside each clause, at least one literal must be evaluated to \(1\).
A brute force algorithm should evaluate all the \(2^n\) possible Boolean assignments \(\{0,1\}\) for the \(n\) variables. Is there any way of solving the so called SAT problem in polynomial time (not exponential)? Unfortunately, only if each clause has less than \(3\) literals!
SAT2. In the SAT2 problem, all clauses have 2 literals. Consider, for instance, the wff
where all the clauses have two literals and \(x_4\equiv(x_4\lor x_4)\) so that clauses with one literal end up having two of them. Then,
We map the logic variables in \(X\) to \(2n\) nodes in \(V\): one node for the non-negated variable \(x_i\) and another one for \(\neg x_i\).
Exploiting here the equivalences \((\neg a\lor b)\equiv (a\Rightarrow b)\equiv (\neg b\Rightarrow \neg a)\), we have two edges per clause, except for the ones with \((x_i\lor x_i)\) where we have only one edge.
This result in the digraph (directed graph) \(G=(V,E)\) as the one in Fig. 1.1:
Fig. 1.1 An instance of SAT2 for \(n=5\) variables.#
Then, the SAT2 problem is reformulated as follows (Aspvall, Plass and Tarjan, 1979):
Given a CNF-digraph \(G=(V,E)\), the CNF encoded by it is satisfiable if and only if there is no strongly-connected component \(C_k\) where \(x_i\in C_k\) and \(\neg x_i\in C_k\).
Firstly, two nodes \(i,j\in V\) are strongly connected if there exists a directed path \(\Gamma_{ij}\) connecting them. For instance, in Fig. 1.1, \(i=x_2\) and \(j=x_1\) are strongly connected since we have \(\Gamma_{ij}=\{i=x_2\rightarrow x_3\rightarrow \neg x_6\rightarrow x_1=j\} \). Actually all the nodes in dark orange belong to the same strongly-connected component \(C_8=\{x_1,x_2,x_3, x_4,\neg x_6\}\). In short, we can visit all the nodes in \(C_8\) starting from any of them.
In addition, the connected component \(C_8\) does not include simultaneously an \(x_i\) and its negation \(\neg x_i\). If this also happens in the remaining strongly-connected components, then the wff is satifiable. Actually this is the case since the remaining components of the above graph are
which all are atomic and there is no possibility of containing both a variable and its negation. Therefore, the above wff is satisfiable. Why? Basically because starting at any node is it impossible to reach a contradiction, i.e. to have \(x_i\) and \(\neg x_i\) in the same component. This is so simple that we do not need to make the two-directional proof (necessary and sufficient condition).
Is SAT2 polynomial? Well, the question reduces now to determine whether the strongly connected components of a digraph can be found in polynomial time. Actually, each component can be found by launching paths and annotating they return, if it happens, or declaring an atomic component otherwise. This is the strategy of the Tarjan’s Algorithm which takes \(O(|V| + |E|)\). Herein, we use the Big-O notation
Therefore, SAT2 is polynomial and therefore it is not intractable. We express that, saying \(\text{SAT2}\not\in\text{NP}\).
1.3. The Clique problem#
Although we could reduce SAT2 to a polynomial problem (find strongly-connected components), this is not the case for the general Satisfiability problem where clauses have any \(n>2\) number of literals . Cook and Levin proved that in a theorem described in the second chapter of Garey & Johnson. The theorem is beautiful and it relies on Turing Machines to show that the CNF solving the SAT problem may have an exponential number of terms! However, this proof is out of the scope of this subject.
Herein, however, we are interested in showing why SAT is intractable (we say \(\text{SAT}\in\text{NP}\)) by reducing other problems to it. The first of this prolems is the Clique problem.
1.3.1. From SAT3 to Clique#
The link between SAT and Clique is given by first transforming a SAT problem to a SAT3 problem where all clauses have \(3\) literals. This is done as follows:
Given a clause \((x_1\lor x_2\lor\ldots\lor x_m)\), we transform it into \(m-2\) clauses with \(3\) literals:
where \(\alpha_2,\ldots,\alpha_{n-2}\) are fresh variables which do not modify the satisfiability of the original clause. Why? Note that if \(\alpha_i\) appears in one sub-clause \(i-1\), its negation \(\neg\alpha_i\) appears in the sub-clause \(i\). As a result this variable does not interfere in the satisfiability of the clause.
Now, if we have originally \(c\) clauses, each one with \(n\ge m>3\) literals, then we have \(O(m\cdot c)\) clauses which is linear in the number of terms.
Maximal cliques. The purpose of the above transformation is to build a graph \(G=(V,E)\) where we have \(3\cdot c\) nodes (\(3\) nodes per clause, because each clause has \(3\) literals). The graph \(G\) is non-directed and there is an edge \((i,j)\in E\) if:
\(i\) and \(j\) belong to different clauses: \(c(i)\neq c(j)\).
\(l_i\) and \(l_j\), the respective literals, do not induce a contradiction: \(l_i\neq \neg l_j\).
For example, in Fig. 1.2 we show the nodes for the wff
which does not need any transformations since all the clauses have \(3\) literals yet. The nodes corresponding to the same clause are have the same color. Note that the literal \(x_3\) leads to \(2\) diferent nodes (orange and yellow clauses) and it appears negated in the red clause. Overall, the wff is satisfiable if and only if there is a group of \(c=3\) nodes mutually inter-connected, such as \(\neg x_3\), \(x_4\) and \(x_2\). Such a group of nodes forms a complete subgraph \(K_c\) or clique whose order is \(c\).
Actually, all the cliques of order \(c\) lead to a valid interpretation of the wff: \(\neg x_3=1\), \(x_4=1\) and \(x_2=1\), i.e. \(x_3=0\), \(x_4=1\) and \(x_2=1\), makes the above wff True. Can you find other cliques (groups of \(c\) nodes completely inter-connected) in the graph? Of course they are. For instance: \(\neg x_3\), \(\neg x_1\) and \(\neg x_4\) lead to \(x_3=0\), \(x_1=0\) and \(x_4=0\) which makes the wff True as well.
Fig. 1.2 Finding maximal cliques in SAT3 for \(c=3\) clauses.#
Not-maximal cliques. Given the above graph \(G=(V,E)\), its construction ensures that the maximum size any of its cliques is \(c\) and this occurs, at least for one clique, only when the wff encoded by the graph is satisfiable. For instance, it is well known that a wff is not satisfiable if and only if (iff) the wff is a contradiction.
The most basic contradiction in CNF is
However, this is not in SAT3 form. To do so we proceed as follows:
and
i.e. each atomic clause is replaced by the conjunction of \(4\) clauses and this conjunction is satisfiable iff the atomic clause is satisfiable.
Well, all we have to do is to build now the graph \(G=(V,E)\) with \(c=8\) clauses with \(3\) nodes each (\(24\) nodes overall). The new formula equivalent to \(x\land\neg x\) will be satisfiable if we can find any clique of size \(8\) in this graph. Do we really need to build the graph to verify this point? In other words, can we have a complete subgraph \(K_8\) inside the graph of \(24\) nodes if:
The variables in the same clause cannot be connected.
No variable can be connected with its negation?
In order to build a clique of size \(c=8\) we need \(8\) literals where we cannot include both a literal and its negation. One option is to get the set \(C=\{x^1,x^2,x^3,x^4,\alpha^1_1,\alpha^3_2,\beta^1_1,\beta^3_2\}\) where the super-indexes refer to the clause’s number.
It is clear that we can form a \(4-\)clique with the \(x\)s.
No problem to extend it to a \(6\)-clique by including the \(\beta\)s.
We can go further and form a \(7\)-clique by including \(\alpha^3_2\).
However, to extend the \(7\)-clique we need to include \(\alpha^1_1\) (there is no other possibility). But this cannot be done since \(\alpha^1_1\) and \(x^1\) (yet included) belong to the same clause and there is no edge between them! As a result, the wff is not satisfiable.
An important lesson of this proof is that: cliques are full-connected subsets. If you can create them it means that their members form an equivalence class.
In addition, adding clauses only leads to make things more complicated since the order needed for satisfiability gets larger.
Exercise. Consider now the following clause: \(x\lor \neg x\)
which is a tautology, i.e. it is fully satisfiable (all assignments lead to True).
In this case, the equivalent SAT3 problem has two clauses, i.e. \(c=2\) (see Garey & Johnson, page \(48\)):
\(
(x\lor \neg x)\equiv (x\lor \neg x\lor \alpha_1)\land (x\lor \neg x\lor \neg\alpha_1)
\)
Build the associated graph and determine the maximal clique.
Answer: \(G=(V,E)\) will have \(V=\{x^1,\neg x^1,\alpha_1, x^2,\neg x^2,\neg\alpha_1\}\), i.e. \(n=6\) nodes.
The (undirected) edges are
\(
\begin{align}
E &=\{(x^1,x^2),(x^1,\neg x^2),(x^1,\neg\alpha_1)\}\cup \{(\neg x^1,x^2),(\neg x^1,\neg x^2),(\neg x^1,\neg\alpha_1)\} \cup \{(\alpha_1,x^2),(\alpha_1,\neg x^2)\}\\
\end{align}
\)
All the possible maximal subsets of \(V\) have size \(3\): \(\{x^1,x^2,\alpha_1\}\), \((x^1,\neg x^2,\alpha_1)\) etc. However, it is impossible to form any clique of size \(3\). Why? Because in all of these subsets we must include either two literals of the same clause or two contradictory literals, which does not match the edge set \(E\). As a result, the largest cliques have size \(c=2\) (actually, each edge in an undirected graph is a \(2\)-clique) and the formula is satisfiable as expected (see Fig. 1.3).
Fig. 1.3 Finding trivial maximal cliques in SAT3 for \(c=2\) clauses.#
1.3.2. From Clique to SAT3#
So far, we know that SAT3 can be always reduced to an equivalent (maximal) Clique problem. As we know that \(\text{SAT3}\in\text{NP}\), then \(\text{Clique}\in\text{NP}\). Actually, this is all we have to do to prove whether a problem \(\Pi\in \text{NP}\):
Select a know problem \(\Pi'\in\text{NP}\).
Reduce \(\Pi\) to \(\Pi'\).
Test whether the reduction is polynomial.
In the previous section, we have worked on the if part of the reduction, i.e. a SAT3 can be formulated as a clique problem. Now, we are working in the iff part: any clique problem can be mapped to an equivalent SAT3 one. Let’s go!
In Garey & Johnson, the Clique problem is formulated as follows (page \(47\)):
Given a (non-directed) graph \(G=(V,E)\) and a positive integer \(J\le|V|\), does the graph contains a clique of size of size \(J\) or more, that is a subset \(V'\subseteq V\) such that \(|V'|\ge J\) and every two nodes in \(V'\) are joined by an edge in \(E\)?
Obviuously, the concept of a clique only makes sense in undirected graphs. Moving on, in Garey & Johnson the formulation of this problem results from reducing SAT3 to Vertex Cover (VC), another NP problem, and then this to clique. Herein, we go straight from SAT3 to clique as follows:
Given an integer constant \(c>0\), suppose that \(G=(V,E)\) has \(|V|=3c\) and it has a clique of size \(J\ge c\). Since we have \(3c\) nodes, we denote each group \(C_r\) of \(3\) nodes as \(C_r=\{l_1^r,l_2^r, l^3_r\}\). Without loss of generality (wlog) we may assume that \((l^{r}_i,l^{r}_j)\not\in E\;\forall i,j\).
Then, if a \(J\)-clique \(V'\) exists only a node per group \(C_r\) with \(r=1\ldots c\) can be in it since we need full-connection of range \(J\) and the nodes inside a group are not connected. If we denote this node as \(v_i^r\in V'\), we set \(l_i^r=1\) indicating that the literal associated with this node is True.
Actually \(c\) denotes the number of clauses. But remember that \(J\le c\).
If \(J=c\) then as there is one True literal per clause, then the wff associated with the and of all clauses is satisfiable.
If \(J<c\) then, having a \(J\)-clique means that only \(J\) clauses can be simultaneously true. This it all we can guarantee, so the wff associated to the graph, whatever it is, it not satisfiable!
Note that the size of the size of the maximal clique is critical to the satisfiability. The more clauses we have the larger must be the clique.
In addition, the \(J-\)clique may not be unique it it exist. All the \(J\)-cliques you find correspond to a form of satisfying the wff (what it is called in Logic as an interpretation).
1.5. Hamiltonian cycle#
The following problem is a classic NP problem in AI:
Given a graph \(G=(V,E)\), does \(G\) a Hamiltonian cycle, that is an ordering \(<v_1,v_2,\ldots,v_n>\) of the vertices of \(G\) where \(n=|V|\) such that \((v_n,v_1)\in E\) and \((v_{i},v_{i+1})\in E\), \(\forall i,1\le i<n\)?
This is the well-known Hamiltonian-cycle (HC) problem: we must find a cycle or tour that visits all nodes once. The problem is NP because instead of looking for subsets (subsetness flavor) we look for all the orderings of a set \(<v_1,v_2,\ldots,v_n>\). Given a permutation \(\pi:V\rightarrow V\), the solution has the form \(<v_{\pi(1)},v_{\pi(2)},\ldots,v_{\pi(n)}>\). Remember that there are \(n!\) possible orderings and looking if any of them is a Hamiltonian cycle is not polinomial at all.
However, proving that \(\text{HC}\in \text{NP}\) deserves a reduction to any of the NP problems. Both Independent Sets (IS) and Vertex Cover (VC) are NP, i.e. \(\text{IS}\in \text{NP}\) and \(\text{VC}\in\text{NP}\) because they can be reduced to the Clique problem which is NP.
However, the reduction of HC to the Clique problem is quite complex and difficult to understand. Even in the Garey & Johnson’s text, they reduce HC to the VC. Herein, we even use a simpler and yet more intuitive approach: we reduce HC to the SAT3 problem!
Remember that a reduction is an necessary and sufficient condition i.e. an iff. Let us design a construction where a solution to the HC is encoded in a graph.
We commence by mapping the Hamiltonian Path (HP), instead, to SAT3:
Given a graph \(G=(V,E)\), does \(G\) a Hamiltonian path, that is an ordering \(<v_1,v_2,\ldots,v_n>\) of the vertices of \(G\) where \(n=|V|\) such that \((v_{i},v_{i+1})\in E\), \(\forall i,1\le i<n\)?
Fig. 1.16 Digraph mapping HP to SAT3. Yellow nodes encode the variables (plus one extra variable).#
The core of the construction, shown in Fig. 1.16, is called a variable gadget where:
For each vertex \(v_i\in V\) of \(G\) we have a couple of variables: \(x_i\) and \(x_{i+1}\). Actually, we have \(n+1\) variables where \(n=|V|\). These variables are shown in yellow.
It is a digraph (directed graph), starting from the top vertex \(x_1\) and going down to the bottom vertex \(x_{n+1}\). Then, in the figure we \(n=3\) variables in the ‘yet to map’ SAT3 and \(x_4\) is an ‘extra’ variable.
Note that, descending from \(x_1\) to \(x_4\) we may follow the left path (in blue) and the right path (in red). At each \(x_i\) we are able to change from color (blue-to-red or read-to-blue). As a result, we may have \(2^n\) different paths from \(x_1\) to \(x_4\).
Below each \(x_i\), \(i=1,2,\ldots,n\) we have a bidirectional path graph with nodes \(x_i\)_\(p_1\) to \(x_i\)_\(p_{3\cdot c}\) where where \(c\) is the number of clauses: \(c=3\) in this case.
These path graphs enable to keep going left-to-right, right-to-left or change of sense to reach the following level \(x_{i+1}\). However, if we want to visit all nodes \(x_1\) to \(x_4\) passing through the path graphs once we must choose one of the senses in each path graph.
We associate a truth value to each path. In the figure, ‘True’ paths have all their edges ‘blue’ and ‘False’ paths have all their edges ‘red’. Black edges are interpreted as ‘blue’ (True) if they go left-to-right and ‘red’ (False) otherwise. This construction trick is key to force a Hamiltonian path from \(x_1\) to \(x_4\) visiting all the nodes in the constructed graph.
However, we still must match a ‘True’ hamiltonian path with a SAT solution, if there is one. Otherwise, the Hamiltonian path will be false.
In orther to do that we use \(c\) additional modes \(c_i\) (one per clause). How do we use them?
Well, in Fig. 1.16 we assume the following CNF:
We proceed as following:
Since \(x_1\) is ‘positive’ (True) in \(c_1\), we must make \(x_1\rightarrow x_1\_p_1\), then go to \(c_1\) using a ‘green’ edge clause and then come back to this path through \(c_1\rightarrow x_1\_p_2\). This way, we may proceed towards the right through \(x_1\_p_3\).
Next, we are \(x_1\_p_4\) (the first of a group of \(3\) nodes coding the role of variable \(x_1\) in clause \(c_2\)). Note that, as \(x_1\) is also ‘positive’ in \(c_2\), we go to \(c_2\) and back as before and then return to \(x_1\_p_5\). Next, we repeat the same coding for the role of \(x_1\) in \(c_3\).
At this point, we have yet visited \(c_1\), \(c_2\) and \(c_3\) because \(x_1\) is positive in all of them. We yet know that the CNF is safistifable, but we have still to progress to visiting all the nodes of the digraph once (Hamiltonian path). Can we do it?
Of course. Since \(x_1\) is ‘True’ in all the clauses, we only need to go down \(x_2\) and proceed from left to right again from \(x_2\_p_1,\ldots,x_2\_p_9\) without visiting any clause node. Then go down \(x_3\) and do the same until we end-up in \(x_4\) where we have completed the following Hamiltonian path:
Since, \(x_2\) is ‘True’ in all clauses, we may proceed similarly, thus leading to the following path
However, \(x_3\) is ‘negative’ both in \(c_1\) and \(c_3\), and ‘positive’ in \(c_2\). This fact implies that \(x_3\) is contributing to make the CNF unsatisfiable. In other words, waiting to visit the clause nodes until level \(3\) does not lead to a Hamiltonian path since we have to go both right-to-left and left-to-right there. The clause nodes should be visited before level \(3\).
Therefore, the above construction shows that HP is reducible to SAT3, i.e.: there is a HP in the above graph iff the CNF is satisfiable.
Extending this result to the Hamiltonian Cycle (HC) is straightforward: just create a back link between \(x_4\) and \(x_1\).
Exercise. Given the SAT3 \((x_1\lor x_2\lor \neg x_3 )\land (\neg x_1\lor\neg x_2\lor x_3)\) which is clearly unsatisfiable (a variable and its negation is present in all clauses). We ask the following: a) Build the HC-construction and b) Show that it has no Hamiltonian cycles.
Answer. We show thre HC-construction in Fig. 1.17. Note that we have \(3\cdot 2 = 6\) nodes per path graph, since there are \(c=2\) clauses.
Fig. 1.17 Digraph mapping HP to SAT3. Yellow nodes encode the variables (plus one extra variable).#
Note that the two first path graphs go first to \(c_1\) in left-to-right order, stating that respectively \(x_1\) and \(x_2\) must be ‘True’, but access in the reverse order to \(c_2\), where they have to be ‘False’. With the third path graph, we have the opposite since \(x_3\) must be false in \(c_1\) and true in \(c_2\). As a result, it is impossible to draw a Hamiltonian cycle in the graph of Fig. 1.17.











