Derivations and Further Details¶
This chapter provides background information and more details on the techniques presented in previous chapters for the purpose of helping users unfamiliar with those to assess whether one or more might pertain to a given problem. Consult with the literature and with DWave for additional information when implementing.
Problem Reformulation¶
SAT and Weighted MAX2SAT¶
[Coo1971] and [Lev1973] identified satisfiability as the first NPcomplete problem. The conjunctive normal form of SAT is a decision problem that asks for an assignment of Boolean variables which satisfies a collection of constraints called clauses. A clause is a disjunction of literals, and is satisfied if any literal in the clause is true. A literal is a variable or a negation of a variable. For example, the clause[1] \(x_1 \vee \overline{x}_2\) is satisfied if either \(x_1 = {true} = 1\) or \(x_2 = {false} =0\).
[1] 

In a 2SAT problem all clauses contain at most 2 literals. Finding a satisfiable assignment or proving that there is no satisfying assignment for 2SAT can be done in polynomial time. However, 3SAT which allows clauses of up to 3 literals is NP hard and there are problem instances which cannot be solved in polynomial time.
A variant of SAT called MAXSAT (MAXimize the number of satisfied clauses) relaxes the decision problem—is there a satisfying assignment or not?—to an optimization problem and seeks the assignment which minimizes the number of unsatisfiable clauses.
Weighted MAXSAT generalizes the problem further: each clause is assigned a positive weight, and the violation of a clause \(c\) having weight \(w_c\) incurs a cost \(w_c\). We seek to maximize the weight of satisfied clauses, or equivalently minimize the weight of unsatisfied clauses. Typically, weights are integral so that weighted MAXSAT is really nothing other than MAXSAT with redundant clauses. (If the weight of a clause is \(w\) then \(w\) replicated clauses are introduced into the MAXSAT problem). A weighted MAXSAT problem is represented as a list of weighted clauses, e.g. \(\{(x_1\vee \overline{x}_2; 3), (x_3;1), (\overline{x}_3 \vee x_2;4)\}\) represents a problem of three variables consisting of 3 clauses having weights 3, 1, and 4 respectively.
If all clauses of a weighted MAXSAT contain as most 2 literals then we have a weighted MAX2SAT problem. Weighted MAX2SAT is equivalent to the QUBO and Ising problems.
How might we represent clauses as terms in a QUBO? A weighted clause \((x_1\vee \overline{x}_2; 3)\) is unsatisfied if \(\overline{x}_1 \wedge x_2\) is true, and this conjunction is equivalent to the product of Boolean variables \(\overline{x}_1 x_2\). If the clause is unsatisfied then a cost of 3 is incurred, thus the penalty representing the clause is \(3 \overline{x}_1 x_2\). The weighted MAX2SAT problem \(\{(x_1\vee \overline{x}_2; 3), (x_3;1), (\overline{x}_3 \vee x_2;4)\}\) is then easily represented as \(3\overline{x}_1 x_2 + \overline{x}_3 + 4 x_3 \overline{x}_2\). This representation is called a posiform. A posiform is a summation of terms where each term is a product of literals multiplied by a positive (usually integral) weight.
The general weighted MAX2SAT problem can be written as the posiform:
where \(\ell_{c,1}\) and \(\ell_{c,2}\) are the two literals of clause \(c\), and \(w_c\) is its weight.
Posiforms and HigherOrder Polynomials¶
Posiforms are a useful representation for pseudoBoolean functions[2] (functions which map bitstrings to real values). General posiforms may have terms consisting of products of more than 2 literals. For a problem of \(N\) terms the required number of ancillary variables in polynomial is \(N\). As an example of a posiform model (in this case having only 2 literals per term), the minimum vertex cover of a graph \(G=(V,E)\) is determined by:
which determines the smallest set of vertices covering all edges in \(G\). The quadratic penalty of weight \(M\) term ensures that for all edges \((v,v')\) at least of one of \(x_v\) or \(x_{v'}\) is zero. Thus, the minimizer determines the smallest set of vertices which cover all edges in \(G\).
[2]  See [Bor2002]. 
A related problem is maximum independent set which seeks the largest possible set of vertices no two of which are connected by an edge (a set of nonconnected vertices is called an independent set). Maximum independent set is given by the posiform objective:
The first term seeks the minimal number of terms not in the independent set, and the second term penalizes choices which correspond to nonindependent sets.
As an example of a problem having higherorder interactions consider an objective function which counts the number of cliques of size 3 in a graph of \(N\) vertices. The graph might be represented with a set of \(\binom{N}{2}\) Boolean variables \(x_{i,j}\) indicating the presence/absence of each edge \((i,j)\). Given this representation vertices \(\{1,2,3\}\) form a clique if \(x_{1,2} x_{1,3} x_{2,3}\). The total number of 3cliques is:
where \(\mathcal{S}\) is the set of all subsets of size 3 of \(\{1,2,\cdots, N\}\). This is an example of a posiform where each term has degree 3.
Another higherorder posiform arises in weighted MAX3SAT. If clauses consist of 3 literals then the analog of the general weighted MAX2SAT problem written as the posiform in SAT and Weighted MAX2SAT is:
NonQuadratic (HigherOrder) Polynomials¶
For many problems, the natural model includes interactions between more than pairs of variables. How can such problems be reduced to the pairwise QUBO representation?
One approach is to introduce and minimize over ancillary variables. The constraint \(z\Leftrightarrow x_1\wedge x_2\), which sets Boolean variable \(z\) equal to the logical AND of Boolean variables \(x_1\) and \(x_2\), can be represented through a quadratic penalty function[3]:
[3]  The same penalty function written in Ising terms of 1/1 variables is \(P(s_1,s_2;s_z) = 3+s_1 s_2  2(s_1+s_2)s_z  s_1s_2 + 2s_z\). 
This penalty function enforces the constraint because it attains the minimal value of 0 at the 4 \((x_1,x_2,z)\) combinations satisfying \(z\Leftrightarrow x_1\wedge x_2\), while all other combination of values have \(P>0\). We use this result to reduce a triplet interaction to pairwise interaction as follows[4]:
[4]  The penalty weight may be taken to be \(M=1\). 
where \(M>1\) is a penalty weight. The product \(x_1 x_2\) is captured by \(z\) which is multiplied by \(x_3\) giving the desired interaction. The penalty function \(P\) is added to ensure that \(z\) is set correctly given \(x_1\) and \(x_2\).
In the same manner an interaction involving 4 or more variables can be reduced to pairwise by sequentially introducing new variables and reducing the degree of the interaction by 1 at each step.
As extra ancillary variables limit the problem size addressable by the DWave system, it is desirable to minimize the number of introduced variables. The reductiontopairwise algorithm of [Bor2002] is easily modified to minimize the number ancillary variables. Select the pair of variables that is most common amongst all terms with more than pairwise interactions, and replace it with ancillary variable \(z\) in all those terms. Repeat until no more terms of 3 or more variables remain. Note that subsequent iterations may involve products of ancillary variables if these ancillary pairs are the most common amongst the terms.
Constraints to Penalties¶
Most applications involve associated constraints on variables which restrict bitstrings to certain feasible configurations. The QPU cannot natively impose constraints; instead you introduce energy penalties that penalize infeasible configurations. Penalty functions are also a source of higher order interactions. This section considers approaches to solving optimization problems with constraints.
A finite domain CSP consists of a set of variables, a specification of the domain of each variable, and a specification of the constraints over combinations of the allowed values of the variables. A constraint \(C_\alpha(\vc{x}_\alpha)\) defined over a subset of variables \(\vc{x}_\alpha\) defines the set of feasible and infeasible combinations of \(\vc{x}_\alpha\). The constraint \(C_\alpha\) may be be viewed as a predicate which evaluates to true on feasible configurations and to false on infeasible configurations. For example, if the domains of variables \(X_1,X_2,X_3\) are all \(\{0,1,2\}\), and the constraint is \(X_1+X_2<X_3\) then the feasible set is \(\{(0,0,1),(0,0,2),(0,1,2),(1,0,2)\}\), and all remaining combinations are infeasible. The variables involved in a constraint define its scope. Typically, constraint satisfaction problems have many constraints of local scope (involving only a small subset of variables), but there may often be global constraints that couple variables. We indicate the set of constraints as \(\mathcal{C} = \{C_\alpha(\vc{x}_\alpha)\}_{\alpha=1}^{\mathcal{C}}\) and we seek a solution \(\vc{x}\) that satisfies all constraints, i.e. \(\bigwedge_{C_\alpha\in\mathcal{C}} C_\alpha(\vc{x})\) is true.
CSP Conversion by Penalty Functions¶
The CSP \(\rightarrow\) MIS \(\rightarrow\) QUBO conversion can be expensive since the resulting conflict graphs can have great many nodes and edges if each constraint has a large feasible set. Though, there are methods to compress the number of needed nodes, an alternative formulation of the CSP is often preferred.
This section follows the same approach as was introduced to reduce higherorder problems to quadratic in Posiforms and HigherOrder Polynomials. For each constraint \(C_\alpha(\vc{x}_\alpha)\) we introduce a quadratic penalty function \(P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha)\). The penalty function is defined so that:
Thus, the feasible configurations are encoded as global minima of \(P_\alpha\) so \(\argmin_{\vc{x}_\alpha,\vc{a}_\alpha} P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha)\) yields a \(\vc{x}\) for which \(C_\alpha(\vc{x}_\alpha)\) is true. The additional \(\vc{a}_\alpha\) variables are necessary to mimic interactions in \(C_\alpha\) that are beyond pairwise. If we have a penalty function for each constraint then minimizing the objective:
gives a feasible solution \(\vc{x}^\star,\vc{a}^\star\) satisfying all constraints and having objective value \(P(\vc{x}^\star,\vc{a}^\star)=o\) if the original CSP has a feasible solution \(\vc{x}^\star\). If the original CSP is infeasible then \(P(\vc{x}^\star,\vc{a}^\star)\) will be at least \(o+1\).
The utility of this approach depends upon being able to conveniently represent common constraints \(C_\alpha\) as QUBO penalties. There are techniques to automatically construct QUBO penalty function from specifications of \(C_\alpha\).
Nonlinear Constraints¶
For penalties whose scope includes a small number of variables (e.g. less than 15) we may construct penalty functions from a specification of the feasible and infeasible configurations. Let \(F\) represent a set of feasible configurations, and \(\overline{F}\) represent a set of infeasible configurations. We require \(\min_{\vc{a}} P(\vc{x},\vc{a}) = o\) for \(\mathbf{x}\in F\) and \(\min_{\vc{a}} P(\mathbf{x},\vc{a}) \ge o+1\) for \(\mathbf{x}\in \overline{F}\) for some constant \(o\). Since \(P\) must be quadratic, we write the penalty as:
with
We seek the parameters \(\vc{w} \equiv \{\vc{w}^{x,x},\vc{w}^{x,a},\vc{w}^{a,a}\}\) given the matrices \(\vc{M}\equiv \{\vc{M}^{x,x},\vc{M}^{x,a},\vc{M}^{a,a}\}\). We generalize penalties in this way in order to limit required precision and/or to accommodate connectivity constraints as described by the set of \(\vc{M}\). Thus, we’d like to solve the optimization problem:
where \(k\) ranges over the allowed configurations \(\vc{a}_k\) of \(\vc{a}\), and the \(\vc{c}\) vectors have components indexed by \(j\) given by:
We have also introduced an unknown \(o\) representing the objective value of feasible configurations. This unknown value need not be equal to 0. The notation \(j\in F\) and \(\overline{j}\in \overline{F}\) indicates feasible and infeasible configurations respectively. Minimizing the \(L_1\) norm forces penalty terms to be close to 0, and aides in limiting the required precision. The second constraint can be expressed more simply to give:
The min in the first constraint makes this a difficult optimization problem. One approach to its solution is to express it as a mixed integer program and use a mixed integer solver.
We introduce Boolean indicator variables \(\{\alpha_k(j)\}\) for each \(1 \le j \le F\) and for each allowed configuration \(\vc{a}_k\). Define the vector \(\vc{\alpha}(j)\) whose components are the different \(k\). The indicator variable \(\alpha_k(j)\) picks out the single \(k\) at which \(\min_k\) in constraint 1 is attained (if there are multiple \(k\) attaining the minimal value we pick 1 of these). Thus we have:
Unfortunately, this constraint couples \(\alpha_k(j)\) to \(\vc{w}^{x,a}\) and \(\vc{w}^{a,a}\).[5] To break this coupling and obtain a linear problem we introduce \(\vc{v}_k^{x,a}(j) = \alpha_k(j) \vc{w}^{x,a}\) and \(\vc{v}_k^{a,a}(j) = \alpha_k(j) \vc{w}^{a,a}\). This requirement is enforced with:
The first two of the constraints in each column requires that \(\vc{v}_k^{x,a}(j)=0\) and \(\vc{v}_k^{a,a}(j)=0\) if \(\alpha_k(j)=0\). The penalty weight \(M\) should be chosen to be larger than the largest absolute value of the optimal values of \(\vc{w}^{x,a}\) and \(\vc{w}^{a,a}\) so that we do not eliminate any solutions.
The resulting mixed integer program can then be solved with any MIP solver. If there is no feasible solution for a given number of ancillary variables or specified connectivity, then these requirements must be relaxed by introducing additional ancillary variables of additional connectivity.
Even if a penalty can be derived through conversion to a linear equality or inequality, it may still be useful to apply the above penaltyderiving machinery to construct a penalty with fewer ancillary variables. For example, the constraint \(x_1\vee x_2\vee x_3\) (which is useful if we’re solving 3SAT) can always be expressed as the inequality \(x_1+x_2+x_3\ge 1\), and converted to the penalty function \((x_1+x_2+x_3  1 a_12a_2)^2\). The two ancillary variables \(a_1\) and \(a_2\) are necessary to represent the possible slack values 0, 1, and 2. However, the following penalty function
represents the same constraint using a single ancillary variable[6].
[5]  CPLEX and other commercial grade optimization software can solve problems with quadratic constraints so it may be used to directly address this formulation. 
[6]  In fact, it is this representation which is commonly used to reduce 3SAT to MAX2SAT. 
Problem Decomposition¶
To solve a problem with more variables than the available number of qubits, break the problem into subproblems, solve the subproblems, and then reconstruct an answer to the original problem from the subproblem solutions. Divideandconquer and dynamic programming algorithms have a rich history in computer science for problems of this type and can be adapted to map problems onto the DWave QPU.
Conditioning¶
The approaches here rely on conditioning (or assigning) a subset of variables to particular values and solving the problem that remains. If \(A\) indicates a set of variables that are unconditioned, and \(\setminus A\) indicates the variables conditioned to certain values, then an Ising objective \(E(\vc{s})\) can be minimized in two stages:
where
Here, \(\vc{s}_A^\star(\vc{s}_{\setminus A})=\argmin_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A})\) is the optimal setting of \(\vc{s}_A\) for a given \(\vc{s}_{\setminus A}\). Conditioning makes the \(\min_{\vc{s}_A}\) problem smaller and simpler than the original problem. The graph of this conditioned problem over the vertices \(\vc{s}_A\) of \(E(\vc{s}_A,\vc {s}_{\setminus A})\) is formed from the graph of the original problem by removing all conditioned variables and all edges attached to a conditioned variable. Of course, you still must identify the lowest energy setting of the conditioned variables.
The outer loop optimization determining the conditioned variable settings is the \(\min_{\vc{s}_{\setminus A}} \mathcal{E}(\vc{s}_{\setminus A})\) problem. The graph for the \(\mathcal{E}(\vc{s}_{\setminus A})\) problem has connections between variables in \(\vc{s}_{\setminus A}\) which have been induced through the elimination (minimization) of the \(\vc{s}_A\) variables; that is, through the dependence on \(\vc{s}_A^\star(\vc{s}_{\setminus A})\). In summary, conditioning casts the \(\min_{\vc{s}} E(\vc{s})\) problem in terms of the smaller problem \(\min_{\vc{s}_{\setminus A}} \mathcal{E}(\vc{s}_{\setminus A})\) where \(\mathcal{E}(\vc{s}_{\setminus A})\) can be evaluated at any \(\vc{s}_{\setminus A}\) using the DWave system. The outer optimization over \(\vc{s}_{\setminus A}\) is carried out using conventional software algorithms.
In certain situations, techniques that infer values for the unconditioned variables based on side constraints or bounding information can further reduce the smaller \(\min_{\vc{s}_{A}}\) subproblem. Different choices for the conditioning sets \(A\) (which may vary during the course of the algorithm) and different inference techniques give different algorithms. Most stochastic localsearch algorithms rely entirely on conditioning and use no inference. In contrast, complete search methods like BranchandBound Algorithms rely heavily on inference to prune their search spaces.
Cutset Conditioning¶
Cutset conditioning is a simple decomposition method that chooses a subset of variables \(A\) (which remain fixed throughout the course of the algorithm) so that the resulting subproblem \(\min_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A})\) defined in Conditioning decomposes into a set of disjoint subproblems.[7] In this case, the graph of \(E(\vc{s}_A,\vc{s}_{\setminus A})\) breaks into separate components indexed by \(\alpha\) which can be solved independently. Formally:
[7]  While it is always possible to find a subset \(A\), the subset might be large for highly connected problems. 
where \(\vc{s}_A = \bigcup_\alpha \vc{s}_{A_\alpha}\) and \(\vc{s}_{A_\alpha} \cap \vc{s}_{A_{\alpha'}}=\emptyset\) for \(\alpha \not = \alpha'\). Here, \(\alpha\) labels the disconnected components of \(A\) once all variables have been fixed to \(\vc{s}_{\setminus A}\). With a proper choice of the conditioning variables, the graph is cut into disjoint partitions that are optimized with respect to \(\vc{s}_{A_\alpha}\) independently. The set of conditioned variables that cuts the original problem into pieces is called the cutset. Choose a cutset such that each of the remaining \(\min_{\vc{s}_{A_\alpha}}\) problems is small enough to be solved on the DWave system. A small example is illustrated in Figure 46.
The outer optimization problem that determines optimal values for the cutset variables \(\vc{s}_{\setminus A}\) is carried out with a classical heuristic, perhaps greedy local search, tabu search, or simulated annealing. Alternatively, a complete search method may be used to find the global minimum of \(\mathcal{E}(\vc{s}_{\setminus A})\). Note, however, that the inner optimizations over \(\vc{s}_A\) come with no proof of optimality.
To simplify outer optimization, make the number of conditioned variables as small as possible. Determining the smallest cutset is NPhard in general, so employ fast heuristics.
An alternative criterion to determine the set \(\mathbf{s}_{\setminus A}\) of conditioned variables is a tree or a collection of trees, because treelike graphs can be optimized in polynomial time. Also determine the set \(\mathbf{s}_{\setminus A}\) so that the resulting graph—after the conditioned variables and their edges are removed—is either a tree or collection of disconnected trees (see [Dec1987]). The cutsets needed to form trees are typically larger than those needed to form Chimerastructured subproblems, so there is a significant advantage in using this approach to solve subproblems on the DWave system. An effective way to find a small set of vertex separators is to apply a graph partition heuristic like METIS,[8] and then build a cutset of vertex separators by selecting a vertex cover from the edges that span partitions.
[8] 
producing fill reducing orderings for sparse matrices. 
BranchandBound Algorithms¶
Branchandbound algorithms progressively condition more and more variables to particular values. For simplicity, assume that spins \(s_i\) are conditioned in index order \(i=1,2,3 \ldots\). At some point in the search, variable \(s_i\) must be conditioned to either \(s_i=1\) or \(s_i=1\). This possibility defines a split at node \(s_i\). At this node, variables \(s=1...i1\) have been conditioned to values and the others remain free. Additional splits on free variables define a branching binary tree of possibilities. The leaves of this tree define the \(2^N\) configurations where all variables have been assigned values.
At each node in the traversal of this tree, a branchandbound algorithm decides whether to prune the tree at this node, effectively skipping exploration of the subtrees below it. A branch can be pruned if it can be shown that it is not possible for any leaf node below it to contain the global optimum. This pruning is accomplished by maintaining upper and lower bounds on the global minimum of \(E(\vc{s})\) as the tree traversal is carried out. At any point in the search, the upper bound is the smallest value \(E(\vc{s})\) that has been seen at the visited leaves. The lower bound at a node in the search is calculated as a lower bound on \(\min_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A})\) where \(\vc{s}_A\) is the set of asyetunconditioned variables. If the lower bound at a given node exceeds the current upper bound, then there is no need to search below this node, since no leaf of any of its subtrees can be optimal. The efficacy of this pruning is determined both by the tightness of the lower bound approximation and by the ordering in which variables are conditioned.
Branchandbound methods can benefit from the quantum annealing solver by terminating searches higher in the tree. After sufficient variables are conditioned so that the remaining unconditioned variables can be optimized by the DWave system, there is no need to explore deeper—simply call the DWave system to estimate the best completion from that node. As the upper bound is minimized through subsequent DWave system completions, this may in turn allow for future pruning. Again, since the DWave system does not come with a proof of optimality, this algorithm may not return a global minimum.
Another way to incorporate the DWave system is to use quantum annealing to provide tight lowerbound functions at any node in the search tree. Lagrangian Relaxation for Finding Lower bounds discussed below is one approach.
During the branchandbound process, you can also take advantage of cutset decompositions. As the search moves between nodes in the search tree, the graph for the subproblem that must be solved changes. In particular, the vertices in the graph that correspond to the fixed variables at the search tree node are removed along with all edges connected to these vertices. The graph may disconnect at some nodes in the search tree, in which case you solve multiple smaller subproblems. These subproblems and their minimizers may be cached so that, when these same subproblems are generated at other nodes, the results can be accessed. See [Mar2007], which considers other ways to explore the search tree, including dynamic variable orderings and bestfirst orderings. All of these improvements may also be applied in this setting. See also [Ros2016] on branchandbound heuristics in the context of the DWave Chimera architecture.
Lagrangian Relaxation for Finding Lower bounds¶
Lagrangian relaxation relies on a different simplification of the graph from that shown in Figure 46 and yields a lower bound on the global minimum objective value. In many cases, the bound can be made quite tight. Consider a node in the graph representing a variable \(s_i\). Divide the node in two and define \(s_i^{(1)}\) and \(s_i^{(2)}\) and add the constraint that \(s_i^{(1)} = s_i^{(2)}\). This leaves the problem unchanged. The original objective \(E(s_i,\vc{s}_{\setminus i})\) is modified to \(E'(s_i^{(1)}, s_i^{(2)},\vc{s}_{\setminus i})\). If you ignore the equality constraints among divided variables, then with sufficiently many divided variables, the modified objective \(E'\) decomposes into smaller independent problems. To ensure that these two objectives give the same value when the \(s_i^{(1)} = s_i^{(2)}\) constraint is satisfied, \(h_{i} = h_{i}^{(1)} + h_{i}^{(2)}\) is required. The advantage of dividing variable \(s_i\) comes when the equality constraint is softened and treated approximately.
Introducing a multiplier \(\lambda_i\) for the equality constraint, the Lagrangian for the constrained problem is
Minimizing this Lagrangian for any fixed value of \(\lambda_i\) provides a lower bound to \(\min_{\vc{s}} E(\vc{s})\). The dual function is defined as
and maximizing the dual with respect to \(\lambda_i\) provides the tightest possible lower bound.
As an example, consider again the small 8variable problem, shown on the left side in Figure 47. The Lagrangian relaxed version of the problem obtained by dividing variable \(s_1\) is shown on the right. The constraint \(s_1^{(1)}=s_1^{(2)}\) is treated softly giving two independent subproblems consisting of variable sets \({s_1^{(1)}, s_2, s_3, s_7, s_8}\) and \({s_1^{(2)}, s_4, s_5, s_6}\). If either subproblem is still too large, it can be decomposed further either through another variable division or through conditioning.
One promising approach is to introduce as many divided variables as needed to generated subproblems small enough to be solved with the current number of qubits. Each subproblem is solved to give a dual function, which is then optimized using a subgradient[9] method to provide the tightest possible lower bounds.
[9]  [Boy2007] provides a concise introduction to subgradient methods, and [Joh2007] an alternative using a smooth approximation to dual function. 
Because a particular anneal may not produce the global minimum, the lower bounds returned by Lagrangian relaxation may be too high. If so, branchandbound may mistakenly prune branches containing the global minimum. However, viewing this Lagrangianrelaxed branchandbound algorithm as a heuristic, expect good results if the quantum annealing process is effective at locating lowenergy states.
LargeNeighborhood Local Search Algorithms¶
Local search algorithms improve upon a candidate solution, \(\vc{s}^t\), available at iteration \(t\) by searching for better solutions within some local neighborhood of \(\vc{s}^t\). Most commonly, the neighborhood for bitstrings is the Hamming neighborhood consisting of all bitstrings that differ from \(\vc{s}^t\) in a single bit. By adopting the improvement, if any, found within the neighborhood as the next configuration, the search progressively finds better and better solutions. The search terminates when it finds no better solutions in the neighborhood. Such configurations are locally optimal with respect to the neighborhood. If the search gets trapped in a local optima (where all neighboring solutions are worse), additional heuristics may be applied to enable escape and continued exploration of the search space.
Quantum annealing can be very simply combined with local search to allow the local search algorithm to explore much larger neighborhoods than the 1bitflip Hamming neighborhood. As the size of the neighborhood increases, the quality of the local minima improves. Consider a problem of \(N\) variables, and define the neighborhood around configuration \(\vc{s}^t\) as all states within Hamming distance \(d\) (with \(d<N\)) of \(\vc{s}^t\). Searching within this neighborhood for a configuration \(\vc{s}^{t+1}\), having energy lower than \(E(\vc{s}^t)\), is a smaller optimization problem than the original; however, it still cannot be directly carried out by quantum annealing hardware. Instead, choose one of the \(\binom{N}{d}\) subsets of \(d\) variables. Call the selected subset \(A\) and note that \(A=d\). Having selected \(A\), determine the best setting for these \(\vc{s}_A\) variables given the fixed context of the conditioned variables \(\vc{s}^{t}_{\setminus A}\). Select \(d\) such that the smaller \(d\)variable problem can be embedded within the DWave QPU. If no improvement is found within the chosen subset, another subset of size \(d\) may be selected.
This algorithm stagnates at a configuration that is a local optimum with respect to all \(d\) variable spin flips. However, if \(d\) and \(N\) are both large, many size\(d\) subsets around each configuration \(\vc{s}^t\) exist, so trapping in \(d\)optimal local minima rarely occurs and progress may be slow as mainly unpromising neighborhoods are explored. It is possible to infer variable subsets that need not be considered, and heuristics may be defined that generate promising subsets[10].
[10]  [Liu2005] presents promising results for even small neighborhoods of size \(d\le 4\). With the DWave QPU, much larger neighborhoods can be considered. 
Embedding¶
The DWave QPU minimizes the energy of an Ising spin configuration whose pairwise interactions lie on the edges of a Chimera graph \(M,N,L\). To solve an Ising spin problem with arbitrary pairwise interaction structure, the corresponding graph must be minor embedded into a Chimera working graph.
An Ising optimization problem is defined on an arbitrary graph containing \(N\) nodes (variables) and \(M\) edges. The working graph in the DWave system corresponds to a graph with fixed connectivity—at most 6 edges per node—on which you can mimic arbitrary connectivity by using qubits to represent edges as well as variables.
In Lagrangian relaxation, an optimization variable \(s_i\) is split into two variables, \(s_i^{(1)}\) and \(s_i^{(2)}\), and the constraint \(s_i^{(1)}=s_i^{(2)}\) added. A similar technique can be used to map an optimization variable \(s_i\) onto a set of one or more qubits \(\{q_i^{(1)}, \cdots, q_i^{(k)}\}\). Because all qubits (assumed to be \(\pm 1\) valued) represent the same problem variable, impose the constraint \(q_i^{(j)} = q_i^{(j')}\) for all pairs \((j,j')\) occurring as edges in a spanning tree across the qubits. The equality constraint \(q_i^{(j)} = q_i^{(j')}\) can be encoded as the Ising penalty:
where \(M>0\) is the weight of the penalty. If \(M\) is sufficiently large, the lowest energy state in the full problem always has \(q_i^{(j)} = q_i^{(j')}\) because that feasible assignment is \(2M\) lower in energy than the infeasible assignment \(q_i^{(j)} \not= q_i^{(j')}\).
In this way, strings of qubits are related to each other to create chains that can connect arbitrary vertices. To create these chainlike connectors, weigh the penalties large enough so that lowenergy configurations do not violate the equality constraints. Balancing this, use the smallest possible penalty weight that enforces the constraints to prevent precision issues, and to foster efficient exploration of the search space. An iterative procedure, which incrementally updates weights until the equality constraints are satisfied, is effective.
For example, if qubits \(q_4\) and \(q_{32}\) represent problem variables \(s_1\) and \(s_2\) respectively and an \((s_1,s_2)\) edge in the problem graph exists, then this coupling can be realized by representing problem variable \(s_1\) with the string of qubits \(\{q_4,q_{12},q_8\}\), as shown in Figure 48. Edge weights of strength \(M\) couple qubits along the bold black edges. The string of qubits representing \(s_1\) is now connected to \(s_2\) through the blue edge. Note also that mapping a single optimization variable to a connected set of qubits in the working graph mimics higher degrees of connectivity than are present in the graph. For example, variable \(s_1\) represented by qubits \(\{q_4,q_{12},q_8\}\) can be connected to up to 12 other qubits and thus up to potentially 12 other optimization variables (if each qubit represents a distinct optimization variable). Connectivity is effectively doubled in this case.
Formalizing Minor Embeddings¶
This section considers the general problem of solving an arbitrary problem on the working graph in light of the variabletoqubit mapping observation above.
A minor of a graph is formed from a sequence of edge contractions on the graph. An edge contraction is the removal of an edge \((v_1,v_2)\) and fusion of the vertices \(v_1\) and \(v_2\), combining them into one. If the sets of neighboring vertices of \(v_1\) and \(v_2\) in the original graph are \(\mathcal{N}_1\) and \(\mathcal{N}_2\) then the neighbors of the fused node are \((\mathcal{N}_1\cup \mathcal{N}_2)\setminus \{v_1,v_2\}\). Given this notation, if \(G\) is a general graph representing an Ising problem, it can be minor embedded into Chimera if it is isomorphic to a graph minor of the Chimera working graph.
To more formally specify the embedding problem, let \(G=(V,E)\) be a graph for a problem. Fundamentally, there are two requirements on embedding:
 A given vertex \(v \in V\) must be mapped to a connected set of qubits \(q\in\text{Chimera}\).
 All edges \((v_1,v_2)\in E\) must be mapped to at least one edge in the working graph.
To satisfy the first constraint, posit an unknown table, \(\text{EmbedDist}(v,q,d)\), that maps vertex \(v\) to qubit \(q\) and is distance \(d\) from a reference qubit also mapped from \(v\). Additionally, define \(\text{Embed}(v,q) \leftrightarrow \exists d \, \text{EmbedDist}(v,q,d)\). The constraints[11] on these tables are then:
[11]  This constraint specification is executable, and solvers are available that can solve for the unknown tables \(\text{EmbedDist}(v,q,d)\) and \(\text{Embed}(v,q)\) from the constraint specification. 
 \(\forall v,q,d \, \text{EmbedDist}(v,q,d) \; \rightarrow \; \bigl(d=0\bigr) \vee \bigl(\exists v_1,q_1 \, \text{Chimera}(q,q_1) \wedge \text{EmbedDist}(v_1,q_1,d1)\bigr)\).
 \(\forall v_1,v_2 \, E(v_1,v_2) \; \rightarrow \; \exists q_1,q_2 \, \text{Chimera}(q_1,q_2) \wedge E(v_1,q_1) \wedge \text{Embed}(v_2,q_2)\).
The predicate \(\text{Chimera}(q_1,q_2)\) is true if and only if there is an edge between qubits \(q_1\) and \(q_2\). Similarly, \(E(v_1,v_2)\) is true if and only if there is an edge in the graph between problem variables \(v_1\) and \(v_2\). In addition to these two constraints, there are the following bookkeeping constraints:
 Every \(v\) is mapped: \(\forall v \exists q \, \text{Embed}(v,q)\).
 At most one \(v\) is mapped to a given \(q\): \(\forall v,q \, \text{Embed}(v,q) \rightarrow \neg \exists v_1 \, (v_1\not= v) \wedge \text{Embed}(v_1,q)\). The second bookkeeping constraint can also be specified as \(\forall q \, \text{COUNT}\bigl(v,1,\text{Embed}(v,q)\bigr)\le 1\). [12]
 There is 1 root for each \(v\): \(\forall v \, \text{COUNT}(q,1,\text{EmbedDist}(v,q,0))=1\).
[12]  The aggregate operator \(\text{COUNT}(v,1,\phi(v))\) counts the values in the domain of \(v\) for which predicate \(\phi(v)\) is true. 
This specification does not minimize the total number of qubits used. However, when quantifying over \(d\) (the size of the connected graphs representing problem variables), specify a maximal value \(D\) so that \(0 \le d \le D\). In this way, the size of the embeddings may be somewhat controlled. For graphs specified through \(E(v_1,v_2)\), however, the problem may be infeasible if \(D\) is too small.
Note
These techniques help solve arbitrary Ising problems by transforming the problem to an alternative form that preserves the lowenergy states. In particular, qubits are introduced and constraints added that force sets of qubits to take identical values. These transformations may affect the efficacy of quantum annealing.
Embedding Complete Graphs¶
Because a complete graph on \(V\) vertices (all vertices connected to all other vertices) can be used to represent any Ising problem defined on \(V\) variables, consider what is the largest complete graph \(K_V\) that is a minor of a \(M\times N\times L\) graph. Knowing this complete minor allows for immediate embedding of all problems of \(V\) or fewer variables.
The largest complete minor has \(V=1+L\min(M,N)\) vertices.[13]
A graph minor for a small \(2\times2\times4\) graph is shown in Figure 49.
A similar result for a \(4\times 4 \times L\) graph is in Figure 50. If Figure 50 were expanded, the top left corner comprising the A and B blocks alone would appear as in Figure 49, for \(L=4\).
Connectivity is addressed at the cost of larger lattices. Note also that specific connectivity requirements may have problemspecific embeddings that make more effective use of qubits than using Chimera as the complete graph on \(V\) vertices.
[13]  That this is the largest complete minor follows immediately from the treewidth of Chimera being \(L\min(M,N)\). 
Finding Better Embeddings¶
In some cases, the exact form of a problem is malleable. One important example arises in machine learning problems. The true objective in predictive modeling is to minimize the error on future observations so that they are accurately predicted. However, without knowing what points will be observed in the future, there is little choice but to minimize the errors on a training set of observed examples. Minimizing training set error runs the risk of overfitting and additional terms favoring “simpler” models over complex ones are typically added to the objective. Thus, there is latitude in the precise objective. In such cases, the embedding method can be simplified.
Let \(G=(V,E)\) represent the Chimera graph. Embed an Ising objective of \(N\) variables into Chimera (with \(V\ge N\)) with a mapping \(\varphi:\{1,\ldots,N\}\mapsto V\), such that \(J_{i,j}\neq 0\Rightarrow (\varphi(i),\varphi(j))\in E\). In other words, each nonzero \(J_{i,j}\) must be assigned to an edge. The simplifying assumption requires that each node in the original graph be mapped to a single node in Chimera.
The mapping \(\varphi\) is encoded with a set of binary variables \(M_{i,q}\), which indicate that problem variable \(i\) is mapped to Chimera node \(q\). A valid mapping requires \(\sum_q m_{i,q} = 1\) for all problem variables \(i\), and \(\sum_i m_{i,q}\le 1\) for all Chimera nodes \(q\). Because the original problem can be altered, fit the problem into Chimera by maximizing the total magnitude of \(J_{i,j}\) mapped to Chimera edges; that is:
where \(\vc{M}\) is subject to the mapping constraints. This problem is a variant of the NPhard quadratic assignment problem. In the interest of linear time embedding, apply a greedy heuristic to approximately maximize the objective.
As a starting point, let \(i_1=\argmax_i \sum_{j<i} J_{j,i} + \sum_{j>i} J_{i,j}\). So \(i_1\) is the row/column index of \(J\) with the highest sum of magnitudes (\(\vc{J}\) is assumed to be uppertriangular). Map \(i_1\) to one of the Chimera vertices of highest degree. For the remaining variables, assume \(\varphi\) is defined on \(\{i_1,\ldots,i_k\}\) such that \(\varphi(i_j)=q_j\) where \(q_j\) is some Chimera qubit. Then assign \(\varphi(i_{k+1})=q_{k+1}\), where \(i_{k+1}\notin\{i_1,\ldots,i_k\}\) and \(q_{k+1}\notin \{q_1,\ldots,q_k\}\) to maximize the sum of all \(J_{i_{k+1},i_j}\) and \(J_{i_j,i_{k+1}}\) over all \(j\in\{1,\ldots,k\}\) for which \((q_j,q_{k+1})\) is a Chimera edge. This fast greedy heuristic performs well.
Solving a Problem on the QPU¶
SpinReversal Transform¶
Applying a spinreversal transform can improve results by reducing the impact of analog errors that may exist on the QPU. A spinreversal transform does not alter the Ising problem, in the sense that there is a direct onetoone mapping between all solutions to the original problem to the altered problem that preserves their energies—solutions of the original problem and of the transformed problem have identical energies. Rather, the transform reverses the meanings of a collection of individual spins \(s_p\) as follows:
This mapping ensures that the classical Ising spin energy is unchanged; the transform simply amounts to reinterpreting spin up as spin down, and visaversa, for a particular spin. Any number of spins may be transformed in this way, but the Ising problem remains the same and therefore we expect the same answers. A spinreversal transform is defined according to a subset of spins for which this transform is applied.
Some sources of ICE depend on the magnitude of the (\(h\), \(J\)) problem parameters. Random spin transformations can be used to avoid systematic errors arising from multiple requests of the same problem, by “breaking up” these dependencies.
From a single problem defined by a set of parameters \(\{ h_i,J_{i,j} \}\), a sequence of spinreversal transforms \(T', T'' \ldots\), each applied to a random subset of spins, defines a sequence of equivalent problems (\(h',J'\)), (\(h'',J''\)).... All have the same energy spectrum, but their implementation on the QPU may invoke different ICE effects (some better and some worse than the original).
When studying the behavior of the QPU in solving a problem, include in the analysis a number of spinreversal transforms of the problem. The transform samples randomly over sets of possible systematic errors. The correlations between these error sets, at least as they relate to certain error sources, can be reduced with the following considerations:
 Changing too few spins leaves most errors unchanged, and therefore has little effect.
 Changing too many spins means that most couplers connect spins that are both transformed, thus \(J_{i,j}\) does not change sign. As a result, some systematic errors associated with the couplers are unaffected.
To sample over a wide range of systematic errors, consider the number of couplers and spins involved in a particular transform.
Note
Spinreversal transforms do not necessarily remove errors due to ICE; rather, they randomize the effects of certain errors that depend on (\(h\), \(J\)). This creates a spectrum of error magnitudes from which the best results can be selected. Some types of systematic error—ICE 1, as defined in Technical Description of the DWave Quantum Processing Unit for example—are not affected by spinreversal transforms.
Specify the number of spinreversal transforms using the num_spin_reversal_transforms parameter when submitting a problem.
Using Anneal Offset¶
To run a problem with anneal offsets:
Permitted parameters vary by QPU, so first retrieve the solver properties for your system.
If anneal offsets are supported, the returned values include a property that shows the positivenegative range of permitted offset values, in normalized offset units, for each qubit in the QPU. (Both values are 0 for any inoperable qubits.) Negative values force the qubit to anneal later than the standard trajectory; positive values, earlier.
In the following truncated list, for instance, we see that qubit 0 can be annealed a maximum of 0.2255351369888578 normalized offset units later than the standard trajectory, or a maximum of 0.050118919330857284 normalized offset units earlier.
[[0.2255351369888578, 0.050118919330857284], [0.2258618355939089, 0.12045964565008474], ...
Embed the problem on the graph. For best results, embed the problem multiple times.
Prepare an array of offset values based on the information retrieved from the system, the hardware adjacencies of the embedded problem, and any known dynamics of the problem.
Supply the array of offsets for the qubits in the system using the anneal_offsets parameter with a length equal to the num_qubits property. The appropriate format depends on the client; see the appropriate developer guide for your client for more information.
Submit the problem.
Compare results of the same problem run with and without offset annealing paths, and the same problem run with different offset values. Adjust the offset values accordingly.