Derivations and Further Details

This chapter provides background information and more details on the techniques presented in previous chapters for the purpose of helping users unfamiliar with those to assess whether one or more might pertain to a given problem. Consult with the literature and with D-Wave for additional information when implementing.

Problem Reformulation

SAT and Weighted MAX-2-SAT

[Coo1971] and [Lev1973] identified satisfiability as the first NP-complete problem. The conjunctive normal form of SAT is a decision problem that asks for an assignment of Boolean variables which satisfies a collection of constraints called clauses. A clause is a disjunction of literals, and is satisfied if any literal in the clause is true. A literal is a variable or a negation of a variable. For example, the clause[1] \(x_1 \vee \overline{x}_2\) is satisfied if either \(x_1 = {true} = 1\) or \(x_2 = {false} =0\).

[1]
An overline on a Boolean variable indicates its negation, i.e.
\(\overline{x}=\neg x=1-x\).

In a 2-SAT problem all clauses contain at most 2 literals. Finding a satisfiable assignment or proving that there is no satisfying assignment for 2-SAT can be done in polynomial time. However, 3-SAT which allows clauses of up to 3 literals is NP hard and there are problem instances which cannot be solved in polynomial time.

A variant of SAT called MAX-SAT (MAXimize the number of satisfied clauses) relaxes the decision problem—is there a satisfying assignment or not?—to an optimization problem and seeks the assignment which minimizes the number of unsatisfiable clauses.

Weighted MAX-SAT generalizes the problem further: each clause is assigned a positive weight, and the violation of a clause \(c\) having weight \(w_c\) incurs a cost \(w_c\). We seek to maximize the weight of satisfied clauses, or equivalently minimize the weight of unsatisfied clauses. Typically, weights are integral so that weighted MAX-SAT is really nothing other than MAX-SAT with redundant clauses. (If the weight of a clause is \(w\) then \(w\) replicated clauses are introduced into the MAX-SAT problem). A weighted MAX-SAT problem is represented as a list of weighted clauses, e.g. \(\{(x_1\vee \overline{x}_2; 3), (x_3;1), (\overline{x}_3 \vee x_2;4)\}\) represents a problem of three variables consisting of 3 clauses having weights 3, 1, and 4 respectively.

If all clauses of a weighted MAX-SAT contain as most 2 literals then we have a weighted MAX-2-SAT problem. Weighted MAX-2-SAT is equivalent to the QUBO and Ising problems.

How might we represent clauses as terms in a QUBO? A weighted clause \((x_1\vee \overline{x}_2; 3)\) is unsatisfied if \(\overline{x}_1 \wedge x_2\) is true, and this conjunction is equivalent to the product of Boolean variables \(\overline{x}_1 x_2\). If the clause is unsatisfied then a cost of 3 is incurred, thus the penalty representing the clause is \(3 \overline{x}_1 x_2\). The weighted MAX-2-SAT problem \(\{(x_1\vee \overline{x}_2; 3), (x_3;1), (\overline{x}_3 \vee x_2;4)\}\) is then easily represented as \(3\overline{x}_1 x_2 + \overline{x}_3 + 4 x_3 \overline{x}_2\). This representation is called a posiform. A posiform is a summation of terms where each term is a product of literals multiplied by a positive (usually integral) weight.

The general weighted MAX-2-SAT problem can be written as the posiform:

\[\min_{\vc{x}} \sum_c w_c \overline{\ell}_{c,1} \overline{\ell}_{c,2}\]

where \(\ell_{c,1}\) and \(\ell_{c,2}\) are the two literals of clause \(c\), and \(w_c\) is its weight.

Posiforms and Higher-Order Polynomials

Posiforms are a useful representation for pseudo-Boolean functions[2] (functions which map bit-strings to real values). General posiforms may have terms consisting of products of more than 2 literals. For a problem of \(N\) terms the required number of ancillary variables in polynomial is \(N\). As an example of a posiform model (in this case having only 2 literals per term), the minimum vertex cover of a graph \(G=(V,E)\) is determined by:

\[\min_{\vc{x}} \left\{ \sum_{v\in V} x_v + M \sum_{(v,v')\in E} \overline{x}_v \overline{x}_{v'} \right\}\]

which determines the smallest set of vertices covering all edges in \(G\). The quadratic penalty of weight \(M\) term ensures that for all edges \((v,v')\) at least of one of \(x_v\) or \(x_{v'}\) is zero. Thus, the minimizer determines the smallest set of vertices which cover all edges in \(G\).

[2]See [Bor2002].

A related problem is maximum independent set which seeks the largest possible set of vertices no two of which are connected by an edge (a set of non-connected vertices is called an independent set). Maximum independent set is given by the posiform objective:

\[\min_{\vc{x}} \left\{ \sum_{v\in V} \overline{x}_v + M \sum_{(v,v')\in E} x_v x_{v'} \right\}\]

The first term seeks the minimal number of terms not in the independent set, and the second term penalizes choices which correspond to non-independent sets.

As an example of a problem having higher-order interactions consider an objective function which counts the number of cliques of size 3 in a graph of \(N\) vertices. The graph might be represented with a set of \(\binom{N}{2}\) Boolean variables \(x_{i,j}\) indicating the presence/absence of each edge \((i,j)\). Given this representation vertices \(\{1,2,3\}\) form a clique if \(x_{1,2} x_{1,3} x_{2,3}\). The total number of 3-cliques is:

\[\sum_{(v_1,v_2,v_3)\in \mathcal{S}} x_{v_1,v_2} x_{v_1,v_3} x_{v_2,v_3}\]

where \(\mathcal{S}\) is the set of all subsets of size 3 of \(\{1,2,\cdots, N\}\). This is an example of a posiform where each term has degree 3.

Another higher-order posiform arises in weighted MAX-3-SAT. If clauses consist of 3 literals then the analog of the general weighted MAX-2-SAT problem written as the posiform in SAT and Weighted MAX-2-SAT is:

\[\min_{\vc{x}} \sum_c w_c \overline{\ell}_{c,1} \overline{\ell}_{c,2} \overline{\ell}_{c,3}.\]

Non-Quadratic (Higher-Order) Polynomials

For many problems, the natural model includes interactions between more than pairs of variables. How can such problems be reduced to the pairwise QUBO representation?

One approach is to introduce and minimize over ancillary variables. The constraint \(z\Leftrightarrow x_1\wedge x_2\), which sets Boolean variable \(z\) equal to the logical AND of Boolean variables \(x_1\) and \(x_2\), can be represented through a quadratic penalty function[3]:

\[P(x_1,x_2;z) = x_1 x_2 - 2(x_1+x_2)z + 3z.\]
[3]The same penalty function written in Ising terms of -1/1 variables is \(P(s_1,s_2;s_z) = 3+s_1 s_2 - 2(s_1+s_2)s_z - s_1-s_2 + 2s_z\).

This penalty function enforces the constraint because it attains the minimal value of 0 at the 4 \((x_1,x_2,z)\) combinations satisfying \(z\Leftrightarrow x_1\wedge x_2\), while all other combination of values have \(P>0\). We use this result to reduce a triplet interaction to pairwise interaction as follows[4]:

\[x_1 x_2 x_3 = \min_z \bigl\{ z x_3 + M P(x_1,x_2;z) \bigr\}\]
[4]The penalty weight may be taken to be \(M=1\).

where \(M>1\) is a penalty weight. The product \(x_1 x_2\) is captured by \(z\) which is multiplied by \(x_3\) giving the desired interaction. The penalty function \(P\) is added to ensure that \(z\) is set correctly given \(x_1\) and \(x_2\).

In the same manner an interaction involving 4 or more variables can be reduced to pairwise by sequentially introducing new variables and reducing the degree of the interaction by 1 at each step.

As extra ancillary variables limit the problem size addressable by the D-Wave system, it is desirable to minimize the number of introduced variables. The reduction-to-pairwise algorithm of [Bor2002] is easily modified to minimize the number ancillary variables. Select the pair of variables that is most common amongst all terms with more than pairwise interactions, and replace it with ancillary variable \(z\) in all those terms. Repeat until no more terms of 3 or more variables remain. Note that subsequent iterations may involve products of ancillary variables if these ancillary pairs are the most common amongst the terms.

Constraints to Penalties

Most applications involve associated constraints on variables which restrict bitstrings to certain feasible configurations. The QPU cannot natively impose constraints; instead you introduce energy penalties that penalize infeasible configurations. Penalty functions are also a source of higher order interactions. This section considers approaches to solving optimization problems with constraints.

A finite domain CSP consists of a set of variables, a specification of the domain of each variable, and a specification of the constraints over combinations of the allowed values of the variables. A constraint \(C_\alpha(\vc{x}_\alpha)\) defined over a subset of variables \(\vc{x}_\alpha\) defines the set of feasible and infeasible combinations of \(\vc{x}_\alpha\). The constraint \(C_\alpha\) may be be viewed as a predicate which evaluates to true on feasible configurations and to false on infeasible configurations. For example, if the domains of variables \(X_1,X_2,X_3\) are all \(\{0,1,2\}\), and the constraint is \(X_1+X_2<X_3\) then the feasible set is \(\{(0,0,1),(0,0,2),(0,1,2),(1,0,2)\}\), and all remaining combinations are infeasible. The variables involved in a constraint define its scope. Typically, constraint satisfaction problems have many constraints of local scope (involving only a small subset of variables), but there may often be global constraints that couple variables. We indicate the set of constraints as \(\mathcal{C} = \{C_\alpha(\vc{x}_\alpha)\}_{\alpha=1}^{|\mathcal{C}|}\) and we seek a solution \(\vc{x}\) that satisfies all constraints, i.e. \(\bigwedge_{C_\alpha\in\mathcal{C}} C_\alpha(\vc{x})\) is true.

CSP Conversion by Penalty Functions

The CSP \(\rightarrow\) MIS \(\rightarrow\) QUBO conversion can be expensive since the resulting conflict graphs can have great many nodes and edges if each constraint has a large feasible set. Though, there are methods to compress the number of needed nodes, an alternative formulation of the CSP is often preferred.

This section follows the same approach as was introduced to reduce higher-order problems to quadratic in Posiforms and Higher-Order Polynomials. For each constraint \(C_\alpha(\vc{x}_\alpha)\) we introduce a quadratic penalty function \(P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha)\). The penalty function is defined so that:

\[\begin{split}\min_{\vc{a}_\alpha} P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha) &= o \qquad \text{for all feasible $\vc{x}_\alpha$} \\ \min_{\vc{a}_\alpha} P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha) &\ge o+1 \qquad \text{for all infeasible $\vc{x}_\alpha$}.\end{split}\]

Thus, the feasible configurations are encoded as global minima of \(P_\alpha\) so \(\argmin_{\vc{x}_\alpha,\vc{a}_\alpha} P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha)\) yields a \(\vc{x}\) for which \(C_\alpha(\vc{x}_\alpha)\) is true. The additional \(\vc{a}_\alpha\) variables are necessary to mimic interactions in \(C_\alpha\) that are beyond pairwise. If we have a penalty function for each constraint then minimizing the objective:

\[\min_{\vc{x},\vc{a}} P(\vc{x},\vc{a}) = \min_{\vc{x},\vc{a}} \left\{\sum_{\alpha=1}^{|\mathcal{C}|} P_\alpha(\vc{x}_\alpha,\vc{a}_\alpha) \right\}\]

gives a feasible solution \(\vc{x}^\star,\vc{a}^\star\) satisfying all constraints and having objective value \(P(\vc{x}^\star,\vc{a}^\star)=o\) if the original CSP has a feasible solution \(\vc{x}^\star\). If the original CSP is infeasible then \(P(\vc{x}^\star,\vc{a}^\star)\) will be at least \(o+1\).

The utility of this approach depends upon being able to conveniently represent common constraints \(C_\alpha\) as QUBO penalties. There are techniques to automatically construct QUBO penalty function from specifications of \(C_\alpha\).

Nonlinear Constraints

For penalties whose scope includes a small number of variables (e.g. less than 15) we may construct penalty functions from a specification of the feasible and infeasible configurations. Let \(F\) represent a set of feasible configurations, and \(\overline{F}\) represent a set of infeasible configurations. We require \(\min_{\vc{a}} P(\vc{x},\vc{a}) = o\) for \(\mathbf{x}\in F\) and \(\min_{\vc{a}} P(\mathbf{x},\vc{a}) \ge o+1\) for \(\mathbf{x}\in \overline{F}\) for some constant \(o\). Since \(P\) must be quadratic, we write the penalty as:

\[\begin{split}P(\vc{x},\vc{a}) = \begin{bmatrix} \vc{x} & \vc{a} \end{bmatrix} \begin{bmatrix} \vc{Q}^{x,x} & \vc{Q}^{x,a} \\ 0 & \vc{Q}^{a,a} \end{bmatrix} \begin{bmatrix} \vc{x} \\ \vc{a} \end{bmatrix}\end{split}\]

with

\[\vc{Q}^{x,x} = \sum_i w_i^{x,x} \vc{M}_i^{x,x}, \qquad \vc{Q}^{x,a} = \sum_i w_i^{x,a} \vc{M}_i^{x,a}, \qquad \vc{Q}^{a,a} = \sum_i w_i^{a,a} \vc{M}_i^{a,a}.\]

We seek the parameters \(\vc{w} \equiv \{\vc{w}^{x,x},\vc{w}^{x,a},\vc{w}^{a,a}\}\) given the matrices \(\vc{M}\equiv \{\vc{M}^{x,x},\vc{M}^{x,a},\vc{M}^{a,a}\}\). We generalize penalties in this way in order to limit required precision and/or to accommodate connectivity constraints as described by the set of \(\vc{M}\). Thus, we’d like to solve the optimization problem:

\[\begin{split}\min_{\vc{w},o} \|\vc{w}\|_1 \\ \text{subject to: } \langle \vc{w}^{x,x}, \vc{c}^{x,x}(j) \rangle + \min_k \left\{ \langle \vc{w}^{x,a}, \vc{c}^{x,a}_k(j) \rangle + \langle \vc{w}^{a,a}, \vc{c}^{a,a}_k\rangle \right\} = o \;\;\; \forall j\in F \\ \langle \vc{w}^{x,x}, \vc{c}^{x,x}(\overline{j}) \rangle + \min_k \left\{ \langle \vc{w}^{x,a}, \vc{c}^{x,a}_k(\overline{j}) \rangle + \langle \vc{w}^{a,a}, \vc{c}^{a,a}_k\rangle \right\} \ge o+1 \;\; \;\forall \overline{j}\in \overline{F}\end{split}\]

where \(k\) ranges over the allowed configurations \(\vc{a}_k\) of \(\vc{a}\), and the \(\vc{c}\) vectors have components indexed by \(j\) given by:

\[\bigr[\vc{c}^{x,x}(j)\bigl]_i = \vc{x}_j^\top \vc{M}^{x,x}_i \vc{x}_j, \qquad \bigl[\vc{c}_k^{x,a}(j) \bigr]_i = \vc{x}_j^\top \vc{M}_i^{x,a} \vc{a}_k, \qquad \bigl[\vc{c}_k^{a,a} \bigr]_i = \vc{a}_k^\top \vc{M}^{a,a}_i \vc{a}_k.\]

We have also introduced an unknown \(o\) representing the objective value of feasible configurations. This unknown value need not be equal to 0. The notation \(j\in F\) and \(\overline{j}\in \overline{F}\) indicates feasible and infeasible configurations respectively. Minimizing the \(L_1\) norm forces penalty terms to be close to 0, and aides in limiting the required precision. The second constraint can be expressed more simply to give:

\[\begin{split}\min_{\vc{w},o} \|\vc{w}\|_1 \\ \text{subject to: } \langle \vc{w}^{x,x}, \vc{c}^{x,x}(j) \rangle + \min_k \left\{ \langle \vc{w}^{x,a}, \vc{c}^{x,a}_k(j) \rangle + \langle \vc{w}^{a,a}, \vc{c}^{a,a}_k\rangle \right\} = o \;\;\; \forall j\in F \\ \langle \vc{w}^{x,x}, \vc{c}^{x,x}(\overline{j}) \rangle + \langle \vc{w}^{x,a}, \vc{c}^{x,a}_k(\overline{j}) \rangle + \langle \vc{w}^{a,a}, \vc{c}^{a,a}_k\rangle \ge o+1 \;\; \;\forall \overline{j}\in \overline{F}, \forall k.\end{split}\]

The min in the first constraint makes this a difficult optimization problem. One approach to its solution is to express it as a mixed integer program and use a mixed integer solver.

We introduce Boolean indicator variables \(\{\alpha_k(j)\}\) for each \(1 \le j \le |F|\) and for each allowed configuration \(\vc{a}_k\). Define the vector \(\vc{\alpha}(j)\) whose components are the different \(k\). The indicator variable \(\alpha_k(j)\) picks out the single \(k\) at which \(\min_k\) in constraint 1 is attained (if there are multiple \(k\) attaining the minimal value we pick 1 of these). Thus we have:

\[\begin{split}\langle \vc{1}, \vc{\alpha}(j)\rangle = 1 \quad \forall j\in F \\ \sum_{k'} \bigl\{ \langle \alpha_{k'}(j) \vc{w}^{x,a}, \vc{c}^{x,a}_{k'}(j) \rangle + \langle \alpha_{k'}(j) \vc{w}^{a,a}, \vc{c}^{a,a}_{k'} \rangle \bigr\} \le \langle \vc{w}^{x,a}, \vc{c}^{x,a}_k(j) \rangle + \langle \vc{w}^{a,a}, \vc{c}^{a,a}_k\rangle \quad \forall k,j.\end{split}\]

Unfortunately, this constraint couples \(\alpha_k(j)\) to \(\vc{w}^{x,a}\) and \(\vc{w}^{a,a}\).[5] To break this coupling and obtain a linear problem we introduce \(\vc{v}_k^{x,a}(j) = \alpha_k(j) \vc{w}^{x,a}\) and \(\vc{v}_k^{a,a}(j) = \alpha_k(j) \vc{w}^{a,a}\). This requirement is enforced with:

\[\begin{split}\vc{v}_k^{x,a}(j) &\le \alpha_k(j) \\ \vc{v}_k^{a,a}(j) &\le \alpha_k(j) M \\ -\vc{v}_k^{x,a}(j) &\le \alpha_k(j) M \\ -\vc{v}_k^{a,a}(j) &\le \alpha_k(j) M \\ \sum_k \vc{v}_k^{x,a}(j) &= \vc{w}^{x,a} \\ \sum_k \vc{v}_k^{a,a}(j) &= \vc{w}^{a,a}.\end{split}\]

The first two of the constraints in each column requires that \(\vc{v}_k^{x,a}(j)=0\) and \(\vc{v}_k^{a,a}(j)=0\) if \(\alpha_k(j)=0\). The penalty weight \(M\) should be chosen to be larger than the largest absolute value of the optimal values of \(\vc{w}^{x,a}\) and \(\vc{w}^{a,a}\) so that we do not eliminate any solutions.

The resulting mixed integer program can then be solved with any MIP solver. If there is no feasible solution for a given number of ancillary variables or specified connectivity, then these requirements must be relaxed by introducing additional ancillary variables of additional connectivity.

Even if a penalty can be derived through conversion to a linear equality or inequality, it may still be useful to apply the above penalty-deriving machinery to construct a penalty with fewer ancillary variables. For example, the constraint \(x_1\vee x_2\vee x_3\) (which is useful if we’re solving 3-SAT) can always be expressed as the inequality \(x_1+x_2+x_3\ge 1\), and converted to the penalty function \((x_1+x_2+x_3 - 1 -a_1-2a_2)^2\). The two ancillary variables \(a_1\) and \(a_2\) are necessary to represent the possible slack values 0, 1, and 2. However, the following penalty function

\[-3+ (\overline{x}_1+\overline{x}_2+\overline{x}_3)(1+a) + \overline{a} + x_1 x_2 + x_2 x_3 + x_3 x_1\]

represents the same constraint using a single ancillary variable[6].

[5]CPLEX and other commercial grade optimization software can solve problems with quadratic constraints so it may be used to directly address this formulation.
[6]In fact, it is this representation which is commonly used to reduce 3-SAT to MAX-2-SAT.

Problem Decomposition

To solve a problem with more variables than the available number of qubits, break the problem into subproblems, solve the subproblems, and then reconstruct an answer to the original problem from the subproblem solutions. Divide-and-conquer and dynamic programming algorithms have a rich history in computer science for problems of this type and can be adapted to map problems onto the D-Wave QPU.

Conditioning

The approaches here rely on conditioning (or assigning) a subset of variables to particular values and solving the problem that remains. If \(A\) indicates a set of variables that are unconditioned, and \(\setminus A\) indicates the variables conditioned to certain values, then an Ising objective \(E(\vc{s})\) can be minimized in two stages:

\[\min_{\vc{s}} E(\vc{s}) = \min_{\vc{s}_A,\vc{s}_{\setminus A}} E(\vc{s}_A,\vc{s}_{\setminus A}) = \min_{\vc{s}_{\setminus A}} \mathcal{E}(\vc{s}_{\setminus A})\]

where

\[\mathcal{E}(\vc{s}_{\setminus A})=\min_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A}) = E\bigl(\vc{s}_A^\star(\vc{s}_{\setminus A}), \vc{s}_{\setminus A} \bigr).\]

Here, \(\vc{s}_A^\star(\vc{s}_{\setminus A})=\argmin_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A})\) is the optimal setting of \(\vc{s}_A\) for a given \(\vc{s}_{\setminus A}\). Conditioning makes the \(\min_{\vc{s}_A}\) problem smaller and simpler than the original problem. The graph of this conditioned problem over the vertices \(\vc{s}_A\) of \(E(\vc{s}_A,\vc {s}_{\setminus A})\) is formed from the graph of the original problem by removing all conditioned variables and all edges attached to a conditioned variable. Of course, you still must identify the lowest energy setting of the conditioned variables.

The outer loop optimization determining the conditioned variable settings is the \(\min_{\vc{s}_{\setminus A}} \mathcal{E}(\vc{s}_{\setminus A})\) problem. The graph for the \(\mathcal{E}(\vc{s}_{\setminus A})\) problem has connections between variables in \(\vc{s}_{\setminus A}\) which have been induced through the elimination (minimization) of the \(\vc{s}_A\) variables; that is, through the dependence on \(\vc{s}_A^\star(\vc{s}_{\setminus A})\). In summary, conditioning casts the \(\min_{\vc{s}} E(\vc{s})\) problem in terms of the smaller problem \(\min_{\vc{s}_{\setminus A}} \mathcal{E}(\vc{s}_{\setminus A})\) where \(\mathcal{E}(\vc{s}_{\setminus A})\) can be evaluated at any \(\vc{s}_{\setminus A}\) using the D-Wave system. The outer optimization over \(\vc{s}_{\setminus A}\) is carried out using conventional software algorithms.

In certain situations, techniques that infer values for the unconditioned variables based on side constraints or bounding information can further reduce the smaller \(\min_{\vc{s}_{A}}\) subproblem. Different choices for the conditioning sets \(A\) (which may vary during the course of the algorithm) and different inference techniques give different algorithms. Most stochastic local-search algorithms rely entirely on conditioning and use no inference. In contrast, complete search methods like Branch-and-Bound Algorithms rely heavily on inference to prune their search spaces.

Cutset Conditioning

Cutset conditioning is a simple decomposition method that chooses a subset of variables \(A\) (which remain fixed throughout the course of the algorithm) so that the resulting subproblem \(\min_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A})\) defined in Conditioning decomposes into a set of disjoint subproblems.[7] In this case, the graph of \(E(\vc{s}_A,\vc{s}_{\setminus A})\) breaks into separate components indexed by \(\alpha\) which can be solved independently. Formally:

[7]While it is always possible to find a subset \(A\), the subset might be large for highly connected problems.
\[\begin{split}\mathcal{E}(\vc{s}_{\setminus A}) &= \min_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A}) \\ &= \min_{\vc{s}_A} \left\{ E_{\setminus A}(\vc{s}_{\setminus A}) + \sum_\alpha E_\alpha(\vc{s}_{\setminus A},\vc{s}_{A_\alpha}) \right\} \\ &= E_{\setminus A}(\vc{s}_{\setminus A}) + \sum_\alpha \min_{\vc{s}_{A_\alpha}} E_\alpha(\vc{s}_{\setminus A},\vc{s}_{A_\alpha}) \\ &= E_{\setminus A}(\vc{s}_{\setminus A}) + \sum_\alpha \mathcal{E}_\alpha(\vc{s}_{\setminus A})\end{split}\]

where \(\vc{s}_A = \bigcup_\alpha \vc{s}_{A_\alpha}\) and \(\vc{s}_{A_\alpha} \cap \vc{s}_{A_{\alpha'}}=\emptyset\) for \(\alpha \not = \alpha'\). Here, \(\alpha\) labels the disconnected components of \(A\) once all variables have been fixed to \(\vc{s}_{\setminus A}\). With a proper choice of the conditioning variables, the graph is cut into disjoint partitions that are optimized with respect to \(\vc{s}_{A_\alpha}\) independently. The set of conditioned variables that cuts the original problem into pieces is called the cutset. Choose a cutset such that each of the remaining \(\min_{\vc{s}_{A_\alpha}}\) problems is small enough to be solved on the D-Wave system. A small example is illustrated in Figure 46.

Two diagrams showing an 8 variable graph before and after cutset conditioning. Variables are numbered S 1 to S 8. The diagram on the left shows the variables connected, before conditioning. The diagram on the right shows the graph after it is conditioned on variable S 1 and therefore decomposed into 3 smaller subproblems.

Fig. 46 8-variable graph, before (left) and after (right) conditioning. If the graph is conditioned on variable \(s_1\), it decomposes into 3 smaller subproblems. The remaining variable sets \(\vc{s}_A=\{s_2,s_3\}\cup\{s_4,s_5,s_6\}\cup \{s_7,s_8\}\) define three decoupled subproblems.

The outer optimization problem that determines optimal values for the cut-set variables \(\vc{s}_{\setminus A}\) is carried out with a classical heuristic, perhaps greedy local search, tabu search, or simulated annealing. Alternatively, a complete search method may be used to find the global minimum of \(\mathcal{E}(\vc{s}_{\setminus A})\). Note, however, that the inner optimizations over \(\vc{s}_A\) come with no proof of optimality.

To simplify outer optimization, make the number of conditioned variables as small as possible. Determining the smallest cutset is NP-hard in general, so employ fast heuristics.

An alternative criterion to determine the set \(\mathbf{s}_{\setminus A}\) of conditioned variables is a tree or a collection of trees, because tree-like graphs can be optimized in polynomial time. Also determine the set \(\mathbf{s}_{\setminus A}\) so that the resulting graph—after the conditioned variables and their edges are removed—is either a tree or collection of disconnected trees (see [Dec1987]). The cutsets needed to form trees are typically larger than those needed to form Chimera-structured subproblems, so there is a significant advantage in using this approach to solve subproblems on the D-Wave system. An effective way to find a small set of vertex separators is to apply a graph partition heuristic like METIS,[8] and then build a cutset of vertex separators by selecting a vertex cover from the edges that span partitions.

[8]
METIS is a set of programs for partitioning graphs and finite-element meshes, and

producing fill reducing orderings for sparse matrices.

Branch-and-Bound Algorithms

Branch-and-bound algorithms progressively condition more and more variables to particular values. For simplicity, assume that spins \(s_i\) are conditioned in index order \(i=1,2,3 \ldots\). At some point in the search, variable \(s_i\) must be conditioned to either \(s_i=-1\) or \(s_i=1\). This possibility defines a split at node \(s_i\). At this node, variables \(s=1...i-1\) have been conditioned to values and the others remain free. Additional splits on free variables define a branching binary tree of possibilities. The leaves of this tree define the \(2^N\) configurations where all variables have been assigned values.

At each node in the traversal of this tree, a branch-and-bound algorithm decides whether to prune the tree at this node, effectively skipping exploration of the subtrees below it. A branch can be pruned if it can be shown that it is not possible for any leaf node below it to contain the global optimum. This pruning is accomplished by maintaining upper and lower bounds on the global minimum of \(E(\vc{s})\) as the tree traversal is carried out. At any point in the search, the upper bound is the smallest value \(E(\vc{s})\) that has been seen at the visited leaves. The lower bound at a node in the search is calculated as a lower bound on \(\min_{\vc{s}_A} E(\vc{s}_A,\vc{s}_{\setminus A})\) where \(\vc{s}_A\) is the set of as-yet-unconditioned variables. If the lower bound at a given node exceeds the current upper bound, then there is no need to search below this node, since no leaf of any of its subtrees can be optimal. The efficacy of this pruning is determined both by the tightness of the lower bound approximation and by the ordering in which variables are conditioned.

Branch-and-bound methods can benefit from the quantum annealing solver by terminating searches higher in the tree. After sufficient variables are conditioned so that the remaining unconditioned variables can be optimized by the D-Wave system, there is no need to explore deeper—simply call the D-Wave system to estimate the best completion from that node. As the upper bound is minimized through subsequent D-Wave system completions, this may in turn allow for future pruning. Again, since the D-Wave system does not come with a proof of optimality, this algorithm may not return a global minimum.

Another way to incorporate the D-Wave system is to use quantum annealing to provide tight lower-bound functions at any node in the search tree. Lagrangian Relaxation for Finding Lower bounds discussed below is one approach.

During the branch-and-bound process, you can also take advantage of cutset decompositions. As the search moves between nodes in the search tree, the graph for the subproblem that must be solved changes. In particular, the vertices in the graph that correspond to the fixed variables at the search tree node are removed along with all edges connected to these vertices. The graph may disconnect at some nodes in the search tree, in which case you solve multiple smaller subproblems. These subproblems and their minimizers may be cached so that, when these same subproblems are generated at other nodes, the results can be accessed. See [Mar2007], which considers other ways to explore the search tree, including dynamic variable orderings and best-first orderings. All of these improvements may also be applied in this setting. See also [Ros2016] on branch-and-bound heuristics in the context of the D-Wave Chimera architecture.

Lagrangian Relaxation for Finding Lower bounds

Lagrangian relaxation relies on a different simplification of the graph from that shown in Figure 46 and yields a lower bound on the global minimum objective value. In many cases, the bound can be made quite tight. Consider a node in the graph representing a variable \(s_i\). Divide the node in two and define \(s_i^{(1)}\) and \(s_i^{(2)}\) and add the constraint that \(s_i^{(1)} = s_i^{(2)}\). This leaves the problem unchanged. The original objective \(E(s_i,\vc{s}_{\setminus i})\) is modified to \(E'(s_i^{(1)}, s_i^{(2)},\vc{s}_{\setminus i})\). If you ignore the equality constraints among divided variables, then with sufficiently many divided variables, the modified objective \(E'\) decomposes into smaller independent problems. To ensure that these two objectives give the same value when the \(s_i^{(1)} = s_i^{(2)}\) constraint is satisfied, \(h_{i} = h_{i}^{(1)} + h_{i}^{(2)}\) is required. The advantage of dividing variable \(s_i\) comes when the equality constraint is softened and treated approximately.

Introducing a multiplier \(\lambda_i\) for the equality constraint, the Lagrangian for the constrained problem is

\[L(s_i^{(1)},s_i^{(2)},\vc{s}_{\setminus i}; \lambda_i) = E'(s_i^{(1)},s_i^{(2)},\mathbf{s}_{\setminus i}) + \lambda_i(s_i^{(1) }- s_i^{(2)}).\]

Minimizing this Lagrangian for any fixed value of \(\lambda_i\) provides a lower bound to \(\min_{\vc{s}} E(\vc{s})\). The dual function is defined as

\[g(\lambda_i) = \min_{s_i^{(1)},s_i^{(2)},\vc{s}_{\setminus i}} L(s_i^{(1)},s_i^{(2)},\vc{s}_{\setminus i}; \lambda_i),\]

and maximizing the dual with respect to \(\lambda_i\) provides the tightest possible lower bound.

Two diagrams showing an 8 variable graph before and after splitting into two separate subproblems. Variables are numbered S 1 to S 8. The diagram on the left shows the variables  connected, before splitting. The diagram on the right shows the graph after it is split on variable S 1. The resulting two problems both include a copy of variable S 1.

Fig. 47 8-variable graph, before (left) and after (right) splitting.

As an example, consider again the small 8-variable problem, shown on the left side in Figure 47. The Lagrangian relaxed version of the problem obtained by dividing variable \(s_1\) is shown on the right. The constraint \(s_1^{(1)}=s_1^{(2)}\) is treated softly giving two independent subproblems consisting of variable sets \({s_1^{(1)}, s_2, s_3, s_7, s_8}\) and \({s_1^{(2)}, s_4, s_5, s_6}\). If either subproblem is still too large, it can be decomposed further either through another variable division or through conditioning.

One promising approach is to introduce as many divided variables as needed to generated subproblems small enough to be solved with the current number of qubits. Each subproblem is solved to give a dual function, which is then optimized using a subgradient[9] method to provide the tightest possible lower bounds.

[9][Boy2007] provides a concise introduction to subgradient methods, and [Joh2007] an alternative using a smooth approximation to dual function.

Because a particular anneal may not produce the global minimum, the lower bounds returned by Lagrangian relaxation may be too high. If so, branch-and-bound may mistakenly prune branches containing the global minimum. However, viewing this Lagrangian-relaxed branch-and-bound algorithm as a heuristic, expect good results if the quantum annealing process is effective at locating low-energy states.

Large-Neighborhood Local Search Algorithms

Local search algorithms improve upon a candidate solution, \(\vc{s}^t\), available at iteration \(t\) by searching for better solutions within some local neighborhood of \(\vc{s}^t\). Most commonly, the neighborhood for bitstrings is the Hamming neighborhood consisting of all bitstrings that differ from \(\vc{s}^t\) in a single bit. By adopting the improvement, if any, found within the neighborhood as the next configuration, the search progressively finds better and better solutions. The search terminates when it finds no better solutions in the neighborhood. Such configurations are locally optimal with respect to the neighborhood. If the search gets trapped in a local optima (where all neighboring solutions are worse), additional heuristics may be applied to enable escape and continued exploration of the search space.

Quantum annealing can be very simply combined with local search to allow the local search algorithm to explore much larger neighborhoods than the 1-bit-flip Hamming neighborhood. As the size of the neighborhood increases, the quality of the local minima improves. Consider a problem of \(N\) variables, and define the neighborhood around configuration \(\vc{s}^t\) as all states within Hamming distance \(d\) (with \(d<N\)) of \(\vc{s}^t\). Searching within this neighborhood for a configuration \(\vc{s}^{t+1}\), having energy lower than \(E(\vc{s}^t)\), is a smaller optimization problem than the original; however, it still cannot be directly carried out by quantum annealing hardware. Instead, choose one of the \(\binom{N}{d}\) subsets of \(d\) variables. Call the selected subset \(A\) and note that \(|A|=d\). Having selected \(A\), determine the best setting for these \(\vc{s}_A\) variables given the fixed context of the conditioned variables \(\vc{s}^{t}_{\setminus A}\). Select \(d\) such that the smaller \(d\)-variable problem can be embedded within the D-Wave QPU. If no improvement is found within the chosen subset, another subset of size \(d\) may be selected.

This algorithm stagnates at a configuration that is a local optimum with respect to all \(d\) variable spin flips. However, if \(d\) and \(N\) are both large, many size-\(d\) subsets around each configuration \(\vc{s}^t\) exist, so trapping in \(d\)-optimal local minima rarely occurs and progress may be slow as mainly unpromising neighborhoods are explored. It is possible to infer variable subsets that need not be considered, and heuristics may be defined that generate promising subsets[10].

[10][Liu2005] presents promising results for even small neighborhoods of size \(d\le 4\). With the D-Wave QPU, much larger neighborhoods can be considered.

Embedding

The D-Wave QPU minimizes the energy of an Ising spin configuration whose pairwise interactions lie on the edges of a Chimera graph \(M,N,L\). To solve an Ising spin problem with arbitrary pairwise interaction structure, the corresponding graph must be minor embedded into a Chimera working graph.

An Ising optimization problem is defined on an arbitrary graph containing \(N\) nodes (variables) and \(M\) edges. The working graph in the D-Wave system corresponds to a graph with fixed connectivity—at most 6 edges per node—on which you can mimic arbitrary connectivity by using qubits to represent edges as well as variables.

In Lagrangian relaxation, an optimization variable \(s_i\) is split into two variables, \(s_i^{(1)}\) and \(s_i^{(2)}\), and the constraint \(s_i^{(1)}=s_i^{(2)}\) added. A similar technique can be used to map an optimization variable \(s_i\) onto a set of one or more qubits \(\{q_i^{(1)}, \cdots, q_i^{(k)}\}\). Because all qubits (assumed to be \(\pm 1\) valued) represent the same problem variable, impose the constraint \(q_i^{(j)} = q_i^{(j')}\) for all pairs \((j,j')\) occurring as edges in a spanning tree across the qubits. The equality constraint \(q_i^{(j)} = q_i^{(j')}\) can be encoded as the Ising penalty:

\[-M q_i^{(j)} q_i^{(j')}\]

where \(M>0\) is the weight of the penalty. If \(M\) is sufficiently large, the lowest energy state in the full problem always has \(q_i^{(j)} = q_i^{(j')}\) because that feasible assignment is \(2M\) lower in energy than the infeasible assignment \(q_i^{(j)} \not= q_i^{(j')}\).

In this way, strings of qubits are related to each other to create chains that can connect arbitrary vertices. To create these chain-like connectors, weigh the penalties large enough so that low-energy configurations do not violate the equality constraints. Balancing this, use the smallest possible penalty weight that enforces the constraints to prevent precision issues, and to foster efficient exploration of the search space. An iterative procedure, which incrementally updates weights until the equality constraints are satisfied, is effective.

Diagram of the Chimera graph showing how qubits are chained together to form a single logical qubit.

Fig. 48 A qubit chain used for connectivity.

For example, if qubits \(q_4\) and \(q_{32}\) represent problem variables \(s_1\) and \(s_2\) respectively and an \((s_1,s_2)\) edge in the problem graph exists, then this coupling can be realized by representing problem variable \(s_1\) with the string of qubits \(\{q_4,q_{12},q_8\}\), as shown in Figure 48. Edge weights of strength \(-M\) couple qubits along the bold black edges. The string of qubits representing \(s_1\) is now connected to \(s_2\) through the blue edge. Note also that mapping a single optimization variable to a connected set of qubits in the working graph mimics higher degrees of connectivity than are present in the graph. For example, variable \(s_1\) represented by qubits \(\{q_4,q_{12},q_8\}\) can be connected to up to 12 other qubits and thus up to potentially 12 other optimization variables (if each qubit represents a distinct optimization variable). Connectivity is effectively doubled in this case.

Formalizing Minor Embeddings

This section considers the general problem of solving an arbitrary problem on the working graph in light of the variable-to-qubit mapping observation above.

A minor of a graph is formed from a sequence of edge contractions on the graph. An edge contraction is the removal of an edge \((v_1,v_2)\) and fusion of the vertices \(v_1\) and \(v_2\), combining them into one. If the sets of neighboring vertices of \(v_1\) and \(v_2\) in the original graph are \(\mathcal{N}_1\) and \(\mathcal{N}_2\) then the neighbors of the fused node are \((\mathcal{N}_1\cup \mathcal{N}_2)\setminus \{v_1,v_2\}\). Given this notation, if \(G\) is a general graph representing an Ising problem, it can be minor embedded into Chimera if it is isomorphic to a graph minor of the Chimera working graph.

To more formally specify the embedding problem, let \(G=(V,E)\) be a graph for a problem. Fundamentally, there are two requirements on embedding:

  • A given vertex \(v \in V\) must be mapped to a connected set of qubits \(q\in\text{Chimera}\).
  • All edges \((v_1,v_2)\in E\) must be mapped to at least one edge in the working graph.

To satisfy the first constraint, posit an unknown table, \(\text{EmbedDist}(v,q,d)\), that maps vertex \(v\) to qubit \(q\) and is distance \(d\) from a reference qubit also mapped from \(v\). Additionally, define \(\text{Embed}(v,q) \leftrightarrow \exists d \, \text{EmbedDist}(v,q,d)\). The constraints[11] on these tables are then:

[11]This constraint specification is executable, and solvers are available that can solve for the unknown tables \(\text{EmbedDist}(v,q,d)\) and \(\text{Embed}(v,q)\) from the constraint specification.
  1. \(\forall v,q,d \, \text{EmbedDist}(v,q,d) \; \rightarrow \; \bigl(d=0\bigr) \vee \bigl(\exists v_1,q_1 \, \text{Chimera}(q,q_1) \wedge \text{EmbedDist}(v_1,q_1,d-1)\bigr)\).
  2. \(\forall v_1,v_2 \, E(v_1,v_2) \; \rightarrow \; \exists q_1,q_2 \, \text{Chimera}(q_1,q_2) \wedge E(v_1,q_1) \wedge \text{Embed}(v_2,q_2)\).

The predicate \(\text{Chimera}(q_1,q_2)\) is true if and only if there is an edge between qubits \(q_1\) and \(q_2\). Similarly, \(E(v_1,v_2)\) is true if and only if there is an edge in the graph between problem variables \(v_1\) and \(v_2\). In addition to these two constraints, there are the following bookkeeping constraints:

  1. Every \(v\) is mapped: \(\forall v \exists q \, \text{Embed}(v,q)\).
  2. At most one \(v\) is mapped to a given \(q\): \(\forall v,q \, \text{Embed}(v,q) \rightarrow \neg \exists v_1 \, (v_1\not= v) \wedge \text{Embed}(v_1,q)\). The second bookkeeping constraint can also be specified as \(\forall q \, \text{COUNT}\bigl(v,1,\text{Embed}(v,q)\bigr)\le 1\). [12]
  3. There is 1 root for each \(v\): \(\forall v \, \text{COUNT}(q,1,\text{EmbedDist}(v,q,0))=1\).
[12]The aggregate operator \(\text{COUNT}(v,1,\phi(v))\) counts the values in the domain of \(v\) for which predicate \(\phi(v)\) is true.

This specification does not minimize the total number of qubits used. However, when quantifying over \(d\) (the size of the connected graphs representing problem variables), specify a maximal value \(D\) so that \(0 \le d \le D\). In this way, the size of the embeddings may be somewhat controlled. For graphs specified through \(E(v_1,v_2)\), however, the problem may be infeasible if \(D\) is too small.

Note

These techniques help solve arbitrary Ising problems by transforming the problem to an alternative form that preserves the low-energy states. In particular, qubits are introduced and constraints added that force sets of qubits to take identical values. These transformations may affect the efficacy of quantum annealing.

Embedding Complete Graphs

Because a complete graph on \(V\) vertices (all vertices connected to all other vertices) can be used to represent any Ising problem defined on \(V\) variables, consider what is the largest complete graph \(K_V\) that is a minor of a \(M\times N\times L\) graph. Knowing this complete minor allows for immediate embedding of all problems of \(V\) or fewer variables.

The largest complete minor has \(V=1+L\min(M,N)\) vertices.[13]

A graph minor for a small \(2\times2\times4\) graph is shown in Figure 49.

A similar result for a \(4\times 4 \times L\) graph is in Figure 50. If Figure 50 were expanded, the top left corner comprising the A and B blocks alone would appear as in Figure 49, for \(L=4\).

Diagram of the Chimera graph showing embedding.

Fig. 49 Embedding of \(K_9\) into \(2\times 2\times 4\); A and B indicate two different blocks of 4 variables. Equivalently labeled qubits are connected by edge penalties.

Diagram of the Chimera graph showing embedding. Condensed representation of the complete graph. Four blocks labeled A, B, C, and D are repeated in four columns. Thick black edges ensure that consistent qubit values are obtained in similarly labeled blocks. Each block is connected to all other blocks, and to a common qubit not drawn in the upper triangular part of the lattice.

Fig. 50 Condensed representation of the complete graph of size \(K_{4L+1}\) on \(4\times 4\times L\). Each block A,B,C,D consists of \(L\) qubits, and the thick black edges ensure that consistent qubit values are obtained in similarly labeled blocks. Each block is connected to all other blocks, and to a common qubit not drawn in the upper triangular part of the lattice.

Connectivity is addressed at the cost of larger lattices. Note also that specific connectivity requirements may have problem-specific embeddings that make more effective use of qubits than using Chimera as the complete graph on \(V\) vertices.

[13]That this is the largest complete minor follows immediately from the tree-width of Chimera being \(L\min(M,N)\).

Finding Better Embeddings

In some cases, the exact form of a problem is malleable. One important example arises in machine learning problems. The true objective in predictive modeling is to minimize the error on future observations so that they are accurately predicted. However, without knowing what points will be observed in the future, there is little choice but to minimize the errors on a training set of observed examples. Minimizing training set error runs the risk of over-fitting and additional terms favoring “simpler” models over complex ones are typically added to the objective. Thus, there is latitude in the precise objective. In such cases, the embedding method can be simplified.

Let \(G=(V,E)\) represent the Chimera graph. Embed an Ising objective of \(N\) variables into Chimera (with \(|V|\ge N\)) with a mapping \(\varphi:\{1,\ldots,N\}\mapsto V\), such that \(J_{i,j}\neq 0\Rightarrow (\varphi(i),\varphi(j))\in E\). In other words, each nonzero \(J_{i,j}\) must be assigned to an edge. The simplifying assumption requires that each node in the original graph be mapped to a single node in Chimera.

The mapping \(\varphi\) is encoded with a set of binary variables \(M_{i,q}\), which indicate that problem variable \(i\) is mapped to Chimera node \(q\). A valid mapping requires \(\sum_q m_{i,q} = 1\) for all problem variables \(i\), and \(\sum_i m_{i,q}\le 1\) for all Chimera nodes \(q\). Because the original problem can be altered, fit the problem into Chimera by maximizing the total magnitude of \(J_{i,j}\) mapped to Chimera edges; that is:

\[\vc{M}^\star = \argmax_{\vc{M}} \sum_{i>j} \sum_{(q,q')\in E}|J_{i,j}| m_{i,q} m_{j,q'}\]

where \(\vc{M}\) is subject to the mapping constraints. This problem is a variant of the NP-hard quadratic assignment problem. In the interest of linear time embedding, apply a greedy heuristic to approximately maximize the objective.

As a starting point, let \(i_1=\argmax_i \sum_{j<i} |J_{j,i}| + \sum_{j>i} |J_{i,j}|\). So \(i_1\) is the row/column index of \(J\) with the highest sum of magnitudes (\(\vc{J}\) is assumed to be upper-triangular). Map \(i_1\) to one of the Chimera vertices of highest degree. For the remaining variables, assume \(\varphi\) is defined on \(\{i_1,\ldots,i_k\}\) such that \(\varphi(i_j)=q_j\) where \(q_j\) is some Chimera qubit. Then assign \(\varphi(i_{k+1})=q_{k+1}\), where \(i_{k+1}\notin\{i_1,\ldots,i_k\}\) and \(q_{k+1}\notin \{q_1,\ldots,q_k\}\) to maximize the sum of all \(|J_{i_{k+1},i_j}|\) and \(|J_{i_j,i_{k+1}}|\) over all \(j\in\{1,\ldots,k\}\) for which \((q_j,q_{k+1})\) is a Chimera edge. This fast greedy heuristic performs well.

Solving a Problem on the QPU

Spin-Reversal Transform

Applying a spin-reversal transform can improve results by reducing the impact of analog errors that may exist on the QPU. A spin-reversal transform does not alter the Ising problem, in the sense that there is a direct one-to-one mapping between all solutions to the original problem to the altered problem that preserves their energies—solutions of the original problem and of the transformed problem have identical energies. Rather, the transform reverses the meanings of a collection of individual spins \(s_p\) as follows:

\begin{equation} \begin{split} s_p & \longrightarrow s'_p = - s_p\\ h_p & \longrightarrow h'_p = - h_p\\ J_{i,j} & \longrightarrow J'_{i,j} = - J'_{i,j}\\ & \text{for\ either}\ i=p\ \text{or}\ j=p. \end{split} \end{equation}

This mapping ensures that the classical Ising spin energy is unchanged; the transform simply amounts to reinterpreting spin up as spin down, and visa-versa, for a particular spin. Any number of spins may be transformed in this way, but the Ising problem remains the same and therefore we expect the same answers. A spin-reversal transform is defined according to a subset of spins for which this transform is applied.

Some sources of ICE depend on the magnitude of the (\(h\), \(J\)) problem parameters. Random spin transformations can be used to avoid systematic errors arising from multiple requests of the same problem, by “breaking up” these dependencies.

From a single problem defined by a set of parameters \(\{ h_i,J_{i,j} \}\), a sequence of spin-reversal transforms \(T', T'' \ldots\), each applied to a random subset of spins, defines a sequence of equivalent problems (\(h',J'\)), (\(h'',J''\)).... All have the same energy spectrum, but their implementation on the QPU may invoke different ICE effects (some better and some worse than the original).

When studying the behavior of the QPU in solving a problem, include in the analysis a number of spin-reversal transforms of the problem. The transform samples randomly over sets of possible systematic errors. The correlations between these error sets, at least as they relate to certain error sources, can be reduced with the following considerations:

  • Changing too few spins leaves most errors unchanged, and therefore has little effect.
  • Changing too many spins means that most couplers connect spins that are both transformed, thus \(J_{i,j}\) does not change sign. As a result, some systematic errors associated with the couplers are unaffected.

To sample over a wide range of systematic errors, consider the number of couplers and spins involved in a particular transform.

Note

Spin-reversal transforms do not necessarily remove errors due to ICE; rather, they randomize the effects of certain errors that depend on (\(h\), \(J\)). This creates a spectrum of error magnitudes from which the best results can be selected. Some types of systematic error—ICE 1, as defined in Technical Description of the D-Wave Quantum Processing Unit for example—are not affected by spin-reversal transforms.

Specify the number of spin-reversal transforms using the num_spin_reversal_transforms parameter when submitting a problem.

Using Anneal Offset

To run a problem with anneal offsets:

  1. Permitted parameters vary by QPU, so first retrieve the solver properties for your system.

    If anneal offsets are supported, the returned values include a property that shows the positive-negative range of permitted offset values, in normalized offset units, for each qubit in the QPU. (Both values are 0 for any inoperable qubits.) Negative values force the qubit to anneal later than the standard trajectory; positive values, earlier.

    In the following truncated list, for instance, we see that qubit 0 can be annealed a maximum of 0.2255351369888578 normalized offset units later than the standard trajectory, or a maximum of 0.050118919330857284 normalized offset units earlier.

    [[-0.2255351369888578, 0.050118919330857284],
    [-0.2258618355939089, 0.12045964565008474], ...
    
  2. Embed the problem on the graph. For best results, embed the problem multiple times.

  3. Prepare an array of offset values based on the information retrieved from the system, the hardware adjacencies of the embedded problem, and any known dynamics of the problem.

  4. Supply the array of offsets for the qubits in the system using the anneal_offsets parameter with a length equal to the num_qubits property. The appropriate format depends on the client; see the appropriate developer guide for your client for more information.

  5. Submit the problem.

  6. Compare results of the same problem run with and without offset annealing paths, and the same problem run with different offset values. Adjust the offset values accordingly.