Operation and Timing¶

This section describes the computation process of D-Wave quantum computers, focusing on system timing. It explains the overall time that is allocated to a quantum machine instruction (QMI), describes how use of the QPU is timed within that period, gives context for how timing can vary. It also describes the timing-related fields in the Solver API (SAPI).

It explains how your QPU usage time is charged to your account.

This section also describes the programming cycle and the anneal-read cycle of the D-Wave QPU.

Overview of QMI Timing¶

Fig. 99 shows a simplified diagram of the sequence of steps, the dark red set of arrows, to execute a quantum machine instruction (QMI) on a D-Wave system, starting and ending on a user’s client system. Each QMI consists of a single input together with parameters. A QMI is sent across a network to the SAPI server and joins a queue. Each queued QMI is assigned to one of possibly multiple workers, which may run in parallel. A worker prepares the QMI for the quantum processing unit (QPU) and optionally for postprocessing[1], sends the QMI to the QPU queue, receives samples (results) and optionally post-processes them (overlapping in time with QPU execution), and bundles the samples with additional QMI-execution information for return to the client system.

 [1] Postprocessing for D-Wave 2000Q and earlier systems includes optimization and sampling algorithms; on Advantage systems, postprocessing is limited to computing the energies of returned samples. Ocean software provides postprocessing tools, you can use these for Advantage systems.

Fig. 99 Overview of execution of a single QMI, starting from a client system, and distinguishing classical (client, CPU) and quantum (QPU) execution.

The total time for a QMI to pass through the D-Wave system is the service time. The execution time for a QMI as observed by a client includes service time and internet latency. The QPU executes one QMI at a time, during which the QPU is unavailable to any other QMI. This execution time is known as the QMI’s QPU access time.

Breakdown of Service Time¶

The service time can be broken into:

• Any time required by the worker before and after QPU access
• Wait time in queues before and after QPU access
• QPU access time
• Postprocessing (PP) time

Service time is defined as the difference between time-in and time-out for each QMI, as shown in the table.

Keyword in SAPI Meaning
time_solved When bundled samples are available

Service time for a single QMI depends on the system load; that is, how many other QMIs are present at a given time. During periods of heavy load, wait time in the two queues may contribute to increased service times. D-Wave has no control over system load under normal operating conditions. Therefore, it is not possible to guarantee that service time targets can be met. Service time measurements described in other D-Wave documents are intended only to give a rough idea of the range of experience that might be found under varying conditions.

Postprocessing Time¶

Postprocessing optimization and sampling algorithms provide local improvements with minimal overhead to solutions obtained from the quantum processing unit (QPU).

Ocean software provides postprocessing tools, and you can optionally run postprocessing online on D-Wave 2000Q and earlier systems.

As shown in Fig. 100, online postprocessing (red) works in parallel with sampling (blue), so that the computation times overlap except for postprocessing the last batch of samples. In this diagram, the time consumed by gathering small batches of samples are marked by vertical blue lines. Within execution of a QMI, gathering the current set of samples takes place concurrently with postprocessing for the previous set of samples (red boxes), which is applied to batches of samples as they are returned by the QPU. As illustrated by Fig. 100, only the time for postprocessing the last set of samples (the rightmost red box) is not overlapped with sampling.

Fig. 100 Relationship of QPU time to postprocessing time, illustrated by one QMI in a sequence (previous, current, next).

Postprocessing overhead is designed not to impose any delay to QPU access for the next QMI, because postprocessing of the last batch takes place concurrently with the next QMI’s programming time.

The system returns two associated timing values, as shown in the table below. Referring to Fig. 100, total_post_processing_time is the sum of all times in the red boxes, while post_processing_overhead is the extra time needed (a single red box) to process the last batch. This latter time together with qpu_access_time contributes to overall service time.

Note

Even if no postprocessing is run on a QMI, the returned post_processing_overhead value is non-zero. This is because computing the final energies occurs after samples are returned and is accounted as postprocessing overhead.

Keyword in SAPI Meaning
total_post_processing_time Total time for postprocessing

For more details about postprocessing and how it is handled in the timing structure, see the Postprocessing section.

“Total Time” Reported in Statistics (for Administrators)¶

One timing parameter, qpu_access_time, provides the raw data for the “Total Time” values reported as system statistics, available to administrators. Reported statistics are the sum of the qpu_access_time values for each QMI selected by the users, solvers, and time periods selected in the filter.

Note

Reported statistics are in milliseconds, while SAPI inputs and ouputs are in microseconds. One millisecond is 1000 microseconds.

Breakdown of QPU Access Time¶

As illustrated in Figure 101, the time to execute a single QMI on a QPU, QPU access time, is broken into two parts: a one-time initialization step to program the QPU (blue) and typically multiple sampling times for the actual execution on the QPU (repeated multicolor).

Fig. 101 Detail of QPU access time.

The QPU access time also includes some overhead:

$T = T_p + \Delta + T_s,$

where $T_P$ is the programming time, $T_s$ is the sampling time, and $\Delta$ (reported as qpu_access_overhead_time by SAPI) is an initialization time spent in low-level operations, roughly 1-2 ms for D-Wave 2000Q systems and 10-20 ms for Advantage systems.

The time for a single sample is further broken into anneal (the anneal proper; green), readout (read the sample from the QPU; red), and thermalization (wait for the QPU to regain its initial temperature; pink). Possible rounding errors mean that the sum of these times may not match the total sampling time reported.

$T_s / R \approx T_a + T_r + T_d,$

where $R$ is the number of reads, $T_a$ the single-sample annealing time, $T_r$ the single-sample readout time, and $T_d$ the single-sample delay time, which consists of the following optional components[2]:

$\begin{split}T_d = &readout\_thermalization \\ &+ reduce\_intersample\_correlation \\ &+ reinitialize\_state.\end{split}$
 [2] See descriptions of these components under Solver Parameters. The reinitialize_state parameter is used only for reverse annealing.

Programming Cycle¶

When an Ising problem is provided as a set of h and J values,[3] the D-Wave system conveys those values to the DACs located on the QPU. Room-temperature electronics generate the raw signals that are sent via wires into the refrigerator to program the DACs. The DACs then apply static magnetic-control signals locally to the qubits and couplers. This is the programming cycle of the QPU.[4] After the programming cycle, the QPU is allowed to cool for a postprogramming thermalization time of, typically, 1 ms; see the Temperature section for more details about this cooling time.

 [3] Several other instructions to the system are provided by the user: for example, an annealing_time over which the quantum annealing process is to occur. See Solver Properties and Parameters Reference for details.
 [4] In some descriptions, the programming cycle is subdivided into a reset step that erases previous data stored in the DACs, followed by a programming step.

The total time spent programming the QPU, including the postprogramming thermalization time, is reported back as qpu_programming_time.

After the programming cycle, the system switches to the annealing phase during which the QPU is repeatedly annealed and read out. Annealing is performed using the analog lines over a time specified by the user as annealing_time and reported by the QPU as qpu_anneal_time_per_sample. Afterward, the digital readout system of the QPU reads and returns the spin states of the qubits. The system is then allowed to cool for a time returned by the QPU as qpu_delay_time_per_sample—an interval comprising a constant value plus any additional time optionally specified by the user via the readout_thermalization parameter.

The anneal-read cycle is also referred to as a sample. The cycle repeats for some number of samples specified by the user in the num_reads parameter, and returns one solution per sample. The total time to complete the requested number of samples is returned by the QPU as qpu_sampling_time.

Sources of Timing Variation and Error¶

Running a D-Wave-using program across the internet or even examining QPU timing information may show variation from run to run from the end-user’s point of view. This section describes some of the possible sources of such variation.

Nondedicated QPU Use¶

D-Wave systems are typically shared among multiple users, each of whom submits QMIs to solve a problem, with little to no synchronization among users. (A single user may also have multiple client programs submitting unsynchronized QMIs to a D-Wave system.) The QPU must be used by a single QMI at a time, so the D-Wave system software ensures that multiple QMIs flow through the system and use the QPU sequentially. In general, this means that a QMI may get queued for the QPU or some other resource, injecting indeterminacy into the timing of execution.

Note

Contact your D-Wave system administrator or D-Wave Support if you need to ensure a quiet system.

Nondeterminacy of Classical System Timings¶

Even when a system is quiet except for the program to be measured, timings often vary. As illustrated in Fig. 102, running a given code block repeatedly can yield different runtimes on a classical system, even though the instruction execution sequence does not change. Runtime distributions with occasional large outliers, as seen here, are not unusual.

Fig. 102 Histogram of 100 measurements of classical execution time using a wall clock timer, showing that the mean time of 336.5 ms (red line) is higher than 75 percent of the measurements.

Timing variations are routine, caused by noise from the operating system (e.g., scheduling, memory management, and power management) and the runtime environment (e.g., garbage collection, just-in-time compilation, and thread migration). [5] In addition, the internal architecture of the classical portion of the D-Wave system includes multiple hardware nodes and software servers, introducing communication among these servers as another source of variation.

For these reasons, mean reported runtimes can often be higher than median runtimes: for example, in Fig. 102, the mean time of 336.5 ms (vertical red line) is higher than 75 percent of the measured runtimes due to a few extreme outliers (one about 3 times higher and two almost 2 times higher than median). As a result, mean runtimes tend to exceed median runtimes. In this context, the smallest time recorded for a single process is considered the most accurate, because noise from outside sources can only increase elapsed time.[6] Because system activity increases with the number of active QMIs, the most accurate times for a single process are obtained by measuring on an otherwise quiet system.

Note

The 336 ms mean time shown for this particular QMI is not intended to be representative of QMI execution times.

The cost of reading a system timer may impose additional measurement errors, since querying the system clock can take microseconds. To reduce the impact of timing code itself, a given code block may be measured outside a loop that executes it many times, with running time calculated as the average time per iteration. Because of system and runtime noise and timer latency, component times measured one way may not add up to total times measured another way.[7] These sources of timer variation or error are present on all computer systems, including the classical portion of D-Wave platforms. Normal timer variation as described here may occasionally yield atypical and imprecise results; also, one expects wall clock times to vary with the particular system configuration and with system load.

 [5] A more common practice in computational research is to report an alternative measurement called CPU time, which is intended to filter out operating system noise. However, CPU timers are only accurate to tens of milliseconds, and CPU times are not available for QPU time measurements. For consistency, we use wall clock times throughout.
 [6] Randal E. Bryant and David R. O’Hallaron, Computer Systems: A Programmer’s Perspective (2nd Edition), Pearson, 2010.
 [7] Paulo Eduardo Nogueira, Rivalino Matias, Jr., and Elder Vicente, An Experimental Study on Execution Time Variation in Computer Experiments, ACM Symposium on Applied Computing, 2014.

Internet Latency¶

If you are running your program on a client system geographically remote from the D-Wave system on which you’re executing, you will likely encounter latency and variability from the internet connection itself (see Fig. 99).

Settings of User-Specified Parameters¶

The following user-specified parameters can cause timing to change, but should not affect the variability of timing. For more information on these parameters, see Solver Properties and Parameters Reference.

• anneal_schedule—User-provided anneal schedule. Specifies the points at which to change the default schedule. Each point is a pair of values representing time $t$ in microseconds and normalized anneal fraction $s$. The system connects the points with piecewise-linear ramps to construct the new schedule. If anneal_schedule is specified, $T_a$, qpu_anneal_time_per_sample is populated with the total time specified by the piecewise-linear schedule.
• annealing_time—Duration, in microseconds, of quantum annealing time. This value populates $T_a$, qpu_anneal_time_per_sample.
• num_spin_reversal_transforms—For QMIs with more than one spin-reversal transform, SAPI handles the timing information for all the subQMIs that it sends to the solver as follows: (1) It sums each timing field that does not end with “per_sample.” (2) For others, it sends the value from the first subQMI. For example, the values for qpu_access_time are summed; those from qpu_delay_time_per_sample are not.
• postprocess—Specifies the type of (classical) postprocessing to be performed on the raw samples from a D-Wave 2000Q QPU. Requesting no postprocessing consumes the least time; either sampling or optimization postprocessing consumes more.
• programming_thermalization—Number of microseconds to wait after programming the QPU to allow it to cool; i.e., post-programming thermalization time. Values lower than the default accelerate solving at the expense of solution quality. This value contributes to the total $T_p$, qpu_programming_time.
• readout_thermalization—Number of microseconds to wait after each sample is read from the QPU to allow it to cool to base temperature; i.e., post-readout thermalization time. This optional value contributes to $T_d$, qpu_delay_time_per_sample.
• reduce_intersample_correlation—Used to reduce sample-to-sample correlations. When true, adds to $T_d$, qpu_delay_time_per_sample. Amount of time added increases linearly with increasing length of the anneal schedule.
• reinitialize_state—Used in reverse annealing. When True (the default setting), reinitializes the initial qubit state for every anneal-readout cycle, adding between 100 and 600 microseconds to $T_d$, qpu_delay_time_per_sample. When False, adds approximately 10 microseconds to $T_d$.[8]

Note

Depending on the parameters chosen for a QMI, QPU access time may be a large or small fraction of service time. E.g., a QMI requesting a single sample with short annealing_time would see programming time as a large fraction of service time and QPU access time as a small fraction.

 [8] Amount of time varies by system.

The D-Wave system limits your ability to submit a long-running QMI to prevent you from inadvertently monopolizing QPU time. This limit varies by system; check the problem_run_duration_range property for your solver.

The limit is calculated according to the following formula:

$$Duration = ((P_1 + P_2) * P_3) + P_4$$

where $P_1$, $P_2$, $P_3$, and $P_4$ are the values specified for the annealing_time, readout_thermalization, num_reads (samples), and programming_thermalization parameters, respectively.

If you attempt to submit a QMI whose execution time would exceed the limit for your system, an error is returned showing the values in microseconds. For example:

ERROR: Upper limit on user-specified timing related parameters exceeded: 12010000 > 3000000


Note that it is possible to specify values that fall within the permitted ranges for each individual parameter, yet together cause the time to execute the QMI to surpass the limit.

How Solver Usage is Charged¶

D-Wave charges you for time that solvers run your problems, with rates depending on QPU usage. You can see the rate at which your account’s quota is consumed for a particular solver in the solver’s quota_conversion_rate property.

You can see the time you are charged for in the responses returned for your submitted problems. The relevant field in the response is 'qpu_access_time'. The example in the QPU Timing Information from SAPI section shows 'qpu_access_time': 9687 in the returned response, meaning almost 10 milliseconds are being charged.

For example, for a QPU solver with a quota conversion rate of 1, a problem that results in a 'qpu_access_time': 1500, deducts 1.5 milliseconds seconds from your account’s quota.

QPU Timing Information from SAPI¶

The table below lists the timing-related fields available in D-Wave’s Ocean SDK. Ocean users access to this information is from the info field in the dimod sampleset class, as in the example below. Note that the time is given in microseconds with a resolution of 0.01 $\mu s$.

>>> from dwave.system import DWaveSampler, EmbeddingComposite
>>> sampler = EmbeddingComposite(DWaveSampler())
>>> sampleset = sampler.sample_ising({'a': 1}, {('a', 'b'): 1})
>>> print(sampleset.info["timing"])
{'qpu_sampling_time': 80.78,
'qpu_anneal_time_per_sample': 20.0,
'qpu_access_time': 16016.18,
'qpu_programming_time': 15935.4,
'qpu_delay_time_per_sample': 21.02,
'total_post_processing_time': 809.0,

Table 31 Fields that affect qpu_access_time
QMI Time Component SAPI Field Name Meaning Affected by
$T$ qpu_access_time Total time in QPU All parameters listed below
$T_p$ qpu_programming_time Total time to program the QPU[9] programming_thermalization, weakly affected by other problem settings (such as $h$, $J$, anneal_offsets, flux_offsets, and h_gain_schedule)
$\Delta$ Time for additional low-level operations
$R$ Number of reads (samples) num_reads
$T_s$ qpu_sampling_time Total time for $R$ samples num_reads, $T_a$, $T_r$, $T_d$
$T_a$ qpu_anneal_time_per_sample Time for one anneal anneal_schedule, annealing_time
$T_r$ qpu_readout_time_per_sample Time for one read Number of qubits read[10]
$T_d$ qpu_delay_time_per_sample Delay between anneals[11] anneal_schedule, readout_thermalization, reduce_intersample_correlation, (only in case of reverse annealing), reinitialize_state
total_post_processing_time Total time for postprocessing Programming time
post_processing_overhead_time Extra time needed to process the last batch total_post_processing_time
 [9] Even if programming_thermalization is 0, $T_p$ is typically between 4 and $40\ \mu s$ depending on processor and describes the time spent setting the $h$ and $J$ parameters of the problem as well as other features such as anneal_offsets, flux_offsets, h_gain_schedule.
 [10] The time to read a sample set from a Advantage generation QPU depends on the location of the qubits on the processor and the number of qubits in the sample set: a problem represented by a dozen qubits has shorter read times (and so a shorter $T_r$, the total_readout_time) than a problem represented by several thousand qubits. For the Advantage QPU, this can be significant. For example, some small problems may take $25\ \mu s$ per read while a large problem might take $150\ \mu s$ per read.
 [11] The time returned in the qpu_delay_time_per_sample field is equal to a constant plus the user-specified value, readout_thermalization.

Timing Data Returned by dwave-cloud-client¶

Below is a sample skeleton of Python code for accessing timing data returned by dwave-cloud-client. Timing values are returned in the computation object and the timing object; further code could query those objects in more detail. The timing object referenced on line 16 is a Python dictionary containing (key, value) pairs. The keys match keywords discussed in this section.

01 import random
02 import datetime as dt
03 from dwave.cloud import Client

04 # Connect using the default or environment connection information
05 with Client.from_config() as client:

06     # Load the default solver
07     solver = client.get_solver()

08     # Build a random Ising model to exactly fit the graph the solver supports
09     linear = {index: random.choice([-1, 1]) for index in solver.nodes}
10     quad = {key: random.choice([-1, 1]) for key in solver.undirected_edges}

11     # Send the problem for sampling, include solver-specific parameter 'num_reads'
13     computation.wait()

14     # Print the first sample out of a hundred
15     print(computation.samples[0])
16     timing = computation['timing']

17     # Service time
18     time_format = "%Y-%m-%d %H:%M:%S.%f"