Sources of Timing Variation and Error

Running a D-Wave-using program across the internet or even examining QPU timing information may show variation from run to run from the end-user’s point of view. This section describes some of the possible sources of such variation.

Nondedicated QPU Use

D-Wave systems are typically shared among multiple users, each of whom submits QMIs to solve a problem, with little to no synchronization among users. (A single user may also have multiple client programs submitting unsynchronized QMIs to a D-Wave system.) The QPU must be used by a single QMI at a time, so the D-Wave system software ensures that multiple QMIs flow through the system and use the QPU sequentially. In general, this means that a QMI may get queued for the QPU or some other resource, injecting indeterminacy into the timing of execution.

Note

Contact your D-Wave system administrator or D-Wave Support if you need to ensure a quiet system.

Nondeterminacy of Classical System Timings

Even when a system is quiet except for the program to be measured, timings often vary. As illustrated in Fig. 54, running a given code block repeatedly can yield different runtimes on a classical system, even though the instruction execution sequence does not change. Runtime distributions with occasional large outliers, as seen here, are not unusual.

Histogram showing the results of 100 measurements of service time plotted against the frequency of their occurrence. Along the horizontal axis is service time in microseconds from 300,000 to 900000, marked in intervals of 100,000. Along the vertical axis is frequency from 0 to 30, marked in intervals of 5. The histogram is annotated with a line showing the mean runtime of 336.5 ms, which is higher than 75 percent of the results.

Fig. 54 Histogram of 100 measurements of classical execution time using a wall clock timer, showing that the mean time of 336.5 ms (red line) is higher than 75 percent of the measurements.

Timing variations are routine, caused by noise from the operating system (e.g., scheduling, memory management, and power management) and the runtime environment (e.g., garbage collection, just-in-time compilation, and thread migration). [1] In addition, the internal architecture of the classical portion of the D-Wave system includes multiple hardware nodes and software servers, introducing communication among these servers as another source of variation.

For these reasons, mean reported runtimes can often be higher than median runtimes: for example, in Fig. 54, the mean time of 336.5 ms (vertical red line) is higher than 75 percent of the measured runtimes due to a few extreme outliers (one about 3 times higher and two almost 2 times higher than median). As a result, mean runtimes tend to exceed median runtimes. In this context, the smallest time recorded for a single process is considered the most accurate, because noise from outside sources can only increase elapsed time.[2] Because system activity increases with the number of active QMIs, the most accurate times for a single process are obtained by measuring on an otherwise quiet system.

Note

The 336 ms mean time shown for this particular QMI is not intended to be representative of QMI execution times.

The cost of reading a system timer may impose additional measurement errors, since querying the system clock can take microseconds. To reduce the impact of timing code itself, a given code block may be measured outside a loop that executes it many times, with running time calculated as the average time per iteration. Because of system and runtime noise and timer latency, component times measured one way may not add up to total times measured another way.[3] These sources of timer variation or error are present on all computer systems, including the classical portion of D-Wave platforms. Normal timer variation as described here may occasionally yield atypical and imprecise results; also, one expects wall clock times to vary with the particular system configuration and with system load.

[1]A more common practice in computational research is to report an alternative measurement called CPU time, which is intended to filter out operating system noise. However, CPU timers are only accurate to tens of milliseconds, and CPU times are not available for QPU time measurements. For consistency, we use wall clock times throughout.
[2]Randal E. Bryant and David R. O’Hallaron, Computer Systems: A Programmer’s Perspective (2nd Edition), Pearson, 2010.
[3]Paulo Eduardo Nogueira, Rivalino Matias, Jr., and Elder Vicente, An Experimental Study on Execution Time Variation in Computer Experiments, ACM Symposium on Applied Computing, 2014.

Internet Latency

If you are running your program on a client system geographically remote from the D-Wave system on which you’re executing, you will likely encounter latency and variability from the internet connection itself (see Fig. 52).

Settings of User-Specified Parameters

The following user-specified parameters can cause timing to change, but should not affect the variability of timing. For more information on these parameters, see Solver Properties and Parameters Reference.

  • anneal_schedule—User-provided anneal schedule. Specifies the points at which to change the default schedule. Each point is a pair of values representing time \(t\) in microseconds and normalized anneal fraction \(s\). The system connects the points with piecewise-linear ramps to construct the new schedule. If anneal_schedule is specified, \(T_a\), qpu_anneal_time_per_sample is populated with the total time specified by the piecewise-linear schedule. Cannot be used with the annealing_time parameter.
  • annealing_time—Duration, in microseconds, of quantum annealing time. This value populates \(T_a\), qpu_anneal_time_per_sample. Cannot be used with the anneal_schedule parameter.
  • num_reads—Number of samples to read from the solver per QMI.
  • num_spin_reversal_transforms—For QMIs with more than one spin-reversal transform, SAPI handles the timing information for all the subQMIs that it sends to the solver as follows: (1) It sums each timing field that does not end with “per_sample.” (2) For others, it sends the value from the first subQMI. For example, the values for qpu_access_time are summed; those from qpu_delay_time_per_sample are not.
  • programming_thermalization—Number of microseconds to wait after programming the QPU to allow it to cool; i.e., post-programming thermalization time. Values lower than the default accelerate solving at the expense of solution quality. This value contributes to the total \(T_p\), qpu_programming_time.
  • readout_thermalization—Number of microseconds to wait after each sample is read from the QPU to allow it to cool to base temperature; i.e., post-readout thermalization time. This optional value contributes to \(T_d\), qpu_delay_time_per_sample.
  • reduce_intersample_correlation—Used to reduce sample-to-sample correlations. When true, adds to \(T_d\), qpu_delay_time_per_sample. Amount of time added increases linearly with increasing length of the anneal schedule.
  • reinitialize_state—Used in reverse annealing. When true (the default setting), reinitializes the initial qubit state for every anneal-readout cycle, adding between 100 and 600 microseconds to \(T_d\), qpu_delay_time_per_sample. When false, adds approximately 10 microseconds to \(T_d\).[4]
  • postprocess—Specifies the type of (classical) postprocessing to be performed on the raw samples from the QPU. Requesting no postprocessing consumes the least time; either sampling or optimization postprocessing consumes more.

Note

Depending on the parameters chosen for a QMI, QPU access time may be a large or small fraction of service time. E.g., a QMI requesting a single sample with short annealing_time would see programming time as a large fraction of service time and QPU access time as a small fraction.

Note

The readout_thermalization parameter is deprecated and will eventually be removed from the API. Plan code updates accordingly.

[4]Amount of time varies by system.