Incorporate Jonathan's corrections to numerical results section
This commit is contained in:
@@ -711,7 +711,7 @@ estimates committed after decoding window $\ell$, we have to set
|
||||
% Intro: Problem with above procedure
|
||||
|
||||
The sliding-window structure visible in \Cref{fig:windowing_pcm} is
|
||||
highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
|
||||
reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
|
||||
Switching our viewpoint to the Tanner graph depicted in
|
||||
\Cref{fig:messages_decimation_tanner}, however, we can see an important
|
||||
difference between \ac{sc}-\ac{ldpc} decoding and the
|
||||
@@ -719,7 +719,7 @@ sliding-window decoding procedure detailed above.
|
||||
While the windowing process is similar, the algorithm above
|
||||
reinitializes the decoder to start from a clean state when moving to
|
||||
the next window.
|
||||
It therefore does not make use of the integral property of
|
||||
Therefore, it does not make use of the integral property of
|
||||
windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled
|
||||
structure by passing soft information from earlier to later spatial positions.
|
||||
|
||||
@@ -731,9 +731,10 @@ still relevant to the decoding of the next.
|
||||
This may somewhat limit the variety of \emph{inner decoders}, i.e.,
|
||||
the decoders decoding the individual windows, the warm-start
|
||||
initialization can be used with.
|
||||
E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though
|
||||
this remains to be investigated.
|
||||
We chose to investigate first plain \ac{bp} due to its simplicity and
|
||||
For instance, \ac{bp}+\ac{osd} does not immediately seem suitable, as
|
||||
it performs a hard decision on the \acp{vn}, though this remains to
|
||||
be investigated.
|
||||
We chose to investigate first standard \ac{bp} due to its simplicity and
|
||||
then \ac{bpgd} because of the availability of recently computed messages.
|
||||
|
||||
% TODO: Include this?
|
||||
@@ -900,7 +901,8 @@ To see how we realize this in practice, we reiterate the steps of the
|
||||
\right) \\[3mm]
|
||||
\text{\ac{cn} Update (Min-Sum): }&
|
||||
\displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i'
|
||||
\in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j}
|
||||
\in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i'
|
||||
\rightarrow j}
|
||||
\right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert
|
||||
L_{i'\rightarrow j} \rvert \\[3mm]
|
||||
\label{eq:vn_update}
|
||||
@@ -943,7 +945,7 @@ We can then continue decoding the next window as usual.
|
||||
|
||||
We can further simplify the algorithm.
|
||||
Looking carefully at \Cref{eq:vn_update} we notice that when the
|
||||
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized,
|
||||
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been initialized to zero,
|
||||
the \ac{vn} update degenerates to
|
||||
\begin{align*}
|
||||
\displaystyle L_{i \rightarrow j} =
|
||||
@@ -971,7 +973,7 @@ Note that the decoding procedure performed on the individual windows
|
||||
\label{alg:warm_start_bp}
|
||||
\begin{algorithmic}[1]
|
||||
\State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
|
||||
\State \textbf{Initialize:} $L_{i\leftarrow j} = 0
|
||||
\State \textbf{Initialize:} $L_{i\leftarrow j} = 0,
|
||||
~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
|
||||
\For{$\ell = 0, \ldots, n_\text{win}-1$}
|
||||
\For{$\nu = 0, \ldots, n_\text{iter}-1$}
|
||||
@@ -1227,7 +1229,7 @@ model, both of which depend on the code and noise model in question.
|
||||
% Software stack: Layer 3
|
||||
|
||||
Even further up, given an already constructed syndrome extraction
|
||||
circuit and the resulting \acf{dem}, we must split the detector error
|
||||
circuit and the resulting \acf{dem}, we split the detector error
|
||||
matrix into separate windows and manage the interplay between the
|
||||
inner decoders acting on those individual windows.
|
||||
|
||||
@@ -1247,10 +1249,8 @@ For the circuit generation, we employed utilities from QUITS
|
||||
generation for a number of different \ac{qldpc} codes.
|
||||
We initially created a Python implementation, which used QUITS for the window
|
||||
splitting and subsequent sliding-window decoding as well.
|
||||
The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python.
|
||||
After a preliminary investigation, we opted for a complete
|
||||
reimplementation in Rust to achieve higher simulation speeds leveraging
|
||||
the compiled nature of the language.
|
||||
The \ac{bp} and \ac{bpgd} are implementation in Rust to achieve
|
||||
higher simulation speeds leveraging the compiled nature of the language.
|
||||
We reimplemented both the window splitting and the decoders.
|
||||
|
||||
% Global experimental setup
|
||||
@@ -1282,21 +1282,21 @@ generated by simulating at least $200$ logical error events.
|
||||
|
||||
% Local experimental setup
|
||||
|
||||
We began our investigation by using \ac{bp} with no further
|
||||
We begin our investigation by using \ac{bp} with no further
|
||||
modifications as the inner decoder.
|
||||
We chose the min-sum variant of \ac{bp} due to its low computational complexity.
|
||||
|
||||
% [Thread] Get impression for max gain
|
||||
|
||||
We initially wanted to gain an impression for the performance gain we could
|
||||
We initially want to gain an impression for the performance gain we could
|
||||
expect from a modification to the sliding-window decoding procedure.
|
||||
To this end, we began by analyzing the decoding performance of the
|
||||
To this end, we begin by analyzing the decoding performance of the
|
||||
original process, without our warm-start modification.
|
||||
We will call this \emph{cold-start} decoding in the following.
|
||||
Because we expected more global decoding to work better (the inner
|
||||
Because we expect more global decoding to work better (the inner
|
||||
decoder then has access to a larger portion of the long-range
|
||||
correlations encoded in the detector error matrix before any commit
|
||||
is made) we initially decided to use decoding on the whole detector
|
||||
is made) we initially decide to use decoding on the whole detector
|
||||
error matrix as a proxy for the attainable decoding performance.
|
||||
|
||||
\begin{figure}[t]
|
||||
@@ -1400,7 +1400,7 @@ this trend and, as expected, achieves the strongest performance.
|
||||
The fact that the $W = 5$ curve is already very close to the
|
||||
whole-block decoder indicates that the marginal benefit of enlarging
|
||||
the window saturates after a certain point.
|
||||
From a practical standpoint, the choice of $W$ thus represents a
|
||||
Thus, from a practical standpoint, the choice of $W$ represents a
|
||||
trade-off between decoding latency and accuracy: larger windows
|
||||
delay the start of decoding by requiring more syndrome extraction
|
||||
rounds to be collected upfront, while the diminishing returns above
|
||||
@@ -1409,7 +1409,7 @@ additional accuracy in return.
|
||||
|
||||
% [Thread] First comparison with warm start
|
||||
|
||||
Next, we additionally generated error rate curves for warm-start
|
||||
Next, we additionally simulate error rate curves for warm-start
|
||||
sliding-window decoding to assess how much of the gap between
|
||||
cold-start and whole-block decoding can be recovered by our modification.
|
||||
We chose the same window sizes as before, so that the warm- and
|
||||
@@ -1537,16 +1537,15 @@ consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger
|
||||
$W$ implies that more messages are carried over and a larger fraction
|
||||
of the next window starts in a warm state.
|
||||
% TODO: Possibly insert explanation for higher gain at lowre error rates
|
||||
A perhaps surprising observation is that the warm-start curve for
|
||||
$W = 5$ actually lies below the whole-block reference across the
|
||||
A perhaps surprising observation is that the warm-start for
|
||||
$W = 5$ outperforms the whole-block reference across the
|
||||
entire range of physical error rates, even though warm-start
|
||||
sliding-window decoding is, by construction, more local than
|
||||
whole-block decoding.
|
||||
A possible explanation for this effect is discussed in the following.
|
||||
|
||||
% [Thread] Warm start is better than whole due to more effective iterations
|
||||
|
||||
A possible explanation for this surprising behavior lies in the
|
||||
A possible explanation for this behavior lies in the
|
||||
number of \ac{bp} iterations effectively spent on the \acp{vn}
|
||||
inside the overlap region.
|
||||
Each \ac{vn} in such an overlap is processed by multiple consecutive
|
||||
@@ -1742,15 +1741,15 @@ initialization diminishes, and the curves approach each other.
|
||||
The fact that no curve clearly saturates within the swept range is
|
||||
itself worth noting.
|
||||
We know that \ac{bp} on \ac{qldpc} codes suffers from poor
|
||||
convergence due to the short cycles in the underlying Tanner graph,
|
||||
so even after several thousand iterations the
|
||||
decoder may continue to slowly refine its message estimates rather
|
||||
than settle into a stable fixed point.
|
||||
convergence due to degeneracy and the short cycles in the underlying
|
||||
Tanner graph, so even after several thousand iterations the decoder
|
||||
may continue to slowly refine its message estimates rather than
|
||||
settle into a stable fixed point.
|
||||
This is one of the core motivations for moving from plain \ac{bp} to
|
||||
the guided-decimation variant studied in
|
||||
\Cref{subsec:Belief Propagation with Guided Decimation}.
|
||||
|
||||
Another thing to note is that setting the per-invocation iteration
|
||||
Furthermore, note that setting the per-invocation iteration
|
||||
budget of the inner decoder equal to the iteration budget of the
|
||||
whole-block decoder is not a fair comparison in terms of total
|
||||
computational effort.
|
||||
@@ -1762,14 +1761,14 @@ sliding-window approach is still at an advantage.
|
||||
|
||||
% [Thread] Exploration of the effect of the step size
|
||||
|
||||
Having examined the effect of the window size $W$, we next turned to
|
||||
Having examined the effect of the window size $W$, we next turn to
|
||||
the second windowing parameter, the step size $F$.
|
||||
We carried out an investigation analogous to the one above:
|
||||
we first compared warm- and cold-start decoding across the full range
|
||||
We carry out an investigation analogous to the one above:
|
||||
we first compare warm- and cold-start decoding across the full range
|
||||
of physical error rates at a fixed iteration budget, and then we
|
||||
examined the dependence on the iteration budget at a fixed physical
|
||||
examine the dependence on the iteration budget at a fixed physical
|
||||
error rate.
|
||||
The window size was held fixed at $W = 5$ throughout, the value at
|
||||
The window size is fixed at $W = 5$ throughout, the value at
|
||||
which the warm-start variant produced the strongest performance in the
|
||||
previous experiments.
|
||||
|
||||
@@ -2032,7 +2031,7 @@ Similarly, assuming the decoder is fast enough to keep up with the
|
||||
incoming syndrome measurements corresponding to the \acp{cn} of
|
||||
subsequent windows, the time at which decoding is complete depends only
|
||||
on the amount of time spent on decoding the very last window.
|
||||
A smaller $F$ thus only costs additional total compute and not
|
||||
Thus, smaller $F$ only costs additional total compute and not
|
||||
additional latency, which is favorable for a warm-start
|
||||
sliding-window implementation.
|
||||
This is especially favorable for our warm-start modification, as it
|
||||
@@ -2062,8 +2061,8 @@ both schemes process the same windows for the same number of
|
||||
iterations and differ only in the initialization of the \ac{bp}
|
||||
messages of each new window.
|
||||
We also observed that plain \ac{bp} did not saturate even at $4096$
|
||||
iterations, which we attribute to the short cycles in the underlying
|
||||
Tanner graph.
|
||||
iterations, which we attribute to the degeneracy and short cycles in
|
||||
the underlying Tanner graph.
|
||||
This motivates the next subsection, in which we replace the inner
|
||||
\ac{bp} decoder by its guided-decimation variant.
|
||||
|
||||
@@ -2261,7 +2260,7 @@ that can occur before every \ac{vn} in the window has been decimated.
|
||||
A preliminary investigation showed that \ac{bpgd} only delivers its
|
||||
intended performance gain once most \acp{vn} have actually been decimated,
|
||||
which motivated this choice.
|
||||
The physical error rate was swept from $p = 0.001$ to $p = 0.004$
|
||||
The physical error rate is swept from $p = 0.001$ to $p = 0.004$
|
||||
in steps of $0.0005$.
|
||||
\Cref{fig:bpgd_w} sweeps over the window size with
|
||||
$W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and
|
||||
@@ -2304,7 +2303,7 @@ matrix at the time of decoding, and this benefits both warm- and
|
||||
cold-start decoding.
|
||||
The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
|
||||
opposite of the corresponding dependence under plain \ac{bp}
|
||||
(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts
|
||||
(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance
|
||||
rather than helps, even though smaller $F$ implies a larger overlap
|
||||
in both cases.
|
||||
|
||||
@@ -2319,13 +2318,13 @@ every \ac{vn} in a window, by the time window $\ell$ ends, all
|
||||
of its \acp{vn} have already been hard-decided.
|
||||
For the \acp{vn} that lie in the overlap region with window $\ell + 1$
|
||||
this hard decision is then carried into the next window through the
|
||||
warm-start initialization, and the next window thus begins decoding
|
||||
with a substantial fraction of its \acp{vn} already frozen, before
|
||||
warm-start initialization, and the next window begins decoding
|
||||
with a substantial fraction of its \acp{vn} already fixed, before
|
||||
its own parity checks have had any chance to influence the
|
||||
corresponding bit estimates.
|
||||
This identifies one of two competing effects on the warm-start performance.
|
||||
The larger the overlap, the more such prematurely frozen \acp{vn} the
|
||||
next window inherits, which hurts performance.
|
||||
The larger the overlap, the more such prematurely fixed \acp{vn} the
|
||||
next window inherits, which degrades performance.
|
||||
On the other hand, a larger window still exposes the inner decoder to
|
||||
a larger set of constraints, which helps performance.
|
||||
The two effects together are consistent with what we observe in
|
||||
@@ -2346,7 +2345,7 @@ $n_\text{iter}$ should reduce the maximum number of \acp{vn} that can
|
||||
be decimated before window $\ell$ commits, and the warm-start
|
||||
performance should approach that of warm-start under plain \ac{bp} as
|
||||
$n_\text{iter}$ is lowered.
|
||||
We therefore now vary $n_\text{iter}$ at fixed window parameters and
|
||||
Therefore, we vary $n_\text{iter}$ at fixed window parameters and
|
||||
fixed physical error rate.
|
||||
|
||||
\begin{figure}[t]
|
||||
@@ -2516,9 +2515,9 @@ fixed physical error rate.
|
||||
sliding-window decoding as a function of the maximum number of inner
|
||||
\ac{bp} iterations $n_\text{iter}$.
|
||||
The dashed colored curves correspond to cold-start sliding-window
|
||||
decoding and the solid colored curves to warm-start, again carrying
|
||||
over both the \ac{bp} messages and the channel \acp{llr} on the
|
||||
overlap region.
|
||||
decoding and the solid colored curves to warm-start, which again
|
||||
retains both the \ac{bp} messages and the decimaiton information on
|
||||
the overlap region.
|
||||
The physical error rate is fixed at $p = 0.0025$ and the iteration
|
||||
budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024,
|
||||
1536, 2048, 2560, 3072, 3584, 4096\}$.
|
||||
@@ -2533,7 +2532,7 @@ For low iteration budgets, all curves in both panels behave similarly
|
||||
to the plain-\ac{bp} curves in
|
||||
\Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}.
|
||||
The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and
|
||||
the warm-start curves lie below their cold-start counterparts at
|
||||
the warm-start configurations now outperform their cold-start counterparts at
|
||||
matching window parameters.
|
||||
As $n_\text{iter}$ continues to grow, however, the cold-start curves
|
||||
undergo a sharp drop, after which they lie roughly an order of
|
||||
@@ -3020,7 +3019,7 @@ and at $F = 1$, respectively.
|
||||
|
||||
These observations match our expectations.
|
||||
With only the \ac{bp} messages carried over, the warm-start
|
||||
initialization no longer freezes any \acp{vn} in the next window
|
||||
initialization no longer freezes any \acp{vn} in the next window.
|
||||
The dependence of this benefit on $W$ and $F$ also recovers the
|
||||
pattern observed for plain \ac{bp} in
|
||||
\Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}:
|
||||
@@ -3034,7 +3033,7 @@ sliding-window decoding under \ac{bpgd} by summarizing our findings.
|
||||
Warm-starting the inner decoder still provides a consistent
|
||||
performance gain when the inner decoder is upgraded from plain
|
||||
\ac{bp} to its guided-decimation variant, but only if some care is
|
||||
taken in choosing what to carry over.
|
||||
taken in choosing what to information carry over.
|
||||
Passing the channel \acp{llr} along with the \ac{bp} messages,
|
||||
as suggested by naively carrying over the warm-start idea to \ac{bpgd},
|
||||
leads to premature hard decisions on \acp{vn} in the overlap region.
|
||||
@@ -3049,3 +3048,17 @@ requirements are substantially larger than those of plain \ac{bp}:
|
||||
the per-round \ac{ler} drops sharply only once the iteration budget
|
||||
is on the order of the number of \acp{vn} in each window.
|
||||
|
||||
Future work could include a softer treatment of the decimation state
|
||||
in \ac{bpgd}.
|
||||
Rather than discarding the decimation information of the previous
|
||||
window entirely, as in the message-only warm start used here, one
|
||||
could encode the decimation decisions as strong but finite biases on
|
||||
the channel \acp{llr} of the next window, allowing the new window's parity
|
||||
checks to override them if the syndrome calls for it.
|
||||
This would interpolate between the two warm-start variants studied here and
|
||||
might combine the benefits of both.
|
||||
A related question is whether the decimation schedule itself should
|
||||
be aware of the window structure, for instance by deferring
|
||||
decimation of \acp{vn} in the overlap region until they have been
|
||||
visited by the next window.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user