Incorporate Jonathan's corrections to numerical results section

This commit is contained in:
2026-05-04 17:07:41 +02:00
parent 72acea0321
commit 7bf1b2f8d7

View File

@@ -711,7 +711,7 @@ estimates committed after decoding window $\ell$, we have to set
% Intro: Problem with above procedure
The sliding-window structure visible in \Cref{fig:windowing_pcm} is
highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
Switching our viewpoint to the Tanner graph depicted in
\Cref{fig:messages_decimation_tanner}, however, we can see an important
difference between \ac{sc}-\ac{ldpc} decoding and the
@@ -719,7 +719,7 @@ sliding-window decoding procedure detailed above.
While the windowing process is similar, the algorithm above
reinitializes the decoder to start from a clean state when moving to
the next window.
It therefore does not make use of the integral property of
Therefore, it does not make use of the integral property of
windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled
structure by passing soft information from earlier to later spatial positions.
@@ -731,9 +731,10 @@ still relevant to the decoding of the next.
This may somewhat limit the variety of \emph{inner decoders}, i.e.,
the decoders decoding the individual windows, the warm-start
initialization can be used with.
E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though
this remains to be investigated.
We chose to investigate first plain \ac{bp} due to its simplicity and
For instance, \ac{bp}+\ac{osd} does not immediately seem suitable, as
it performs a hard decision on the \acp{vn}, though this remains to
be investigated.
We chose to investigate first standard \ac{bp} due to its simplicity and
then \ac{bpgd} because of the availability of recently computed messages.
% TODO: Include this?
@@ -900,7 +901,8 @@ To see how we realize this in practice, we reiterate the steps of the
\right) \\[3mm]
\text{\ac{cn} Update (Min-Sum): }&
\displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i'
\in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j}
\in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i'
\rightarrow j}
\right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert
L_{i'\rightarrow j} \rvert \\[3mm]
\label{eq:vn_update}
@@ -943,7 +945,7 @@ We can then continue decoding the next window as usual.
We can further simplify the algorithm.
Looking carefully at \Cref{eq:vn_update} we notice that when the
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized,
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been initialized to zero,
the \ac{vn} update degenerates to
\begin{align*}
\displaystyle L_{i \rightarrow j} =
@@ -971,7 +973,7 @@ Note that the decoding procedure performed on the individual windows
\label{alg:warm_start_bp}
\begin{algorithmic}[1]
\State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
\State \textbf{Initialize:} $L_{i\leftarrow j} = 0
\State \textbf{Initialize:} $L_{i\leftarrow j} = 0,
~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
\For{$\ell = 0, \ldots, n_\text{win}-1$}
\For{$\nu = 0, \ldots, n_\text{iter}-1$}
@@ -1227,7 +1229,7 @@ model, both of which depend on the code and noise model in question.
% Software stack: Layer 3
Even further up, given an already constructed syndrome extraction
circuit and the resulting \acf{dem}, we must split the detector error
circuit and the resulting \acf{dem}, we split the detector error
matrix into separate windows and manage the interplay between the
inner decoders acting on those individual windows.
@@ -1247,10 +1249,8 @@ For the circuit generation, we employed utilities from QUITS
generation for a number of different \ac{qldpc} codes.
We initially created a Python implementation, which used QUITS for the window
splitting and subsequent sliding-window decoding as well.
The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python.
After a preliminary investigation, we opted for a complete
reimplementation in Rust to achieve higher simulation speeds leveraging
the compiled nature of the language.
The \ac{bp} and \ac{bpgd} are implementation in Rust to achieve
higher simulation speeds leveraging the compiled nature of the language.
We reimplemented both the window splitting and the decoders.
% Global experimental setup
@@ -1282,21 +1282,21 @@ generated by simulating at least $200$ logical error events.
% Local experimental setup
We began our investigation by using \ac{bp} with no further
We begin our investigation by using \ac{bp} with no further
modifications as the inner decoder.
We chose the min-sum variant of \ac{bp} due to its low computational complexity.
% [Thread] Get impression for max gain
We initially wanted to gain an impression for the performance gain we could
We initially want to gain an impression for the performance gain we could
expect from a modification to the sliding-window decoding procedure.
To this end, we began by analyzing the decoding performance of the
To this end, we begin by analyzing the decoding performance of the
original process, without our warm-start modification.
We will call this \emph{cold-start} decoding in the following.
Because we expected more global decoding to work better (the inner
Because we expect more global decoding to work better (the inner
decoder then has access to a larger portion of the long-range
correlations encoded in the detector error matrix before any commit
is made) we initially decided to use decoding on the whole detector
is made) we initially decide to use decoding on the whole detector
error matrix as a proxy for the attainable decoding performance.
\begin{figure}[t]
@@ -1400,7 +1400,7 @@ this trend and, as expected, achieves the strongest performance.
The fact that the $W = 5$ curve is already very close to the
whole-block decoder indicates that the marginal benefit of enlarging
the window saturates after a certain point.
From a practical standpoint, the choice of $W$ thus represents a
Thus, from a practical standpoint, the choice of $W$ represents a
trade-off between decoding latency and accuracy: larger windows
delay the start of decoding by requiring more syndrome extraction
rounds to be collected upfront, while the diminishing returns above
@@ -1409,7 +1409,7 @@ additional accuracy in return.
% [Thread] First comparison with warm start
Next, we additionally generated error rate curves for warm-start
Next, we additionally simulate error rate curves for warm-start
sliding-window decoding to assess how much of the gap between
cold-start and whole-block decoding can be recovered by our modification.
We chose the same window sizes as before, so that the warm- and
@@ -1537,16 +1537,15 @@ consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger
$W$ implies that more messages are carried over and a larger fraction
of the next window starts in a warm state.
% TODO: Possibly insert explanation for higher gain at lowre error rates
A perhaps surprising observation is that the warm-start curve for
$W = 5$ actually lies below the whole-block reference across the
A perhaps surprising observation is that the warm-start for
$W = 5$ outperforms the whole-block reference across the
entire range of physical error rates, even though warm-start
sliding-window decoding is, by construction, more local than
whole-block decoding.
A possible explanation for this effect is discussed in the following.
% [Thread] Warm start is better than whole due to more effective iterations
A possible explanation for this surprising behavior lies in the
A possible explanation for this behavior lies in the
number of \ac{bp} iterations effectively spent on the \acp{vn}
inside the overlap region.
Each \ac{vn} in such an overlap is processed by multiple consecutive
@@ -1742,15 +1741,15 @@ initialization diminishes, and the curves approach each other.
The fact that no curve clearly saturates within the swept range is
itself worth noting.
We know that \ac{bp} on \ac{qldpc} codes suffers from poor
convergence due to the short cycles in the underlying Tanner graph,
so even after several thousand iterations the
decoder may continue to slowly refine its message estimates rather
than settle into a stable fixed point.
convergence due to degeneracy and the short cycles in the underlying
Tanner graph, so even after several thousand iterations the decoder
may continue to slowly refine its message estimates rather than
settle into a stable fixed point.
This is one of the core motivations for moving from plain \ac{bp} to
the guided-decimation variant studied in
\Cref{subsec:Belief Propagation with Guided Decimation}.
Another thing to note is that setting the per-invocation iteration
Furthermore, note that setting the per-invocation iteration
budget of the inner decoder equal to the iteration budget of the
whole-block decoder is not a fair comparison in terms of total
computational effort.
@@ -1762,14 +1761,14 @@ sliding-window approach is still at an advantage.
% [Thread] Exploration of the effect of the step size
Having examined the effect of the window size $W$, we next turned to
Having examined the effect of the window size $W$, we next turn to
the second windowing parameter, the step size $F$.
We carried out an investigation analogous to the one above:
we first compared warm- and cold-start decoding across the full range
We carry out an investigation analogous to the one above:
we first compare warm- and cold-start decoding across the full range
of physical error rates at a fixed iteration budget, and then we
examined the dependence on the iteration budget at a fixed physical
examine the dependence on the iteration budget at a fixed physical
error rate.
The window size was held fixed at $W = 5$ throughout, the value at
The window size is fixed at $W = 5$ throughout, the value at
which the warm-start variant produced the strongest performance in the
previous experiments.
@@ -2032,7 +2031,7 @@ Similarly, assuming the decoder is fast enough to keep up with the
incoming syndrome measurements corresponding to the \acp{cn} of
subsequent windows, the time at which decoding is complete depends only
on the amount of time spent on decoding the very last window.
A smaller $F$ thus only costs additional total compute and not
Thus, smaller $F$ only costs additional total compute and not
additional latency, which is favorable for a warm-start
sliding-window implementation.
This is especially favorable for our warm-start modification, as it
@@ -2062,8 +2061,8 @@ both schemes process the same windows for the same number of
iterations and differ only in the initialization of the \ac{bp}
messages of each new window.
We also observed that plain \ac{bp} did not saturate even at $4096$
iterations, which we attribute to the short cycles in the underlying
Tanner graph.
iterations, which we attribute to the degeneracy and short cycles in
the underlying Tanner graph.
This motivates the next subsection, in which we replace the inner
\ac{bp} decoder by its guided-decimation variant.
@@ -2261,7 +2260,7 @@ that can occur before every \ac{vn} in the window has been decimated.
A preliminary investigation showed that \ac{bpgd} only delivers its
intended performance gain once most \acp{vn} have actually been decimated,
which motivated this choice.
The physical error rate was swept from $p = 0.001$ to $p = 0.004$
The physical error rate is swept from $p = 0.001$ to $p = 0.004$
in steps of $0.0005$.
\Cref{fig:bpgd_w} sweeps over the window size with
$W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and
@@ -2304,7 +2303,7 @@ matrix at the time of decoding, and this benefits both warm- and
cold-start decoding.
The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
opposite of the corresponding dependence under plain \ac{bp}
(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts
(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance
rather than helps, even though smaller $F$ implies a larger overlap
in both cases.
@@ -2319,13 +2318,13 @@ every \ac{vn} in a window, by the time window $\ell$ ends, all
of its \acp{vn} have already been hard-decided.
For the \acp{vn} that lie in the overlap region with window $\ell + 1$
this hard decision is then carried into the next window through the
warm-start initialization, and the next window thus begins decoding
with a substantial fraction of its \acp{vn} already frozen, before
warm-start initialization, and the next window begins decoding
with a substantial fraction of its \acp{vn} already fixed, before
its own parity checks have had any chance to influence the
corresponding bit estimates.
This identifies one of two competing effects on the warm-start performance.
The larger the overlap, the more such prematurely frozen \acp{vn} the
next window inherits, which hurts performance.
The larger the overlap, the more such prematurely fixed \acp{vn} the
next window inherits, which degrades performance.
On the other hand, a larger window still exposes the inner decoder to
a larger set of constraints, which helps performance.
The two effects together are consistent with what we observe in
@@ -2346,7 +2345,7 @@ $n_\text{iter}$ should reduce the maximum number of \acp{vn} that can
be decimated before window $\ell$ commits, and the warm-start
performance should approach that of warm-start under plain \ac{bp} as
$n_\text{iter}$ is lowered.
We therefore now vary $n_\text{iter}$ at fixed window parameters and
Therefore, we vary $n_\text{iter}$ at fixed window parameters and
fixed physical error rate.
\begin{figure}[t]
@@ -2516,9 +2515,9 @@ fixed physical error rate.
sliding-window decoding as a function of the maximum number of inner
\ac{bp} iterations $n_\text{iter}$.
The dashed colored curves correspond to cold-start sliding-window
decoding and the solid colored curves to warm-start, again carrying
over both the \ac{bp} messages and the channel \acp{llr} on the
overlap region.
decoding and the solid colored curves to warm-start, which again
retains both the \ac{bp} messages and the decimaiton information on
the overlap region.
The physical error rate is fixed at $p = 0.0025$ and the iteration
budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024,
1536, 2048, 2560, 3072, 3584, 4096\}$.
@@ -2533,7 +2532,7 @@ For low iteration budgets, all curves in both panels behave similarly
to the plain-\ac{bp} curves in
\Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}.
The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and
the warm-start curves lie below their cold-start counterparts at
the warm-start configurations now outperform their cold-start counterparts at
matching window parameters.
As $n_\text{iter}$ continues to grow, however, the cold-start curves
undergo a sharp drop, after which they lie roughly an order of
@@ -3020,7 +3019,7 @@ and at $F = 1$, respectively.
These observations match our expectations.
With only the \ac{bp} messages carried over, the warm-start
initialization no longer freezes any \acp{vn} in the next window
initialization no longer freezes any \acp{vn} in the next window.
The dependence of this benefit on $W$ and $F$ also recovers the
pattern observed for plain \ac{bp} in
\Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}:
@@ -3034,7 +3033,7 @@ sliding-window decoding under \ac{bpgd} by summarizing our findings.
Warm-starting the inner decoder still provides a consistent
performance gain when the inner decoder is upgraded from plain
\ac{bp} to its guided-decimation variant, but only if some care is
taken in choosing what to carry over.
taken in choosing what to information carry over.
Passing the channel \acp{llr} along with the \ac{bp} messages,
as suggested by naively carrying over the warm-start idea to \ac{bpgd},
leads to premature hard decisions on \acp{vn} in the overlap region.
@@ -3049,3 +3048,17 @@ requirements are substantially larger than those of plain \ac{bp}:
the per-round \ac{ler} drops sharply only once the iteration budget
is on the order of the number of \acp{vn} in each window.
Future work could include a softer treatment of the decimation state
in \ac{bpgd}.
Rather than discarding the decimation information of the previous
window entirely, as in the message-only warm start used here, one
could encode the decimation decisions as strong but finite biases on
the channel \acp{llr} of the next window, allowing the new window's parity
checks to override them if the syndrome calls for it.
This would interpolate between the two warm-start variants studied here and
might combine the benefits of both.
A related question is whether the decimation schedule itself should
be aware of the window structure, for instance by deferring
decimation of \acp{vn} in the overlap region until they have been
visited by the next window.