Incorporate Jonathan's corrections to numerical results section

This commit is contained in:
2026-05-04 17:07:41 +02:00
parent 72acea0321
commit 7bf1b2f8d7

View File

@@ -711,7 +711,7 @@ estimates committed after decoding window $\ell$, we have to set
% Intro: Problem with above procedure % Intro: Problem with above procedure
The sliding-window structure visible in \Cref{fig:windowing_pcm} is The sliding-window structure visible in \Cref{fig:windowing_pcm} is
highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes. reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
Switching our viewpoint to the Tanner graph depicted in Switching our viewpoint to the Tanner graph depicted in
\Cref{fig:messages_decimation_tanner}, however, we can see an important \Cref{fig:messages_decimation_tanner}, however, we can see an important
difference between \ac{sc}-\ac{ldpc} decoding and the difference between \ac{sc}-\ac{ldpc} decoding and the
@@ -719,7 +719,7 @@ sliding-window decoding procedure detailed above.
While the windowing process is similar, the algorithm above While the windowing process is similar, the algorithm above
reinitializes the decoder to start from a clean state when moving to reinitializes the decoder to start from a clean state when moving to
the next window. the next window.
It therefore does not make use of the integral property of Therefore, it does not make use of the integral property of
windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled
structure by passing soft information from earlier to later spatial positions. structure by passing soft information from earlier to later spatial positions.
@@ -731,9 +731,10 @@ still relevant to the decoding of the next.
This may somewhat limit the variety of \emph{inner decoders}, i.e., This may somewhat limit the variety of \emph{inner decoders}, i.e.,
the decoders decoding the individual windows, the warm-start the decoders decoding the individual windows, the warm-start
initialization can be used with. initialization can be used with.
E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though For instance, \ac{bp}+\ac{osd} does not immediately seem suitable, as
this remains to be investigated. it performs a hard decision on the \acp{vn}, though this remains to
We chose to investigate first plain \ac{bp} due to its simplicity and be investigated.
We chose to investigate first standard \ac{bp} due to its simplicity and
then \ac{bpgd} because of the availability of recently computed messages. then \ac{bpgd} because of the availability of recently computed messages.
% TODO: Include this? % TODO: Include this?
@@ -900,7 +901,8 @@ To see how we realize this in practice, we reiterate the steps of the
\right) \\[3mm] \right) \\[3mm]
\text{\ac{cn} Update (Min-Sum): }& \text{\ac{cn} Update (Min-Sum): }&
\displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i' \displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i'
\in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j} \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i'
\rightarrow j}
\right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert \right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert
L_{i'\rightarrow j} \rvert \\[3mm] L_{i'\rightarrow j} \rvert \\[3mm]
\label{eq:vn_update} \label{eq:vn_update}
@@ -943,7 +945,7 @@ We can then continue decoding the next window as usual.
We can further simplify the algorithm. We can further simplify the algorithm.
Looking carefully at \Cref{eq:vn_update} we notice that when the Looking carefully at \Cref{eq:vn_update} we notice that when the
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized, \ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been initialized to zero,
the \ac{vn} update degenerates to the \ac{vn} update degenerates to
\begin{align*} \begin{align*}
\displaystyle L_{i \rightarrow j} = \displaystyle L_{i \rightarrow j} =
@@ -971,7 +973,7 @@ Note that the decoding procedure performed on the individual windows
\label{alg:warm_start_bp} \label{alg:warm_start_bp}
\begin{algorithmic}[1] \begin{algorithmic}[1]
\State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$ \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
\State \textbf{Initialize:} $L_{i\leftarrow j} = 0 \State \textbf{Initialize:} $L_{i\leftarrow j} = 0,
~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$ ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
\For{$\ell = 0, \ldots, n_\text{win}-1$} \For{$\ell = 0, \ldots, n_\text{win}-1$}
\For{$\nu = 0, \ldots, n_\text{iter}-1$} \For{$\nu = 0, \ldots, n_\text{iter}-1$}
@@ -1227,7 +1229,7 @@ model, both of which depend on the code and noise model in question.
% Software stack: Layer 3 % Software stack: Layer 3
Even further up, given an already constructed syndrome extraction Even further up, given an already constructed syndrome extraction
circuit and the resulting \acf{dem}, we must split the detector error circuit and the resulting \acf{dem}, we split the detector error
matrix into separate windows and manage the interplay between the matrix into separate windows and manage the interplay between the
inner decoders acting on those individual windows. inner decoders acting on those individual windows.
@@ -1247,10 +1249,8 @@ For the circuit generation, we employed utilities from QUITS
generation for a number of different \ac{qldpc} codes. generation for a number of different \ac{qldpc} codes.
We initially created a Python implementation, which used QUITS for the window We initially created a Python implementation, which used QUITS for the window
splitting and subsequent sliding-window decoding as well. splitting and subsequent sliding-window decoding as well.
The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python. The \ac{bp} and \ac{bpgd} are implementation in Rust to achieve
After a preliminary investigation, we opted for a complete higher simulation speeds leveraging the compiled nature of the language.
reimplementation in Rust to achieve higher simulation speeds leveraging
the compiled nature of the language.
We reimplemented both the window splitting and the decoders. We reimplemented both the window splitting and the decoders.
% Global experimental setup % Global experimental setup
@@ -1282,21 +1282,21 @@ generated by simulating at least $200$ logical error events.
% Local experimental setup % Local experimental setup
We began our investigation by using \ac{bp} with no further We begin our investigation by using \ac{bp} with no further
modifications as the inner decoder. modifications as the inner decoder.
We chose the min-sum variant of \ac{bp} due to its low computational complexity. We chose the min-sum variant of \ac{bp} due to its low computational complexity.
% [Thread] Get impression for max gain % [Thread] Get impression for max gain
We initially wanted to gain an impression for the performance gain we could We initially want to gain an impression for the performance gain we could
expect from a modification to the sliding-window decoding procedure. expect from a modification to the sliding-window decoding procedure.
To this end, we began by analyzing the decoding performance of the To this end, we begin by analyzing the decoding performance of the
original process, without our warm-start modification. original process, without our warm-start modification.
We will call this \emph{cold-start} decoding in the following. We will call this \emph{cold-start} decoding in the following.
Because we expected more global decoding to work better (the inner Because we expect more global decoding to work better (the inner
decoder then has access to a larger portion of the long-range decoder then has access to a larger portion of the long-range
correlations encoded in the detector error matrix before any commit correlations encoded in the detector error matrix before any commit
is made) we initially decided to use decoding on the whole detector is made) we initially decide to use decoding on the whole detector
error matrix as a proxy for the attainable decoding performance. error matrix as a proxy for the attainable decoding performance.
\begin{figure}[t] \begin{figure}[t]
@@ -1400,7 +1400,7 @@ this trend and, as expected, achieves the strongest performance.
The fact that the $W = 5$ curve is already very close to the The fact that the $W = 5$ curve is already very close to the
whole-block decoder indicates that the marginal benefit of enlarging whole-block decoder indicates that the marginal benefit of enlarging
the window saturates after a certain point. the window saturates after a certain point.
From a practical standpoint, the choice of $W$ thus represents a Thus, from a practical standpoint, the choice of $W$ represents a
trade-off between decoding latency and accuracy: larger windows trade-off between decoding latency and accuracy: larger windows
delay the start of decoding by requiring more syndrome extraction delay the start of decoding by requiring more syndrome extraction
rounds to be collected upfront, while the diminishing returns above rounds to be collected upfront, while the diminishing returns above
@@ -1409,7 +1409,7 @@ additional accuracy in return.
% [Thread] First comparison with warm start % [Thread] First comparison with warm start
Next, we additionally generated error rate curves for warm-start Next, we additionally simulate error rate curves for warm-start
sliding-window decoding to assess how much of the gap between sliding-window decoding to assess how much of the gap between
cold-start and whole-block decoding can be recovered by our modification. cold-start and whole-block decoding can be recovered by our modification.
We chose the same window sizes as before, so that the warm- and We chose the same window sizes as before, so that the warm- and
@@ -1537,16 +1537,15 @@ consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger
$W$ implies that more messages are carried over and a larger fraction $W$ implies that more messages are carried over and a larger fraction
of the next window starts in a warm state. of the next window starts in a warm state.
% TODO: Possibly insert explanation for higher gain at lowre error rates % TODO: Possibly insert explanation for higher gain at lowre error rates
A perhaps surprising observation is that the warm-start curve for A perhaps surprising observation is that the warm-start for
$W = 5$ actually lies below the whole-block reference across the $W = 5$ outperforms the whole-block reference across the
entire range of physical error rates, even though warm-start entire range of physical error rates, even though warm-start
sliding-window decoding is, by construction, more local than sliding-window decoding is, by construction, more local than
whole-block decoding. whole-block decoding.
A possible explanation for this effect is discussed in the following.
% [Thread] Warm start is better than whole due to more effective iterations % [Thread] Warm start is better than whole due to more effective iterations
A possible explanation for this surprising behavior lies in the A possible explanation for this behavior lies in the
number of \ac{bp} iterations effectively spent on the \acp{vn} number of \ac{bp} iterations effectively spent on the \acp{vn}
inside the overlap region. inside the overlap region.
Each \ac{vn} in such an overlap is processed by multiple consecutive Each \ac{vn} in such an overlap is processed by multiple consecutive
@@ -1742,15 +1741,15 @@ initialization diminishes, and the curves approach each other.
The fact that no curve clearly saturates within the swept range is The fact that no curve clearly saturates within the swept range is
itself worth noting. itself worth noting.
We know that \ac{bp} on \ac{qldpc} codes suffers from poor We know that \ac{bp} on \ac{qldpc} codes suffers from poor
convergence due to the short cycles in the underlying Tanner graph, convergence due to degeneracy and the short cycles in the underlying
so even after several thousand iterations the Tanner graph, so even after several thousand iterations the decoder
decoder may continue to slowly refine its message estimates rather may continue to slowly refine its message estimates rather than
than settle into a stable fixed point. settle into a stable fixed point.
This is one of the core motivations for moving from plain \ac{bp} to This is one of the core motivations for moving from plain \ac{bp} to
the guided-decimation variant studied in the guided-decimation variant studied in
\Cref{subsec:Belief Propagation with Guided Decimation}. \Cref{subsec:Belief Propagation with Guided Decimation}.
Another thing to note is that setting the per-invocation iteration Furthermore, note that setting the per-invocation iteration
budget of the inner decoder equal to the iteration budget of the budget of the inner decoder equal to the iteration budget of the
whole-block decoder is not a fair comparison in terms of total whole-block decoder is not a fair comparison in terms of total
computational effort. computational effort.
@@ -1762,14 +1761,14 @@ sliding-window approach is still at an advantage.
% [Thread] Exploration of the effect of the step size % [Thread] Exploration of the effect of the step size
Having examined the effect of the window size $W$, we next turned to Having examined the effect of the window size $W$, we next turn to
the second windowing parameter, the step size $F$. the second windowing parameter, the step size $F$.
We carried out an investigation analogous to the one above: We carry out an investigation analogous to the one above:
we first compared warm- and cold-start decoding across the full range we first compare warm- and cold-start decoding across the full range
of physical error rates at a fixed iteration budget, and then we of physical error rates at a fixed iteration budget, and then we
examined the dependence on the iteration budget at a fixed physical examine the dependence on the iteration budget at a fixed physical
error rate. error rate.
The window size was held fixed at $W = 5$ throughout, the value at The window size is fixed at $W = 5$ throughout, the value at
which the warm-start variant produced the strongest performance in the which the warm-start variant produced the strongest performance in the
previous experiments. previous experiments.
@@ -2032,7 +2031,7 @@ Similarly, assuming the decoder is fast enough to keep up with the
incoming syndrome measurements corresponding to the \acp{cn} of incoming syndrome measurements corresponding to the \acp{cn} of
subsequent windows, the time at which decoding is complete depends only subsequent windows, the time at which decoding is complete depends only
on the amount of time spent on decoding the very last window. on the amount of time spent on decoding the very last window.
A smaller $F$ thus only costs additional total compute and not Thus, smaller $F$ only costs additional total compute and not
additional latency, which is favorable for a warm-start additional latency, which is favorable for a warm-start
sliding-window implementation. sliding-window implementation.
This is especially favorable for our warm-start modification, as it This is especially favorable for our warm-start modification, as it
@@ -2062,8 +2061,8 @@ both schemes process the same windows for the same number of
iterations and differ only in the initialization of the \ac{bp} iterations and differ only in the initialization of the \ac{bp}
messages of each new window. messages of each new window.
We also observed that plain \ac{bp} did not saturate even at $4096$ We also observed that plain \ac{bp} did not saturate even at $4096$
iterations, which we attribute to the short cycles in the underlying iterations, which we attribute to the degeneracy and short cycles in
Tanner graph. the underlying Tanner graph.
This motivates the next subsection, in which we replace the inner This motivates the next subsection, in which we replace the inner
\ac{bp} decoder by its guided-decimation variant. \ac{bp} decoder by its guided-decimation variant.
@@ -2261,7 +2260,7 @@ that can occur before every \ac{vn} in the window has been decimated.
A preliminary investigation showed that \ac{bpgd} only delivers its A preliminary investigation showed that \ac{bpgd} only delivers its
intended performance gain once most \acp{vn} have actually been decimated, intended performance gain once most \acp{vn} have actually been decimated,
which motivated this choice. which motivated this choice.
The physical error rate was swept from $p = 0.001$ to $p = 0.004$ The physical error rate is swept from $p = 0.001$ to $p = 0.004$
in steps of $0.0005$. in steps of $0.0005$.
\Cref{fig:bpgd_w} sweeps over the window size with \Cref{fig:bpgd_w} sweeps over the window size with
$W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and $W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and
@@ -2304,7 +2303,7 @@ matrix at the time of decoding, and this benefits both warm- and
cold-start decoding. cold-start decoding.
The dependence on the step size in \Cref{fig:bpgd_f}, however, is the The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
opposite of the corresponding dependence under plain \ac{bp} opposite of the corresponding dependence under plain \ac{bp}
(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts (\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance
rather than helps, even though smaller $F$ implies a larger overlap rather than helps, even though smaller $F$ implies a larger overlap
in both cases. in both cases.
@@ -2319,13 +2318,13 @@ every \ac{vn} in a window, by the time window $\ell$ ends, all
of its \acp{vn} have already been hard-decided. of its \acp{vn} have already been hard-decided.
For the \acp{vn} that lie in the overlap region with window $\ell + 1$ For the \acp{vn} that lie in the overlap region with window $\ell + 1$
this hard decision is then carried into the next window through the this hard decision is then carried into the next window through the
warm-start initialization, and the next window thus begins decoding warm-start initialization, and the next window begins decoding
with a substantial fraction of its \acp{vn} already frozen, before with a substantial fraction of its \acp{vn} already fixed, before
its own parity checks have had any chance to influence the its own parity checks have had any chance to influence the
corresponding bit estimates. corresponding bit estimates.
This identifies one of two competing effects on the warm-start performance. This identifies one of two competing effects on the warm-start performance.
The larger the overlap, the more such prematurely frozen \acp{vn} the The larger the overlap, the more such prematurely fixed \acp{vn} the
next window inherits, which hurts performance. next window inherits, which degrades performance.
On the other hand, a larger window still exposes the inner decoder to On the other hand, a larger window still exposes the inner decoder to
a larger set of constraints, which helps performance. a larger set of constraints, which helps performance.
The two effects together are consistent with what we observe in The two effects together are consistent with what we observe in
@@ -2346,7 +2345,7 @@ $n_\text{iter}$ should reduce the maximum number of \acp{vn} that can
be decimated before window $\ell$ commits, and the warm-start be decimated before window $\ell$ commits, and the warm-start
performance should approach that of warm-start under plain \ac{bp} as performance should approach that of warm-start under plain \ac{bp} as
$n_\text{iter}$ is lowered. $n_\text{iter}$ is lowered.
We therefore now vary $n_\text{iter}$ at fixed window parameters and Therefore, we vary $n_\text{iter}$ at fixed window parameters and
fixed physical error rate. fixed physical error rate.
\begin{figure}[t] \begin{figure}[t]
@@ -2516,9 +2515,9 @@ fixed physical error rate.
sliding-window decoding as a function of the maximum number of inner sliding-window decoding as a function of the maximum number of inner
\ac{bp} iterations $n_\text{iter}$. \ac{bp} iterations $n_\text{iter}$.
The dashed colored curves correspond to cold-start sliding-window The dashed colored curves correspond to cold-start sliding-window
decoding and the solid colored curves to warm-start, again carrying decoding and the solid colored curves to warm-start, which again
over both the \ac{bp} messages and the channel \acp{llr} on the retains both the \ac{bp} messages and the decimaiton information on
overlap region. the overlap region.
The physical error rate is fixed at $p = 0.0025$ and the iteration The physical error rate is fixed at $p = 0.0025$ and the iteration
budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024, budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024,
1536, 2048, 2560, 3072, 3584, 4096\}$. 1536, 2048, 2560, 3072, 3584, 4096\}$.
@@ -2533,7 +2532,7 @@ For low iteration budgets, all curves in both panels behave similarly
to the plain-\ac{bp} curves in to the plain-\ac{bp} curves in
\Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}. \Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}.
The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and
the warm-start curves lie below their cold-start counterparts at the warm-start configurations now outperform their cold-start counterparts at
matching window parameters. matching window parameters.
As $n_\text{iter}$ continues to grow, however, the cold-start curves As $n_\text{iter}$ continues to grow, however, the cold-start curves
undergo a sharp drop, after which they lie roughly an order of undergo a sharp drop, after which they lie roughly an order of
@@ -3020,7 +3019,7 @@ and at $F = 1$, respectively.
These observations match our expectations. These observations match our expectations.
With only the \ac{bp} messages carried over, the warm-start With only the \ac{bp} messages carried over, the warm-start
initialization no longer freezes any \acp{vn} in the next window initialization no longer freezes any \acp{vn} in the next window.
The dependence of this benefit on $W$ and $F$ also recovers the The dependence of this benefit on $W$ and $F$ also recovers the
pattern observed for plain \ac{bp} in pattern observed for plain \ac{bp} in
\Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}: \Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}:
@@ -3034,7 +3033,7 @@ sliding-window decoding under \ac{bpgd} by summarizing our findings.
Warm-starting the inner decoder still provides a consistent Warm-starting the inner decoder still provides a consistent
performance gain when the inner decoder is upgraded from plain performance gain when the inner decoder is upgraded from plain
\ac{bp} to its guided-decimation variant, but only if some care is \ac{bp} to its guided-decimation variant, but only if some care is
taken in choosing what to carry over. taken in choosing what to information carry over.
Passing the channel \acp{llr} along with the \ac{bp} messages, Passing the channel \acp{llr} along with the \ac{bp} messages,
as suggested by naively carrying over the warm-start idea to \ac{bpgd}, as suggested by naively carrying over the warm-start idea to \ac{bpgd},
leads to premature hard decisions on \acp{vn} in the overlap region. leads to premature hard decisions on \acp{vn} in the overlap region.
@@ -3049,3 +3048,17 @@ requirements are substantially larger than those of plain \ac{bp}:
the per-round \ac{ler} drops sharply only once the iteration budget the per-round \ac{ler} drops sharply only once the iteration budget
is on the order of the number of \acp{vn} in each window. is on the order of the number of \acp{vn} in each window.
Future work could include a softer treatment of the decimation state
in \ac{bpgd}.
Rather than discarding the decimation information of the previous
window entirely, as in the message-only warm start used here, one
could encode the decimation decisions as strong but finite biases on
the channel \acp{llr} of the next window, allowing the new window's parity
checks to override them if the syndrome calls for it.
This would interpolate between the two warm-start variants studied here and
might combine the benefits of both.
A related question is whether the decimation schedule itself should
be aware of the window structure, for instance by deferring
decimation of \acp{vn} in the overlap region until they have been
visited by the next window.