From 7bf1b2f8d7f30e13d49e788abfb2923b3cc0d39f Mon Sep 17 00:00:00 2001 From: Andreas Tsouchlos Date: Mon, 4 May 2026 17:07:41 +0200 Subject: [PATCH] Incorporate Jonathan's corrections to numerical results section --- src/thesis/chapters/4_decoding_under_dems.tex | 113 ++++++++++-------- 1 file changed, 63 insertions(+), 50 deletions(-) diff --git a/src/thesis/chapters/4_decoding_under_dems.tex b/src/thesis/chapters/4_decoding_under_dems.tex index ff6a07c..00814b2 100644 --- a/src/thesis/chapters/4_decoding_under_dems.tex +++ b/src/thesis/chapters/4_decoding_under_dems.tex @@ -711,7 +711,7 @@ estimates committed after decoding window $\ell$, we have to set % Intro: Problem with above procedure The sliding-window structure visible in \Cref{fig:windowing_pcm} is -highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes. +reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes. Switching our viewpoint to the Tanner graph depicted in \Cref{fig:messages_decimation_tanner}, however, we can see an important difference between \ac{sc}-\ac{ldpc} decoding and the @@ -719,7 +719,7 @@ sliding-window decoding procedure detailed above. While the windowing process is similar, the algorithm above reinitializes the decoder to start from a clean state when moving to the next window. -It therefore does not make use of the integral property of +Therefore, it does not make use of the integral property of windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled structure by passing soft information from earlier to later spatial positions. @@ -731,9 +731,10 @@ still relevant to the decoding of the next. This may somewhat limit the variety of \emph{inner decoders}, i.e., the decoders decoding the individual windows, the warm-start initialization can be used with. -E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though -this remains to be investigated. -We chose to investigate first plain \ac{bp} due to its simplicity and +For instance, \ac{bp}+\ac{osd} does not immediately seem suitable, as +it performs a hard decision on the \acp{vn}, though this remains to +be investigated. +We chose to investigate first standard \ac{bp} due to its simplicity and then \ac{bpgd} because of the availability of recently computed messages. % TODO: Include this? @@ -900,7 +901,8 @@ To see how we realize this in practice, we reiterate the steps of the \right) \\[3mm] \text{\ac{cn} Update (Min-Sum): }& \displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i' - \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j} + \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' + \rightarrow j} \right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert L_{i'\rightarrow j} \rvert \\[3mm] \label{eq:vn_update} @@ -943,7 +945,7 @@ We can then continue decoding the next window as usual. We can further simplify the algorithm. Looking carefully at \Cref{eq:vn_update} we notice that when the -\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized, +\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been initialized to zero, the \ac{vn} update degenerates to \begin{align*} \displaystyle L_{i \rightarrow j} = @@ -971,7 +973,7 @@ Note that the decoding procedure performed on the individual windows \label{alg:warm_start_bp} \begin{algorithmic}[1] \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$ - \State \textbf{Initialize:} $L_{i\leftarrow j} = 0 + \State \textbf{Initialize:} $L_{i\leftarrow j} = 0, ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$ \For{$\ell = 0, \ldots, n_\text{win}-1$} \For{$\nu = 0, \ldots, n_\text{iter}-1$} @@ -1227,7 +1229,7 @@ model, both of which depend on the code and noise model in question. % Software stack: Layer 3 Even further up, given an already constructed syndrome extraction -circuit and the resulting \acf{dem}, we must split the detector error +circuit and the resulting \acf{dem}, we split the detector error matrix into separate windows and manage the interplay between the inner decoders acting on those individual windows. @@ -1247,10 +1249,8 @@ For the circuit generation, we employed utilities from QUITS generation for a number of different \ac{qldpc} codes. We initially created a Python implementation, which used QUITS for the window splitting and subsequent sliding-window decoding as well. -The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python. -After a preliminary investigation, we opted for a complete -reimplementation in Rust to achieve higher simulation speeds leveraging -the compiled nature of the language. +The \ac{bp} and \ac{bpgd} are implementation in Rust to achieve +higher simulation speeds leveraging the compiled nature of the language. We reimplemented both the window splitting and the decoders. % Global experimental setup @@ -1282,21 +1282,21 @@ generated by simulating at least $200$ logical error events. % Local experimental setup -We began our investigation by using \ac{bp} with no further +We begin our investigation by using \ac{bp} with no further modifications as the inner decoder. We chose the min-sum variant of \ac{bp} due to its low computational complexity. % [Thread] Get impression for max gain -We initially wanted to gain an impression for the performance gain we could +We initially want to gain an impression for the performance gain we could expect from a modification to the sliding-window decoding procedure. -To this end, we began by analyzing the decoding performance of the +To this end, we begin by analyzing the decoding performance of the original process, without our warm-start modification. We will call this \emph{cold-start} decoding in the following. -Because we expected more global decoding to work better (the inner +Because we expect more global decoding to work better (the inner decoder then has access to a larger portion of the long-range correlations encoded in the detector error matrix before any commit -is made) we initially decided to use decoding on the whole detector +is made) we initially decide to use decoding on the whole detector error matrix as a proxy for the attainable decoding performance. \begin{figure}[t] @@ -1400,7 +1400,7 @@ this trend and, as expected, achieves the strongest performance. The fact that the $W = 5$ curve is already very close to the whole-block decoder indicates that the marginal benefit of enlarging the window saturates after a certain point. -From a practical standpoint, the choice of $W$ thus represents a +Thus, from a practical standpoint, the choice of $W$ represents a trade-off between decoding latency and accuracy: larger windows delay the start of decoding by requiring more syndrome extraction rounds to be collected upfront, while the diminishing returns above @@ -1409,7 +1409,7 @@ additional accuracy in return. % [Thread] First comparison with warm start -Next, we additionally generated error rate curves for warm-start +Next, we additionally simulate error rate curves for warm-start sliding-window decoding to assess how much of the gap between cold-start and whole-block decoding can be recovered by our modification. We chose the same window sizes as before, so that the warm- and @@ -1537,16 +1537,15 @@ consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger $W$ implies that more messages are carried over and a larger fraction of the next window starts in a warm state. % TODO: Possibly insert explanation for higher gain at lowre error rates -A perhaps surprising observation is that the warm-start curve for -$W = 5$ actually lies below the whole-block reference across the +A perhaps surprising observation is that the warm-start for +$W = 5$ outperforms the whole-block reference across the entire range of physical error rates, even though warm-start sliding-window decoding is, by construction, more local than whole-block decoding. -A possible explanation for this effect is discussed in the following. % [Thread] Warm start is better than whole due to more effective iterations -A possible explanation for this surprising behavior lies in the +A possible explanation for this behavior lies in the number of \ac{bp} iterations effectively spent on the \acp{vn} inside the overlap region. Each \ac{vn} in such an overlap is processed by multiple consecutive @@ -1742,15 +1741,15 @@ initialization diminishes, and the curves approach each other. The fact that no curve clearly saturates within the swept range is itself worth noting. We know that \ac{bp} on \ac{qldpc} codes suffers from poor -convergence due to the short cycles in the underlying Tanner graph, -so even after several thousand iterations the -decoder may continue to slowly refine its message estimates rather -than settle into a stable fixed point. +convergence due to degeneracy and the short cycles in the underlying +Tanner graph, so even after several thousand iterations the decoder +may continue to slowly refine its message estimates rather than +settle into a stable fixed point. This is one of the core motivations for moving from plain \ac{bp} to the guided-decimation variant studied in \Cref{subsec:Belief Propagation with Guided Decimation}. -Another thing to note is that setting the per-invocation iteration +Furthermore, note that setting the per-invocation iteration budget of the inner decoder equal to the iteration budget of the whole-block decoder is not a fair comparison in terms of total computational effort. @@ -1762,14 +1761,14 @@ sliding-window approach is still at an advantage. % [Thread] Exploration of the effect of the step size -Having examined the effect of the window size $W$, we next turned to +Having examined the effect of the window size $W$, we next turn to the second windowing parameter, the step size $F$. -We carried out an investigation analogous to the one above: -we first compared warm- and cold-start decoding across the full range +We carry out an investigation analogous to the one above: +we first compare warm- and cold-start decoding across the full range of physical error rates at a fixed iteration budget, and then we -examined the dependence on the iteration budget at a fixed physical +examine the dependence on the iteration budget at a fixed physical error rate. -The window size was held fixed at $W = 5$ throughout, the value at +The window size is fixed at $W = 5$ throughout, the value at which the warm-start variant produced the strongest performance in the previous experiments. @@ -2032,7 +2031,7 @@ Similarly, assuming the decoder is fast enough to keep up with the incoming syndrome measurements corresponding to the \acp{cn} of subsequent windows, the time at which decoding is complete depends only on the amount of time spent on decoding the very last window. -A smaller $F$ thus only costs additional total compute and not +Thus, smaller $F$ only costs additional total compute and not additional latency, which is favorable for a warm-start sliding-window implementation. This is especially favorable for our warm-start modification, as it @@ -2062,8 +2061,8 @@ both schemes process the same windows for the same number of iterations and differ only in the initialization of the \ac{bp} messages of each new window. We also observed that plain \ac{bp} did not saturate even at $4096$ -iterations, which we attribute to the short cycles in the underlying -Tanner graph. +iterations, which we attribute to the degeneracy and short cycles in +the underlying Tanner graph. This motivates the next subsection, in which we replace the inner \ac{bp} decoder by its guided-decimation variant. @@ -2261,7 +2260,7 @@ that can occur before every \ac{vn} in the window has been decimated. A preliminary investigation showed that \ac{bpgd} only delivers its intended performance gain once most \acp{vn} have actually been decimated, which motivated this choice. -The physical error rate was swept from $p = 0.001$ to $p = 0.004$ +The physical error rate is swept from $p = 0.001$ to $p = 0.004$ in steps of $0.0005$. \Cref{fig:bpgd_w} sweeps over the window size with $W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and @@ -2304,7 +2303,7 @@ matrix at the time of decoding, and this benefits both warm- and cold-start decoding. The dependence on the step size in \Cref{fig:bpgd_f}, however, is the opposite of the corresponding dependence under plain \ac{bp} -(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts +(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance rather than helps, even though smaller $F$ implies a larger overlap in both cases. @@ -2319,13 +2318,13 @@ every \ac{vn} in a window, by the time window $\ell$ ends, all of its \acp{vn} have already been hard-decided. For the \acp{vn} that lie in the overlap region with window $\ell + 1$ this hard decision is then carried into the next window through the -warm-start initialization, and the next window thus begins decoding -with a substantial fraction of its \acp{vn} already frozen, before +warm-start initialization, and the next window begins decoding +with a substantial fraction of its \acp{vn} already fixed, before its own parity checks have had any chance to influence the corresponding bit estimates. This identifies one of two competing effects on the warm-start performance. -The larger the overlap, the more such prematurely frozen \acp{vn} the -next window inherits, which hurts performance. +The larger the overlap, the more such prematurely fixed \acp{vn} the +next window inherits, which degrades performance. On the other hand, a larger window still exposes the inner decoder to a larger set of constraints, which helps performance. The two effects together are consistent with what we observe in @@ -2346,7 +2345,7 @@ $n_\text{iter}$ should reduce the maximum number of \acp{vn} that can be decimated before window $\ell$ commits, and the warm-start performance should approach that of warm-start under plain \ac{bp} as $n_\text{iter}$ is lowered. -We therefore now vary $n_\text{iter}$ at fixed window parameters and +Therefore, we vary $n_\text{iter}$ at fixed window parameters and fixed physical error rate. \begin{figure}[t] @@ -2516,9 +2515,9 @@ fixed physical error rate. sliding-window decoding as a function of the maximum number of inner \ac{bp} iterations $n_\text{iter}$. The dashed colored curves correspond to cold-start sliding-window -decoding and the solid colored curves to warm-start, again carrying -over both the \ac{bp} messages and the channel \acp{llr} on the -overlap region. +decoding and the solid colored curves to warm-start, which again +retains both the \ac{bp} messages and the decimaiton information on +the overlap region. The physical error rate is fixed at $p = 0.0025$ and the iteration budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096\}$. @@ -2533,7 +2532,7 @@ For low iteration budgets, all curves in both panels behave similarly to the plain-\ac{bp} curves in \Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}. The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and -the warm-start curves lie below their cold-start counterparts at +the warm-start configurations now outperform their cold-start counterparts at matching window parameters. As $n_\text{iter}$ continues to grow, however, the cold-start curves undergo a sharp drop, after which they lie roughly an order of @@ -3020,7 +3019,7 @@ and at $F = 1$, respectively. These observations match our expectations. With only the \ac{bp} messages carried over, the warm-start -initialization no longer freezes any \acp{vn} in the next window +initialization no longer freezes any \acp{vn} in the next window. The dependence of this benefit on $W$ and $F$ also recovers the pattern observed for plain \ac{bp} in \Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}: @@ -3034,7 +3033,7 @@ sliding-window decoding under \ac{bpgd} by summarizing our findings. Warm-starting the inner decoder still provides a consistent performance gain when the inner decoder is upgraded from plain \ac{bp} to its guided-decimation variant, but only if some care is -taken in choosing what to carry over. +taken in choosing what to information carry over. Passing the channel \acp{llr} along with the \ac{bp} messages, as suggested by naively carrying over the warm-start idea to \ac{bpgd}, leads to premature hard decisions on \acp{vn} in the overlap region. @@ -3049,3 +3048,17 @@ requirements are substantially larger than those of plain \ac{bp}: the per-round \ac{ler} drops sharply only once the iteration budget is on the order of the number of \acp{vn} in each window. +Future work could include a softer treatment of the decimation state +in \ac{bpgd}. +Rather than discarding the decimation information of the previous +window entirely, as in the message-only warm start used here, one +could encode the decimation decisions as strong but finite biases on +the channel \acp{llr} of the next window, allowing the new window's parity +checks to override them if the syndrome calls for it. +This would interpolate between the two warm-start variants studied here and +might combine the benefits of both. +A related question is whether the decimation schedule itself should +be aware of the window structure, for instance by deferring +decimation of \acp{vn} in the overlap region until they have been +visited by the next window. +