From 7bf1b2f8d7f30e13d49e788abfb2923b3cc0d39f Mon Sep 17 00:00:00 2001
From: Andreas Tsouchlos <an.tsouchlos@gmail.com>
Date: Mon, 4 May 2026 17:07:41 +0200
Subject: [PATCH] Incorporate Jonathan's corrections to numerical results
 section

---
 src/thesis/chapters/4_decoding_under_dems.tex | 113 ++++++++++--------
 1 file changed, 63 insertions(+), 50 deletions(-)

diff --git a/src/thesis/chapters/4_decoding_under_dems.tex b/src/thesis/chapters/4_decoding_under_dems.tex
index ff6a07c..00814b2 100644
--- a/src/thesis/chapters/4_decoding_under_dems.tex
+++ b/src/thesis/chapters/4_decoding_under_dems.tex
@@ -711,7 +711,7 @@ estimates committed after decoding window $\ell$, we have to set
 % Intro: Problem with above procedure
 
 The sliding-window structure visible in \Cref{fig:windowing_pcm} is
-highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
+reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
 Switching our viewpoint to the Tanner graph depicted in
 \Cref{fig:messages_decimation_tanner}, however, we can see an important
 difference between \ac{sc}-\ac{ldpc} decoding and the
@@ -719,7 +719,7 @@ sliding-window decoding procedure detailed above.
 While the windowing process is similar, the algorithm above
 reinitializes the decoder to start from a clean state when moving to
 the next window.
-It therefore does not make use of the integral property of
+Therefore, it does not make use of the integral property of
 windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled
 structure by passing soft information from earlier to later spatial positions.
 
@@ -731,9 +731,10 @@ still relevant to the decoding of the next.
 This may somewhat limit the variety of \emph{inner decoders}, i.e.,
 the decoders decoding the individual windows, the warm-start
 initialization can be used with.
-E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though
-this remains to be investigated.
-We chose to investigate first plain \ac{bp} due to its simplicity and
+For instance, \ac{bp}+\ac{osd} does not immediately seem suitable, as
+it performs a hard decision on the \acp{vn}, though this remains to
+be investigated.
+We chose to investigate first standard \ac{bp} due to its simplicity and
 then \ac{bpgd} because of the availability of recently computed messages.
 
 % TODO: Include this?
@@ -900,7 +901,8 @@ To see how we realize this in practice, we reiterate the steps of the
     \right) \\[3mm]
     \text{\ac{cn} Update (Min-Sum): }&
     \displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i'
-    \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j}
+    \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i'
+        \rightarrow j}
     \right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert
     L_{i'\rightarrow j} \rvert \\[3mm]
     \label{eq:vn_update}
@@ -943,7 +945,7 @@ We can then continue decoding the next window as usual.
 
 We can further simplify the algorithm.
 Looking carefully at \Cref{eq:vn_update} we notice that when the
-\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized,
+\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been initialized to zero,
 the \ac{vn} update degenerates to
 \begin{align*}
     \displaystyle L_{i \rightarrow j} =
@@ -971,7 +973,7 @@ Note that the decoding procedure performed on the individual windows
     \label{alg:warm_start_bp}
     \begin{algorithmic}[1]
         \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
-        \State \textbf{Initialize:} $L_{i\leftarrow j} = 0
+        \State \textbf{Initialize:} $L_{i\leftarrow j} = 0,
             ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
         \For{$\ell = 0, \ldots, n_\text{win}-1$}
             \For{$\nu = 0, \ldots, n_\text{iter}-1$}
@@ -1227,7 +1229,7 @@ model, both of which depend on the code and noise model in question.
 % Software stack: Layer 3
 
 Even further up, given an already constructed syndrome extraction
-circuit and the resulting \acf{dem}, we must split the detector error
+circuit and the resulting \acf{dem}, we split the detector error
 matrix into separate windows and manage the interplay between the
 inner decoders acting on those individual windows.
 
@@ -1247,10 +1249,8 @@ For the circuit generation, we employed utilities from QUITS
 generation for a number of different \ac{qldpc} codes.
 We initially created a Python implementation, which used QUITS for the window
 splitting and subsequent sliding-window decoding as well.
-The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python.
-After a preliminary investigation, we opted for a complete
-reimplementation in Rust to achieve higher simulation speeds leveraging
-the compiled nature of the language.
+The \ac{bp} and \ac{bpgd} are implementation in Rust to achieve
+higher simulation speeds leveraging the compiled nature of the language.
 We reimplemented both the window splitting and the decoders.
 
 % Global experimental setup
@@ -1282,21 +1282,21 @@ generated by simulating at least $200$ logical error events.
 
 % Local experimental setup
 
-We began our investigation by using \ac{bp} with no further
+We begin our investigation by using \ac{bp} with no further
 modifications as the inner decoder.
 We chose the min-sum variant of \ac{bp} due to its low computational complexity.
 
 % [Thread] Get impression for max gain
 
-We initially wanted to gain an impression for the performance gain we could
+We initially want to gain an impression for the performance gain we could
 expect from a modification to the sliding-window decoding procedure.
-To this end, we began by analyzing the decoding performance of the
+To this end, we begin by analyzing the decoding performance of the
 original process, without our warm-start modification.
 We will call this \emph{cold-start} decoding in the following.
-Because we expected more global decoding to work better (the inner
+Because we expect more global decoding to work better (the inner
     decoder then has access to a larger portion of the long-range
     correlations encoded in the detector error matrix before any commit
-is made) we initially decided to use decoding on the whole detector
+is made) we initially decide to use decoding on the whole detector
 error matrix as a proxy for the attainable decoding performance.
 
 \begin{figure}[t]
@@ -1400,7 +1400,7 @@ this trend and, as expected, achieves the strongest performance.
 The fact that the $W = 5$ curve is already very close to the
 whole-block decoder indicates that the marginal benefit of enlarging
 the window saturates after a certain point.
-From a practical standpoint, the choice of $W$ thus represents a
+Thus, from a practical standpoint, the choice of $W$ represents a
 trade-off between decoding latency and accuracy: larger windows
 delay the start of decoding by requiring more syndrome extraction
 rounds to be collected upfront, while the diminishing returns above
@@ -1409,7 +1409,7 @@ additional accuracy in return.
 
 % [Thread] First comparison with warm start
 
-Next, we additionally generated error rate curves for warm-start
+Next, we additionally simulate error rate curves for warm-start
 sliding-window decoding to assess how much of the gap between
 cold-start and whole-block decoding can be recovered by our modification.
 We chose the same window sizes as before, so that the warm- and
@@ -1537,16 +1537,15 @@ consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger
 $W$ implies that more messages are carried over and a larger fraction
 of the next window starts in a warm state.
 % TODO: Possibly insert explanation for higher gain at lowre error rates
-A perhaps surprising observation is that the warm-start curve for
-$W = 5$ actually lies below the whole-block reference across the
+A perhaps surprising observation is that the warm-start for
+$W = 5$ outperforms the whole-block reference across the
 entire range of physical error rates, even though warm-start
 sliding-window decoding is, by construction, more local than
 whole-block decoding.
-A possible explanation for this effect is discussed in the following.
 
 % [Thread] Warm start is better than whole due to more effective iterations
 
-A possible explanation for this surprising behavior lies in the
+A possible explanation for this behavior lies in the
 number of \ac{bp} iterations effectively spent on the \acp{vn}
 inside the overlap region.
 Each \ac{vn} in such an overlap is processed by multiple consecutive
@@ -1742,15 +1741,15 @@ initialization diminishes, and the curves approach each other.
 The fact that no curve clearly saturates within the swept range is
 itself worth noting.
 We know that \ac{bp} on \ac{qldpc} codes suffers from poor
-convergence due to the short cycles in the underlying Tanner graph,
-so even after several thousand iterations the
-decoder may continue to slowly refine its message estimates rather
-than settle into a stable fixed point.
+convergence due to degeneracy and the short cycles in the underlying
+Tanner graph, so even after several thousand iterations the decoder
+may continue to slowly refine its message estimates rather than
+settle into a stable fixed point.
 This is one of the core motivations for moving from plain \ac{bp} to
 the guided-decimation variant studied in
 \Cref{subsec:Belief Propagation with Guided Decimation}.
 
-Another thing to note is that setting the per-invocation iteration
+Furthermore, note that setting the per-invocation iteration
 budget of the inner decoder equal to the iteration budget of the
 whole-block decoder is not a fair comparison in terms of total
 computational effort.
@@ -1762,14 +1761,14 @@ sliding-window approach is still at an advantage.
 
 % [Thread] Exploration of the effect of the step size
 
-Having examined the effect of the window size $W$, we next turned to
+Having examined the effect of the window size $W$, we next turn to
 the second windowing parameter, the step size $F$.
-We carried out an investigation analogous to the one above:
-we first compared warm- and cold-start decoding across the full range
+We carry out an investigation analogous to the one above:
+we first compare warm- and cold-start decoding across the full range
 of physical error rates at a fixed iteration budget, and then we
-examined the dependence on the iteration budget at a fixed physical
+examine the dependence on the iteration budget at a fixed physical
 error rate.
-The window size was held fixed at $W = 5$ throughout, the value at
+The window size is fixed at $W = 5$ throughout, the value at
 which the warm-start variant produced the strongest performance in the
 previous experiments.
 
@@ -2032,7 +2031,7 @@ Similarly, assuming the decoder is fast enough to keep up with the
 incoming syndrome measurements corresponding to the \acp{cn} of
 subsequent windows, the time at which decoding is complete depends only
 on the amount of time spent on decoding the very last window.
-A smaller $F$ thus only costs additional total compute and not
+Thus, smaller $F$ only costs additional total compute and not
 additional latency, which is favorable for a warm-start
 sliding-window implementation.
 This is especially favorable for our warm-start modification, as it
@@ -2062,8 +2061,8 @@ both schemes process the same windows for the same number of
 iterations and differ only in the initialization of the \ac{bp}
 messages of each new window.
 We also observed that plain \ac{bp} did not saturate even at $4096$
-iterations, which we attribute to the short cycles in the underlying
-Tanner graph.
+iterations, which we attribute to the degeneracy and short cycles in
+the underlying Tanner graph.
 This motivates the next subsection, in which we replace the inner
 \ac{bp} decoder by its guided-decimation variant.
 
@@ -2261,7 +2260,7 @@ that can occur before every \ac{vn} in the window has been decimated.
 A preliminary investigation showed that \ac{bpgd} only delivers its
 intended performance gain once most \acp{vn} have actually been decimated,
 which motivated this choice.
-The physical error rate was swept from $p = 0.001$ to $p = 0.004$
+The physical error rate is swept from $p = 0.001$ to $p = 0.004$
 in steps of $0.0005$.
 \Cref{fig:bpgd_w} sweeps over the window size with
 $W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and
@@ -2304,7 +2303,7 @@ matrix at the time of decoding, and this benefits both warm- and
 cold-start decoding.
 The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
 opposite of the corresponding dependence under plain \ac{bp}
-(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts
+(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance
 rather than helps, even though smaller $F$ implies a larger overlap
 in both cases.
 
@@ -2319,13 +2318,13 @@ every \ac{vn} in a window, by the time window $\ell$ ends, all
 of its \acp{vn} have already been hard-decided.
 For the \acp{vn} that lie in the overlap region with window $\ell + 1$
 this hard decision is then carried into the next window through the
-warm-start initialization, and the next window thus begins decoding
-with a substantial fraction of its \acp{vn} already frozen, before
+warm-start initialization, and the next window begins decoding
+with a substantial fraction of its \acp{vn} already fixed, before
 its own parity checks have had any chance to influence the
 corresponding bit estimates.
 This identifies one of two competing effects on the warm-start performance.
-The larger the overlap, the more such prematurely frozen \acp{vn} the
-next window inherits, which hurts performance.
+The larger the overlap, the more such prematurely fixed \acp{vn} the
+next window inherits, which degrades performance.
 On the other hand, a larger window still exposes the inner decoder to
 a larger set of constraints, which helps performance.
 The two effects together are consistent with what we observe in
@@ -2346,7 +2345,7 @@ $n_\text{iter}$ should reduce the maximum number of \acp{vn} that can
 be decimated before window $\ell$ commits, and the warm-start
 performance should approach that of warm-start under plain \ac{bp} as
 $n_\text{iter}$ is lowered.
-We therefore now vary $n_\text{iter}$ at fixed window parameters and
+Therefore, we vary $n_\text{iter}$ at fixed window parameters and
 fixed physical error rate.
 
 \begin{figure}[t]
@@ -2516,9 +2515,9 @@ fixed physical error rate.
 sliding-window decoding as a function of the maximum number of inner
 \ac{bp} iterations $n_\text{iter}$.
 The dashed colored curves correspond to cold-start sliding-window
-decoding and the solid colored curves to warm-start, again carrying
-over both the \ac{bp} messages and the channel \acp{llr} on the
-overlap region.
+decoding and the solid colored curves to warm-start, which again
+retains both the \ac{bp} messages and the decimaiton information on
+the overlap region.
 The physical error rate is fixed at $p = 0.0025$ and the iteration
 budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024,
 1536, 2048, 2560, 3072, 3584, 4096\}$.
@@ -2533,7 +2532,7 @@ For low iteration budgets, all curves in both panels behave similarly
 to the plain-\ac{bp} curves in
 \Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}.
 The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and
-the warm-start curves lie below their cold-start counterparts at
+the warm-start configurations now outperform their cold-start counterparts at
 matching window parameters.
 As $n_\text{iter}$ continues to grow, however, the cold-start curves
 undergo a sharp drop, after which they lie roughly an order of
@@ -3020,7 +3019,7 @@ and at $F = 1$, respectively.
 
 These observations match our expectations.
 With only the \ac{bp} messages carried over, the warm-start
-initialization no longer freezes any \acp{vn} in the next window
+initialization no longer freezes any \acp{vn} in the next window.
 The dependence of this benefit on $W$ and $F$ also recovers the
 pattern observed for plain \ac{bp} in
 \Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}:
@@ -3034,7 +3033,7 @@ sliding-window decoding under \ac{bpgd} by summarizing our findings.
 Warm-starting the inner decoder still provides a consistent
 performance gain when the inner decoder is upgraded from plain
 \ac{bp} to its guided-decimation variant, but only if some care is
-taken in choosing what to carry over.
+taken in choosing what to information carry over.
 Passing the channel \acp{llr} along with the \ac{bp} messages,
 as suggested by naively carrying over the warm-start idea to \ac{bpgd},
 leads to premature hard decisions on \acp{vn} in the overlap region.
@@ -3049,3 +3048,17 @@ requirements are substantially larger than those of plain \ac{bp}:
 the per-round \ac{ler} drops sharply only once the iteration budget
 is on the order of the number of \acp{vn} in each window.
 
+Future work could include a softer treatment of the decimation state
+in \ac{bpgd}.
+Rather than discarding the decimation information of the previous
+window entirely, as in the message-only warm start used here, one
+could encode the decimation decisions as strong but finite biases on
+the channel \acp{llr} of the next window, allowing the new window's parity
+checks to override them if the syndrome calls for it.
+This would interpolate between the two warm-start variants studied here and
+might combine the benefits of both.
+A related question is whether the decimation schedule itself should
+be aware of the window structure, for instance by deferring
+decimation of \acp{vn} in the overlap region until they have been
+visited by the next window.
+