Incorporate Jonathan's corrections to numerical results section

2026-05-04 17:07:41 +02:00
parent 72acea0321
commit 7bf1b2f8d7
1 changed files with 63 additions and 50 deletions
--- a/src/thesis/chapters/4_decoding_under_dems.tex
+++ b/src/thesis/chapters/4_decoding_under_dems.tex
@@ -711,7 +711,7 @@ estimates committed after decoding window $\ell$, we have to set
 % Intro: Problem with above procedure
 The sliding-window structure visible in \Cref{fig:windowing_pcm} is
-highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
+reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
 Switching our viewpoint to the Tanner graph depicted in
 \Cref{fig:messages_decimation_tanner}, however, we can see an important
 difference between \ac{sc}-\ac{ldpc} decoding and the
@@ -719,7 +719,7 @@ sliding-window decoding procedure detailed above.
 While the windowing process is similar, the algorithm above
 reinitializes the decoder to start from a clean state when moving to
 the next window.
-It therefore does not make use of the integral property of
+Therefore, it does not make use of the integral property of
 windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled
 structure by passing soft information from earlier to later spatial positions.
@@ -731,9 +731,10 @@ still relevant to the decoding of the next.
 This may somewhat limit the variety of \emph{inner decoders}, i.e.,
 the decoders decoding the individual windows, the warm-start
 initialization can be used with.
-E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though
+For instance, \ac{bp}+\ac{osd} does not immediately seem suitable, as
-this remains to be investigated.
+it performs a hard decision on the \acp{vn}, though this remains to
-We chose to investigate first plain \ac{bp} due to its simplicity and
+be investigated.
 We chose to investigate first standard \ac{bp} due to its simplicity and
 then \ac{bpgd} because of the availability of recently computed messages.
 % TODO: Include this?
@@ -900,7 +901,8 @@ To see how we realize this in practice, we reiterate the steps of the
    \right) \\[3mm]
    \text{\ac{cn} Update (Min-Sum): }&
    \displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i'
-    \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j}
+    \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \sign \left( L_{i'
        \rightarrow j}
    \right) \cdot \min_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}} \lvert
    L_{i'\rightarrow j} \rvert \\[3mm]
    \label{eq:vn_update}
@@ -943,7 +945,7 @@ We can then continue decoding the next window as usual.
 We can further simplify the algorithm.
 Looking carefully at \Cref{eq:vn_update} we notice that when the
-\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized,
+\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been initialized to zero,
 the \ac{vn} update degenerates to
 \begin{align*}
    \displaystyle L_{i \rightarrow j} =
@@ -971,7 +973,7 @@ Note that the decoding procedure performed on the individual windows
    \label{alg:warm_start_bp}
    \begin{algorithmic}[1]
        \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
-        \State \textbf{Initialize:} $L_{i\leftarrow j} = 0
+        \State \textbf{Initialize:} $L_{i\leftarrow j} = 0,
            ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
        \For{$\ell = 0, \ldots, n_\text{win}-1$}
            \For{$\nu = 0, \ldots, n_\text{iter}-1$}
@@ -1227,7 +1229,7 @@ model, both of which depend on the code and noise model in question.
 % Software stack: Layer 3
 Even further up, given an already constructed syndrome extraction
-circuit and the resulting \acf{dem}, we must split the detector error
+circuit and the resulting \acf{dem}, we split the detector error
 matrix into separate windows and manage the interplay between the
 inner decoders acting on those individual windows.
@@ -1247,10 +1249,8 @@ For the circuit generation, we employed utilities from QUITS
 generation for a number of different \ac{qldpc} codes.
 We initially created a Python implementation, which used QUITS for the window
 splitting and subsequent sliding-window decoding as well.
-The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python.
+The \ac{bp} and \ac{bpgd} are implementation in Rust to achieve
-After a preliminary investigation, we opted for a complete
+higher simulation speeds leveraging the compiled nature of the language.
 reimplementation in Rust to achieve higher simulation speeds leveraging
 the compiled nature of the language.
 We reimplemented both the window splitting and the decoders.
 % Global experimental setup
@@ -1282,21 +1282,21 @@ generated by simulating at least $200$ logical error events.
 % Local experimental setup
-We began our investigation by using \ac{bp} with no further
+We begin our investigation by using \ac{bp} with no further
 modifications as the inner decoder.
 We chose the min-sum variant of \ac{bp} due to its low computational complexity.
 % [Thread] Get impression for max gain
-We initially wanted to gain an impression for the performance gain we could
+We initially want to gain an impression for the performance gain we could
 expect from a modification to the sliding-window decoding procedure.
-To this end, we began by analyzing the decoding performance of the
+To this end, we begin by analyzing the decoding performance of the
 original process, without our warm-start modification.
 We will call this \emph{cold-start} decoding in the following.
-Because we expected more global decoding to work better (the inner
+Because we expect more global decoding to work better (the inner
    decoder then has access to a larger portion of the long-range
    correlations encoded in the detector error matrix before any commit
-is made) we initially decided to use decoding on the whole detector
+is made) we initially decide to use decoding on the whole detector
 error matrix as a proxy for the attainable decoding performance.
 \begin{figure}[t]
@@ -1400,7 +1400,7 @@ this trend and, as expected, achieves the strongest performance.
 The fact that the $W = 5$ curve is already very close to the
 whole-block decoder indicates that the marginal benefit of enlarging
 the window saturates after a certain point.
-From a practical standpoint, the choice of $W$ thus represents a
+Thus, from a practical standpoint, the choice of $W$ represents a
 trade-off between decoding latency and accuracy: larger windows
 delay the start of decoding by requiring more syndrome extraction
 rounds to be collected upfront, while the diminishing returns above
@@ -1409,7 +1409,7 @@ additional accuracy in return.
 % [Thread] First comparison with warm start
-Next, we additionally generated error rate curves for warm-start
+Next, we additionally simulate error rate curves for warm-start
 sliding-window decoding to assess how much of the gap between
 cold-start and whole-block decoding can be recovered by our modification.
 We chose the same window sizes as before, so that the warm- and
@@ -1537,16 +1537,15 @@ consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger
 $W$ implies that more messages are carried over and a larger fraction
 of the next window starts in a warm state.
 % TODO: Possibly insert explanation for higher gain at lowre error rates
-A perhaps surprising observation is that the warm-start curve for
+A perhaps surprising observation is that the warm-start for
-$W = 5$ actually lies below the whole-block reference across the
+$W = 5$ outperforms the whole-block reference across the
 entire range of physical error rates, even though warm-start
 sliding-window decoding is, by construction, more local than
 whole-block decoding.
 A possible explanation for this effect is discussed in the following.
 % [Thread] Warm start is better than whole due to more effective iterations
-A possible explanation for this surprising behavior lies in the
+A possible explanation for this behavior lies in the
 number of \ac{bp} iterations effectively spent on the \acp{vn}
 inside the overlap region.
 Each \ac{vn} in such an overlap is processed by multiple consecutive
@@ -1742,15 +1741,15 @@ initialization diminishes, and the curves approach each other.
 The fact that no curve clearly saturates within the swept range is
 itself worth noting.
 We know that \ac{bp} on \ac{qldpc} codes suffers from poor
-convergence due to the short cycles in the underlying Tanner graph,
+convergence due to degeneracy and the short cycles in the underlying
-so even after several thousand iterations the
+Tanner graph, so even after several thousand iterations the decoder
-decoder may continue to slowly refine its message estimates rather
+may continue to slowly refine its message estimates rather than
-than settle into a stable fixed point.
+settle into a stable fixed point.
 This is one of the core motivations for moving from plain \ac{bp} to
 the guided-decimation variant studied in
 \Cref{subsec:Belief Propagation with Guided Decimation}.
-Another thing to note is that setting the per-invocation iteration
+Furthermore, note that setting the per-invocation iteration
 budget of the inner decoder equal to the iteration budget of the
 whole-block decoder is not a fair comparison in terms of total
 computational effort.
@@ -1762,14 +1761,14 @@ sliding-window approach is still at an advantage.
 % [Thread] Exploration of the effect of the step size
-Having examined the effect of the window size $W$, we next turned to
+Having examined the effect of the window size $W$, we next turn to
 the second windowing parameter, the step size $F$.
-We carried out an investigation analogous to the one above:
+We carry out an investigation analogous to the one above:
-we first compared warm- and cold-start decoding across the full range
+we first compare warm- and cold-start decoding across the full range
 of physical error rates at a fixed iteration budget, and then we
-examined the dependence on the iteration budget at a fixed physical
+examine the dependence on the iteration budget at a fixed physical
 error rate.
-The window size was held fixed at $W = 5$ throughout, the value at
+The window size is fixed at $W = 5$ throughout, the value at
 which the warm-start variant produced the strongest performance in the
 previous experiments.
@@ -2032,7 +2031,7 @@ Similarly, assuming the decoder is fast enough to keep up with the
 incoming syndrome measurements corresponding to the \acp{cn} of
 subsequent windows, the time at which decoding is complete depends only
 on the amount of time spent on decoding the very last window.
-A smaller $F$ thus only costs additional total compute and not
+Thus, smaller $F$ only costs additional total compute and not
 additional latency, which is favorable for a warm-start
 sliding-window implementation.
 This is especially favorable for our warm-start modification, as it
@@ -2062,8 +2061,8 @@ both schemes process the same windows for the same number of
 iterations and differ only in the initialization of the \ac{bp}
 messages of each new window.
 We also observed that plain \ac{bp} did not saturate even at $4096$
-iterations, which we attribute to the short cycles in the underlying
+iterations, which we attribute to the degeneracy and short cycles in
-Tanner graph.
+the underlying Tanner graph.
 This motivates the next subsection, in which we replace the inner
 \ac{bp} decoder by its guided-decimation variant.
@@ -2261,7 +2260,7 @@ that can occur before every \ac{vn} in the window has been decimated.
 A preliminary investigation showed that \ac{bpgd} only delivers its
 intended performance gain once most \acp{vn} have actually been decimated,
 which motivated this choice.
-The physical error rate was swept from $p = 0.001$ to $p = 0.004$
+The physical error rate is swept from $p = 0.001$ to $p = 0.004$
 in steps of $0.0005$.
 \Cref{fig:bpgd_w} sweeps over the window size with
 $W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and
@@ -2304,7 +2303,7 @@ matrix at the time of decoding, and this benefits both warm- and
 cold-start decoding.
 The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
 opposite of the corresponding dependence under plain \ac{bp}
-(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts
+(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance
 rather than helps, even though smaller $F$ implies a larger overlap
 in both cases.
@@ -2319,13 +2318,13 @@ every \ac{vn} in a window, by the time window $\ell$ ends, all
 of its \acp{vn} have already been hard-decided.
 For the \acp{vn} that lie in the overlap region with window $\ell + 1$
 this hard decision is then carried into the next window through the
-warm-start initialization, and the next window thus begins decoding
+warm-start initialization, and the next window begins decoding
-with a substantial fraction of its \acp{vn} already frozen, before
+with a substantial fraction of its \acp{vn} already fixed, before
 its own parity checks have had any chance to influence the
 corresponding bit estimates.
 This identifies one of two competing effects on the warm-start performance.
-The larger the overlap, the more such prematurely frozen \acp{vn} the
+The larger the overlap, the more such prematurely fixed \acp{vn} the
-next window inherits, which hurts performance.
+next window inherits, which degrades performance.
 On the other hand, a larger window still exposes the inner decoder to
 a larger set of constraints, which helps performance.
 The two effects together are consistent with what we observe in
@@ -2346,7 +2345,7 @@ $n_\text{iter}$ should reduce the maximum number of \acp{vn} that can
 be decimated before window $\ell$ commits, and the warm-start
 performance should approach that of warm-start under plain \ac{bp} as
 $n_\text{iter}$ is lowered.
-We therefore now vary $n_\text{iter}$ at fixed window parameters and
+Therefore, we vary $n_\text{iter}$ at fixed window parameters and
 fixed physical error rate.
 \begin{figure}[t]
@@ -2516,9 +2515,9 @@ fixed physical error rate.
 sliding-window decoding as a function of the maximum number of inner
 \ac{bp} iterations $n_\text{iter}$.
 The dashed colored curves correspond to cold-start sliding-window
-decoding and the solid colored curves to warm-start, again carrying
+decoding and the solid colored curves to warm-start, which again
-over both the \ac{bp} messages and the channel \acp{llr} on the
+retains both the \ac{bp} messages and the decimaiton information on
-overlap region.
+the overlap region.
 The physical error rate is fixed at $p = 0.0025$ and the iteration
 budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024,
 1536, 2048, 2560, 3072, 3584, 4096\}$.
@@ -2533,7 +2532,7 @@ For low iteration budgets, all curves in both panels behave similarly
 to the plain-\ac{bp} curves in
 \Cref{fig:bp_w_over_iter,fig:bp_f_over_iter}.
 The per-round \ac{ler} decreases gradually with $n_\text{iter}$, and
-the warm-start curves lie below their cold-start counterparts at
+the warm-start configurations now outperform their cold-start counterparts at
 matching window parameters.
 As $n_\text{iter}$ continues to grow, however, the cold-start curves
 undergo a sharp drop, after which they lie roughly an order of
@@ -3020,7 +3019,7 @@ and at $F = 1$, respectively.
 These observations match our expectations.
 With only the \ac{bp} messages carried over, the warm-start
-initialization no longer freezes any \acp{vn} in the next window
+initialization no longer freezes any \acp{vn} in the next window.
 The dependence of this benefit on $W$ and $F$ also recovers the
 pattern observed for plain \ac{bp} in
 \Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}:
@@ -3034,7 +3033,7 @@ sliding-window decoding under \ac{bpgd} by summarizing our findings.
 Warm-starting the inner decoder still provides a consistent
 performance gain when the inner decoder is upgraded from plain
 \ac{bp} to its guided-decimation variant, but only if some care is
-taken in choosing what to carry over.
+taken in choosing what to information carry over.
 Passing the channel \acp{llr} along with the \ac{bp} messages,
 as suggested by naively carrying over the warm-start idea to \ac{bpgd},
 leads to premature hard decisions on \acp{vn} in the overlap region.
@@ -3049,3 +3048,17 @@ requirements are substantially larger than those of plain \ac{bp}:
 the per-round \ac{ler} drops sharply only once the iteration budget
 is on the order of the number of \acp{vn} in each window.
 Future work could include a softer treatment of the decimation state
 in \ac{bpgd}.
 Rather than discarding the decimation information of the previous
 window entirely, as in the message-only warm start used here, one
 could encode the decimation decisions as strong but finite biases on
 the channel \acp{llr} of the next window, allowing the new window's parity
 checks to override them if the syndrome calls for it.
 This would interpolate between the two warm-start variants studied here and
 might combine the benefits of both.
 A related question is whether the decimation schedule itself should
 be aware of the window structure, for instance by deferring
 decimation of \acp{vn} in the overlap region until they have been
 visited by the next window.