diff --git a/src/thesis/chapters/4_decoding_under_dems.tex b/src/thesis/chapters/4_decoding_under_dems.tex
index 98c8401..16d912c 100644
--- a/src/thesis/chapters/4_decoding_under_dems.tex
+++ b/src/thesis/chapters/4_decoding_under_dems.tex
@@ -13,14 +13,13 @@ under \acp{dem}.
 
 We investigate decoding \acf{qldpc} codes under \acp{dem} in particular.
 We focus on \ac{qldpc} codes, as they have emerged as leading
-candidates for practical quantum error correction, offering the
-ability to encode more logical qubits per physical qubit than surface
-codes while maintaining favorable threshold properties
+candidates for practical quantum error correction, offering
+comparable thresholds with substantially improved encoding rates
 \cite[Sec.~1]{bravyi_high-threshold_2024}.
 Because of this, the decoding algorithms we consider will all be
 related to \acf{bp} in some way.
 Our aim is to build a fault-tolerant \ac{qec} system that works well
-even under consideration of circuit-level noise.
+even in the presence of circuit-level noise.
 We must overcome two main challenges to achieve this.
 
 First, recall the problems related to degeneracy, which is inherent
@@ -29,21 +28,21 @@ Because multiple minimum-weight codewords exist, the \ac{bp}
 algorithm becomes uncertain of the direction to proceed in.
 Additionally, the commutativity conditions of the stabilizers
 necessitate the existence of short cycles.
-These two aspects together lead to substantial convergence problems
-of \ac{bp} for quantum codes, when it is used on it's own.
+Together, these two aspects lead to substantial convergence problems
+of \ac{bp} for quantum codes, when it is used on its own.
 
 Second, the consideration of circuit-level noise introduces many more
 error locations into the circuit.
 Using \acp{dem}, we construct a new circuit code and model each of
 these error locations as a new \acf{vn}.
-We also perform multiple rounds of syndrome measuremetns,
+We also perform multiple rounds of syndrome measurements,
 exacerbating the problem.
 This leads to a massively increased computational complexity and
 latency of the decoding process.
 In our experiments using the $\llbracket 144,12,12 \rrbracket$
 \acf{bb} code with $12$ syndrome measurement rounds, for example, the
-number of \acp{vn} was increased from $144$ to $9504$, and the
-number of \acfp{cn} was increased from $72$ to $1008$.
+number of \acp{vn} grew from $144$ to $9504$, and the
+number of \acfp{cn} grew from $72$ to $1008$.
 
 The first problem is not inherent to \acp{dem} or fault-tolerance,
 but rather quantum codes in general.
@@ -53,23 +52,25 @@ The most popular approach is combining a few initial
 iterations of \ac{bp} with a second decoding algorithm, \ac{osd}
 \cite{roffe_decoding_2020}.
 Other approaches exist, such as \ac{aed}
-\cite{koutsioumpas_automorphism_2025}, were multiple variations of
+\cite{koutsioumpas_automorphism_2025}, where multiple variations of
 the code are decoded simultaneously to increase the chances of convergence.
 Here, we will focus on the \acf{bpgd} algorithm
 \cite{yao_belief_2024} we already introduced in \Cref{ch:Fundamentals},
 for reasons that will become clear later in the chapter.
 
 The second problem is inherent to decoding using \acp{dem}.
-This is an area that has been less studied.
+This is an area that has received less attention.
 As we saw in \Cref{sec:Quantum Error Correction}, for \ac{qec},
 latency is the main constraint, not raw computational complexity.
 The main way this is addressed in the literature is \emph{sliding
 window decoding}, which attempts to divide the overall decoding
 problem into many smaller ones that can be solved more efficiently.
 
-% TODO: This could potentially be abit more text (e.g., go into
+% TODO: This could potentially be a bit more text (e.g., go into
 % SC-LDPC like structure that serves as the inspiration for the
 % warm-start decoding. Or just go into warm-start decoding)
+Our own work will focus mostly on the the solution of the second
+problem using sliding-window decoding.
 We will start by briefly reviewing the existing work related to
 sliding-window decoding,
 before focusing on one specific realization.
@@ -78,7 +79,7 @@ perform numerical simulations to evaluate it.
 
 % and reducing latency is the main goal of the existing literature.
 % This is generally done using windowing approaches; either
-% sliding-window based, where the latency is reduced due an earlier
+% sliding-window based, where the latency is reduced due to an earlier
 % start to the decoding process \cite{kuo_fault-tolerant_2024}%
 % \cite{huang_improved_2023}\cite{huang_increasing_2024}\cite{gong_toward_2024},
 % or by decoding multiple windows in parallel
@@ -202,21 +203,21 @@ Each of these windows is then decoded separately.
 related to sliding-window decoding.
 The papers \cite{huang_improved_2023} and \cite{huang_increasing_2024} are
 lumped together, as they share the same content;
-one is simply preprint published earlier.
+one is simply a preprint published earlier.
 We will only refer to \cite{huang_increasing_2024} in the following.
 \cite{kang_quits_2025} is somewhat special in that the authors focus
-more on the introduction of a new simluator framework they call
+more on the introduction of a new simulator framework they call
 QUITS, rather than the performance of sliding-window decoding itself.
 \cite{gong_toward_2024} and \cite{kang_quits_2025} have made their
 software freely available online%
 \footnote{
-    https://github.com/mkangquantum/quits
+    \url{https://github.com/mkangquantum/quits}
 }%
 \footnote{
-    https://github.com/gongaa/SlidingWindowDecoder
+    \url{https://github.com/gongaa/SlidingWindowDecoder}
 }.
 A final thing to note is that \cite{dennis_topological_2002} never
-explicitly mention sliding windows, they call their scheme
+explicitly mentions sliding windows; the authors call their scheme
 ``overlapping recovery''.
 
 % Topological vs QLDPC
@@ -227,7 +228,7 @@ Most of the work on topological codes has treated surface codes,
 with the exception of \cite{kuo_fault-tolerant_2024} where toric
 codes were considered.
 With regard to \ac{qldpc} codes, in \cite{huang_increasing_2024}
-they examine \emph{hypergraph product} (\acs{hgp}) and
+the authors examine \emph{hypergraph product} (\acs{hgp}) and
 \emph{lifted-product} (\acs{lp}) codes.
 HGP codes are constructed from the product of two classical codes,
 while LP codes generalize this construction by additionally applying
@@ -237,7 +238,7 @@ are additionally considered.
 Like HGP codes, BPC codes are derived from a product construction,
 but exploit an additional symmetry to yield fewer physical qubits for
 the same code parameters.
-Finally, in \cite{gong_toward_2024} the authors explore \ac{bb} codes.
+Finally, \cite{gong_toward_2024} explores \ac{bb} codes.
 
 % Sequential vs parallel
 
@@ -246,14 +247,14 @@ arises of how exactly to realize the decoding.
 There are two main approaches, with differing mechanisms of reducing
 the latency.
 Some papers decode the sliding windows in a parallel fashion.
-The benefit in this case is the option to more effectively utilize
-classical hardware for decoding.
+The benefit in this case is
+is that classical hardware can be utilized more effectively.
 Others choose a sequential approach.
 Here, decoding can start earlier, as there is no need to wait for the
 syndrome measurements of all windows before beginning with the decoding.
 With the exception of \cite{dennis_topological_2002}, literature
 treating topological codes has mostly focused on parallel decoding
-while literature treating \ac{qldpc} codes has wholely considered
+while literature treating \ac{qldpc} codes has wholly considered
 sequential decoding.
 
 % Deep-dive into QLDPC methods
@@ -267,20 +268,21 @@ As we noted above, \ac{hgp} and \ac{lp} codes are considered in
 \ac{hgp}, \ac{lp} and \ac{bpc} codes are considered in \cite{kang_quits_2025},
 and \ac{bb} codes are considered in \cite{gong_toward_2024}.
 The employed noise models also differ;
-\cite{huang_increasing_2024} use phenomenological noise, while
+\cite{huang_increasing_2024} uses phenomenological noise, while
 \cite{gong_toward_2024} and \cite{kang_quits_2025} use circuit-level noise.
-Finally, \cite{gong_toward_2024} introduce their own variation of
+Finally, in \cite{gong_toward_2024} the authors introduce their own variation of
 \ac{bpgd}, \ac{bp} with \ac{gdg}, while \cite{huang_increasing_2024}
 and \cite{kang_quits_2025} use \ac{bp} + \ac{osd}.
 We would additionally like to note that only in
-\cite{gong_toward_2024} and \cite{kang_quits_2025} do the authors
+\cite{gong_toward_2024} and \cite{kang_quits_2025}
 explicitly work with the \ac{dem} formalism.
 
 \renewcommand{\arraystretch}{1.1}
 \setlength{\tabcolsep}{12pt}
 \begin{table}[t]
     \centering
-    \caption{Experimental conditions for papers related to \ac{qldpc} codes.}
+    \caption{Experimental conditions in the literature on
+    sliding-window decoding for \ac{qldpc} codes.}
     \vspace*{3mm}
     \label{table:experimental_conditions}
     \begin{tabular}{l|ccc}
@@ -381,14 +383,14 @@ explicitly work with the \ac{dem} formalism.
 
 In this section, we will examine the methodology by which a detector
 error matrix is divided into overlapping windows.
-The algorithm detailed here follows \cite{kang_quits_2025}, whose
-work is in turn based on \cite{huang_increasing_2024}.
+The algorithm detailed here follows \cite{kang_quits_2025}, which
+is in turn based on \cite{huang_increasing_2024}.
 
 % Very high-level overview
 
 Sliding-window decoding is made possible by the time-like structure
 of the syndrome extraction circuitry.
-This is epecially clearly visible under the \ac{dem} formalism, where
+This is especially clearly visible under the \ac{dem} formalism, where
 this manifests as a block-diagonal structure of the detector
 error matrix $\bm{H}$.
 Note that this presupposes a choice of detectors as seen in
@@ -396,7 +398,7 @@ Note that this presupposes a choice of detectors as seen in
 This block-diagonal structure introduces some locality in the
 interdependence between \acp{vn} and \acp{cn}.
 For each local set of \acp{vn}, there is only a local set of connected \acp{cn}.
-We exploit this fact by cutting the matrix into overlapping windows.
+We exploit this fact by partitioning the matrix into overlapping windows.
 \Cref{fig:windowing_pcm} depicts this process using the $\llbracket
 72, 6, 6 \rrbracket$ BB code as an example.
 
@@ -404,9 +406,9 @@ We exploit this fact by cutting the matrix into overlapping windows.
 
 How the locality is leveraged can be understood by considering the
 decoding process.
-After decoding a window, there is a subset of \acp{cn} that no longer
-contribute to decoding, as they are not connected to any \acp{vn}
-considered for the subsequent windows.
+After decoding a window, there is a subset of \acp{cn} that
+no longer contribute to decoding, since none of their
+neighboring \acp{vn} appear in subsequent windows.
 We call the set of \acp{vn} connected to those \acp{cn} the
 \emph{commit region} and we wish to commit them before moving to the
 next window, i.e., fix the values we estimate for the corresponding bits.
@@ -419,13 +421,13 @@ measurements for the first window are complete.
 
 There are two degrees of freedom in how we perform the windowing.
 The \emph{window size} $W \in \mathbb{N}$ represents the number of
-syndrome extraction rounds lumped into one window.
-The \emph{step size} $F \in \mathbb{N}$ represents the number of
-syndrome extraction rounds passed over before starting the next window.
+syndrome extraction rounds lumped into one window, while
+the \emph{step size} $F \in \mathbb{N}$ represents the number of
+syndrome extraction rounds skipped before starting the next window.
 $W$ controls the size of the windows while $F$ controls the overlap
 between them.
 As illustrated in \Cref{fig:windowing_pcm}, $W$ and $F$ control the
-window dimensions and locactions by defining the related \acp{cn},
+window dimensions and locations by defining the related \acp{cn},
 not the \acp{vn}.
 This is because while the number of overall \acp{cn} is only affected
 by the choice of the underlying code and the number of syndrome
@@ -462,9 +464,22 @@ and is difficult to predict beforehand.
     \vspace*{10mm}
 
     \caption{
-        Visualization of the windowing process on a detector
-        error matrix generated from the $\llbracket 72, 6, 6
-        \rrbracket$ BB code.
+        Visualization of the windowing process on a detector error
+        matrix generated from the $\llbracket 72, 6, 6 \rrbracket$
+        BB code under circuit-level noise.
+        The block-diagonal structure reflects the time-like locality
+        of the syndrome extraction circuit., with each block
+        corresponding to one syndrome measurement round.
+        Two consecutive windows are highlighted: the window size $W$
+        controls the number of syndrome rounds included in each
+        window, while the step size $F$ controls how many rounds
+        separate the start of one window from the next.
+        The bracketed region indicates the commit
+        region of the first window, i.e., the \acp{vn} that are committed
+        before moving to the second window.
+        % Visualization of the windowing process on a detector
+        % error matrix generated from the $\llbracket 72, 6, 6
+        % \rrbracket$ BB code.
     }
     \label{fig:windowing_pcm}
 \end{figure}
@@ -476,9 +491,9 @@ We use the variables $n,m \in \mathbb{N}$ to describe the number of
 \acp{vn} and \acp{cn} respectively.
 We index the \acp{vn} using the variable $i \in \mathcal{I} :=
 [0:n-1]$ and the \acp{cn} using the variable $j \in \mathcal{J} := [ 0 : m-1]$.
-Finally, we call $\mathcal{N}_\text{V}(i) = \left\{ i\in \mathcal{I}:
-\bm{H}_{j,i} = 1 \right\}$ and $\mathcal{N}_\text{C}(j) := \left\{ j
-\in \mathcal{J} : \bm{H}_{j,i} = 1 \right\}$ the neighborhoods of the
+Finally, we call $\mathcal{N}_\text{V}(i) = \left\{ j\in \mathcal{J}:
+\bm{H}_{j,i} = 1 \right\}$ and $\mathcal{N}_\text{C}(j) := \left\{ i
+\in \mathcal{I} : \bm{H}_{j,i} = 1 \right\}$ the neighborhoods of the
 corresponding nodes.
 In this case, we take $\bm{H} \in \mathbb{F}_2^{m\times n}$ to be the
 check matrix of the underlying code, from which the \ac{dem} was generated.
@@ -493,7 +508,7 @@ where $n_\text{win} \in \mathbb{N}$ is the number of windows.
 Because we defined the step size $F$ as the number of syndrome
 extraction rounds to skip, the first \ac{cn} of window $\ell$ should have index
 $\ell F m$.
-Similarly, because of the way we defined the step size $W$, the
+Similarly, because of the way we defined the window size $W$, the
 number of \acp{cn} should be $Wm$ for all but the last window.
 The number of \acp{cn} in the last window may differ if there are
 not enough \acp{cn} left to completely fill it.
@@ -511,8 +526,8 @@ We thus define
 $\mathcal{J}_\text{win}^{(\ell)}$ is the set of all \acp{cn} in the
 window while $\mathcal{J}_\text{commit}^{(\ell)}$ is the set of \acp{cn}
 that do not contribute to the next window and whose neighboring
-\acp{vn} will thus be comitted.
-We can additionally define the set of \acp{vn} that are shared between windows
+\acp{vn} will thus be committed.
+We can additionally define the set of \acp{cn} that are shared between windows
 $\ell$ and $\ell + 1$ as $\mathcal{J}_\text{overlap}^{(\ell)} :=
 \mathcal{J}_\text{win}^{(\ell)}\setminus \mathcal{J}_\text{commit}^{(\ell)}$.
 
@@ -540,7 +555,7 @@ The commit region of window $\ell$ should include all of the \acp{vn}
 neighboring any of the \acp{cn} in $\mathcal{J}_\text{commit}^{(\ell)}$.
 Consequently, the maximum index of the \acp{vn} we consider should be
 $i_\text{max}(\mathcal{J}_\text{commit}^{(\ell)})$.
-Additionally, the set of \acp{vn} comitted in the next window should
+Additionally, the set of \acp{vn} committed in the next window should
 start immediately afterwards.
 We thus define
 \begin{align*}
@@ -634,17 +649,32 @@ and after decoding all windows we will therefore have committed all \acp{vn}.
         };
     \end{tikzpicture}
 
-    \caption{Visual representation of notation used for window splitting.}
+    \caption{
+        Visual representation of the index sets used to define a sliding window.
+        The solid box delimits the rows ($\mathcal{J}_\text{win}^{(\ell)}$)
+        and columns ($\mathcal{I}_\text{win}^{(\ell)}$) of the detector
+        error matrix considered when decoding window $\ell$, while the
+        dashed box shows the analogous region for window $\ell + 1$.
+        The shaded region marks the submatrix
+        $\bm{H}_\text{overlap}^{(\ell)}$, whose rows correspond to the
+        overlap CNs $\mathcal{J}_\text{overlap}^{(\ell)} =
+        \mathcal{J}_\text{win}^{(\ell)} \setminus
+        \mathcal{J}_\text{commit}^{(\ell)}$ shared with the next window,
+        and whose columns correspond to the committed VNs
+        $\mathcal{I}_\text{commit}^{(\ell)}$.
+        After decoding window $\ell$, this submatrix is used to update
+        the syndrome of the overlap CNs based on the committed bit estimates.
+    }
     \label{fig:vis_rep}
 \end{figure}
 
 % Syndrome update
 
 \Cref{fig:vis_rep} illustrates the meaning of the various sets of nodes.
-We can also see a particular point we have to be careful about when
+We can also see a subtlety we must handle carefully when
 moving on to decode the next window.
 While the \acp{vn} in $\mathcal{J}_\text{commit}^{(\ell)}$ have no
-bearing on the further decoding process, the values commit for the
+bearing on the further decoding process, the values commited for the
 \acp{vn} in $\mathcal{I}_\text{commit}^{(\ell)}$ do.
 This is the case because these \acp{vn} have neighboring \acp{cn} in
 the next window.
@@ -656,7 +686,7 @@ $\bm{H}_\text{overlap}^{(\ell)} =
 We have to account for this fact by updating the syndrome $\bm{s}$
 based on the committed bit values.
 Specifically, if $\bm{e}_\text{commit}^{(\ell)}$ describes the error
-estimates commited after decoding window $\ell$, we have to set
+estimates committed after decoding window $\ell$, we have to set
 \begin{align*}
     \bm{s}_{\mathcal{J}_\text{overlap}^{(\ell)}} =
     \bm{H}_\text{overlap}^{(\ell)}
@@ -671,7 +701,7 @@ estimates commited after decoding window $\ell$, we have to set
 % Intro
 
 The sliding-window structure visible in \Cref{fig:windowing_pcm} is
-highly reminicent of the way \ac{sc}-\ac{ldpc} codes are decoded.
+highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
 Switching our viewpoint to the Tanner graph depicted in
 \Cref{fig:windowing_tanner}, however, we can see an important
 difference between \ac{sc}-\ac{ldpc} decoding and the
@@ -690,6 +720,15 @@ we perform a \emph{warm start} by initializing the messages in the
 overlapping region to the values last held during the decoding of the
 previous window.
 
+\content{callback to intro: explain why we consider BPGD instead of,
+e.g., BP+OSD}
+\content{Explain why we expect a warm start to be beneficial}
+\content{Mention that our own work ties into the bottom category in
+\Cref{fig:literature}}
+\content{Explicitly state that $\mathcal{I}_\text{win}^{(\ell)}$
+    overlaps with $\mathcal{I}_\text{win}^{(\ell + 1)}$, and that this is
+where the warm start applies}
+
 \begin{figure}[t]
     \centering
 
@@ -793,11 +832,13 @@ previous window.
 \end{figure}
 
 %%%%%%%%%%%%%%%%
-\subsection{Warm-Start Belief Propagation Decoding}
-\label{subsec:Warm-Start Belief Propagation Decoding}
+\subsection{Belief Propagation}
+\label{subsec:Warm-Start Belief Propagation}
 
 % Warm-Start decoding for BP
 
+\content{Explicitly name messages passed (${L_{j\leftarrow i} : i \in
+\ldots, j\in \ldots}$)}
 \content{Pass messages to next window}
 \content{(?) Explicitly mention initialization using only CN->VN
 messages + swapping of CN and VN update?}
@@ -926,7 +967,7 @@ messages + swapping of CN and VN update?}
 \end{figure}
 
 %%%%%%%%%%%%%%%%
-\subsection{Warm-Start Belief Propagation with Guided Decimation Decoding}
+\subsection{Belief Propagation with Guided Decimation}
 \label{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding}
 
 % Warm-Start decoding for BPGD
@@ -1073,7 +1114,7 @@ messages, pass decimation info}
 \section{Numerical Results}
 \label{sec:Numerical Results}
 
-% Intro
+% Simulation setup
 
 In this section, we perform numerical experiments to evaluate the
 modification to sliding-window decoding we introduced in
@@ -1093,6 +1134,10 @@ of the per-round \ac{ler} as defined in
 All datapoints have been generated by simulating at least $200$
 logical error events.
 
+\content{Mention the number of syndrome extraction rounds}
+
+% Software stack: Layer 1
+
 For the practical aspects of implementation, several layers of
 abstraction must be considered.
 The lowest layer is the circuit-level simulator.
@@ -1101,21 +1146,29 @@ quantum mechanical aspects of the system, including the modeling of
 noise on gates, idling qubits, and measurements according to the
 chosen noise model.
 
+% Software stack: Layer 2
+
 Moving one level of abstraction higher, the syndrome extraction
 circuit itself must be generated.
 This entails constructing the full circuit, including the ancilla
 measurements and the error locations introduced by the chosen noise
 model, both of which depend on the code and noise model in question.
 
+% Software stack: Layer 3
+
 Even further up, given an already constructed syndrome extraction
 circuit and the resulting \acf{dem}, we must split the detector error
 matrix into separate windows and manage the interplay between the
 inner decoders acting on those individual windows.
 
+% Software stack: Layer 4
+
 Finally, we require the decoder itself, which operates on a
 \acf{pcm} and a syndrome, with no dependence on the complexity of the
 layers below.
 
+% Software stack: Tools
+
 In our implementation, Stim \cite{gidney_stim_2021} served as the
 circuit-level simulator, chosen for its efficiency and native support
 for the \ac{dem} formalism.
@@ -1134,9 +1187,36 @@ We reimplemented both the window splitting and the decoders themselves.
 \subsection{Belief Propagation}
 \label{subsec:Belief Propagation}
 
-% Simulation setup
+% Intro
 
-\content{Use min-sum}
+We begin our investigation by using \ac{bp} with no further
+modifications as the inner decoder.
+We chose the min-sum variant of \ac{bp} due to its low computational complexity.
+
+% Whole decoding as a lower bound on the error rate
+
+We initially wanted to gain an impression for the performance gain we could
+expect from a modification to the sliding-window decoding procedure.
+To this end, we began by analyzing the decoding performance of the
+original process, without our warm-start modification.
+We will call this \emph{cold-start} decoding in the following.
+We examined the decoding performance for different window sizes $W$
+and compared it against the performance when decoding on the whole
+detector error matrix at once, i.e., without windowing.
+
+\Cref{fig:whole_vs_cold} depicts the results of this analysis.
+\red{[Write more about the experimental setup (200 BP iterations,
+fixed step size, what else?)]}
+\red{[Describe the plot (whole decoding in black, (?) list different
+window sizes and colors/markers, what else?)]}
+We can see that a larger window results in a lower overall error rate.
+This seems sensible, because the lower the window size, the more
+locally the decoding is performed.
+While this allows us to leverage the time-like structure of the
+circuitry more strongly and further reduce the latency, it is
+expected to lower the performance, since \red{[find something to say here]}.
+Decoding the whole detector error matrix globally with no windowing
+provides the best performance.
 
 \begin{figure}[t]
     \centering
@@ -1196,8 +1276,27 @@ We reimplemented both the window splitting and the decoders themselves.
         extraction were performed and the noise model is
         standard circuit-based depolarizing noise.
     }
+    \label{fig:whole_vs_cold}
 \end{figure}
 
+% Initial results of warm-start decoding
+
+As a next step, we additionally generated error rate curves using our
+warm-start modification.
+\red{[Again 200 BP iterations, etc.]}
+\Cref{fig:whole_vs_cold_vs_warm} shows the numerical results from
+this experiment.
+The cold-start results from the previous graph are now plotted in
+dashed lines, while the new warm-start results are plotted with solid lines.
+We can see that the decoding performance has been improved overall.
+
+% Unexpected: Warm-start better than whole
+
+Additionally, we can see some initially unexpected behavior: The warm-start
+sliding window decoding with $W=5$ performs better than decoding
+under consideration of the whole detector error matrix at once, even
+though the process is less global.
+
 \begin{figure}[t]
     \centering
     \begin{tikzpicture}
@@ -1279,6 +1378,7 @@ We reimplemented both the window splitting and the decoders themselves.
         extraction were performed and the noise model is
         standard circuit-based depolarizing noise.
     }
+    \label{fig:whole_vs_cold_vs_warm}
 \end{figure}
 
 \begin{figure}[t]