% TODO: Make all [H] -> [t]
\chapter{Decoding under Detector Error Models}

In \Cref{ch:Fundamentals} we introduced the fundamentals of classical
error correction, before moving on to quantum information science and
finally combining the two in \acf{qec}.
In \Cref{ch:Fault tolerance} we then turned to fault-tolerance, with
a focus on a specific way of implementing it, called \acfp{dem}.
In this chapter, we move on from the fundamental concepts and examine
how to apply them in practice.
Specifically, we concern ourselves with the practical aspects of decoding
under \acp{dem}.

We investigate decoding \acf{qldpc} codes under \acp{dem} in particular.
We focus on \ac{qldpc} codes, as they have emerged as leading
candidates for practical quantum error correction, offering
comparable thresholds with substantially improved encoding rates
\cite[Sec.~1]{bravyi_high-threshold_2024}.
Because of this, the decoding algorithms we consider will all be
related to \acf{bp} in some way.
Our aim is to build a fault-tolerant \ac{qec} system that works well
even in the presence of circuit-level noise.
We must overcome two main challenges to achieve this.

First, recall the problems related to degeneracy, which is inherent
to quantum codes.
Because multiple minimum-weight codewords exist, the \ac{bp}
algorithm becomes uncertain of the direction to proceed in.
Additionally, the commutativity conditions of the stabilizers
necessitate the existence of short cycles.
Together, these two aspects lead to substantial convergence problems
of \ac{bp} for quantum codes, when it is used on its own.

Second, the consideration of circuit-level noise introduces many more
error locations into the circuit.
Using \acp{dem}, we construct a new circuit code and model each of
these error locations as a new \acf{vn}.
We also perform multiple rounds of syndrome measurements,
exacerbating the problem.
This leads to a massively increased computational complexity and
latency of the decoding process.
In our experiments using the $\llbracket 144,12,12 \rrbracket$
\acf{bb} code with $12$ syndrome measurement rounds, for example, the
number of \acp{vn} grew from $144$ to $9504$, and the
number of \acfp{cn} grew from $72$ to $1008$.

The first problem is not inherent to \acp{dem} or fault-tolerance,
but rather quantum codes in general.
Many different approaches to solving it exist, usually centered
around somehow modifying \ac{bp}.
The most popular approach is combining a few initial
iterations of \ac{bp} with a second decoding algorithm, \ac{osd}
\cite{roffe_decoding_2020}.
Other approaches exist, such as \ac{aed}
\cite{koutsioumpas_automorphism_2025}, where multiple variations of
the code are decoded simultaneously to increase the chances of convergence.
Here, we will focus on the \acf{bpgd} algorithm
\cite{yao_belief_2024} we already introduced in \Cref{ch:Fundamentals},
for reasons that will become clear later in the chapter.

The second problem is inherent to decoding using \acp{dem}.
This is an area that has received less attention.
As we saw in \Cref{sec:Quantum Error Correction}, for \ac{qec},
latency is the main constraint, not raw computational complexity.
The main way this is addressed in the literature is \emph{sliding
window decoding}, which attempts to divide the overall decoding
problem into many smaller ones that can be solved more efficiently.

% TODO: This could potentially be a bit more text (e.g., go into
% SC-LDPC like structure that serves as the inspiration for the
% warm-start decoding. Or just go into warm-start decoding)
Our own work will focus mostly on the the solution of the second
problem using sliding-window decoding.
We will start by briefly reviewing the existing work related to
sliding-window decoding,
before focusing on one specific realization.
We will then introduce a modification to the existing algorithm and
perform numerical simulations to evaluate it.

% and reducing latency is the main goal of the existing literature.
% This is generally done using windowing approaches; either
% sliding-window based, where the latency is reduced due to an earlier
% start to the decoding process \cite{kuo_fault-tolerant_2024}%
% \cite{huang_improved_2023}\cite{huang_increasing_2024}\cite{gong_toward_2024},
% or by decoding multiple windows in parallel
% \cite{skoric_parallel_2023}\cite{tan_scalable_2023}.
% This work is based on the sliding-window method.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Sliding-Window Decoding}
\label{sec:Sliding-Window Decoding}

% Spacetime codes

\ac{qec} codes are often viewed through the lenses of the
\emph{space} and \emph{time} dimensions.
Both directions add redundancy, but they do so in a different way and
guard against different defects.
The space dimension corresponds to the redundancy added through the
code itself, while the time dimension corresponds to the repetition
of the syndrome measurements \cite[Sec.~IV.B]{dennis_topological_2002}.

% Basic idea

The idea of sliding-window decoding is to exploit the time-like
structure by splitting the circuit into overlapping windows along the
time dimension.
Each of these windows is then decoded separately.

%%%%%%%%%%%%%%%%
\subsection{Review of Existing Literature}
\label{subsec:Review of Existing Literature}

\begin{figure}[t]
    \centering

    \tikzset{
        literature/.append style={
            minimum width=6mm,
            minimum height=6mm,
            text width=18mm,
            align=left,
        }
    }
    \tikzset{
        heading/.append style={
            draw=black,
            minimum width=22mm,
            minimum height=6mm,
            align=left,
            rounded corners = 1mm,
        }
    }

    \tikzexternaldisable
    \begin{tikzpicture}[node distance = 0mm and 0mm]
        % tex-fmt: off
        \node[heading, minimum width=15mm, fill=gray!25]                  (code)  {Code};
        \node[heading, below right=2mm and -5mm of code, fill=orange!20]  (top)   {Topological};
        \node[heading, below right=45mm and -5mm of code, fill=orange!20] (qldpc) {QLDPC};

        \node[literature, below right=1mm and -12mm of top] (dennis) {\cite{dennis_topological_2002}};
        \node[literature, below=of dennis]                  (tan)    {\cite{tan_scalable_2023}};
        \node[literature, below=of tan]                     (skoric) {\cite{skoric_parallel_2023}};
        \node[literature, below=of skoric]                  (bombin) {\cite{bombin_modular_2023}};
        \node[literature, below=of bombin]                  (kuo)    {\cite{kuo_fault-tolerant_2024}};

        \node[literature, below right=1mm and -12mm of qldpc] (huang) {\cite{huang_improved_2023},\cite{huang_increasing_2024}};
        \node[literature, below=of huang]                     (gong)  {\cite{gong_toward_2024}};
        \node[literature, below=of gong]                      (kang)  {\cite{kang_quits_2025}};

        \coordinate (code-anchor)  at ($(code.south) + (-2mm,0)$);
        \coordinate (top-anchor)   at ($(top.south) + (-5mm,0)$);
        \coordinate (qldpc-anchor) at ($(qldpc.south) + (-5mm,0)$);

        \draw (code-anchor) |- (top);
        \draw (code-anchor) |- (qldpc);

        \draw (top-anchor) |- (dennis);
        \draw (top-anchor) |- (tan);
        \draw (top-anchor) |- (skoric);
        \draw (top-anchor) |- (bombin);
        \draw (top-anchor) |- (kuo);

        \draw (qldpc-anchor) |- (huang);
        \draw (qldpc-anchor) |- (gong);
        \draw (qldpc-anchor) |- (kang);

        \draw [
            line width=1pt,
            decorate,
            decoration={brace,amplitude=2mm,raise=5mm}
        ]
        (dennis.north east) -- (dennis.south east)
        node[midway,right,xshift=10mm]{Sequential};

        \draw [
            line width=1pt,
            decorate,
            decoration={brace,amplitude=2mm,raise=5mm}
        ]
        (tan.north east) -- (kuo.south east)
        node[midway,right,xshift=10mm]{Parallel};

        \draw [
            line width=1pt,
            decorate,
            decoration={brace,amplitude=2mm,raise=5mm}
        ]
        (huang.north east) -- (kang.south east)
        node[midway,right,xshift=10mm]{Sequential};
        % tex-fmt: on
    \end{tikzpicture}
    \tikzexternalenable

    \caption{Overview of literature on sliding-window decoding.}
    \label{fig:literature}
\end{figure}

% Some general notes

\Cref{fig:literature} gives an overview over the existing body of work
related to sliding-window decoding.
The papers \cite{huang_improved_2023} and \cite{huang_increasing_2024} are
lumped together, as they share the same content;
one is simply a preprint published earlier.
We will only refer to \cite{huang_increasing_2024} in the following.
\cite{kang_quits_2025} is somewhat special in that the authors focus
more on the introduction of a new simulator framework they call
QUITS, rather than the performance of sliding-window decoding itself.
\cite{gong_toward_2024} and \cite{kang_quits_2025} have made their
software freely available online%
\footnote{
    \url{https://github.com/mkangquantum/quits}
}%
\footnote{
    \url{https://github.com/gongaa/SlidingWindowDecoder}
}.
A final thing to note is that \cite{dennis_topological_2002} never
explicitly mentions sliding windows; the authors call their scheme
``overlapping recovery''.

% Topological vs QLDPC

Research has focused on two categories of \ac{qec} codes, topological
and \ac{qldpc} codes.
Most of the work on topological codes has treated surface codes,
with the exception of \cite{kuo_fault-tolerant_2024} where toric
codes were considered.
With regard to \ac{qldpc} codes, in \cite{huang_increasing_2024}
the authors examine \emph{hypergraph product} (\acs{hgp}) and
\emph{lifted-product} (\acs{lp}) codes.
HGP codes are constructed from the product of two classical codes,
while LP codes generalize this construction by additionally applying
a lift to reduce the qubit overhead.
In \cite{kang_quits_2025}, \emph{balanced product codes} (\acs{bpc})
are additionally considered.
Like HGP codes, BPC codes are derived from a product construction,
but exploit an additional symmetry to yield fewer physical qubits for
the same code parameters.
Finally, \cite{gong_toward_2024} explores \ac{bb} codes.

% Sequential vs parallel

After having divided the whole circuit into separate windows, the question
arises of how exactly to realize the decoding.
There are two main approaches, with differing mechanisms of reducing
the latency.
Some papers decode the sliding windows in a parallel fashion.
The benefit in this case is
is that classical hardware can be utilized more effectively.
Others choose a sequential approach.
Here, decoding can start earlier, as there is no need to wait for the
syndrome measurements of all windows before beginning with the decoding.
With the exception of \cite{dennis_topological_2002}, literature
treating topological codes has mostly focused on parallel decoding
while literature treating \ac{qldpc} codes has wholly considered
sequential decoding.

% Deep-dive into QLDPC methods

For this work, the publications treating \ac{qldpc} codes are
especially interesting.
The experimental conditions for these are summarized in
\Cref{table:experimental_conditions}.
As we noted above, \ac{hgp} and \ac{lp} codes are considered in
\cite{huang_increasing_2024},
\ac{hgp}, \ac{lp} and \ac{bpc} codes are considered in \cite{kang_quits_2025},
and \ac{bb} codes are considered in \cite{gong_toward_2024}.
The employed noise models also differ;
\cite{huang_increasing_2024} uses phenomenological noise, while
\cite{gong_toward_2024} and \cite{kang_quits_2025} use circuit-level noise.
Finally, in \cite{gong_toward_2024} the authors introduce their own variation of
\ac{bpgd}, \ac{bp} with \ac{gdg}, while \cite{huang_increasing_2024}
and \cite{kang_quits_2025} use \ac{bp} + \ac{osd}.
We would additionally like to note that only in
\cite{gong_toward_2024} and \cite{kang_quits_2025}
explicitly work with the \ac{dem} formalism.

\renewcommand{\arraystretch}{1.1}
\setlength{\tabcolsep}{12pt}
\begin{table}[t]
    \centering
    \caption{Experimental conditions in the literature on
    sliding-window decoding for \ac{qldpc} codes.}
    \vspace*{3mm}
    \label{table:experimental_conditions}
    \begin{tabular}{l|ccc}
        % tex-fmt: off
        Publication                                                            &    Code                       & Noise Model                     & Decoder \\ \hline
        \hspace{-2.5mm}\cite{huang_improved_2023},\cite{huang_increasing_2024} & \acs{hgp}, \acs{lp}           & Phenomenological noise & \acs{bp} + \acs{osd} \\
        \hspace{-2.5mm}\cite{gong_toward_2024}                                 & \acs{bb}                      & Circuit-level noise             & \acs{bp} + \acs{gdg} \\
        \hspace{-2.5mm}\cite{kang_quits_2025}                                  & \acs{hgp}, \acs{lp}, \acs{bpc}  & Circuit-level noise             & \acs{bp} + \ac{osd}
        % tex-fmt: on
    \end{tabular}
\end{table}

% \red{
%     Existing work
%     \begin{itemize}
%         \item \cite{gong_toward_2024}
%             \begin{itemize}
%                 \item BB codes (QLDPC)
%                 \item Circuit-level noise
%                 \item Sequential
%                 \item Cites $\underbrace{\cite{dennis_topological_2002}
%                         \cite{tan_scalable_2023}
%                     \cite{skoric_parallel_2023}}_\text{Surface code}
%                     \underbrace{\cite{huang_improved_2023}}_\text{QLDPC,Phenomenological}$
%             \end{itemize}
%         \item \cite{huang_improved_2023}
%             \begin{itemize}
%                 \item Hypergraph product codes, Lifted product codes (QLDPC)
%                 \item Phenomenological noise
%                 \item Sequential
%                 \item Cites $\underbrace{\cite{dennis_topological_2002}
%                         [Huang, Brown, 2021]
%                         \cite{skoric_parallel_2023}
%                         \cite{tan_scalable_2023}
%                     \cite{bombin_modular_2023}}_\text{Surface code}$
%             \end{itemize}
%         \item \cite{dennis_topological_2002}
%             \begin{itemize}
%                 \item Surface code (Topological)
%                 \item No idea what noise, don't care either (Gong et
%                     al. say circuit-level noise)
%                 \item ``Overlapping recovery'' -> Sequential
%             \end{itemize}
%         \item \cite{tan_scalable_2023}
%             \begin{itemize}
%                 \item Surface code (Topological)
%                 \item Circuit-level noise
%                 \item Parallel
%                 \item Cites \cite{dennis_topological_2002}
%             \end{itemize}
%         \item \cite{skoric_parallel_2023}
%             \begin{itemize}
%                 \item Surface code (Topological)
%                 \item Circuit-level noise
%                 \item Parallel
%                 \item Cites \cite{dennis_topological_2002}
%             \end{itemize}
%         \item \cite{huang_increasing_2024}
%             \begin{itemize}
%                 \item Same as \cite{huang_improved_2023}
%             \end{itemize}
%         \item \cite{kuo_fault-tolerant_2024}
%             \begin{itemize}
%                 \item Toric codes (Topological)
%                 \item Circuit-level noise
%                 \item Parallel
%                 \item Cites \cite{dennis_topological_2002}
%                     \cite{tan_scalable_2023}
%                     \cite{skoric_parallel_2023} \cite{gong_toward_2024}
%             \end{itemize}
%         \item \cite{bombin_modular_2023}
%             \begin{itemize}
%                 \item Surface codes (Topological)
%                 \item No idea if it's even fault-tolerant
%                 \item Parallel
%                 \item Cites \cite{dennis_topological_2002}
%                     \cite{tan_scalable_2023}
%                     \cite{skoric_parallel_2023} \cite{leverrier_decoding_2022}
%             \end{itemize}
%             % This is not BP and not parallelization over the time dimension
%             % \item \cite{leverrier_decoding_2022}
%             %     \begin{itemize}
%             %         \item Quantum tanner codes (QLDPC)
%             %         \item Parallel
%             %         \item No idea if it's even fault-tolerant
%             %         \item Cites [don't care]
%             %     \end{itemize}
%         \item \cite{kang_quits_2025}
%             \begin{itemize}
%                 \item Cites \cite{huang_increasing_2024} \ldots
%             \end{itemize}
%     \end{itemize}
% }

%%%%%%%%%%%%%%%%
\subsection{Window Splitting and Sequential Sliding-Window Decoding}
\label{subsec:Window Splitting and Sequential Sliding-Window Decoding}

In this section, we will examine the methodology by which a detector
error matrix is divided into overlapping windows.
The algorithm detailed here follows \cite{kang_quits_2025}, which
is in turn based on \cite{huang_increasing_2024}.

% Very high-level overview

Sliding-window decoding is made possible by the time-like structure
of the syndrome extraction circuitry.
This is especially clearly visible under the \ac{dem} formalism, where
this manifests as a block-diagonal structure of the detector
error matrix $\bm{H}$.
Note that this presupposes a choice of detectors as seen in
\Cref{subsec:Detector Error Matrix}.
This block-diagonal structure introduces some locality in the
interdependence between \acp{vn} and \acp{cn}.
For each local set of \acp{vn}, there is only a local set of connected \acp{cn}.
We exploit this fact by partitioning the matrix into overlapping windows.
\Cref{fig:windowing_pcm} depicts this process using the $\llbracket
72, 6, 6 \rrbracket$ BB code as an example.

% High-level overview

How the locality is leveraged can be understood by considering the
decoding process.
After decoding a window, there is a subset of \acp{cn} that
no longer contribute to decoding, since none of their
neighboring \acp{vn} appear in subsequent windows.
We call the set of \acp{vn} connected to those \acp{cn} the
\emph{commit region} and we wish to commit them before moving to the
next window, i.e., fix the values we estimate for the corresponding bits.
As mentioned above, the benefit of this sequential sliding-window
decoding approach
is that the decoding process can begin as soon as the syndrome
measurements for the first window are complete.

% W and F and why we look at rows, not columns

There are two degrees of freedom in how we perform the windowing.
The \emph{window size} $W \in \mathbb{N}$ represents the number of
syndrome extraction rounds lumped into one window, while
the \emph{step size} $F \in \mathbb{N}$ represents the number of
syndrome extraction rounds skipped before starting the next window.
$W$ controls the size of the windows while $F$ controls the overlap
between them.
As illustrated in \Cref{fig:windowing_pcm}, $W$ and $F$ control the
window dimensions and locations by defining the related \acp{cn},
not the \acp{vn}.
This is because while the number of overall \acp{cn} is only affected
by the choice of the underlying code and the number of syndrome
measurement rounds, the number of \acp{vn} depends on the noise model
and is difficult to predict beforehand.

\begin{figure}[t]
    \centering

    \hspace*{-114mm}%
    \begin{tikzpicture}
        \draw[decorate, decoration={brace, amplitude=10pt}, line width=1pt]
        (0,0) -- (3.1,0) node[midway, above=4mm] {Commit region};
    \end{tikzpicture}

    \centering
    \includegraphics[scale=0.75]{res/72_bb_dem.pdf}

    \vspace*{-25.3mm}

    \hspace*{-98mm}%
    \begin{tikzpicture}
        \draw[{Latex}-{Latex}, line width=.7pt] (0,   -0.75mm) -- (0,  5mm);
        \draw[line width=1pt]                   (-1mm,-0.75mm) --
        (3mm,-0.75mm);
        \draw[line width=1pt]                   (-1mm,5mm)     -- (3mm,5mm);
        \node[left] at                          (-2mm,2.125mm) {$\sim W$};

        \draw[{Latex}-{Latex}, line width=.3pt] (6.5cm,1.6mm)  -- (6.5cm,5mm);
        \draw[line width=1pt]                   (6.5cm,4.9mm)  -- (6.5cm,7mm);
        \node[above] at                         (6.5cm,7mm) {$\sim F$};
    \end{tikzpicture}

    \vspace*{10mm}

    \caption{
        Visualization of the windowing process on a detector error
        matrix generated from the $\llbracket 72, 6, 6 \rrbracket$
        BB code under circuit-level noise.
        The block-diagonal structure reflects the time-like locality
        of the syndrome extraction circuit., with each block
        corresponding to one syndrome measurement round.
        Two consecutive windows are highlighted: the window size $W$
        controls the number of syndrome rounds included in each
        window, while the step size $F$ controls how many rounds
        separate the start of one window from the next.
        The bracketed region indicates the commit
        region of the first window, i.e., the \acp{vn} that are committed
        before moving to the second window.
        % Visualization of the windowing process on a detector
        % error matrix generated from the $\llbracket 72, 6, 6
        % \rrbracket$ BB code.
    }
    \label{fig:windowing_pcm}
\end{figure}

% Notation recap

We briefly reintroduce the notation important for the definition of the windows.
We use the variables $n,m \in \mathbb{N}$ to describe the number of
\acp{vn} and \acp{cn} respectively.
We index the \acp{vn} using the variable $i \in \mathcal{I} :=
[0:n-1]$ and the \acp{cn} using the variable $j \in \mathcal{J} := [ 0 : m-1]$.
Finally, we call $\mathcal{N}_\text{V}(i) = \left\{ j\in \mathcal{J}:
\bm{H}_{j,i} = 1 \right\}$ and $\mathcal{N}_\text{C}(j) := \left\{ i
\in \mathcal{I} : \bm{H}_{j,i} = 1 \right\}$ the neighborhoods of the
corresponding nodes.
In this case, we take $\bm{H} \in \mathbb{F}_2^{m\times n}$ to be the
check matrix of the underlying code, from which the \ac{dem} was generated.
We use $m_\text{DEM}, \mathcal{I}_\text{DEM}$, and $\mathcal{J}_\text{DEM}$
to refer to the respective values defined from the detector error matrix.

% How we get the corresponding rows

We begin by describing the sets of \acp{cn} relevant to each window.
For indexing, we use the variable $\ell \in [0:n_\text{win} - 1]$,
where $n_\text{win} \in \mathbb{N}$ is the number of windows.
Because we defined the step size $F$ as the number of syndrome
extraction rounds to skip, the first \ac{cn} of window $\ell$ should have index
$\ell F m$.
Similarly, because of the way we defined the window size $W$, the
number of \acp{cn} should be $Wm$ for all but the last window.
The number of \acp{cn} in the last window may differ if there are
not enough \acp{cn} left to completely fill it.
We thus define
\begin{align*}
    \mathcal{J}_\text{win}^{(\ell)} &:= \left\{ j\in \mathcal{J}_\text{DEM}:~
        \ell F m \le j < \min \left\{m_\text{DEM}, (\ell F + W) m \right\}
    \right\} \\[2mm]
    & \hspace{30mm} \text{and} \\[2mm]
    \mathcal{J}_\text{commit}^{(\ell)} &:= \left\{ j\in \mathcal{J}_\text{DEM}:~
        \ell F m \le j < \min \left\{m_\text{DEM}, (\ell + 1) F m \right\}
    \right\}
    .%
\end{align*}
$\mathcal{J}_\text{win}^{(\ell)}$ is the set of all \acp{cn} in the
window while $\mathcal{J}_\text{commit}^{(\ell)}$ is the set of \acp{cn}
that do not contribute to the next window and whose neighboring
\acp{vn} will thus be committed.
We can additionally define the set of \acp{cn} that are shared between windows
$\ell$ and $\ell + 1$ as $\mathcal{J}_\text{overlap}^{(\ell)} :=
\mathcal{J}_\text{win}^{(\ell)}\setminus \mathcal{J}_\text{commit}^{(\ell)}$.

% How we get the corresponding columns

We can now turn our attention to defining the sets of \acp{vn} relevant
to each window.
We first introduce a helper function $i_\text{max} :
\mathcal{P}(\mathbb{N}) \to \mathbb{N}$, which takes a set of
\ac{cn} indices and returns the largest neighboring \ac{vn} index.
We define
\begin{align*}
    i_\text{max}\left( \mathcal{S} \right) := \max \left\{ i\in
    \mathcal{N}_\text{C}(j) : j\in \mathcal{S} \right\}
    ,
\end{align*}
where we set $i_\text{max} (\emptyset) = -1$ by convention%
\footnote{
    This has the effect of later automatically setting the lower
    bounds for the indices in $\mathcal{I}_\text{commit}^{(\ell)}$
    and $\mathcal{I}_\text{win}^{(\ell)}$ appropriately.
}%
.
The commit region of window $\ell$ should include all of the \acp{vn}
neighboring any of the \acp{cn} in $\mathcal{J}_\text{commit}^{(\ell)}$.
Consequently, the maximum index of the \acp{vn} we consider should be
$i_\text{max}(\mathcal{J}_\text{commit}^{(\ell)})$.
Additionally, the set of \acp{vn} committed in the next window should
start immediately afterwards.
We thus define
\begin{align*}
    \mathcal{I}_\text{commit}^{(\ell)}
    &:= \left\{i \in \mathcal{I}_\text{DEM} :~
        i_\text{max}\left( \mathcal{J}_\text{commit}^{(\ell-1)} \right)
        < i \le
        i_\text{max}\left( \mathcal{J}_\text{commit}^{(\ell)} \right)
    \right\}\\[2mm]
    & \hspace{39mm} \text{and} \\[2mm]
    \mathcal{I}_\text{win}^{(\ell)}
    &:= \left\{i \in \mathcal{I}_\text{DEM} :~
        i_\text{max}\left( \mathcal{J}_\text{commit}^{(\ell-1)} \right)
        < i \le
        i_\text{max}\left( \mathcal{J}_\text{win}^{(\ell)} \right)
    \right\}
    .%
\end{align*}
Again, we set $\mathcal{I}_\text{overlap}^{(\ell)} =
\mathcal{I}_\text{win}^{(\ell)}\setminus \mathcal{I}_\text{commit}^{(\ell)}$.
Note that we have
\begin{align*}
    \bigcup_{\ell=0}^{n_\text{win}-1}
    \mathcal{I}_\text{commit}^{(\ell)} = \mathcal{I}
\end{align*}
and after decoding all windows we will therefore have committed all \acp{vn}.

\begin{figure}[t]
    \centering

    \begin{tikzpicture}
        \def\sx{1.5}
        \def\sy{1.5}

        \coordinate (a00) at (0,0);
        \coordinate (a01) at (0,      3*\sy);
        \coordinate (a11) at (6*\sx,  3*\sy);
        \coordinate (a10) at (6*\sx,  0*\sy);

        \coordinate (b00) at (3.2*\sx, -1*\sy);
        \coordinate (b01) at (3.2*\sx,  2*\sy);
        \coordinate (b11) at (9.2*\sx,  2*\sy);
        \coordinate (b10) at (9.2*\sx, -1*\sy);

        \fill[gray!40] (a00) -- (a00 |- b01) -- (b01) -- (b01 |- a00) -- cycle;

        \draw (a00) -- (a01) -- (a11) -- (a10) -- cycle;
        \draw[densely dashed] (b00) -- (b01) -- (b11) -- (b10) -- cycle;

        \draw [
            decorate,
            decoration={brace,amplitude=3mm,raise=1mm}
        ]
        (a01) -- (a11)
        node[midway,above,yshift=4mm]{$\mathcal{I}_\text{win}^{(\ell)}$};

        \draw [
            decorate,
            decoration={brace,amplitude=3mm,raise=1mm}
        ]
        (a00 -| b00) -- (a00)
        node[midway,below,yshift=-4mm]{$\mathcal{I}_\text{commit}^{(\ell)}$};

        \draw [
            decorate,
            decoration={brace,amplitude=3mm,raise=1mm}
        ]
        (a00) -- (a01)
        node[midway,xshift=-3mm,left]{$\mathcal{J}_\text{win}^{(\ell)}$};

        \draw [
            decorate,
            decoration={brace,amplitude=3mm,raise=1mm}
        ]
        (a11) -- (a11 |- b11)
        node[midway,xshift=3mm,right]{$\mathcal{J}_\text{commit}^{(\ell)}$};

        \draw [
            decorate,
            decoration={brace,amplitude=3mm,raise=1mm}
        ]
        (a11 |- b11) -- (a10)
        node[midway,xshift=3mm,right]{$\mathcal{J}_\text{overlap}^{(\ell)}
            := \mathcal{J}_\text{win}^{(\ell)} \setminus
        \mathcal{J}_\text{commit}^{(\ell)}$};

        \draw [
            decorate,
            decoration={brace,amplitude=3mm,raise=1mm}
        ]
        (a10) -- (a00 -| b00)
        node[midway,yshift=-8.25mm,xshift=-8mm,right]{$\mathcal{I}_\text{overlap}^{(\ell)}
            := \mathcal{I}_\text{win}^{(\ell)} \setminus
        \mathcal{I}_\text{commit}^{(\ell)}$};

        \node[align=center] at ($(a00)!0.5!(b01)$)
        {%
            $\bm{H}_\text{overlap}^{(\ell)}$ \\[3mm]
            $=
            \left(\bm{H}_\text{DEM}\right)_{\mathcal{J}_\text{overlap}^{(\ell)},
            \mathcal{I}_\text{commit}^{(\ell)}}$%
        };
    \end{tikzpicture}

    \caption{
        Visual representation of the index sets used to define a sliding window.
        The solid box delimits the rows ($\mathcal{J}_\text{win}^{(\ell)}$)
        and columns ($\mathcal{I}_\text{win}^{(\ell)}$) of the detector
        error matrix considered when decoding window $\ell$, while the
        dashed box shows the analogous region for window $\ell + 1$.
        The shaded region marks the submatrix
        $\bm{H}_\text{overlap}^{(\ell)}$, whose rows correspond to the
        overlap CNs $\mathcal{J}_\text{overlap}^{(\ell)}$ shared with
        the next window, and whose columns correspond to the
        committed VNs $\mathcal{I}_\text{commit}^{(\ell)}$.
        After decoding window $\ell$, this submatrix is used to update
        the syndrome of the overlap CNs based on the committed bit estimates.
    }
    \label{fig:vis_rep}
\end{figure}

% Syndrome update

\Cref{fig:vis_rep} illustrates the meaning of the various sets of nodes.
We can also see a subtlety we must handle carefully when
moving on to decode the next window.
While the \acp{vn} in $\mathcal{J}_\text{commit}^{(\ell)}$ have no
bearing on the further decoding process, the values commited for the
\acp{vn} in $\mathcal{I}_\text{commit}^{(\ell)}$ do.
This is the case because these \acp{vn} have neighboring \acp{cn} in
the next window.
The part of the detector error matrix $\bm{H}_\text{DEM}$ describing
these connections is
$\bm{H}_\text{overlap}^{(\ell)} =
\left(\bm{H}_\text{DEM}\right)_{\mathcal{J}_\text{overlap}^{(\ell)},
\mathcal{I}_\text{commit}^{(\ell)}}$.
We have to account for this fact by updating the syndrome $\bm{s}$
based on the committed bit values.
Specifically, if $\hat{\bm{e}}_\text{commit}^{(\ell)}$ describes the error
estimates committed after decoding window $\ell$, we have to set
\begin{align*}
    \left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}} =
    \bm{H}_\text{overlap}^{(\ell)}
    \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}
    .%
\end{align*}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Warm-Start Sliding-Window Decoding}
\label{sec:warm_start_bp}

% Intro: Problem with above procedure

The sliding-window structure visible in \Cref{fig:windowing_pcm} is
highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
Switching our viewpoint to the Tanner graph depicted in
\Cref{fig:messages_decimation_tanner}, however, we can see an important
difference between \ac{sc}-\ac{ldpc} decoding and the
sliding-window decoding procedure detailed above.
While the windowing process is similar, the algorithm above
reinitializes the decoder to start from a clean state when moving to
the next window.
It therefore does not make use of the integral property of
windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled
structure by passing soft information from earlier to later spatial positions.

% Passing messages requires messages

The act of passing messages from one window to the next requires
there being messages after completing decoding of one window that are
still relevant to the decoding of the next.
This may somewhat limit the variety of \emph{inner decoders}, i.e.,
the decoders decoding the individual windows, the warm-start
initialization can be used with.
E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though
this remains to be investigated.
We chose to investigate first plain \ac{bp} due to its simplicity and
then \ac{bpgd} because of the availability of recently computed messages.

% TODO: Include this?
% \content{Mention that our own work ties into the bottom category in
% \Cref{fig:literature}}

%%%%%%%%%%%%%%%%
\subsection{Warm Start For Belief Propagation Decoding}
\label{subsec:Warm-Start Belief Propagation}

\begin{figure}[t]
    \centering

    \tikzset{
        VN/.style={
            circle, fill=KITgreen, minimum width=1mm, minimum height=1mm,
        },
        CN/.style={
            rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm,
        },
    }

    \begin{tikzpicture}[node distance = 5mm]
        \node[VN]                  (vn00) {};
        \node[VN, below = of vn00] (vn01) {};
        \node[VN, below = of vn01] (vn02) {};
        \node[VN, below = of vn02] (vn03) {};
        \node[VN, below = of vn03] (vn04) {};

        \coordinate (temp) at ($(vn01)!0.5!(vn02)$);

        \node[CN, left =10mm of temp] (cn00) {};
        \node[CN, below = of cn00] (cn01) {};

        \draw (vn00) -- (cn00);
        \draw (vn01) -- (cn00);
        \draw (vn03) -- (cn00);
        \draw (vn01) -- (cn01);
        \draw (vn02) -- (cn01);
        \draw (vn04) -- (cn01);

        \foreach \i in {1,2,3,4} {
            \pgfmathtruncatemacro{\prev}{\i-1}

            \node[VN, right = 25mm of vn\prev 0] (vn\i0) {};
            \node[VN, below = of vn\i0]          (vn\i1) {};
            \node[VN, below = of vn\i1]          (vn\i2) {};
            \node[VN, below = of vn\i2]          (vn\i3) {};
            \node[VN, below = of vn\i3]          (vn\i4) {};

            \coordinate (temp) at ($(vn\i1)!0.5!(vn\i2)$);

            \node[CN, left = 10mm of temp] (cn\i0) {};
            \node[CN, below = of cn\i0]     (cn\i1) {};

            \draw (vn\i0) -- (cn\i0);
            \draw (vn\i1) -- (cn\i0);
            \draw (vn\i3) -- (cn\i0);
            \draw (vn\i1) -- (cn\i1);
            \draw (vn\i2) -- (cn\i1);
            \draw (vn\i4) -- (cn\i1);
        }

        \foreach \i in {1,2,3,4} {
            \pgfmathtruncatemacro{\prev}{\i-1}

            \draw (vn\prev 3) -- (cn\i 0);
            \draw (vn\prev 4) -- (cn\i 1);
        }

        \node[
            draw, inner sep=5mm,line width=1pt,
            fit=(vn00)(vn04)(cn00)(cn01)(vn20)(vn24)(cn20)(cn21)
        ]
        (box1) {};
        \node[
            draw, dashed, inner sep=5mm, inner ysep=8mm,line width=1pt,
            fit=(vn10)(vn14)(cn10)(cn11)(vn30)(vn34)(cn30)(cn31)
        ]
        (box2) {};

        \draw[KITorange, line width=2pt] (cn10) -- (vn10);
        \draw[KITorange, line width=2pt] (cn10) -- (vn11);
        \draw[KITorange, line width=2pt] (cn10) -- (vn13);
        \draw[KITorange, line width=2pt] (cn11) -- (vn11);
        \draw[KITorange, line width=2pt] (cn11) -- (vn12);
        \draw[KITorange, line width=2pt] (cn11) -- (vn14);

        \draw[KITorange, line width=2pt] (vn13) -- (cn20);
        \draw[KITorange, line width=2pt] (vn14) -- (cn21);

        \draw[KITorange, line width=2pt] (cn20) -- (vn20);
        \draw[KITorange, line width=2pt] (cn20) -- (vn21);
        \draw[KITorange, line width=2pt] (cn20) -- (vn23);
        \draw[KITorange, line width=2pt] (cn21) -- (vn21);
        \draw[KITorange, line width=2pt] (cn21) -- (vn22);
        \draw[KITorange, line width=2pt] (cn21) -- (vn24);

        % Marker for W on the bottom
        \draw[line width=1pt]
        ([yshift=-5mm, line width=1pt]box1.south west) -- ++(0,-4mm)
        coordinate (dim1l);
        \draw[line width=1pt]
        ([yshift=-5mm]box1.south east) -- ++(0,-4mm)
        coordinate (dim1r);
        \draw[{Latex}-{Latex}, line width=1pt]
        ([yshift=1mm]dim1l) -- ([yshift=1mm]dim1r)
        node[midway, below=2pt] {$W$};

        % Marker for F on top
        \draw[line width=1pt]
        ([yshift=3mm]box2.north west) -- ++(0,4mm)
        coordinate (dim3l);
        \draw[line width=1pt]
        ([yshift=3mm]box2.north west -| box1.north west) -- ++(0,4mm)
        coordinate (dim3r);
        \draw[{Latex}-{Latex}, line width=1pt]
        ([yshift=-1mm]dim3l) -- ([yshift=-1mm]dim3r)
        node[midway, above=2pt] {$F$};

        % Arrow on the top right
        \draw[-{Latex}, line width=1pt]
        ([yshift=8mm] box1.north east) -- ++(28mm,0);
    \end{tikzpicture}

    \caption{
        \red{Visualization of the messages used for the
        initialization of the next window under BP decoding.}
        \Acfp{vn} are represented using green circles while \acfp{cn}
        are represented using blue squares.
    }
    \label{fig:messages_tanner}
\end{figure}

% Proposed modification: Overview

We propose a modification to the procedure detailed in
\Cref{subsec:Window Splitting and Sequential Sliding-Window Decoding}:
Instead of zero-initializing the \ac{bp} messages of the next window,
we perform a \emph{warm start} by initializing the messages in the
overlapping region to the values last held during the decoding of the
previous window.

% Practical realization: Problem with naive approach

To see how we realize this in practice, we reiterate the steps of the
\ac{bp} algorithm
\begin{align}
    \label{eq:init}
    \text{Initialization: } & L_{i \rightarrow j} = \tilde{L}_i \\[3mm]
    \text{\ac{cn} Update (SPA): }&
    \displaystyle L_{i \leftarrow j} =
    2\cdot(-1)^{s_j}\cdot\tanh^{-1}
    \!\left(
        \prod_{i' \in \mathcal{N}_\text{C}(j)\setminus\{i\}}
        \tanh\frac{L_{i'\rightarrow j}}{2}
    \right) \\[3mm]
    \text{\ac{cn} Update (Min-Sum): }&
    \displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i'
    \in \mathcal{N}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j}
    \right) \cdot \min_{i' \in \mathcal{N}(j)\setminus \{i\}} \lvert
    L_{i'\rightarrow j} \rvert \\[3mm]
    \label{eq:vn_update}
    \text{\ac{vn} Update: } & \displaystyle L_{i \rightarrow j} =
    \tilde{L}_i +
    \sum_{j' \in \mathcal{N}_\text{V}(i)\setminus\{j\}}
    L_{i \leftarrow j'}
\end{align}
and turn our attention to \Cref{fig:messages_tanner}.
We consider the right-most boundary of the first window, drawn with a
solid line.
The fact that we partition the overall Tanner graph at this location,
i.e., with the last nodes of the last window being \acp{vn} and the
first nodes of the next window being \acp{cn}, is due to
the windowing construction detailed in
\Cref{subsec:Window Splitting and Sequential Sliding-Window Decoding}.
We consider the edges connecting the last set of \acp{vn}
still in the first window to the next set of \acp{cn}.
These edges are the routes along which information is transferred
to subsequent spatial positions, in the form of the \ac{vn} to \ac{cn}
messages $L_{i\rightarrow j}$.
Note that these edges are not considered during the decoding the first
window, since they leave its bounds.
Consequently, no messages have been computed for these when the
decoding of the first window completes.
This means that simply initializing the edges in the overlap region
with the exising $L_{i\rightarrow j}$ messages and starting the
decoding of the next window with a \ac{cn} update is not enough.

% Practical realization: working approach

We can resolve this issue by initializing the edges using the existing
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ and beginning the
decoding of the next window with a \ac{vn} update instead.
This way, we recompute the existing $L_{i\rightarrow j}$ messages and
additionally compute the messages crossing the window boundary.
We can then continue decoding the next window as usual.

% Practical realization: Simplification of algorithm

We can further simplify the algorithm.
Looking carefully at \Cref{eq:vn_update} we notice that when the
\ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized,
the \ac{vn} update degenerates to
\begin{align*}
    \displaystyle L_{i \rightarrow j} =
    \tilde{L}_i +
    \sum_{j' \in \mathcal{N}_\text{V}(i)\setminus\{j\}}
    L_{i \leftarrow j'} = \tilde{L}_i
    ,%
\end{align*}
i.e., the \ac{vn} update \Cref{eq:vn_update} becomes the same as the
initialization step \Cref{eq:init}.
We conclude that as long as we zero-initialize the
$L_{i\leftarrow j}$ messages, there is no need for a separate
initialization step.
\Cref{alg:warm_start_bp} shows the full warm-start sliding-window
decoding algorithm using \ac{bp} as the inner decoder for the
windows.
Note that the decoding procedure performed on the individual windows
(lines 4-8 in \Cref{alg:warm_start_bp}) is functionally equivalent to
\Cref{alg:syndome_bp} when using the \acf{spa} variant of \ac{bp}.

% tex-fmt: off
\tikzexternaldisable
\begin{algorithm}[t]
    \caption{Sliding-window belief propagation (BP) decoding algorithm with warm start.}
    \label{alg:warm_start_bp}
    \begin{algorithmic}[1]
        \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
        \State \textbf{Initialize:} $L_{i\leftarrow j} = 0
            ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
        \For{$\ell = 0, \ldots, n_\text{win}-1$}
            \For{$\nu = 0, \ldots, n_\text{iter}-1$}
                \State Perform \ac{vn} update for window $\ell$
                \State Perform \ac{cn} update for window $\ell$
                \State Compute $\hat{\bm{e}}^{(\ell)}$ and check early
                    termination condition
            \EndFor
            \State $\displaystyle\left(\hat{\bm{e}}^\text{total}\right)_{\mathcal{I}^{(\ell)}_\text{commit}} \leftarrow \hat{\bm{e}}^{(\ell)}_\text{commit}$
            \State $\displaystyle\left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}}
                \leftarrow \bm{H}_\text{overlap}^{(\ell)}
                \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}$
            \If{$\ell < n_\text{win} - 1$}
                \State $L^{(\ell+1)}_{i\leftarrow j} \leftarrow
                L^{(\ell)}_{i\leftarrow j}
                ~\forall~ i \in \mathcal{I}_\text{overlap}^{(\ell)},
                    j \in \mathcal{J}_\text{overlap}^{(\ell)}$
            \EndIf
        \EndFor
        \State \textbf{return} $\hat{\bm{e}}$
    \end{algorithmic}
\end{algorithm}
\tikzexternalenable
% tex-fmt: on

%%%%%%%%%%%%%%%%
\subsection{Warm Start for Belief Propagation with Guided Decimation Decoding}
\label{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding}

% Intro: Recap of BPGD

We now direct our attention at using \ac{bpgd} as an inner decoder.
Recall that for \ac{bpgd}, after a number $T \in \mathbb{N}$ of
iterations we decimate
the most reliable \ac{vn}, meaning we perform a hard decision and
remove it from the following decoding process.

This means that when moving from one window to the next, we now have
more information available: not just the \ac{bp} messages but also the
information about what \acp{vn} were decimated and to what values.
We call this \emph{decimation information} in the following.
We can extend \Cref{alg:warm_start_bp} by additionally passing the
decimation information after initializing the \ac{cn} to \ac{vn} messages.
\Cref{fig:messages_decimation_tanner} visualizes this process.

% TODO: Do this in the fundamentals chapter. Then write a proper
% algorithm for warm-start sliding-window decoding with BPGD as well
%\content{(?) Explicitly mention decimation info = channel llrs?}

\begin{figure}[t]
    \centering

    \tikzset{
        VN/.style={
            circle, fill=KITgreen, minimum width=1mm, minimum height=1mm,
        },
        CN/.style={
            rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm,
        },
    }

    \begin{tikzpicture}[node distance = 5mm]
        \node[VN]                  (vn00) {};
        \node[VN, below = of vn00] (vn01) {};
        \node[VN, below = of vn01] (vn02) {};
        \node[VN, below = of vn02] (vn03) {};
        \node[VN, below = of vn03] (vn04) {};

        \coordinate (temp) at ($(vn01)!0.5!(vn02)$);

        \node[CN, left =10mm of temp] (cn00) {};
        \node[CN, below = of cn00] (cn01) {};

        \draw (vn00) -- (cn00);
        \draw (vn01) -- (cn00);
        \draw (vn03) -- (cn00);
        \draw (vn01) -- (cn01);
        \draw (vn02) -- (cn01);
        \draw (vn04) -- (cn01);

        \foreach \i in {1,2,3,4} {
            \pgfmathtruncatemacro{\prev}{\i-1}

            \node[VN, right = 25mm of vn\prev 0] (vn\i0) {};
            \node[VN, below = of vn\i0]          (vn\i1) {};
            \node[VN, below = of vn\i1]          (vn\i2) {};
            \node[VN, below = of vn\i2]          (vn\i3) {};
            \node[VN, below = of vn\i3]          (vn\i4) {};

            \coordinate (temp) at ($(vn\i1)!0.5!(vn\i2)$);

            \node[CN, left = 10mm of temp] (cn\i0) {};
            \node[CN, below = of cn\i0]     (cn\i1) {};

            \draw (vn\i0) -- (cn\i0);
            \draw (vn\i1) -- (cn\i0);
            \draw (vn\i3) -- (cn\i0);
            \draw (vn\i1) -- (cn\i1);
            \draw (vn\i2) -- (cn\i1);
            \draw (vn\i4) -- (cn\i1);
        }

        \foreach \i in {1,2,3,4} {
            \pgfmathtruncatemacro{\prev}{\i-1}

            \draw (vn\prev 3) -- (cn\i 0);
            \draw (vn\prev 4) -- (cn\i 1);
        }

        \node[
            draw, inner sep=5mm,line width=1pt,
            fit=(vn00)(vn04)(cn00)(cn01)(vn20)(vn24)(cn20)(cn21)
        ]
        (box1) {};
        \node[
            draw, dashed, inner sep=5mm, inner ysep=8mm,line width=1pt,
            fit=(vn10)(vn14)(cn10)(cn11)(vn30)(vn34)(cn30)(cn31)
        ]
        (box2) {};

        \draw[KITorange, line width=2pt] (cn10) -- (vn10);
        \draw[KITorange, line width=2pt] (cn10) -- (vn11);
        \draw[KITorange, line width=2pt] (cn10) -- (vn13);
        \draw[KITorange, line width=2pt] (cn11) -- (vn11);
        \draw[KITorange, line width=2pt] (cn11) -- (vn12);
        \draw[KITorange, line width=2pt] (cn11) -- (vn14);

        \draw[KITorange, line width=2pt] (vn13) -- (cn20);
        \draw[KITorange, line width=2pt] (vn14) -- (cn21);

        \draw[KITorange, line width=2pt] (cn20) -- (vn20);
        \draw[KITorange, line width=2pt] (cn20) -- (vn21);
        \draw[KITorange, line width=2pt] (cn20) -- (vn23);
        \draw[KITorange, line width=2pt] (cn21) -- (vn21);
        \draw[KITorange, line width=2pt] (cn21) -- (vn22);
        \draw[KITorange, line width=2pt] (cn21) -- (vn24);

        \node[VN, draw=KITorange, fill=KITorange] at (vn10) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn11) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn12) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn13) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn14) {};

        \node[VN, draw=KITorange, fill=KITorange] at (vn20) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn21) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn22) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn23) {};
        \node[VN, draw=KITorange, fill=KITorange] at (vn24) {};

        % Marker for W on the bottom
        \draw[line width=1pt]
        ([yshift=-5mm, line width=1pt]box1.south west) -- ++(0,-4mm)
        coordinate (dim1l);
        \draw[line width=1pt]
        ([yshift=-5mm]box1.south east) -- ++(0,-4mm)
        coordinate (dim1r);
        \draw[{Latex}-{Latex}, line width=1pt]
        ([yshift=1mm]dim1l) -- ([yshift=1mm]dim1r)
        node[midway, below=2pt] {$W$};

        % Marker for F on top
        \draw[line width=1pt]
        ([yshift=3mm]box2.north west) -- ++(0,4mm)
        coordinate (dim3l);
        \draw[line width=1pt]
        ([yshift=3mm]box2.north west -| box1.north west) -- ++(0,4mm)
        coordinate (dim3r);
        \draw[{Latex}-{Latex}, line width=1pt]
        ([yshift=-1mm]dim3l) -- ([yshift=-1mm]dim3r)
        node[midway, above=2pt] {$F$};

        % Arrow on the top right
        \draw[-{Latex}, line width=1pt]
        ([yshift=8mm] box1.north east) -- ++(28mm,0);
    \end{tikzpicture}

    \caption{
        \red{Visualization of the messages and decimation information
            used for the
        initialization of the next window under \ac{bpgd} decoding}.
        \Acfp{vn} are represented using green circles while \acfp{cn}
        are represented using blue squares.
    }
    \label{fig:messages_decimation_tanner}
\end{figure}

% % tex-fmt: off
% \tikzexternaldisable
% \begin{algorithm}[t]
%     \caption{Sliding-window decoding algorithm with warm start for generic inner decoder.}
%     \label{alg:warm_start_general}
%     \begin{algorithmic}[1]
%         \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$
%         \State \textbf{Initialize:} $L_{i\leftarrow j} = 0
%             ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$
%         \For{$\ell = 0, \ldots, n_\text{win}-1$}
%             \State Obtain $\hat{\bm{e}}^{(\ell)}$ from inner decoder
%             \State $\displaystyle\left(\hat{\bm{e}}^\text{total}\right)_{\mathcal{I}^{(\ell)}_\text{commit}} \leftarrow \hat{\bm{e}}^{(\ell)}_\text{commit}$
%             \State $\displaystyle\left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}}
%                 \leftarrow \bm{H}_\text{overlap}^{(\ell)}
%                 \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}$
%             \If{$\ell < n_\text{win} - 1$}
%                 \State $L^{(\ell+1)}_{i\leftarrow j} \leftarrow
%                 L^{(\ell)}_{i\leftarrow j}
%                 ~\forall~ i \in \mathcal{I}_\text{overlap}^{(\ell)},
%                     j \in \mathcal{J}_\text{overlap}^{(\ell)}$
%             \EndIf
%         \EndFor
%         \State \textbf{return} $\hat{\bm{e}}$
%     \end{algorithmic}
% \end{algorithm}
% \tikzexternalenable
% % tex-fmt: on
%
% \content{Make algorithm 4 specific to BPGD?}

\section{Numerical Results}
\label{sec:Numerical Results}

% Intro

In this section, we perform numerical experiments to evaluate the
modification to sliding-window decoding we introduced in
\Cref{sec:warm_start_bp}.
For the practical aspects of implementation, several layers of
abstraction must be considered.

% Software stack: Layer 1

The lowest layer is the circuit-level simulator.
This serves as the backbone of all further simulations, handling the
quantum mechanical aspects of the system, including the modeling of
noise on gates, idling qubits, and measurements according to the
chosen noise model.

% Software stack: Layer 2

Moving one level of abstraction higher, the syndrome extraction
circuit itself must be generated.
This entails constructing the full circuit, including the ancilla
measurements and the error locations introduced by the chosen noise
model, both of which depend on the code and noise model in question.

% Software stack: Layer 3

Even further up, given an already constructed syndrome extraction
circuit and the resulting \acf{dem}, we must split the detector error
matrix into separate windows and manage the interplay between the
inner decoders acting on those individual windows.

% Software stack: Layer 4

Finally, we require the decoder itself, which operates on a
\acf{pcm} and a syndrome, with no dependence on the complexity of the
layers below.

% Software stack: Tools

In our implementation, Stim \cite{gidney_stim_2021} served as the
circuit-level simulator, chosen for its efficiency and native support
for the \ac{dem} formalism.
For the circuit generation, we employed utilities from QUITS
\cite{kang_quits_2025}, which provides syndrome extraction circuitry
generation for a number of different \ac{qldpc} codes.
We initially created a Python implementation, which used QUITS for the window
splitting and subsequent sliding-window decoding as well.
The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python.
After a preliminary investigation, we opted for a complete
reimplementation in Rust to achieve higher simulation speeds leveraging
the compiled nature of the language.
We reimplemented both the window splitting and the decoders.

% Global experimental setup

We chose to carry out our simulations on \ac{bb} codes, as they have
recently emerged as particularly promising candidates for practical
\ac{qec}, offering high encoding rates and large minimum distances
while admitting short-depth syndrome extraction circuits
\cite[Sec.~1]{bravyi_high-threshold_2024}.
Specifically, we chose the $\llbracket 144, 12, 12 \rrbracket$ BB
code, as it represents a good trade-off between code size and
simulation tractability.
For the generation of the \ac{dem} we set the number of syndrome
extraction rounds to $12$, similarly to \cite{gong_toward_2024}, and
we defined our detectors as in the example in
\Cref{subsec:Detector Error Matrix}.
We employed circuit-lose noise as described in
\Cref{subsec:Choice of Noise Model} as our noise model, specifically standard
ciruit-based depolarizing noise \cite[Sec.~VIII]{fowler_high-threshold_2009},
i.e., all error locations in the circuit get assigned the same
physical error probability.
We report performance in terms of the per-round \ac{ler} as defined
in \Cref{subsec:Per-Round Logical Error Rate} and all datapoints were
generated by simulating at least $200$ logical error events.

%%%%%%%%%%%%%%%%
\subsection{Belief Propagation}
\label{subsec:Belief Propagation}

% Local experimental setup

We began our investigation by using \ac{bp} with no further
modifications as the inner decoder.
We chose the min-sum variant of \ac{bp} due to its low computational complexity.

% [Thread] Get impression for max gain

We initially wanted to gain an impression for the performance gain we could
expect from a modification to the sliding-window decoding procedure.
To this end, we began by analyzing the decoding performance of the
original process, without our warm-start modification.
We will call this \emph{cold-start} decoding in the following.
Because we expected more global decoding to work better (the inner
    decoder then has access to a larger portion of the long-range
    correlations encoded in the detector error matrix before any commit
is made) we initially decided to use decoding on the whole detector
error matrix as a proxy for the attainable decoding performance.

\begin{figure}[t]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
                width=\figwidth,
                height=\figheight,
                ymode=log,
                legend style={
                    cells={anchor=west},
                    cells={align=left},
                },
                enlargelimits=false,
                ymin=1e-5, ymax=2e-1,
                grid=both,
                legend pos = south east,
                xtick={0.001,0.0015,...,0.004},
                xticklabel style={/pgf/number format/fixed},
                xticklabel style={/pgf/number format/precision=4},
                scaled x ticks=false,
                xlabel={Physical error rate},
                ylabel={Per-round-LER},
            ]

            \foreach \W/\col/\mark in
            {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                \edef\temp{\noexpand
                    \addplot+[mark=\mark, solid, mark
                    options={fill=\col}, \col]
                    table[
                        col sep=comma, x=physical_p,
                        y=LER_per_round,
                    ]
                    {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_False/F_1/W_\W/LERs.csv};
                }
                \temp

                \addlegendentryexpanded{$W = \W$}
            }

            \addplot+[mark=*, solid, mark options={fill=black}, black]
            table[
                col sep=comma, x=physical_p,
                y=LER_per_round,
            ]
            {res/sim/whole/SyndromeMinSumDecoder/max_iter_200/LERs.csv};

            \addlegendentry{Whole}
        \end{axis}
    \end{tikzpicture}

    \caption{
        \red{\lipsum[2]}
    }
    \label{fig:whole_vs_cold}
\end{figure}

% [Experimental parameters] Figure 4.6

\Cref{fig:whole_vs_cold} shows the simulation results for this initial
investigation.
The three colored curves correspond to cold-start sliding-window
decoding with window sizes $W \in \{3, 4, 5\}$, all with the step
size fixed to $F = 1$, while the black curve gives the per-round
\ac{ler} obtained when decoding on the whole detector error matrix at once.
In all cases, the inner \ac{bp} decoder was allowed a maximum of
$200$ iterations, and the physical error rate was swept from
$p = 0.001$ to $p = 0.004$ in steps of $0.0005$.

% [Description] Figure 4.6

Across the entire range of physical error rates, all curves exhibit
the expected monotonic increase in logical error rate with increasing
physical noise.
The $W = 3$ decoder consistently yields the highest LER, performing
roughly an order of magnitude worse than the baseline at low physical
error rates.
Increasing the window size to $W = 4$ substantially closes this gap,
and the $W = 5$ curve nearly coincides with the whole-block decoder
across the full range of physical error rates.

% [Interpretation] Figure 4.6

This behavior is consistent with the intuition behind sliding-window decoding.
The detector error matrix encodes correlations between detection
events that span the full syndrome extraction history, so errors
lying in the commit region of an early window are in general
constrained by check nodes that only become visible in subsequent windows.
Larger windows expose the inner decoder to more of these constraints
before any commit is made, leading to better-informed decisions and a
lower per-round \ac{ler}.
Decoding the whole matrix at once represents the limiting case of
this trend and, as expected, achieves the strongest performance.
The fact that the $W = 5$ curve is already very close to the
whole-block decoder indicates that the marginal benefit of enlarging
the window saturates after a certain point.
From a practical standpoint, the choice of $W$ thus represents a
trade-off between decoding latency and accuracy: larger windows
delay the start of decoding by requiring more syndrome extraction
rounds to be collected upfront, while the diminishing returns above
$W = 4$ suggest that growing the window much further yields little
additional accuracy in return.

% [Thread] First comparison with warm start

Next, we additionally generated error rate curves for warm-start
sliding-window decoding to assess how much of the gap between
cold-start and whole-block decoding can be recovered by our modification.
We chose the same window sizes as before, so that the warm- and
cold-start curves can be compared directly at matching values of $W$.

\begin{figure}[t]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
                width=\figwidth,
                height=\figheight,
                ymode=log,
                legend style={
                    cells={anchor=west},
                    cells={align=left},
                },
                enlargelimits=false,
                ymin=1e-5, ymax=2e-1,
                grid=both,
                legend pos = south east,
                xtick={0.001,0.0015,...,0.004},
                xticklabel style={/pgf/number format/fixed},
                xticklabel style={/pgf/number format/precision=4},
                scaled x ticks=false,
                xlabel={Physical error rate},
                ylabel={Per-round-LER},
                extra description/.code={
                    \node[rotate=90, anchor=south]
                    at ([xshift=10mm]current axis.east)
                    {Warm s. (---), Cold s. (- - -)};
                },
            ]

            \foreach \W/\col/\mark in
            {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} {
                \edef\temp{\noexpand
                    \addplot+[
                        mark=\mark, densely dashed, mark options={fill=\col},
                        \col, forget plot
                    ]
                    table[
                        col sep=comma, x=physical_p,
                        y=LER_per_round,
                    ]
                    {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_False/F_1/W_\W/LERs.csv};
                }
                \temp
            }

            \foreach \W/\col/\mark in
            {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                \edef\temp{\noexpand
                    \addplot+[mark=\mark, solid, mark
                    options={fill=\col}, \col]
                    table[
                        col sep=comma, x=physical_p,
                        y=LER_per_round,
                    ]
                    {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_True/F_1/W_\W/LERs.csv};
                }
                \temp

                \addlegendentryexpanded{$W = \W$}
            }

            \addplot+[mark=*, solid, mark options={fill=black}, black]
            table[
                col sep=comma, x=physical_p,
                y=LER_per_round,
            ]
            {res/sim/whole/SyndromeMinSumDecoder/max_iter_200/LERs.csv};

            \addlegendentry{Whole}
        \end{axis}
    \end{tikzpicture}

    \caption{
        \red{\lipsum[2]}
    }
    \label{fig:whole_vs_cold_vs_warm}
\end{figure}

% [Experimental parameters] Figure 4.7

\Cref{fig:whole_vs_cold_vs_warm} extends the previous comparison by
additionally including the warm-start variant of sliding-window decoding.
The dashed colored curves reproduce the cold-start results from
\Cref{fig:whole_vs_cold}, while the solid colored curves show the
corresponding warm-start runs for the same window sizes
$W \in \{3, 4, 5\}$.
The remaining experimental parameters are unchanged:
the step size is fixed to $F = 1$,
the inner \ac{bp} decoder is allowed up to $200$ iterations per
window invocation, the black curve again gives the whole-block
reference, and the physical error rate is swept from $p = 0.001$ to
$p = 0.004$ in steps of $0.0005$.

% [Description] Figure 4.7

For each window size, the warm-start variant consistently outperforms
its cold-start counterpart, with the dashed curves lying above the
corresponding solid curves across the entire range of physical error rates.
The performance gap between the two approaches is most pronounced for
the largest window ($W = 5$) and gradually narrows as the window size decreases.
Additionally, the gap between the cold- and warm-start curves
generally widens as the physical error rate decreases.

% [Interpretation] Figure 4.7

The improvement of warm-start over cold-start decoding matches the
motivation for the modification:
By reusing already existing messages from the previous window in the
overlap region, the next window invocation has additional information
at its disposal about the reliability of the \acp{vn} and \acp{cn}.
The widening of the gap towards larger window sizes is consistent
with this picture, since with $F$ fixed to $1$ the overlap between
consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger
$W$ implies that more messages are carried over and a larger fraction
of the next window starts in a warm state.
% TODO: Possibly insert explanation for higher gain at lowre error rates
A perhaps surprising observation is that the warm-start curve for
$W = 5$ actually lies below the whole-block reference across the
entire range of physical error rates, even though warm-start
sliding-window decoding is, by construction, more local than
whole-block decoding.
A possible explanation for this effect is discussed in the following.

% [Thread] Warm start is better than whole due to more effective iterations

A possible explanation for this surprising behavior lies in the
number of \ac{bp} iterations effectively spent on the \acp{vn}
inside the overlap region.
Each \ac{vn} in such an overlap is processed by multiple consecutive
window invocations, and because every new window resumes from the
messages left over by its predecessor, these invocations effectively
accumulate iterations on the same \acp{vn} rather than restarting
from scratch.
The whole-block decoder, by contrast, performs only a single run of
at most $200$ iterations on the entire detector error matrix, so
each of its \acp{vn} receives at most that many iterations.
It seems this larger effective iteration budget on the overlap
regions can outweigh the loss of globality incurred by windowing.

A natural way to test this hypothesis is to raise the maximum number
of \ac{bp} iterations of the whole-block decoder until its per-round
\ac{ler} saturates.
If the above interpretation is correct, the resulting saturation
level should constitute a lower bound that no windowed scheme,
irrespective of the initialization, can beat, since by construction
whole-block decoding has access to the full set of constraints
available to any window.

\begin{figure}[t]
    \centering
    \begin{tikzpicture}
        \def\spyxmin{32}
        \def\spyxmax{512}
        \def\spyymin{5e-3}
        \def\spyymax{7e-2}

        \newcommand{\plotcurves}{%
            \foreach \W/\col/\mark in
            {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} {
                \edef\temp{\noexpand
                    \addplot+[mark=\mark, densely dashed,
                    forget plot, \col]
                    table[
                        col sep=comma, x=max_iter,
                        y=LER_per_round,
                    ]
                    {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_False/F_1/W_\W/LERs.csv};
                }
                \temp
            }
            \foreach \W/\col/\mark in
            {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                \edef\temp{\noexpand
                    \addplot+[mark=\mark, solid, mark
                    options={fill=\col}, \col, forget plot]
                    table[
                        col sep=comma, x=max_iter,
                        y=LER_per_round,
                    ]
                    {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_True/F_1/W_\W/LERs.csv};
                }
                \temp
            }
            \addplot+[mark=*, solid, mark options={fill=black}, black,
            forget plot]
            table[col sep=comma, x=max_iter, y=LER_per_round]
            {res/sim/max_iter/SyndromeMinSumDecoder/p_0.0025/LERs.csv};
        }

        \begin{axis}[
                name=main,
                width=\figwidth,
                height=\figheight,
                ymode=log,
                enlargelimits=false,
                ymin=1e-3, ymax=1e-1,
                grid=both,
                legend pos=north east,
                xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096},
                xticklabels={$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$},
                xticklabel style={/pgf/number format/fixed},
                scaled x ticks=false,
                xlabel={Number of BP iterations},
                ylabel={Per-round-LER},
                extra description/.code={
                    \node[rotate=90, anchor=south]
                    at ([xshift=10mm]current axis.east)
                    {Warm s. (---), Cold s. (- - -)};
                },
            ]

            \plotcurves

            \addlegendimage{KITred, mark=triangle*}
            \addlegendentry{$W = 3$}
            \addlegendimage{KITblue, mark=diamond*}
            \addlegendentry{$W = 4$}
            \addlegendimage{KITorange, mark=square*}
            \addlegendentry{$W = 5$}
            \addlegendimage{black, mark=*}
            \addlegendentry{Whole}

            \node[draw=black, fit={(axis cs:\spyxmin,\spyymin) (axis
            cs:\spyxmax,\spyymax)}, inner sep=0pt, name=spybox] {};

        \end{axis}

        \begin{axis}[
                name=inset,
                at={(main.north)},
                anchor=south,
                xshift=0mm, yshift=6mm,
                width=6.5cm, height=4.875cm,
                ymode=log,
                enlargelimits=false,
                xmin=\spyxmin, xmax=\spyxmax,
                ymin=\spyymin, ymax=\spyymax,
                xtick={32,128,256, 512},
                yticklabels={\empty},
                xticklabels={\empty},
                grid=both,
                axis background/.style={fill=white},
            ]

            \plotcurves
        \end{axis}

        \draw (spybox.north east) -- (inset.south west);
    \end{tikzpicture}

    \caption{
        \red{\lipsum[2]}
    }
    \label{fig:bp_w_over_iter}
\end{figure}

% [Experimental parameters] Figure 4.8

\Cref{fig:bp_w_over_iter} shows the per-round \ac{ler} as a function
of the maximum number of \ac{bp} iterations granted to the inner decoders.
The dashed colored curves correspond to cold-start sliding-window
decoding for $W \in \{3, 4, 5\}$, the solid colored curves to the
corresponding warm-start sliding-window decoding, and the black curve
to the whole-block reference.
The physical error rate is fixed at $p = 0.0025$, the step size at
$F = 1$, and the iteration budget is swept over
$n_\text{iter} \in \{32, 128, 256, 512, 1024, 2048, 4096\}$.
The enlarged plot magnifies the low-iteration regime
$n_\text{iter} \in [32, 512]$.

% [Description] Figure 4.8

All curves decrease monotonically with the iteration budget, but
contrary to our expectation, none of them appears to fully saturate
within the swept range: even at $n_\text{iter} = 4096$, every curve
still exhibits a noticeable downward slope.
At $n_\text{iter} = 32$, the whole-block curve lies below both the
$W=4$ and $W=5$ sliding-window curves.
At $n_\text{iter} = 128$ the whole-block curve already performs
better than the $W=4$ sliding-window curve
and at $n_\text{iter} = 512$ the whole-block and warm-start $W = 5$
curves also cross.
From this point onwards, the whole-block decoder lies strictly below
all windowed schemes, this difference becoming more pronounced as the
iteration budget grows further.
Within the magnified plot, the gap between the warm-start and
cold-start curves at fixed $W$ is largest for the smallest iteration
counts and shrinks rapidly as $n_\text{iter}$ grows, and at fixed
$n_\text{iter}$ the size of this gap grows with the window size,
mirroring the behavior already observed in \Cref{fig:whole_vs_cold_vs_warm}.

% [Interpretation] Figure 4.8

These observations are largely consistent with the effective-iterations
hypothesis put forward above.
The whole-block decoder eventually overtaking every windowed scheme
matches the prediction made there: with a sufficiently large
iteration budget, the whole-block decoder reaches an error rate
that nonone of the windowed schemes can beat, because of the more global
nature of the considered constraints.
Furthermore, the pronounced advantage of warm- over cold-start decoding at low
numbers of iterations makes sense if we consider the overall trend of the plots.
At low iteration budgets, each additional iteration is worth more
than at high budgets.
As the number of permitted iteration increases, the benefit of
the additional ``free'' iterations gained due to the the warm-start
initialization diminishes, and the curves approach each other.

The fact that no curve clearly saturates within the swept range is
itself worth noting.
We know that \ac{bp} on \ac{qldpc} codes suffers from poor
convergence due to the short cycles in the underlying Tanner graph,
so even after several thousand iterations the
decoder may continue to slowly refine its message estimates rather
than settle into a stable fixed point.
This is one of the core motivations for moving from plain \ac{bp} to
the guided-decimation variant studied in
\Cref{subsec:Belief Propagation with Guided Decimation}.

Another thing to note is that setting the per-invocation iteration
budget of the inner decoder equal to the iteration budget of the
whole-block decoder is not a fair comparison in terms of total
computational effort.
The sliding-window scheme processes each \ac{vn} in an overlap region
multiple times and therefore spends more iterations overall.
In the context of \ac{qec}, however, the relevant figure of merit is
not total compute but decoding latency, and in terms of latency the
sliding-window approach is still at an advantage.

% [Thread] Exploration of the effect of the step size

Having examined the effect of the window size $W$, we next turned to
the second windowing parameter, the step size $F$.
We carried out an investigation analogous to the one above:
we first compared warm- and cold-start decoding across the full range
of physical error rates at a fixed iteration budget, and then we
examined the dependence on the iteration budget at a fixed physical
error rate.
The window size was held fixed at $W = 5$ throughout, the value at
which the warm-start variant produced the strongest performance in the
previous experiments.

\begin{figure}[t]
    \centering
    \begin{subfigure}{0.48\textwidth}
        \centering
        \hspace*{-7mm}
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-5, ymax=2e-1,
                    grid=both,
                    legend pos = south east,
                    xtick={0.001,0.0015,...,0.004},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Physical error rate},
                    ylabel={Per-round-LER},
                    % extra description/.code={
                    %     \node[rotate=90, anchor=south]
                    %     at ([xshift=10mm]current axis.east)
                    %     {Warm s. (---), Cold s. (- - -)};
                    % },
                ]

                \foreach \F/\col/\mark in
                {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[
                            mark=\mark, densely dashed, mark
                            options={fill=\col},
                            \col, forget plot
                        ]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_False/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }

                \foreach \F/\col/\mark in
                {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_True/F_\F/W_5/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$F = \F$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
        \label{fig:bp_f_over_p}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}{0.48\textwidth}
        \centering
        \hspace*{-27mm}
        \begin{tikzpicture}
            \def\spyxmin{32}
            \def\spyxmax{512}
            \def\spyymin{5e-3}
            \def\spyymax{5e-2}

            \newcommand{\plotcurvesb}{%
                \foreach \F/\col/\mark in
                {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, densely dashed,
                        forget plot, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_False/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }
                \foreach \F/\col/\mark in
                {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col, forget plot]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_True/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }
            }

            \begin{axis}[
                    name=main,
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-3, ymax=1e-1,
                    grid=both,
                    legend pos = north east,
                    xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096},
                    xticklabels =
                    {$32$, $512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Number of BP iterations},
                    extra description/.code={
                        \node[rotate=90, anchor=south]
                        at ([xshift=10mm]current axis.east)
                        {Warm s. (---), Cold s. (- - -)};
                    },
                ]

                \plotcurvesb

                \addlegendimage{KITorange, mark=square*}
                \addlegendentry{$F = 1$}
                \addlegendimage{KITblue, mark=diamond*}
                \addlegendentry{$F = 2$}
                \addlegendimage{KITred, mark=triangle*}
                \addlegendentry{$F = 3$}

                \node[draw=black, fit={(axis cs:\spyxmin,\spyymin) (axis
                cs:\spyxmax,\spyymax)}, inner sep=0pt, name=spybox] {};
            \end{axis}

            \begin{axis}[
                    name=inset,
                    at={(main.north west)},
                    anchor=south,
                    xshift=-6mm, yshift=6mm,
                    width=6.5cm, height=4.875cm,
                    ymode=log,
                    enlargelimits=false,
                    xmin=\spyxmin, xmax=\spyxmax,
                    ymin=\spyymin, ymax=\spyymax,
                    xtick={32,128,256,512},
                    yticklabels={\empty},
                    xticklabels={\empty},
                    grid=both,
                    axis background/.style={fill=white},
                ]

                \plotcurvesb
            \end{axis}

            \draw (spybox.north east) -- (inset.south east);
        \end{tikzpicture}
        \vspace{-3.2mm}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
        \label{fig:bp_f_over_iter}
    \end{subfigure}

    \caption{
        \red{\lipsum[2]}
    }
    \label{fig:bp_f}
\end{figure}

% [Experimental parameters] Figure 4.9

\Cref{fig:bp_f} summarizes the results of this investigation.
In both panels the dashed colored curves correspond to cold-start
sliding-window decoding for $F \in \{1, 2, 3\}$ and the solid colored
curves to the corresponding warm-start runs.
The window size is fixed to $W = 5$ throughout.
\Cref{fig:bp_f_over_p} sweeps the physical error rate over
$p \in [0.001, 0.004]$ in steps of $0.0005$ at a fixed maximum of
$n_\text{iter} = 200$ \ac{bp} iterations per window invocation,
mirroring the experimental setup of \Cref{fig:whole_vs_cold_vs_warm}.
\Cref{fig:bp_f_over_iter} fixes the physical error rate at
$p = 0.0025$ and sweeps the iteration budget over
$n_\text{iter} \in \{32, 128, 256, 512, 1024, 2048, 4096\}$,
mirroring the setup of \Cref{fig:bp_w_over_iter} and again including
an inset that magnifies the low-iteration regime
$n_\text{iter} \in [32, 512]$.

% [Description] Figure 4.9

In \Cref{fig:bp_f_over_p}, every curve exhibits the expected
monotonic increase of the per-round \ac{ler} with the physical
error rate.
At fixed $F$, the warm-start approach lies below
cold-start across the entire sweep, and at fixed
warm- or cold-start, smaller $F$ produces a lower \ac{ler}.
Both gaps grow as the physical error rate decreases:
the curves at $F = 1$ separate further from those at $F = 2$ and $F = 3$,
and the warm-start curves separate further from the cold-start ones.
In \Cref{fig:bp_f_over_iter}, all six curves again decrease
monotonically with the iteration budget, with no clear saturation
even at $n_\text{iter} = 4096$.
Lower $F$ yields a lower \ac{ler} throughout, and warm-start
consistently outperforms cold-start at matching $F$.
At $n_\text{iter} = 32$, all three cold-start curves coincide at
roughly the same per-round \ac{ler}, while the warm-start curves are
visibly spread out.
Furthermore, the magnified plot confirms that the gap between warm-
and cold-start curves at fixed $F$ shrinks as $n_\text{iter}$ grows,
and that at fixed $n_\text{iter}$ this gap is largest for $F = 1$.

% [Interpretation] Figure 4.9

The observed dependence on the step size mirrors the dependence on
the window size studied earlier and the same explanation applies.
With $W$ held fixed, decreasing $F$ enlarges the overlap between
consecutive windows from $W - F$ to $W - F + 1$ syndrome measurement rounds, so
a smaller step size is beneficial for the same reason that a larger
window size is:
each \ac{vn} in an overlap region participates in more window
invocations, and the warm-start modification effectively accumulates
iterations on it across these invocations.
The widening of the warm/cold gap towards low iteration counts and
low physical error rates similarly mirrors the patterns already
observed in
\Cref{fig:whole_vs_cold_vs_warm,fig:bp_w_over_iter}.

In contrast to the window size $W$, the step size $F$ has no effect
on decoding latency.
The time at which the inner decoder for a given window can begin
decoding is determined solely by when the syndromes for the rounds
covered by that window have been collected, which is independent of
how much the window overlaps with its predecessor.
Similarly, assuming the decoder is fast enough to keep up with the
incoming syndrome measurements corresponding to the \acp{cn} of
subsequent windows, the time at which decoding is complete depends only
on the amount of time spent on decoding the very last window.
A smaller $F$ thus only costs additional total compute and not
additional latency, which is favorable for a warm-start
sliding-window implementation.
This is especially favorable for our warm-start modification, as it
works best where the overlap is largest, i.e., for low values of $F$.

% Conclusion of BP investigation

We conclude our investigation into the performance of warm-start
sliding-window decoding under plain \ac{bp} by summarizing our findings.
The warm-start modification raises the number of \ac{bp} iterations
effectively spent on the \acp{vn} in an overlap region by reusing the
messages from the previous window invocation instead of restarting
from scratch.
This explains why decoding performance improved monotonically with
the size of the overlap, and consequently why both larger window
sizes $W$ and smaller step sizes $F$ yielded lower per-round \acp{ler}.
The warm-start gain over cold-start was most pronounced at low
per-window iteration budgets,
% and at low physical error rates, the
% regimes
the regime in which each additional iteration carries proportionally
more information.
Additionally, we would like to note that the warm-start modification
incurs no computational cost relative to cold-start decoding.
It changes neither the decoding latency nor the total compute, since
both schemes process the same windows for the same number of
iterations and differ only in the initialization of the \ac{bp}
messages of each new window.
We also observed that plain \ac{bp} did not saturate even at $4096$
iterations, which we attribute to the short cycles in the underlying
Tanner graph.
This motivates the next subsection, in which we replace the inner
\ac{bp} decoder by its guided-decimation variant.

%%%%%%%%%%%%%%%%
\subsection{Belief Propagation with Guided Decimation}
\label{subsec:Belief Propagation with Guided Decimation}

% [Thread] Intro to BPGD + Local experimental setup

We now turn to \ac{bpgd} as the inner decoder, in order to address
the convergence issues of plain \ac{bp} on \ac{qec} codes.
For the underlying \ac{bp} step we use the \ac{spa} variant rather
than the min-sum approximation employed in
\Cref{subsec:Belief Propagation}, since this made the implementation
of the guided decimation more straightforward.

\begin{figure}[t]
    \centering
    \hspace*{-6mm}
    \begin{subfigure}{0.5\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-5, ymax=2e-1,
                    grid=both,
                    legend pos = south east,
                    xtick={0.001,0.0015,...,0.004},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Physical error rate},
                    ylabel={Per-round-LER},
                    % extra description/.code={
                    %     \node[rotate=90, anchor=south]
                    %     at ()
                    %     {Warm s. (---), Cold s. (- - -)};
                    % },
                ]

                \foreach \W/\col/\mark in
                {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[
                            mark=\mark, densely dashed, mark
                            options={fill=\col},
                            \col, forget plot
                        ]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_1/W_\W/LERs.csv};
                    }
                    \temp
                }

                \foreach \W/\col/\mark in
                {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoderPassDecimation/max_iter_5000/pass_soft_info_True/F_1/W_\W/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$W = \W$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
        \label{fig:bpgd_w}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}{0.5\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-5, ymax=2e-1,
                    grid=both,
                    legend pos = south east,
                    xtick={0.001,0.0015,0.002,...,0.004},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Physical error rate},
                    yticklabels={\empty},
                    % ylabel={Per-round-LER},
                    extra description/.code={
                        \node[rotate=90, anchor=south]
                        at ([xshift=10mm]current axis.east)
                        {Warm s. (---), Cold s. (- - -)};
                    },
                ]

                \foreach \F/\col/\mark in
                {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[
                            mark=\mark, densely dashed, mark
                            options={fill=\col},
                            \col, forget plot
                        ]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }

                \foreach \F/\col/\mark in
                {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoderPassDecimation/max_iter_5000/pass_soft_info_True/F_\F/W_5/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$F = \F$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
        \label{fig:bpgd_f}
    \end{subfigure}

    \caption{
        \red{\lipsum[2]}
    }
    \label{fig:bpgd_wf}
\end{figure}

% [Experimental parameters] Figure 4.10

\Cref{fig:bpgd_wf} shows the per-round \ac{ler} of \ac{bpgd}
sliding-window decoding as a function of the physical error rate.
In both panels the dashed curves correspond to cold-start
sliding-window decoding and the solid curves to the
corresponding warm-start decoding, where the warm start carries over both
the \ac{bp} messages and the decimation information of the overlap
region as described in
\Cref{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding}.
The maximum number of inner \ac{bp} iterations was set to
$n_\text{iter} = 5000$.
This value was chosen to be at least as large as the number of
\acp{vn} in any single window, since with one \ac{bp} iteration
between consecutive decimations ($T = 1$ in the notation of
\Cref{alg:bpgd}) this is the maximum number of inner iterations
that can occur before every \ac{vn} in the window has been decimated.
A preliminary investigation showed that \ac{bpgd} only delivers its
intended performance gain once most \acp{vn} have actually been decimated,
which motivated this choice.
The physical error rate was swept from $p = 0.001$ to $p = 0.004$
in steps of $0.0005$.
\Cref{fig:bpgd_w} sweeps over the window size with
$W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and
\Cref{fig:bpgd_f} sweeps over the step size with
$F \in \{1, 2, 3\}$ at fixed window size $W = 5$.

% [Description] Figure 4.10

In both panels, every curve again exhibits the expected monotonic
increase of the per-round \ac{ler} with the physical error rate.
Across both panels and across all parameter choices, the warm-start
curves lie above the corresponding cold-start curves, i.e.,
the warm-start variant performsworse than its cold-start counterpart.
This is the opposite of what we observed for plain \ac{bp}, where
warm-start improved upon cold-start at every parameter setting.
The gap between the warm- and cold-start curves additionally widens
as the physical error rate decreases:
at the lowest sampled rate $p = 0.001$, the per-round \ac{ler} of the
warm-start runs is more than two orders of magnitude above that of
the corresponding cold-start runs.
In \Cref{fig:bpgd_w}, larger window sizes yield lower per-round
\acp{ler} for both warm- and cold-start, and the spacing between the
cold-start curves shrinks as $W$ grows.
In \Cref{fig:bpgd_f}, the cold-start curves follow the previously seen
ordering with $F = 1$ at the bottom and $F = 3$ at the top.
The warm-start curves, however, exhibit the opposite ordering:
$F = 1$ now yields the highest per-round \ac{ler}, $F = 2$ lies below
it, and $F = 3$ is the lowest of the three warm-start curves.

% [Interpretation] Figure 4.10

The fact that warm-start sliding-window decoding now performs worse
than its cold-start counterpart is surprising in light of the results
for plain \ac{bp}, where the warm-start modification was uniformly beneficial.
The dependence on the window size in \Cref{fig:bpgd_w} is, on its own,
consistent with the same explanation that we gave for
\Cref{fig:whole_vs_cold}: larger windows expose the inner decoder to
a larger fraction of the constraints encoded in the detector error
matrix at the time of decoding, and this benefits both warm- and
cold-start decoding.
The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
opposite of the corresponding dependence under plain \ac{bp}
(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts
rather than helps, even though smaller $F$ implies a larger overlap
in both cases.

This inversion provides the clue to what is going wrong.
Recall from
\Cref{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding}
that the warm start for \ac{bpgd} carries over not only the \ac{bp}
messages on the edges of the overlap region but also the decimation
information.
Because we run with an iteration budget large enough to decimate
every \ac{vn} in a window, by the time window $\ell$ ends, all
of its \acp{vn} have already been hard-decided.
For the \acp{vn} that lie in the overlap region with window $\ell + 1$
this hard decision is then carried into the next window through the
warm-start initialization, and the next window thus begins decoding
with a substantial fraction of its \acp{vn} already frozen, before
its own parity checks have had any chance to influence the
corresponding bit estimates.
This identifies one of two competing effects on the warm-start performance.
The larger the overlap, the more such prematurely frozen \acp{vn} the
next window inherits, which hurts performance.
On the other hand, a larger window still exposes the inner decoder to
a larger set of constraints, which helps performance.
The two effects together are consistent with what we observe in
\Cref{fig:bpgd_wf}.
Increasing $W$ at fixed $F$ enlarges both the overlap and the window
itself, and the benefit due to the larger $W$ dominates.
Decreasing $F$ at fixed $W$, by contrast, enlarges only the overlap
without enlarging the window, so the freezing effect is no longer
offset and warm-start performance worsens with smaller $F$.

\begin{figure}[t]
    \centering
    \hspace*{-6mm}
    \begin{subfigure}{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    % xmode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-3, ymax=1e-1,
                    grid=both,
                    legend pos = south west,
                    xtick={32,512,1024,2048,4096},
                    % xtick={0.001,0.0015,...,0.004},
                    xticklabels =
                    {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$},
                    xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Number of BP iterations},
                    ylabel={Per-round-LER},
                    % extra description/.code={
                    %     \node[rotate=90, anchor=south]
                    %     at ([xshift=10mm]current axis.east)
                    %     {Warm s. (---), Cold s. (- - -)};
                    % },
                ]

                \foreach \W/\col/\mark in
                {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, densely dashed,
                        forget plot, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_False/F_1/W_\W/LERs.csv};
                    }
                    \temp
                }

                \foreach \W/\col/\mark in
                {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_True/F_1/W_\W/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$W = \W$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    % xmode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-3, ymax=1e-1,
                    grid=both,
                    legend pos = south west,
                    xtick={32,512,1024,2048,4096},
                    % xtick={0.001,0.0015,...,0.004},
                    xticklabels =
                    {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$},
                    xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Number of BP iterations},
                    % ylabel={Per-round-LER},
                    yticklabels={\empty},
                    extra description/.code={
                        \node[rotate=90, anchor=south]
                        at ([xshift=10mm]current axis.east)
                        {Warm s. (---), Cold s. (- - -)};
                    },
                ]

                \foreach \F/\col/\mark in
                {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, densely dashed,
                        forget plot, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_False/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }

                \foreach \F/\col/\mark in
                {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_True/F_\F/W_5/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$F = \F$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
    \end{subfigure}

    \caption{
        \red{\lipsum[2]}
    }
\end{figure}

\begin{figure}[t]
    \centering
    \hspace*{-6mm}
    \begin{subfigure}{0.5\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-5, ymax=2e-1,
                    grid=both,
                    legend pos = south east,
                    xtick={0.001,0.0015,...,0.004},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Physical error rate},
                    ylabel={Per-round-LER},
                    % extra description/.code={
                    %     \node[rotate=90, anchor=south]
                    %     at ([xshift=10mm]current axis.east)
                    %     {Warm s. (---), Cold s. (- - -)};
                    % },
                ]

                \foreach \W/\col/\mark in
                {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[
                            mark=\mark, densely dashed, mark
                            options={fill=\col},
                            \col, forget plot
                        ]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_1/W_\W/LERs.csv};
                    }
                    \temp
                }

                \foreach \W/\col/\mark in
                {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_True/F_1/W_\W/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$W = \W$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}{0.5\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-5, ymax=2e-1,
                    grid=both,
                    legend pos = south east,
                    xtick={0.001,0.0015,0.002,...,0.004},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Physical error rate},
                    yticklabels={\empty},
                    % ylabel={Per-round-LER},
                    extra description/.code={
                        \node[rotate=90, anchor=south]
                        at ([xshift=10mm]current axis.east)
                        {Warm s. (---), Cold s. (- - -)};
                    },
                ]

                \foreach \F/\col/\mark in
                {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[
                            mark=\mark, densely dashed, mark
                            options={fill=\col},
                            \col, forget plot
                        ]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }

                \foreach \F/\col/\mark in
                {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=physical_p,
                            y=LER_per_round,
                        ]
                        {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_True/F_\F/W_5/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$F = \F$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
    \end{subfigure}

    \caption{
        \red{\lipsum[2]}
    }
\end{figure}

\begin{figure}[t]
    \centering
    \hspace*{-6mm}
    \begin{subfigure}{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    % xmode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-3, ymax=1e-1,
                    grid=both,
                    legend pos = north east,
                    xtick={32,512,1024,2048,4096},
                    % xtick={0.001,0.0015,...,0.004},
                    xticklabels =
                    {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$},
                    xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Number of BP iterations},
                    ylabel={Per-round-LER},
                    % extra description/.code={
                    %     \node[rotate=90, anchor=south]
                    %     at ([xshift=10mm]current axis.east)
                    %     {Warm s. (---), Cold s. (- - -)};
                    % },
                ]

                \foreach \W/\col/\mark in
                {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, densely dashed,
                        forget plot, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_False/F_1/W_\W/LERs.csv};
                    }
                    \temp
                }

                \foreach \W/\col/\mark in
                {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_True/F_1/W_\W/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$W = \W$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[
                    width=8cm,
                    height=6cm,
                    ymode=log,
                    % xmode=log,
                    legend style={
                        cells={anchor=west},
                        cells={align=left},
                    },
                    enlargelimits=false,
                    ymin=1e-3, ymax=1e-1,
                    grid=both,
                    legend pos = north east,
                    xtick={32,512,1024,2048,4096},
                    % xtick={0.001,0.0015,...,0.004},
                    xticklabels =
                    {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$},
                    xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096},
                    xticklabel style={/pgf/number format/fixed},
                    xticklabel style={/pgf/number format/precision=4},
                    x tick label style={rotate=45, anchor=north east,
                    inner sep=1mm},
                    scaled x ticks=false,
                    xlabel={Number of BP iterations},
                    % ylabel={Per-round-LER},
                    yticklabels={\empty},
                    extra description/.code={
                        \node[rotate=90, anchor=south]
                        at ([xshift=10mm]current axis.east)
                        {Warm s. (---), Cold s. (- - -)};
                    },
                ]

                \foreach \F/\col/\mark in
                {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, densely dashed,
                        forget plot, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_False/F_\F/W_5/LERs.csv};
                    }
                    \temp
                }

                \foreach \F/\col/\mark in
                {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} {
                    \edef\temp{\noexpand
                        \addplot+[mark=\mark, solid, mark
                        options={fill=\col}, \col]
                        table[
                            col sep=comma, x=max_iter,
                            y=LER_per_round,
                        ]
                        {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_True/F_\F/W_5/LERs.csv};
                    }
                    \temp

                    \addlegendentryexpanded{$F = \F$}
                }
            \end{axis}
        \end{tikzpicture}

        \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing
        elit, sed do eiusmod tempor incididunt}}
    \end{subfigure}

    \caption{
        \red{\lipsum[2]}
    }
\end{figure}