% TODO: Make all [H] -> [t] \chapter{Decoding under Detector Error Models} In \Cref{ch:Fundamentals} we introduced the fundamentals of classical error correction, before moving on to quantum information science and finally combining the two in \acf{qec}. In \Cref{ch:Fault tolerance} we then turned to fault-tolerance, with a focus on a specific way of implementing it, called \acfp{dem}. In this chapter, we move on from the fundamental concepts and examine how to apply them in practice. Specifically, we concern ourselves with the practical aspects of decoding under \acp{dem}. We investigate decoding \acf{qldpc} codes under \acp{dem} in particular. We focus on \ac{qldpc} codes, as they have emerged as leading candidates for practical quantum error correction, offering comparable thresholds with substantially improved encoding rates \cite[Sec.~1]{bravyi_high-threshold_2024}. Because of this, the decoding algorithms we consider will all be related to \acf{bp} in some way. Our aim is to build a fault-tolerant \ac{qec} system that works well even in the presence of circuit-level noise. We must overcome two main challenges to achieve this. First, recall the problems related to degeneracy, which is inherent to quantum codes. Because multiple minimum-weight codewords exist, the \ac{bp} algorithm becomes uncertain of the direction to proceed in. Additionally, the commutativity conditions of the stabilizers necessitate the existence of short cycles. Together, these two aspects lead to substantial convergence problems of \ac{bp} for quantum codes, when it is used on its own. Second, the consideration of circuit-level noise introduces many more error locations into the circuit. Using \acp{dem}, we construct a new circuit code and model each of these error locations as a new \acf{vn}. We also perform multiple rounds of syndrome measurements, exacerbating the problem. This leads to a massively increased computational complexity and latency of the decoding process. In our experiments using the $\llbracket 144,12,12 \rrbracket$ \acf{bb} code with $12$ syndrome measurement rounds, for example, the number of \acp{vn} grew from $144$ to $9504$, and the number of \acfp{cn} grew from $72$ to $1008$. The first problem is not inherent to \acp{dem} or fault-tolerance, but rather quantum codes in general. Many different approaches to solving it exist, usually centered around somehow modifying \ac{bp}. The most popular approach is combining a few initial iterations of \ac{bp} with a second decoding algorithm, \ac{osd} \cite{roffe_decoding_2020}. Other approaches exist, such as \ac{aed} \cite{koutsioumpas_automorphism_2025}, where multiple variations of the code are decoded simultaneously to increase the chances of convergence. Here, we will focus on the \acf{bpgd} algorithm \cite{yao_belief_2024} we already introduced in \Cref{ch:Fundamentals}, for reasons that will become clear later in the chapter. The second problem is inherent to decoding using \acp{dem}. This is an area that has received less attention. As we saw in \Cref{sec:Quantum Error Correction}, for \ac{qec}, latency is the main constraint, not raw computational complexity. The main way this is addressed in the literature is \emph{sliding window decoding}, which attempts to divide the overall decoding problem into many smaller ones that can be solved more efficiently. % TODO: This could potentially be a bit more text (e.g., go into % SC-LDPC like structure that serves as the inspiration for the % warm-start decoding. Or just go into warm-start decoding) Our own work will focus mostly on the the solution of the second problem using sliding-window decoding. We will start by briefly reviewing the existing work related to sliding-window decoding, before focusing on one specific realization. We will then introduce a modification to the existing algorithm and perform numerical simulations to evaluate it. % and reducing latency is the main goal of the existing literature. % This is generally done using windowing approaches; either % sliding-window based, where the latency is reduced due to an earlier % start to the decoding process \cite{kuo_fault-tolerant_2024}% % \cite{huang_improved_2023}\cite{huang_increasing_2024}\cite{gong_toward_2024}, % or by decoding multiple windows in parallel % \cite{skoric_parallel_2023}\cite{tan_scalable_2023}. % This work is based on the sliding-window method. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Sliding-Window Decoding} \label{sec:Sliding-Window Decoding} % Spacetime codes \ac{qec} codes are often viewed through the lenses of the \emph{space} and \emph{time} dimensions. Both directions add redundancy, but they do so in a different way and guard against different defects. The space dimension corresponds to the redundancy added through the code itself, while the time dimension corresponds to the repetition of the syndrome measurements \cite[Sec.~IV.B]{dennis_topological_2002}. % Basic idea The idea of sliding-window decoding is to exploit the time-like structure by splitting the circuit into overlapping windows along the time dimension. Each of these windows is then decoded separately. %%%%%%%%%%%%%%%% \subsection{Review of Existing Literature} \label{subsec:Review of Existing Literature} \begin{figure}[t] \centering \tikzset{ literature/.append style={ minimum width=6mm, minimum height=6mm, text width=18mm, align=left, } } \tikzset{ heading/.append style={ draw=black, minimum width=22mm, minimum height=6mm, align=left, rounded corners = 1mm, } } \tikzexternaldisable \begin{tikzpicture}[node distance = 0mm and 0mm] % tex-fmt: off \node[heading, minimum width=15mm, fill=gray!25] (code) {Code}; \node[heading, below right=2mm and -5mm of code, fill=orange!20] (top) {Topological}; \node[heading, below right=45mm and -5mm of code, fill=orange!20] (qldpc) {QLDPC}; \node[literature, below right=1mm and -12mm of top] (dennis) {\cite{dennis_topological_2002}}; \node[literature, below=of dennis] (tan) {\cite{tan_scalable_2023}}; \node[literature, below=of tan] (skoric) {\cite{skoric_parallel_2023}}; \node[literature, below=of skoric] (bombin) {\cite{bombin_modular_2023}}; \node[literature, below=of bombin] (kuo) {\cite{kuo_fault-tolerant_2024}}; \node[literature, below right=1mm and -12mm of qldpc] (huang) {\cite{huang_improved_2023},\cite{huang_increasing_2024}}; \node[literature, below=of huang] (gong) {\cite{gong_toward_2024}}; \node[literature, below=of gong] (kang) {\cite{kang_quits_2025}}; \coordinate (code-anchor) at ($(code.south) + (-2mm,0)$); \coordinate (top-anchor) at ($(top.south) + (-5mm,0)$); \coordinate (qldpc-anchor) at ($(qldpc.south) + (-5mm,0)$); \draw (code-anchor) |- (top); \draw (code-anchor) |- (qldpc); \draw (top-anchor) |- (dennis); \draw (top-anchor) |- (tan); \draw (top-anchor) |- (skoric); \draw (top-anchor) |- (bombin); \draw (top-anchor) |- (kuo); \draw (qldpc-anchor) |- (huang); \draw (qldpc-anchor) |- (gong); \draw (qldpc-anchor) |- (kang); \draw [ line width=1pt, decorate, decoration={brace,amplitude=2mm,raise=5mm} ] (dennis.north east) -- (dennis.south east) node[midway,right,xshift=10mm]{Sequential}; \draw [ line width=1pt, decorate, decoration={brace,amplitude=2mm,raise=5mm} ] (tan.north east) -- (kuo.south east) node[midway,right,xshift=10mm]{Parallel}; \draw [ line width=1pt, decorate, decoration={brace,amplitude=2mm,raise=5mm} ] (huang.north east) -- (kang.south east) node[midway,right,xshift=10mm]{Sequential}; % tex-fmt: on \end{tikzpicture} \tikzexternalenable \caption{Overview of literature on sliding-window decoding.} \label{fig:literature} \end{figure} % Some general notes \Cref{fig:literature} gives an overview over the existing body of work related to sliding-window decoding. The papers \cite{huang_improved_2023} and \cite{huang_increasing_2024} are lumped together, as they share the same content; one is simply a preprint published earlier. We will only refer to \cite{huang_increasing_2024} in the following. \cite{kang_quits_2025} is somewhat special in that the authors focus more on the introduction of a new simulator framework they call QUITS, rather than the performance of sliding-window decoding itself. \cite{gong_toward_2024} and \cite{kang_quits_2025} have made their software freely available online% \footnote{ \url{https://github.com/mkangquantum/quits} }% \footnote{ \url{https://github.com/gongaa/SlidingWindowDecoder} }. A final thing to note is that \cite{dennis_topological_2002} never explicitly mentions sliding windows; the authors call their scheme ``overlapping recovery''. % Topological vs QLDPC Research has focused on two categories of \ac{qec} codes, topological and \ac{qldpc} codes. Most of the work on topological codes has treated surface codes, with the exception of \cite{kuo_fault-tolerant_2024} where toric codes were considered. With regard to \ac{qldpc} codes, in \cite{huang_increasing_2024} the authors examine \emph{hypergraph product} (\acs{hgp}) and \emph{lifted-product} (\acs{lp}) codes. HGP codes are constructed from the product of two classical codes, while LP codes generalize this construction by additionally applying a lift to reduce the qubit overhead. In \cite{kang_quits_2025}, \emph{balanced product codes} (\acs{bpc}) are additionally considered. Like HGP codes, BPC codes are derived from a product construction, but exploit an additional symmetry to yield fewer physical qubits for the same code parameters. Finally, \cite{gong_toward_2024} explores \ac{bb} codes. % Sequential vs parallel After having divided the whole circuit into separate windows, the question arises of how exactly to realize the decoding. There are two main approaches, with differing mechanisms of reducing the latency. Some papers decode the sliding windows in a parallel fashion. The benefit in this case is is that classical hardware can be utilized more effectively. Others choose a sequential approach. Here, decoding can start earlier, as there is no need to wait for the syndrome measurements of all windows before beginning with the decoding. With the exception of \cite{dennis_topological_2002}, literature treating topological codes has mostly focused on parallel decoding while literature treating \ac{qldpc} codes has wholly considered sequential decoding. % Deep-dive into QLDPC methods For this work, the publications treating \ac{qldpc} codes are especially interesting. The experimental conditions for these are summarized in \Cref{table:experimental_conditions}. As we noted above, \ac{hgp} and \ac{lp} codes are considered in \cite{huang_increasing_2024}, \ac{hgp}, \ac{lp} and \ac{bpc} codes are considered in \cite{kang_quits_2025}, and \ac{bb} codes are considered in \cite{gong_toward_2024}. The employed noise models also differ; \cite{huang_increasing_2024} uses phenomenological noise, while \cite{gong_toward_2024} and \cite{kang_quits_2025} use circuit-level noise. Finally, in \cite{gong_toward_2024} the authors introduce their own variation of \ac{bpgd}, \ac{bp} with \ac{gdg}, while \cite{huang_increasing_2024} and \cite{kang_quits_2025} use \ac{bp} + \ac{osd}. We would additionally like to note that only in \cite{gong_toward_2024} and \cite{kang_quits_2025} explicitly work with the \ac{dem} formalism. \renewcommand{\arraystretch}{1.1} \setlength{\tabcolsep}{12pt} \begin{table}[t] \centering \caption{Experimental conditions in the literature on sliding-window decoding for \ac{qldpc} codes.} \vspace*{3mm} \label{table:experimental_conditions} \begin{tabular}{l|ccc} % tex-fmt: off Publication & Code & Noise Model & Decoder \\ \hline \hspace{-2.5mm}\cite{huang_improved_2023},\cite{huang_increasing_2024} & \acs{hgp}, \acs{lp} & Phenomenological noise & \acs{bp} + \acs{osd} \\ \hspace{-2.5mm}\cite{gong_toward_2024} & \acs{bb} & Circuit-level noise & \acs{bp} + \acs{gdg} \\ \hspace{-2.5mm}\cite{kang_quits_2025} & \acs{hgp}, \acs{lp}, \acs{bpc} & Circuit-level noise & \acs{bp} + \ac{osd} % tex-fmt: on \end{tabular} \end{table} % \red{ % Existing work % \begin{itemize} % \item \cite{gong_toward_2024} % \begin{itemize} % \item BB codes (QLDPC) % \item Circuit-level noise % \item Sequential % \item Cites $\underbrace{\cite{dennis_topological_2002} % \cite{tan_scalable_2023} % \cite{skoric_parallel_2023}}_\text{Surface code} % \underbrace{\cite{huang_improved_2023}}_\text{QLDPC,Phenomenological}$ % \end{itemize} % \item \cite{huang_improved_2023} % \begin{itemize} % \item Hypergraph product codes, Lifted product codes (QLDPC) % \item Phenomenological noise % \item Sequential % \item Cites $\underbrace{\cite{dennis_topological_2002} % [Huang, Brown, 2021] % \cite{skoric_parallel_2023} % \cite{tan_scalable_2023} % \cite{bombin_modular_2023}}_\text{Surface code}$ % \end{itemize} % \item \cite{dennis_topological_2002} % \begin{itemize} % \item Surface code (Topological) % \item No idea what noise, don't care either (Gong et % al. say circuit-level noise) % \item ``Overlapping recovery'' -> Sequential % \end{itemize} % \item \cite{tan_scalable_2023} % \begin{itemize} % \item Surface code (Topological) % \item Circuit-level noise % \item Parallel % \item Cites \cite{dennis_topological_2002} % \end{itemize} % \item \cite{skoric_parallel_2023} % \begin{itemize} % \item Surface code (Topological) % \item Circuit-level noise % \item Parallel % \item Cites \cite{dennis_topological_2002} % \end{itemize} % \item \cite{huang_increasing_2024} % \begin{itemize} % \item Same as \cite{huang_improved_2023} % \end{itemize} % \item \cite{kuo_fault-tolerant_2024} % \begin{itemize} % \item Toric codes (Topological) % \item Circuit-level noise % \item Parallel % \item Cites \cite{dennis_topological_2002} % \cite{tan_scalable_2023} % \cite{skoric_parallel_2023} \cite{gong_toward_2024} % \end{itemize} % \item \cite{bombin_modular_2023} % \begin{itemize} % \item Surface codes (Topological) % \item No idea if it's even fault-tolerant % \item Parallel % \item Cites \cite{dennis_topological_2002} % \cite{tan_scalable_2023} % \cite{skoric_parallel_2023} \cite{leverrier_decoding_2022} % \end{itemize} % % This is not BP and not parallelization over the time dimension % % \item \cite{leverrier_decoding_2022} % % \begin{itemize} % % \item Quantum tanner codes (QLDPC) % % \item Parallel % % \item No idea if it's even fault-tolerant % % \item Cites [don't care] % % \end{itemize} % \item \cite{kang_quits_2025} % \begin{itemize} % \item Cites \cite{huang_increasing_2024} \ldots % \end{itemize} % \end{itemize} % } %%%%%%%%%%%%%%%% \subsection{Window Splitting and Sequential Sliding-Window Decoding} \label{subsec:Window Splitting and Sequential Sliding-Window Decoding} In this section, we will examine the methodology by which a detector error matrix is divided into overlapping windows. The algorithm detailed here follows \cite{kang_quits_2025}, which is in turn based on \cite{huang_increasing_2024}. % Very high-level overview Sliding-window decoding is made possible by the time-like structure of the syndrome extraction circuitry. This is especially clearly visible under the \ac{dem} formalism, where this manifests as a block-diagonal structure of the detector error matrix $\bm{H}$. Note that this presupposes a choice of detectors as seen in \Cref{subsec:Detector Error Matrix}. This block-diagonal structure introduces some locality in the interdependence between \acp{vn} and \acp{cn}. For each local set of \acp{vn}, there is only a local set of connected \acp{cn}. We exploit this fact by partitioning the matrix into overlapping windows. \Cref{fig:windowing_pcm} depicts this process using the $\llbracket 72, 6, 6 \rrbracket$ BB code as an example. % High-level overview How the locality is leveraged can be understood by considering the decoding process. After decoding a window, there is a subset of \acp{cn} that no longer contribute to decoding, since none of their neighboring \acp{vn} appear in subsequent windows. We call the set of \acp{vn} connected to those \acp{cn} the \emph{commit region} and we wish to commit them before moving to the next window, i.e., fix the values we estimate for the corresponding bits. As mentioned above, the benefit of this sequential sliding-window decoding approach is that the decoding process can begin as soon as the syndrome measurements for the first window are complete. % W and F and why we look at rows, not columns There are two degrees of freedom in how we perform the windowing. The \emph{window size} $W \in \mathbb{N}$ represents the number of syndrome extraction rounds lumped into one window, while the \emph{step size} $F \in \mathbb{N}$ represents the number of syndrome extraction rounds skipped before starting the next window. $W$ controls the size of the windows while $F$ controls the overlap between them. As illustrated in \Cref{fig:windowing_pcm}, $W$ and $F$ control the window dimensions and locations by defining the related \acp{cn}, not the \acp{vn}. This is because while the number of overall \acp{cn} is only affected by the choice of the underlying code and the number of syndrome measurement rounds, the number of \acp{vn} depends on the noise model and is difficult to predict beforehand. \begin{figure}[t] \centering \hspace*{-114mm}% \begin{tikzpicture} \draw[decorate, decoration={brace, amplitude=10pt}, line width=1pt] (0,0) -- (3.1,0) node[midway, above=4mm] {Commit region}; \end{tikzpicture} \centering \includegraphics[scale=0.75]{res/72_bb_dem.pdf} \vspace*{-25.3mm} \hspace*{-98mm}% \begin{tikzpicture} \draw[{Latex}-{Latex}, line width=.7pt] (0, -0.75mm) -- (0, 5mm); \draw[line width=1pt] (-1mm,-0.75mm) -- (3mm,-0.75mm); \draw[line width=1pt] (-1mm,5mm) -- (3mm,5mm); \node[left] at (-2mm,2.125mm) {$\sim W$}; \draw[{Latex}-{Latex}, line width=.3pt] (6.5cm,1.6mm) -- (6.5cm,5mm); \draw[line width=1pt] (6.5cm,4.9mm) -- (6.5cm,7mm); \node[above] at (6.5cm,7mm) {$\sim F$}; \end{tikzpicture} \vspace*{10mm} \caption{ Visualization of the windowing process on a detector error matrix generated from the $\llbracket 72, 6, 6 \rrbracket$ BB code under circuit-level noise. The block-diagonal structure reflects the time-like locality of the syndrome extraction circuit., with each block corresponding to one syndrome measurement round. Two consecutive windows are highlighted: the window size $W$ controls the number of syndrome rounds included in each window, while the step size $F$ controls how many rounds separate the start of one window from the next. The bracketed region indicates the commit region of the first window, i.e., the \acp{vn} that are committed before moving to the second window. % Visualization of the windowing process on a detector % error matrix generated from the $\llbracket 72, 6, 6 % \rrbracket$ BB code. } \label{fig:windowing_pcm} \end{figure} % Notation recap We briefly reintroduce the notation important for the definition of the windows. We use the variables $n,m \in \mathbb{N}$ to describe the number of \acp{vn} and \acp{cn} respectively. We index the \acp{vn} using the variable $i \in \mathcal{I} := [0:n-1]$ and the \acp{cn} using the variable $j \in \mathcal{J} := [ 0 : m-1]$. Finally, we call $\mathcal{N}_\text{V}(i) = \left\{ j\in \mathcal{J}: \bm{H}_{j,i} = 1 \right\}$ and $\mathcal{N}_\text{C}(j) := \left\{ i \in \mathcal{I} : \bm{H}_{j,i} = 1 \right\}$ the neighborhoods of the corresponding nodes. In this case, we take $\bm{H} \in \mathbb{F}_2^{m\times n}$ to be the check matrix of the underlying code, from which the \ac{dem} was generated. We use $m_\text{DEM}, \mathcal{I}_\text{DEM}$, and $\mathcal{J}_\text{DEM}$ to refer to the respective values defined from the detector error matrix. % How we get the corresponding rows We begin by describing the sets of \acp{cn} relevant to each window. For indexing, we use the variable $\ell \in [0:n_\text{win} - 1]$, where $n_\text{win} \in \mathbb{N}$ is the number of windows. Because we defined the step size $F$ as the number of syndrome extraction rounds to skip, the first \ac{cn} of window $\ell$ should have index $\ell F m$. Similarly, because of the way we defined the window size $W$, the number of \acp{cn} should be $Wm$ for all but the last window. The number of \acp{cn} in the last window may differ if there are not enough \acp{cn} left to completely fill it. We thus define \begin{align*} \mathcal{J}_\text{win}^{(\ell)} &:= \left\{ j\in \mathcal{J}_\text{DEM}:~ \ell F m \le j < \min \left\{m_\text{DEM}, (\ell F + W) m \right\} \right\} \\[2mm] & \hspace{30mm} \text{and} \\[2mm] \mathcal{J}_\text{commit}^{(\ell)} &:= \left\{ j\in \mathcal{J}_\text{DEM}:~ \ell F m \le j < \min \left\{m_\text{DEM}, (\ell + 1) F m \right\} \right\} .% \end{align*} $\mathcal{J}_\text{win}^{(\ell)}$ is the set of all \acp{cn} in the window while $\mathcal{J}_\text{commit}^{(\ell)}$ is the set of \acp{cn} that do not contribute to the next window and whose neighboring \acp{vn} will thus be committed. We can additionally define the set of \acp{cn} that are shared between windows $\ell$ and $\ell + 1$ as $\mathcal{J}_\text{overlap}^{(\ell)} := \mathcal{J}_\text{win}^{(\ell)}\setminus \mathcal{J}_\text{commit}^{(\ell)}$. % How we get the corresponding columns We can now turn our attention to defining the sets of \acp{vn} relevant to each window. We first introduce a helper function $i_\text{max} : \mathcal{P}(\mathbb{N}) \to \mathbb{N}$, which takes a set of \ac{cn} indices and returns the largest neighboring \ac{vn} index. We define \begin{align*} i_\text{max}\left( \mathcal{S} \right) := \max \left\{ i\in \mathcal{N}_\text{C}(j) : j\in \mathcal{S} \right\} , \end{align*} where we set $i_\text{max} (\emptyset) = -1$ by convention% \footnote{ This has the effect of later automatically setting the lower bounds for the indices in $\mathcal{I}_\text{commit}^{(\ell)}$ and $\mathcal{I}_\text{win}^{(\ell)}$ appropriately. }% . The commit region of window $\ell$ should include all of the \acp{vn} neighboring any of the \acp{cn} in $\mathcal{J}_\text{commit}^{(\ell)}$. Consequently, the maximum index of the \acp{vn} we consider should be $i_\text{max}(\mathcal{J}_\text{commit}^{(\ell)})$. Additionally, the set of \acp{vn} committed in the next window should start immediately afterwards. We thus define \begin{align*} \mathcal{I}_\text{commit}^{(\ell)} &:= \left\{i \in \mathcal{I}_\text{DEM} :~ i_\text{max}\left( \mathcal{J}_\text{commit}^{(\ell-1)} \right) < i \le i_\text{max}\left( \mathcal{J}_\text{commit}^{(\ell)} \right) \right\}\\[2mm] & \hspace{39mm} \text{and} \\[2mm] \mathcal{I}_\text{win}^{(\ell)} &:= \left\{i \in \mathcal{I}_\text{DEM} :~ i_\text{max}\left( \mathcal{J}_\text{commit}^{(\ell-1)} \right) < i \le i_\text{max}\left( \mathcal{J}_\text{win}^{(\ell)} \right) \right\} .% \end{align*} Again, we set $\mathcal{I}_\text{overlap}^{(\ell)} = \mathcal{I}_\text{win}^{(\ell)}\setminus \mathcal{I}_\text{commit}^{(\ell)}$. Note that we have \begin{align*} \bigcup_{\ell=0}^{n_\text{win}-1} \mathcal{I}_\text{commit}^{(\ell)} = \mathcal{I} \end{align*} and after decoding all windows we will therefore have committed all \acp{vn}. \begin{figure}[t] \centering \begin{tikzpicture} \def\sx{1.5} \def\sy{1.5} \coordinate (a00) at (0,0); \coordinate (a01) at (0, 3*\sy); \coordinate (a11) at (6*\sx, 3*\sy); \coordinate (a10) at (6*\sx, 0*\sy); \coordinate (b00) at (3.2*\sx, -1*\sy); \coordinate (b01) at (3.2*\sx, 2*\sy); \coordinate (b11) at (9.2*\sx, 2*\sy); \coordinate (b10) at (9.2*\sx, -1*\sy); \fill[gray!40] (a00) -- (a00 |- b01) -- (b01) -- (b01 |- a00) -- cycle; \draw (a00) -- (a01) -- (a11) -- (a10) -- cycle; \draw[densely dashed] (b00) -- (b01) -- (b11) -- (b10) -- cycle; \draw [ decorate, decoration={brace,amplitude=3mm,raise=1mm} ] (a01) -- (a11) node[midway,above,yshift=4mm]{$\mathcal{I}_\text{win}^{(\ell)}$}; \draw [ decorate, decoration={brace,amplitude=3mm,raise=1mm} ] (a00 -| b00) -- (a00) node[midway,below,yshift=-4mm]{$\mathcal{I}_\text{commit}^{(\ell)}$}; \draw [ decorate, decoration={brace,amplitude=3mm,raise=1mm} ] (a00) -- (a01) node[midway,xshift=-3mm,left]{$\mathcal{J}_\text{win}^{(\ell)}$}; \draw [ decorate, decoration={brace,amplitude=3mm,raise=1mm} ] (a11) -- (a11 |- b11) node[midway,xshift=3mm,right]{$\mathcal{J}_\text{commit}^{(\ell)}$}; \draw [ decorate, decoration={brace,amplitude=3mm,raise=1mm} ] (a11 |- b11) -- (a10) node[midway,xshift=3mm,right]{$\mathcal{J}_\text{overlap}^{(\ell)} := \mathcal{J}_\text{win}^{(\ell)} \setminus \mathcal{J}_\text{commit}^{(\ell)}$}; \draw [ decorate, decoration={brace,amplitude=3mm,raise=1mm} ] (a10) -- (a00 -| b00) node[midway,yshift=-8.25mm,xshift=-8mm,right]{$\mathcal{I}_\text{overlap}^{(\ell)} := \mathcal{I}_\text{win}^{(\ell)} \setminus \mathcal{I}_\text{commit}^{(\ell)}$}; \node[align=center] at ($(a00)!0.5!(b01)$) {% $\bm{H}_\text{overlap}^{(\ell)}$ \\[3mm] $= \left(\bm{H}_\text{DEM}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}, \mathcal{I}_\text{commit}^{(\ell)}}$% }; \end{tikzpicture} \caption{ Visual representation of the index sets used to define a sliding window. The solid box delimits the rows ($\mathcal{J}_\text{win}^{(\ell)}$) and columns ($\mathcal{I}_\text{win}^{(\ell)}$) of the detector error matrix considered when decoding window $\ell$, while the dashed box shows the analogous region for window $\ell + 1$. The shaded region marks the submatrix $\bm{H}_\text{overlap}^{(\ell)}$, whose rows correspond to the overlap CNs $\mathcal{J}_\text{overlap}^{(\ell)}$ shared with the next window, and whose columns correspond to the committed VNs $\mathcal{I}_\text{commit}^{(\ell)}$. After decoding window $\ell$, this submatrix is used to update the syndrome of the overlap CNs based on the committed bit estimates. } \label{fig:vis_rep} \end{figure} % Syndrome update \Cref{fig:vis_rep} illustrates the meaning of the various sets of nodes. We can also see a subtlety we must handle carefully when moving on to decode the next window. While the \acp{vn} in $\mathcal{J}_\text{commit}^{(\ell)}$ have no bearing on the further decoding process, the values commited for the \acp{vn} in $\mathcal{I}_\text{commit}^{(\ell)}$ do. This is the case because these \acp{vn} have neighboring \acp{cn} in the next window. The part of the detector error matrix $\bm{H}_\text{DEM}$ describing these connections is $\bm{H}_\text{overlap}^{(\ell)} = \left(\bm{H}_\text{DEM}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}, \mathcal{I}_\text{commit}^{(\ell)}}$. We have to account for this fact by updating the syndrome $\bm{s}$ based on the committed bit values. Specifically, if $\hat{\bm{e}}_\text{commit}^{(\ell)}$ describes the error estimates committed after decoding window $\ell$, we have to set \begin{align*} \left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}} = \bm{H}_\text{overlap}^{(\ell)} \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T} .% \end{align*} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Warm-Start Sliding-Window Decoding} \label{sec:warm_start_bp} % Intro: Problem with above procedure The sliding-window structure visible in \Cref{fig:windowing_pcm} is highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes. Switching our viewpoint to the Tanner graph depicted in \Cref{fig:messages_decimation_tanner}, however, we can see an important difference between \ac{sc}-\ac{ldpc} decoding and the sliding-window decoding procedure detailed above. While the windowing process is similar, the algorithm above reinitializes the decoder to start from a clean state when moving to the next window. It therefore does not make use of the integral property of windowed \ac{sc}-\ac{ldpc} decoding of exploiting the spatially coupled structure by passing soft information from earlier to later spatial positions. % Passing messages requires messages The act of passing messages from one window to the next requires there being messages after completing decoding of one window that are still relevant to the decoding of the next. This may somewhat limit the variety of \emph{inner decoders}, i.e., the decoders decoding the individual windows, the warm-start initialization can be used with. E.g., \ac{bp}+\ac{osd} does not immediately seem suitable, though this remains to be investigated. We chose to investigate first plain \ac{bp} due to its simplicity and then \ac{bpgd} because of the availability of recently computed messages. % TODO: Include this? % \content{Mention that our own work ties into the bottom category in % \Cref{fig:literature}} %%%%%%%%%%%%%%%% \subsection{Warm Start For Belief Propagation Decoding} \label{subsec:Warm-Start Belief Propagation} \begin{figure}[t] \centering \tikzset{ VN/.style={ circle, fill=KITgreen, minimum width=1mm, minimum height=1mm, }, CN/.style={ rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm, }, } \begin{tikzpicture}[node distance = 5mm] \node[VN] (vn00) {}; \node[VN, below = of vn00] (vn01) {}; \node[VN, below = of vn01] (vn02) {}; \node[VN, below = of vn02] (vn03) {}; \node[VN, below = of vn03] (vn04) {}; \coordinate (temp) at ($(vn01)!0.5!(vn02)$); \node[CN, left =10mm of temp] (cn00) {}; \node[CN, below = of cn00] (cn01) {}; \draw (vn00) -- (cn00); \draw (vn01) -- (cn00); \draw (vn03) -- (cn00); \draw (vn01) -- (cn01); \draw (vn02) -- (cn01); \draw (vn04) -- (cn01); \foreach \i in {1,2,3,4} { \pgfmathtruncatemacro{\prev}{\i-1} \node[VN, right = 25mm of vn\prev 0] (vn\i0) {}; \node[VN, below = of vn\i0] (vn\i1) {}; \node[VN, below = of vn\i1] (vn\i2) {}; \node[VN, below = of vn\i2] (vn\i3) {}; \node[VN, below = of vn\i3] (vn\i4) {}; \coordinate (temp) at ($(vn\i1)!0.5!(vn\i2)$); \node[CN, left = 10mm of temp] (cn\i0) {}; \node[CN, below = of cn\i0] (cn\i1) {}; \draw (vn\i0) -- (cn\i0); \draw (vn\i1) -- (cn\i0); \draw (vn\i3) -- (cn\i0); \draw (vn\i1) -- (cn\i1); \draw (vn\i2) -- (cn\i1); \draw (vn\i4) -- (cn\i1); } \foreach \i in {1,2,3,4} { \pgfmathtruncatemacro{\prev}{\i-1} \draw (vn\prev 3) -- (cn\i 0); \draw (vn\prev 4) -- (cn\i 1); } \node[ draw, inner sep=5mm,line width=1pt, fit=(vn00)(vn04)(cn00)(cn01)(vn20)(vn24)(cn20)(cn21) ] (box1) {}; \node[ draw, dashed, inner sep=5mm, inner ysep=8mm,line width=1pt, fit=(vn10)(vn14)(cn10)(cn11)(vn30)(vn34)(cn30)(cn31) ] (box2) {}; \draw[KITorange, line width=2pt] (cn10) -- (vn10); \draw[KITorange, line width=2pt] (cn10) -- (vn11); \draw[KITorange, line width=2pt] (cn10) -- (vn13); \draw[KITorange, line width=2pt] (cn11) -- (vn11); \draw[KITorange, line width=2pt] (cn11) -- (vn12); \draw[KITorange, line width=2pt] (cn11) -- (vn14); \draw[KITorange, line width=2pt] (vn13) -- (cn20); \draw[KITorange, line width=2pt] (vn14) -- (cn21); \draw[KITorange, line width=2pt] (cn20) -- (vn20); \draw[KITorange, line width=2pt] (cn20) -- (vn21); \draw[KITorange, line width=2pt] (cn20) -- (vn23); \draw[KITorange, line width=2pt] (cn21) -- (vn21); \draw[KITorange, line width=2pt] (cn21) -- (vn22); \draw[KITorange, line width=2pt] (cn21) -- (vn24); % Marker for W on the bottom \draw[line width=1pt] ([yshift=-5mm, line width=1pt]box1.south west) -- ++(0,-4mm) coordinate (dim1l); \draw[line width=1pt] ([yshift=-5mm]box1.south east) -- ++(0,-4mm) coordinate (dim1r); \draw[{Latex}-{Latex}, line width=1pt] ([yshift=1mm]dim1l) -- ([yshift=1mm]dim1r) node[midway, below=2pt] {$W$}; % Marker for F on top \draw[line width=1pt] ([yshift=3mm]box2.north west) -- ++(0,4mm) coordinate (dim3l); \draw[line width=1pt] ([yshift=3mm]box2.north west -| box1.north west) -- ++(0,4mm) coordinate (dim3r); \draw[{Latex}-{Latex}, line width=1pt] ([yshift=-1mm]dim3l) -- ([yshift=-1mm]dim3r) node[midway, above=2pt] {$F$}; % Arrow on the top right \draw[-{Latex}, line width=1pt] ([yshift=8mm] box1.north east) -- ++(28mm,0); \end{tikzpicture} \caption{ \red{Visualization of the messages used for the initialization of the next window under BP decoding.} \Acfp{vn} are represented using green circles while \acfp{cn} are represented using blue squares. } \label{fig:messages_tanner} \end{figure} % Proposed modification: Overview We propose a modification to the procedure detailed in \Cref{subsec:Window Splitting and Sequential Sliding-Window Decoding}: Instead of zero-initializing the \ac{bp} messages of the next window, we perform a \emph{warm start} by initializing the messages in the overlapping region to the values last held during the decoding of the previous window. % Practical realization: Problem with naive approach To see how we realize this in practice, we reiterate the steps of the \ac{bp} algorithm \begin{align} \label{eq:init} \text{Initialization: } & L_{i \rightarrow j} = \tilde{L}_i \\[3mm] \text{\ac{cn} Update (SPA): }& \displaystyle L_{i \leftarrow j} = 2\cdot(-1)^{s_j}\cdot\tanh^{-1} \!\left( \prod_{i' \in \mathcal{N}_\text{C}(j)\setminus\{i\}} \tanh\frac{L_{i'\rightarrow j}}{2} \right) \\[3mm] \text{\ac{cn} Update (Min-Sum): }& \displaystyle L_{i \leftarrow j} = (-1)^{s_j}\cdot \prod_{i' \in \mathcal{N}(j)\setminus \{i\}} \sign \left( L_{i' \rightarrow j} \right) \cdot \min_{i' \in \mathcal{N}(j)\setminus \{i\}} \lvert L_{i'\rightarrow j} \rvert \\[3mm] \label{eq:vn_update} \text{\ac{vn} Update: } & \displaystyle L_{i \rightarrow j} = \tilde{L}_i + \sum_{j' \in \mathcal{N}_\text{V}(i)\setminus\{j\}} L_{i \leftarrow j'} \end{align} and turn our attention to \Cref{fig:messages_tanner}. We consider the right-most boundary of the first window, drawn with a solid line. The fact that we partition the overall Tanner graph at this location, i.e., with the last nodes of the last window being \acp{vn} and the first nodes of the next window being \acp{cn}, is due to the windowing construction detailed in \Cref{subsec:Window Splitting and Sequential Sliding-Window Decoding}. We consider the edges connecting the last set of \acp{vn} still in the first window to the next set of \acp{cn}. These edges are the routes along which information is transferred to subsequent spatial positions, in the form of the \ac{vn} to \ac{cn} messages $L_{i\rightarrow j}$. Note that these edges are not considered during the decoding the first window, since they leave its bounds. Consequently, no messages have been computed for these when the decoding of the first window completes. This means that simply initializing the edges in the overlap region with the exising $L_{i\rightarrow j}$ messages and starting the decoding of the next window with a \ac{cn} update is not enough. % Practical realization: working approach We can resolve this issue by initializing the edges using the existing \ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ and beginning the decoding of the next window with a \ac{vn} update instead. This way, we recompute the existing $L_{i\rightarrow j}$ messages and additionally compute the messages crossing the window boundary. We can then continue decoding the next window as usual. % Practical realization: Simplification of algorithm We can further simplify the algorithm. Looking carefully at \Cref{eq:vn_update} we notice that when the \ac{cn} to \ac{vn} messages $L_{i\leftarrow j}$ have been zero-initialized, the \ac{vn} update degenerates to \begin{align*} \displaystyle L_{i \rightarrow j} = \tilde{L}_i + \sum_{j' \in \mathcal{N}_\text{V}(i)\setminus\{j\}} L_{i \leftarrow j'} = \tilde{L}_i ,% \end{align*} i.e., the \ac{vn} update \Cref{eq:vn_update} becomes the same as the initialization step \Cref{eq:init}. We conclude that as long as we zero-initialize the $L_{i\leftarrow j}$ messages, there is no need for a separate initialization step. \Cref{alg:warm_start_bp} shows the full warm-start sliding-window decoding algorithm using \ac{bp} as the inner decoder for the windows. Note that the decoding procedure performed on the individual windows (lines 4-8 in \Cref{alg:warm_start_bp}) is functionally equivalent to \Cref{alg:syndome_bp} when using the \acf{spa} variant of \ac{bp}. % tex-fmt: off \tikzexternaldisable \begin{algorithm}[t] \caption{Sliding-window belief propagation (BP) decoding algorithm with warm start.} \label{alg:warm_start_bp} \begin{algorithmic}[1] \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$ \State \textbf{Initialize:} $L_{i\leftarrow j} = 0 ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$ \For{$\ell = 0, \ldots, n_\text{win}-1$} \For{$\nu = 0, \ldots, n_\text{iter}-1$} \State Perform \ac{vn} update for window $\ell$ \State Perform \ac{cn} update for window $\ell$ \State Compute $\hat{\bm{e}}^{(\ell)}$ and check early termination condition \EndFor \State $\displaystyle\left(\hat{\bm{e}}^\text{total}\right)_{\mathcal{I}^{(\ell)}_\text{commit}} \leftarrow \hat{\bm{e}}^{(\ell)}_\text{commit}$ \State $\displaystyle\left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}} \leftarrow \bm{H}_\text{overlap}^{(\ell)} \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}$ \If{$\ell < n_\text{win} - 1$} \State $L^{(\ell+1)}_{i\leftarrow j} \leftarrow L^{(\ell)}_{i\leftarrow j} ~\forall~ i \in \mathcal{I}_\text{overlap}^{(\ell)}, j \in \mathcal{J}_\text{overlap}^{(\ell)}$ \EndIf \EndFor \State \textbf{return} $\hat{\bm{e}}$ \end{algorithmic} \end{algorithm} \tikzexternalenable % tex-fmt: on %%%%%%%%%%%%%%%% \subsection{Warm Start for Belief Propagation with Guided Decimation Decoding} \label{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding} % Intro: Recap of BPGD We now direct our attention at using \ac{bpgd} as an inner decoder. Recall that for \ac{bpgd}, after a number $T \in \mathbb{N}$ of iterations we decimate the most reliable \ac{vn}, meaning we perform a hard decision and remove it from the following decoding process. This means that when moving from one window to the next, we now have more information available: not just the \ac{bp} messages but also the information about what \acp{vn} were decimated and to what values. We call this \emph{decimation information} in the following. We can extend \Cref{alg:warm_start_bp} by additionally passing the decimation information after initializing the \ac{cn} to \ac{vn} messages. \Cref{fig:messages_decimation_tanner} visualizes this process. % TODO: Do this in the fundamentals chapter. Then write a proper % algorithm for warm-start sliding-window decoding with BPGD as well %\content{(?) Explicitly mention decimation info = channel llrs?} \begin{figure}[t] \centering \tikzset{ VN/.style={ circle, fill=KITgreen, minimum width=1mm, minimum height=1mm, }, CN/.style={ rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm, }, } \begin{tikzpicture}[node distance = 5mm] \node[VN] (vn00) {}; \node[VN, below = of vn00] (vn01) {}; \node[VN, below = of vn01] (vn02) {}; \node[VN, below = of vn02] (vn03) {}; \node[VN, below = of vn03] (vn04) {}; \coordinate (temp) at ($(vn01)!0.5!(vn02)$); \node[CN, left =10mm of temp] (cn00) {}; \node[CN, below = of cn00] (cn01) {}; \draw (vn00) -- (cn00); \draw (vn01) -- (cn00); \draw (vn03) -- (cn00); \draw (vn01) -- (cn01); \draw (vn02) -- (cn01); \draw (vn04) -- (cn01); \foreach \i in {1,2,3,4} { \pgfmathtruncatemacro{\prev}{\i-1} \node[VN, right = 25mm of vn\prev 0] (vn\i0) {}; \node[VN, below = of vn\i0] (vn\i1) {}; \node[VN, below = of vn\i1] (vn\i2) {}; \node[VN, below = of vn\i2] (vn\i3) {}; \node[VN, below = of vn\i3] (vn\i4) {}; \coordinate (temp) at ($(vn\i1)!0.5!(vn\i2)$); \node[CN, left = 10mm of temp] (cn\i0) {}; \node[CN, below = of cn\i0] (cn\i1) {}; \draw (vn\i0) -- (cn\i0); \draw (vn\i1) -- (cn\i0); \draw (vn\i3) -- (cn\i0); \draw (vn\i1) -- (cn\i1); \draw (vn\i2) -- (cn\i1); \draw (vn\i4) -- (cn\i1); } \foreach \i in {1,2,3,4} { \pgfmathtruncatemacro{\prev}{\i-1} \draw (vn\prev 3) -- (cn\i 0); \draw (vn\prev 4) -- (cn\i 1); } \node[ draw, inner sep=5mm,line width=1pt, fit=(vn00)(vn04)(cn00)(cn01)(vn20)(vn24)(cn20)(cn21) ] (box1) {}; \node[ draw, dashed, inner sep=5mm, inner ysep=8mm,line width=1pt, fit=(vn10)(vn14)(cn10)(cn11)(vn30)(vn34)(cn30)(cn31) ] (box2) {}; \draw[KITorange, line width=2pt] (cn10) -- (vn10); \draw[KITorange, line width=2pt] (cn10) -- (vn11); \draw[KITorange, line width=2pt] (cn10) -- (vn13); \draw[KITorange, line width=2pt] (cn11) -- (vn11); \draw[KITorange, line width=2pt] (cn11) -- (vn12); \draw[KITorange, line width=2pt] (cn11) -- (vn14); \draw[KITorange, line width=2pt] (vn13) -- (cn20); \draw[KITorange, line width=2pt] (vn14) -- (cn21); \draw[KITorange, line width=2pt] (cn20) -- (vn20); \draw[KITorange, line width=2pt] (cn20) -- (vn21); \draw[KITorange, line width=2pt] (cn20) -- (vn23); \draw[KITorange, line width=2pt] (cn21) -- (vn21); \draw[KITorange, line width=2pt] (cn21) -- (vn22); \draw[KITorange, line width=2pt] (cn21) -- (vn24); \node[VN, draw=KITorange, fill=KITorange] at (vn10) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn11) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn12) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn13) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn14) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn20) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn21) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn22) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn23) {}; \node[VN, draw=KITorange, fill=KITorange] at (vn24) {}; % Marker for W on the bottom \draw[line width=1pt] ([yshift=-5mm, line width=1pt]box1.south west) -- ++(0,-4mm) coordinate (dim1l); \draw[line width=1pt] ([yshift=-5mm]box1.south east) -- ++(0,-4mm) coordinate (dim1r); \draw[{Latex}-{Latex}, line width=1pt] ([yshift=1mm]dim1l) -- ([yshift=1mm]dim1r) node[midway, below=2pt] {$W$}; % Marker for F on top \draw[line width=1pt] ([yshift=3mm]box2.north west) -- ++(0,4mm) coordinate (dim3l); \draw[line width=1pt] ([yshift=3mm]box2.north west -| box1.north west) -- ++(0,4mm) coordinate (dim3r); \draw[{Latex}-{Latex}, line width=1pt] ([yshift=-1mm]dim3l) -- ([yshift=-1mm]dim3r) node[midway, above=2pt] {$F$}; % Arrow on the top right \draw[-{Latex}, line width=1pt] ([yshift=8mm] box1.north east) -- ++(28mm,0); \end{tikzpicture} \caption{ \red{Visualization of the messages and decimation information used for the initialization of the next window under \ac{bpgd} decoding}. \Acfp{vn} are represented using green circles while \acfp{cn} are represented using blue squares. } \label{fig:messages_decimation_tanner} \end{figure} % % tex-fmt: off % \tikzexternaldisable % \begin{algorithm}[t] % \caption{Sliding-window decoding algorithm with warm start for generic inner decoder.} % \label{alg:warm_start_general} % \begin{algorithmic}[1] % \State \textbf{Initialize:} $\hat{\bm{e}}^\text{total} \leftarrow \bm{0}$ % \State \textbf{Initialize:} $L_{i\leftarrow j} = 0 % ~\forall~ i\in \mathcal{I}, j\in \mathcal{J}$ % \For{$\ell = 0, \ldots, n_\text{win}-1$} % \State Obtain $\hat{\bm{e}}^{(\ell)}$ from inner decoder % \State $\displaystyle\left(\hat{\bm{e}}^\text{total}\right)_{\mathcal{I}^{(\ell)}_\text{commit}} \leftarrow \hat{\bm{e}}^{(\ell)}_\text{commit}$ % \State $\displaystyle\left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}} % \leftarrow \bm{H}_\text{overlap}^{(\ell)} % \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}$ % \If{$\ell < n_\text{win} - 1$} % \State $L^{(\ell+1)}_{i\leftarrow j} \leftarrow % L^{(\ell)}_{i\leftarrow j} % ~\forall~ i \in \mathcal{I}_\text{overlap}^{(\ell)}, % j \in \mathcal{J}_\text{overlap}^{(\ell)}$ % \EndIf % \EndFor % \State \textbf{return} $\hat{\bm{e}}$ % \end{algorithmic} % \end{algorithm} % \tikzexternalenable % % tex-fmt: on % % \content{Make algorithm 4 specific to BPGD?} \section{Numerical Results} \label{sec:Numerical Results} % Intro In this section, we perform numerical experiments to evaluate the modification to sliding-window decoding we introduced in \Cref{sec:warm_start_bp}. For the practical aspects of implementation, several layers of abstraction must be considered. % Software stack: Layer 1 The lowest layer is the circuit-level simulator. This serves as the backbone of all further simulations, handling the quantum mechanical aspects of the system, including the modeling of noise on gates, idling qubits, and measurements according to the chosen noise model. % Software stack: Layer 2 Moving one level of abstraction higher, the syndrome extraction circuit itself must be generated. This entails constructing the full circuit, including the ancilla measurements and the error locations introduced by the chosen noise model, both of which depend on the code and noise model in question. % Software stack: Layer 3 Even further up, given an already constructed syndrome extraction circuit and the resulting \acf{dem}, we must split the detector error matrix into separate windows and manage the interplay between the inner decoders acting on those individual windows. % Software stack: Layer 4 Finally, we require the decoder itself, which operates on a \acf{pcm} and a syndrome, with no dependence on the complexity of the layers below. % Software stack: Tools In our implementation, Stim \cite{gidney_stim_2021} served as the circuit-level simulator, chosen for its efficiency and native support for the \ac{dem} formalism. For the circuit generation, we employed utilities from QUITS \cite{kang_quits_2025}, which provides syndrome extraction circuitry generation for a number of different \ac{qldpc} codes. We initially created a Python implementation, which used QUITS for the window splitting and subsequent sliding-window decoding as well. The \ac{bp} and \ac{bpgd} decoders were also initially implemented in Python. After a preliminary investigation, we opted for a complete reimplementation in Rust to achieve higher simulation speeds leveraging the compiled nature of the language. We reimplemented both the window splitting and the decoders. % Global experimental setup We chose to carry out our simulations on \ac{bb} codes, as they have recently emerged as particularly promising candidates for practical \ac{qec}, offering high encoding rates and large minimum distances while admitting short-depth syndrome extraction circuits \cite[Sec.~1]{bravyi_high-threshold_2024}. Specifically, we chose the $\llbracket 144, 12, 12 \rrbracket$ BB code, as it represents a good trade-off between code size and simulation tractability. For the generation of the \ac{dem} we set the number of syndrome extraction rounds to $12$, similarly to \cite{gong_toward_2024}, and we defined our detectors as in the example in \Cref{subsec:Detector Error Matrix}. We employed circuit-lose noise as described in \Cref{subsec:Choice of Noise Model} as our noise model, specifically standard ciruit-based depolarizing noise \cite[Sec.~VIII]{fowler_high-threshold_2009}, i.e., all error locations in the circuit get assigned the same physical error probability. We report performance in terms of the per-round \ac{ler} as defined in \Cref{subsec:Per-Round Logical Error Rate} and all datapoints were generated by simulating at least $200$ logical error events. %%%%%%%%%%%%%%%% \subsection{Belief Propagation} \label{subsec:Belief Propagation} % Local experimental setup We began our investigation by using \ac{bp} with no further modifications as the inner decoder. We chose the min-sum variant of \ac{bp} due to its low computational complexity. % [Thread] Get impression for max gain We initially wanted to gain an impression for the performance gain we could expect from a modification to the sliding-window decoding procedure. To this end, we began by analyzing the decoding performance of the original process, without our warm-start modification. We will call this \emph{cold-start} decoding in the following. Because we expected more global decoding to work better (the inner decoder then has access to a larger portion of the long-range correlations encoded in the detector error matrix before any commit is made) we initially decided to use decoding on the whole detector error matrix as a proxy for the attainable decoding performance. \begin{figure}[t] \centering \begin{tikzpicture} \begin{axis}[ width=\figwidth, height=\figheight, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, scaled x ticks=false, xlabel={Physical error rate}, ylabel={Per-round-LER}, ] \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp \addlegendentryexpanded{$W = \W$} } \addplot+[mark=*, solid, mark options={fill=black}, black] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/whole/SyndromeMinSumDecoder/max_iter_200/LERs.csv}; \addlegendentry{Whole} \end{axis} \end{tikzpicture} \caption{ \red{\lipsum[2]} } \label{fig:whole_vs_cold} \end{figure} % [Experimental parameters] Figure 4.6 \Cref{fig:whole_vs_cold} shows the simulation results for this initial investigation. The three colored curves correspond to cold-start sliding-window decoding with window sizes $W \in \{3, 4, 5\}$, all with the step size fixed to $F = 1$, while the black curve gives the per-round \ac{ler} obtained when decoding on the whole detector error matrix at once. In all cases, the inner \ac{bp} decoder was allowed a maximum of $200$ iterations, and the physical error rate was swept from $p = 0.001$ to $p = 0.004$ in steps of $0.0005$. % [Description] Figure 4.6 Across the entire range of physical error rates, all curves exhibit the expected monotonic increase in logical error rate with increasing physical noise. The $W = 3$ decoder consistently yields the highest LER, performing roughly an order of magnitude worse than the baseline at low physical error rates. Increasing the window size to $W = 4$ substantially closes this gap, and the $W = 5$ curve nearly coincides with the whole-block decoder across the full range of physical error rates. % [Interpretation] Figure 4.6 This behavior is consistent with the intuition behind sliding-window decoding. The detector error matrix encodes correlations between detection events that span the full syndrome extraction history, so errors lying in the commit region of an early window are in general constrained by check nodes that only become visible in subsequent windows. Larger windows expose the inner decoder to more of these constraints before any commit is made, leading to better-informed decisions and a lower per-round \ac{ler}. Decoding the whole matrix at once represents the limiting case of this trend and, as expected, achieves the strongest performance. The fact that the $W = 5$ curve is already very close to the whole-block decoder indicates that the marginal benefit of enlarging the window saturates after a certain point. From a practical standpoint, the choice of $W$ thus represents a trade-off between decoding latency and accuracy: larger windows delay the start of decoding by requiring more syndrome extraction rounds to be collected upfront, while the diminishing returns above $W = 4$ suggest that growing the window much further yields little additional accuracy in return. % [Thread] First comparison with warm start Next, we additionally generated error rate curves for warm-start sliding-window decoding to assess how much of the gap between cold-start and whole-block decoding can be recovered by our modification. We chose the same window sizes as before, so that the warm- and cold-start curves can be compared directly at matching values of $W$. \begin{figure}[t] \centering \begin{tikzpicture} \begin{axis}[ width=\figwidth, height=\figheight, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, scaled x ticks=false, xlabel={Physical error rate}, ylabel={Per-round-LER}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \foreach \W/\col/\mark in {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} { \edef\temp{\noexpand \addplot+[ mark=\mark, densely dashed, mark options={fill=\col}, \col, forget plot ] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp } \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_True/F_1/W_\W/LERs.csv}; } \temp \addlegendentryexpanded{$W = \W$} } \addplot+[mark=*, solid, mark options={fill=black}, black] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/whole/SyndromeMinSumDecoder/max_iter_200/LERs.csv}; \addlegendentry{Whole} \end{axis} \end{tikzpicture} \caption{ \red{\lipsum[2]} } \label{fig:whole_vs_cold_vs_warm} \end{figure} % [Experimental parameters] Figure 4.7 \Cref{fig:whole_vs_cold_vs_warm} extends the previous comparison by additionally including the warm-start variant of sliding-window decoding. The dashed colored curves reproduce the cold-start results from \Cref{fig:whole_vs_cold}, while the solid colored curves show the corresponding warm-start runs for the same window sizes $W \in \{3, 4, 5\}$. The remaining experimental parameters are unchanged: the step size is fixed to $F = 1$, the inner \ac{bp} decoder is allowed up to $200$ iterations per window invocation, the black curve again gives the whole-block reference, and the physical error rate is swept from $p = 0.001$ to $p = 0.004$ in steps of $0.0005$. % [Description] Figure 4.7 For each window size, the warm-start variant consistently outperforms its cold-start counterpart, with the dashed curves lying above the corresponding solid curves across the entire range of physical error rates. The performance gap between the two approaches is most pronounced for the largest window ($W = 5$) and gradually narrows as the window size decreases. Additionally, the gap between the cold- and warm-start curves generally widens as the physical error rate decreases. % [Interpretation] Figure 4.7 The improvement of warm-start over cold-start decoding matches the motivation for the modification: By reusing already existing messages from the previous window in the overlap region, the next window invocation has additional information at its disposal about the reliability of the \acp{vn} and \acp{cn}. The widening of the gap towards larger window sizes is consistent with this picture, since with $F$ fixed to $1$ the overlap between consecutive windows spans $W - F = W - 1$ syndrome rounds, so larger $W$ implies that more messages are carried over and a larger fraction of the next window starts in a warm state. % TODO: Possibly insert explanation for higher gain at lowre error rates A perhaps surprising observation is that the warm-start curve for $W = 5$ actually lies below the whole-block reference across the entire range of physical error rates, even though warm-start sliding-window decoding is, by construction, more local than whole-block decoding. A possible explanation for this effect is discussed in the following. % [Thread] Warm start is better than whole due to more effective iterations A possible explanation for this surprising behavior lies in the number of \ac{bp} iterations effectively spent on the \acp{vn} inside the overlap region. Each \ac{vn} in such an overlap is processed by multiple consecutive window invocations, and because every new window resumes from the messages left over by its predecessor, these invocations effectively accumulate iterations on the same \acp{vn} rather than restarting from scratch. The whole-block decoder, by contrast, performs only a single run of at most $200$ iterations on the entire detector error matrix, so each of its \acp{vn} receives at most that many iterations. It seems this larger effective iteration budget on the overlap regions can outweigh the loss of globality incurred by windowing. A natural way to test this hypothesis is to raise the maximum number of \ac{bp} iterations of the whole-block decoder until its per-round \ac{ler} saturates. If the above interpretation is correct, the resulting saturation level should constitute a lower bound that no windowed scheme, irrespective of the initialization, can beat, since by construction whole-block decoding has access to the full set of constraints available to any window. \begin{figure}[t] \centering \begin{tikzpicture} \def\spyxmin{32} \def\spyxmax{512} \def\spyymin{5e-3} \def\spyymax{7e-2} \newcommand{\plotcurves}{% \foreach \W/\col/\mark in {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} { \edef\temp{\noexpand \addplot+[mark=\mark, densely dashed, forget plot, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp } \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col, forget plot] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_True/F_1/W_\W/LERs.csv}; } \temp } \addplot+[mark=*, solid, mark options={fill=black}, black, forget plot] table[col sep=comma, x=max_iter, y=LER_per_round] {res/sim/max_iter/SyndromeMinSumDecoder/p_0.0025/LERs.csv}; } \begin{axis}[ name=main, width=\figwidth, height=\figheight, ymode=log, enlargelimits=false, ymin=1e-3, ymax=1e-1, grid=both, legend pos=north east, xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096}, xticklabels={$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$}, xticklabel style={/pgf/number format/fixed}, scaled x ticks=false, xlabel={Number of BP iterations}, ylabel={Per-round-LER}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \plotcurves \addlegendimage{KITred, mark=triangle*} \addlegendentry{$W = 3$} \addlegendimage{KITblue, mark=diamond*} \addlegendentry{$W = 4$} \addlegendimage{KITorange, mark=square*} \addlegendentry{$W = 5$} \addlegendimage{black, mark=*} \addlegendentry{Whole} \node[draw=black, fit={(axis cs:\spyxmin,\spyymin) (axis cs:\spyxmax,\spyymax)}, inner sep=0pt, name=spybox] {}; \end{axis} \begin{axis}[ name=inset, at={(main.north)}, anchor=south, xshift=0mm, yshift=6mm, width=6.5cm, height=4.875cm, ymode=log, enlargelimits=false, xmin=\spyxmin, xmax=\spyxmax, ymin=\spyymin, ymax=\spyymax, xtick={32,128,256, 512}, yticklabels={\empty}, xticklabels={\empty}, grid=both, axis background/.style={fill=white}, ] \plotcurves \end{axis} \draw (spybox.north east) -- (inset.south west); \end{tikzpicture} \caption{ \red{\lipsum[2]} } \label{fig:bp_w_over_iter} \end{figure} % [Experimental parameters] Figure 4.8 \Cref{fig:bp_w_over_iter} shows the per-round \ac{ler} as a function of the maximum number of \ac{bp} iterations granted to the inner decoders. The dashed colored curves correspond to cold-start sliding-window decoding for $W \in \{3, 4, 5\}$, the solid colored curves to the corresponding warm-start sliding-window decoding, and the black curve to the whole-block reference. The physical error rate is fixed at $p = 0.0025$, the step size at $F = 1$, and the iteration budget is swept over $n_\text{iter} \in \{32, 128, 256, 512, 1024, 2048, 4096\}$. The enlarged plot magnifies the low-iteration regime $n_\text{iter} \in [32, 512]$. % [Description] Figure 4.8 All curves decrease monotonically with the iteration budget, but contrary to our expectation, none of them appears to fully saturate within the swept range: even at $n_\text{iter} = 4096$, every curve still exhibits a noticeable downward slope. At $n_\text{iter} = 32$, the whole-block curve lies below both the $W=4$ and $W=5$ sliding-window curves. At $n_\text{iter} = 128$ the whole-block curve already performs better than the $W=4$ sliding-window curve and at $n_\text{iter} = 512$ the whole-block and warm-start $W = 5$ curves also cross. From this point onwards, the whole-block decoder lies strictly below all windowed schemes, this difference becoming more pronounced as the iteration budget grows further. Within the magnified plot, the gap between the warm-start and cold-start curves at fixed $W$ is largest for the smallest iteration counts and shrinks rapidly as $n_\text{iter}$ grows, and at fixed $n_\text{iter}$ the size of this gap grows with the window size, mirroring the behavior already observed in \Cref{fig:whole_vs_cold_vs_warm}. % [Interpretation] Figure 4.8 These observations are largely consistent with the effective-iterations hypothesis put forward above. The whole-block decoder eventually overtaking every windowed scheme matches the prediction made there: with a sufficiently large iteration budget, the whole-block decoder reaches an error rate that nonone of the windowed schemes can beat, because of the more global nature of the considered constraints. Furthermore, the pronounced advantage of warm- over cold-start decoding at low numbers of iterations makes sense if we consider the overall trend of the plots. At low iteration budgets, each additional iteration is worth more than at high budgets. As the number of permitted iteration increases, the benefit of the additional ``free'' iterations gained due to the the warm-start initialization diminishes, and the curves approach each other. The fact that no curve clearly saturates within the swept range is itself worth noting. We know that \ac{bp} on \ac{qldpc} codes suffers from poor convergence due to the short cycles in the underlying Tanner graph, so even after several thousand iterations the decoder may continue to slowly refine its message estimates rather than settle into a stable fixed point. This is one of the core motivations for moving from plain \ac{bp} to the guided-decimation variant studied in \Cref{subsec:Belief Propagation with Guided Decimation}. Another thing to note is that setting the per-invocation iteration budget of the inner decoder equal to the iteration budget of the whole-block decoder is not a fair comparison in terms of total computational effort. The sliding-window scheme processes each \ac{vn} in an overlap region multiple times and therefore spends more iterations overall. In the context of \ac{qec}, however, the relevant figure of merit is not total compute but decoding latency, and in terms of latency the sliding-window approach is still at an advantage. % [Thread] Exploration of the effect of the step size Having examined the effect of the window size $W$, we next turned to the second windowing parameter, the step size $F$. We carried out an investigation analogous to the one above: we first compared warm- and cold-start decoding across the full range of physical error rates at a fixed iteration budget, and then we examined the dependence on the iteration budget at a fixed physical error rate. The window size was held fixed at $W = 5$ throughout, the value at which the warm-start variant produced the strongest performance in the previous experiments. \begin{figure}[t] \centering \begin{subfigure}{0.48\textwidth} \centering \hspace*{-7mm} \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Physical error rate}, ylabel={Per-round-LER}, % extra description/.code={ % \node[rotate=90, anchor=south] % at ([xshift=10mm]current axis.east) % {Warm s. (---), Cold s. (- - -)}; % }, ] \foreach \F/\col/\mark in {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} { \edef\temp{\noexpand \addplot+[ mark=\mark, densely dashed, mark options={fill=\col}, \col, forget plot ] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_False/F_\F/W_5/LERs.csv}; } \temp } \foreach \F/\col/\mark in {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeMinSumDecoder/max_iter_200/pass_soft_info_True/F_\F/W_5/LERs.csv}; } \temp \addlegendentryexpanded{$F = \F$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \label{fig:bp_f_over_p} \end{subfigure}% \hfill% \begin{subfigure}{0.48\textwidth} \centering \hspace*{-27mm} \begin{tikzpicture} \def\spyxmin{32} \def\spyxmax{512} \def\spyymin{5e-3} \def\spyymax{5e-2} \newcommand{\plotcurvesb}{% \foreach \F/\col/\mark in {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} { \edef\temp{\noexpand \addplot+[mark=\mark, densely dashed, forget plot, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_False/F_\F/W_5/LERs.csv}; } \temp } \foreach \F/\col/\mark in {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col, forget plot] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeMinSumDecoder/p_0.0025/pass_soft_info_True/F_\F/W_5/LERs.csv}; } \temp } } \begin{axis}[ name=main, width=8cm, height=6cm, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-3, ymax=1e-1, grid=both, legend pos = north east, xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096}, xticklabels = {$32$, $512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Number of BP iterations}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \plotcurvesb \addlegendimage{KITorange, mark=square*} \addlegendentry{$F = 1$} \addlegendimage{KITblue, mark=diamond*} \addlegendentry{$F = 2$} \addlegendimage{KITred, mark=triangle*} \addlegendentry{$F = 3$} \node[draw=black, fit={(axis cs:\spyxmin,\spyymin) (axis cs:\spyxmax,\spyymax)}, inner sep=0pt, name=spybox] {}; \end{axis} \begin{axis}[ name=inset, at={(main.north west)}, anchor=south, xshift=-6mm, yshift=6mm, width=6.5cm, height=4.875cm, ymode=log, enlargelimits=false, xmin=\spyxmin, xmax=\spyxmax, ymin=\spyymin, ymax=\spyymax, xtick={32,128,256,512}, yticklabels={\empty}, xticklabels={\empty}, grid=both, axis background/.style={fill=white}, ] \plotcurvesb \end{axis} \draw (spybox.north east) -- (inset.south east); \end{tikzpicture} \vspace{-3.2mm} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \label{fig:bp_f_over_iter} \end{subfigure} \caption{ \red{\lipsum[2]} } \label{fig:bp_f} \end{figure} % [Experimental parameters] Figure 4.9 \Cref{fig:bp_f} summarizes the results of this investigation. In both panels the dashed colored curves correspond to cold-start sliding-window decoding for $F \in \{1, 2, 3\}$ and the solid colored curves to the corresponding warm-start runs. The window size is fixed to $W = 5$ throughout. \Cref{fig:bp_f_over_p} sweeps the physical error rate over $p \in [0.001, 0.004]$ in steps of $0.0005$ at a fixed maximum of $n_\text{iter} = 200$ \ac{bp} iterations per window invocation, mirroring the experimental setup of \Cref{fig:whole_vs_cold_vs_warm}. \Cref{fig:bp_f_over_iter} fixes the physical error rate at $p = 0.0025$ and sweeps the iteration budget over $n_\text{iter} \in \{32, 128, 256, 512, 1024, 2048, 4096\}$, mirroring the setup of \Cref{fig:bp_w_over_iter} and again including an inset that magnifies the low-iteration regime $n_\text{iter} \in [32, 512]$. % [Description] Figure 4.9 In \Cref{fig:bp_f_over_p}, every curve exhibits the expected monotonic increase of the per-round \ac{ler} with the physical error rate. At fixed $F$, the warm-start approach lies below cold-start across the entire sweep, and at fixed warm- or cold-start, smaller $F$ produces a lower \ac{ler}. Both gaps grow as the physical error rate decreases: the curves at $F = 1$ separate further from those at $F = 2$ and $F = 3$, and the warm-start curves separate further from the cold-start ones. In \Cref{fig:bp_f_over_iter}, all six curves again decrease monotonically with the iteration budget, with no clear saturation even at $n_\text{iter} = 4096$. Lower $F$ yields a lower \ac{ler} throughout, and warm-start consistently outperforms cold-start at matching $F$. At $n_\text{iter} = 32$, all three cold-start curves coincide at roughly the same per-round \ac{ler}, while the warm-start curves are visibly spread out. Furthermore, the magnified plot confirms that the gap between warm- and cold-start curves at fixed $F$ shrinks as $n_\text{iter}$ grows, and that at fixed $n_\text{iter}$ this gap is largest for $F = 1$. % [Interpretation] Figure 4.9 The observed dependence on the step size mirrors the dependence on the window size studied earlier and the same explanation applies. With $W$ held fixed, decreasing $F$ enlarges the overlap between consecutive windows from $W - F$ to $W - F + 1$ syndrome measurement rounds, so a smaller step size is beneficial for the same reason that a larger window size is: each \ac{vn} in an overlap region participates in more window invocations, and the warm-start modification effectively accumulates iterations on it across these invocations. The widening of the warm/cold gap towards low iteration counts and low physical error rates similarly mirrors the patterns already observed in \Cref{fig:whole_vs_cold_vs_warm,fig:bp_w_over_iter}. In contrast to the window size $W$, the step size $F$ has no effect on decoding latency. The time at which the inner decoder for a given window can begin decoding is determined solely by when the syndromes for the rounds covered by that window have been collected, which is independent of how much the window overlaps with its predecessor. Similarly, assuming the decoder is fast enough to keep up with the incoming syndrome measurements corresponding to the \acp{cn} of subsequent windows, the time at which decoding is complete depends only on the amount of time spent on decoding the very last window. A smaller $F$ thus only costs additional total compute and not additional latency, which is favorable for a warm-start sliding-window implementation. This is especially favorable for our warm-start modification, as it works best where the overlap is largest, i.e., for low values of $F$. % Conclusion of BP investigation We conclude our investigation into the performance of warm-start sliding-window decoding under plain \ac{bp} by summarizing our findings. The warm-start modification raises the number of \ac{bp} iterations effectively spent on the \acp{vn} in an overlap region by reusing the messages from the previous window invocation instead of restarting from scratch. This explains why decoding performance improved monotonically with the size of the overlap, and consequently why both larger window sizes $W$ and smaller step sizes $F$ yielded lower per-round \acp{ler}. The warm-start gain over cold-start was most pronounced at low per-window iteration budgets, % and at low physical error rates, the % regimes the regime in which each additional iteration carries proportionally more information. Additionally, we would like to note that the warm-start modification incurs no computational cost relative to cold-start decoding. It changes neither the decoding latency nor the total compute, since both schemes process the same windows for the same number of iterations and differ only in the initialization of the \ac{bp} messages of each new window. We also observed that plain \ac{bp} did not saturate even at $4096$ iterations, which we attribute to the short cycles in the underlying Tanner graph. This motivates the next subsection, in which we replace the inner \ac{bp} decoder by its guided-decimation variant. %%%%%%%%%%%%%%%% \subsection{Belief Propagation with Guided Decimation} \label{subsec:Belief Propagation with Guided Decimation} % [Thread] Intro to BPGD + Local experimental setup We now turn to \ac{bpgd} as the inner decoder, in order to address the convergence issues of plain \ac{bp} on \ac{qec} codes. For the underlying \ac{bp} step we use the \ac{spa} variant rather than the min-sum approximation employed in \Cref{subsec:Belief Propagation}, since this made the implementation of the guided decimation more straightforward. \begin{figure}[t] \centering \hspace*{-6mm} \begin{subfigure}{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Physical error rate}, ylabel={Per-round-LER}, % extra description/.code={ % \node[rotate=90, anchor=south] % at () % {Warm s. (---), Cold s. (- - -)}; % }, ] \foreach \W/\col/\mark in {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} { \edef\temp{\noexpand \addplot+[ mark=\mark, densely dashed, mark options={fill=\col}, \col, forget plot ] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp } \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoderPassDecimation/max_iter_5000/pass_soft_info_True/F_1/W_\W/LERs.csv}; } \temp \addlegendentryexpanded{$W = \W$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \label{fig:bpgd_w} \end{subfigure}% \hfill% \begin{subfigure}{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,0.002,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Physical error rate}, yticklabels={\empty}, % ylabel={Per-round-LER}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \foreach \F/\col/\mark in {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} { \edef\temp{\noexpand \addplot+[ mark=\mark, densely dashed, mark options={fill=\col}, \col, forget plot ] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_\F/W_5/LERs.csv}; } \temp } \foreach \F/\col/\mark in {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoderPassDecimation/max_iter_5000/pass_soft_info_True/F_\F/W_5/LERs.csv}; } \temp \addlegendentryexpanded{$F = \F$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \label{fig:bpgd_f} \end{subfigure} \caption{ \red{\lipsum[2]} } \label{fig:bpgd_wf} \end{figure} % [Experimental parameters] Figure 4.10 \Cref{fig:bpgd_wf} shows the per-round \ac{ler} of \ac{bpgd} sliding-window decoding as a function of the physical error rate. In both panels the dashed curves correspond to cold-start sliding-window decoding and the solid curves to the corresponding warm-start decoding, where the warm start carries over both the \ac{bp} messages and the decimation information of the overlap region as described in \Cref{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding}. The maximum number of inner \ac{bp} iterations was set to $n_\text{iter} = 5000$. This value was chosen to be at least as large as the number of \acp{vn} in any single window, since with one \ac{bp} iteration between consecutive decimations ($T = 1$ in the notation of \Cref{alg:bpgd}) this is the maximum number of inner iterations that can occur before every \ac{vn} in the window has been decimated. A preliminary investigation showed that \ac{bpgd} only delivers its intended performance gain once most \acp{vn} have actually been decimated, which motivated this choice. The physical error rate was swept from $p = 0.001$ to $p = 0.004$ in steps of $0.0005$. \Cref{fig:bpgd_w} sweeps over the window size with $W \in \{3, 4, 5\}$ at fixed step size $F = 1$, and \Cref{fig:bpgd_f} sweeps over the step size with $F \in \{1, 2, 3\}$ at fixed window size $W = 5$. % [Description] Figure 4.10 In both panels, every curve again exhibits the expected monotonic increase of the per-round \ac{ler} with the physical error rate. Across both panels and across all parameter choices, the warm-start curves lie above the corresponding cold-start curves, i.e., the warm-start variant performsworse than its cold-start counterpart. This is the opposite of what we observed for plain \ac{bp}, where warm-start improved upon cold-start at every parameter setting. The gap between the warm- and cold-start curves additionally widens as the physical error rate decreases: at the lowest sampled rate $p = 0.001$, the per-round \ac{ler} of the warm-start runs is more than two orders of magnitude above that of the corresponding cold-start runs. In \Cref{fig:bpgd_w}, larger window sizes yield lower per-round \acp{ler} for both warm- and cold-start, and the spacing between the cold-start curves shrinks as $W$ grows. In \Cref{fig:bpgd_f}, the cold-start curves follow the previously seen ordering with $F = 1$ at the bottom and $F = 3$ at the top. The warm-start curves, however, exhibit the opposite ordering: $F = 1$ now yields the highest per-round \ac{ler}, $F = 2$ lies below it, and $F = 3$ is the lowest of the three warm-start curves. % [Interpretation] Figure 4.10 The fact that warm-start sliding-window decoding now performs worse than its cold-start counterpart is surprising in light of the results for plain \ac{bp}, where the warm-start modification was uniformly beneficial. The dependence on the window size in \Cref{fig:bpgd_w} is, on its own, consistent with the same explanation that we gave for \Cref{fig:whole_vs_cold}: larger windows expose the inner decoder to a larger fraction of the constraints encoded in the detector error matrix at the time of decoding, and this benefits both warm- and cold-start decoding. The dependence on the step size in \Cref{fig:bpgd_f}, however, is the opposite of the corresponding dependence under plain \ac{bp} (\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now hurts rather than helps, even though smaller $F$ implies a larger overlap in both cases. This inversion provides the clue to what is going wrong. Recall from \Cref{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding} that the warm start for \ac{bpgd} carries over not only the \ac{bp} messages on the edges of the overlap region but also the decimation information. Because we run with an iteration budget large enough to decimate every \ac{vn} in a window, by the time window $\ell$ ends, all of its \acp{vn} have already been hard-decided. For the \acp{vn} that lie in the overlap region with window $\ell + 1$ this hard decision is then carried into the next window through the warm-start initialization, and the next window thus begins decoding with a substantial fraction of its \acp{vn} already frozen, before its own parity checks have had any chance to influence the corresponding bit estimates. This identifies one of two competing effects on the warm-start performance. The larger the overlap, the more such prematurely frozen \acp{vn} the next window inherits, which hurts performance. On the other hand, a larger window still exposes the inner decoder to a larger set of constraints, which helps performance. The two effects together are consistent with what we observe in \Cref{fig:bpgd_wf}. Increasing $W$ at fixed $F$ enlarges both the overlap and the window itself, and the benefit due to the larger $W$ dominates. Decreasing $F$ at fixed $W$, by contrast, enlarges only the overlap without enlarging the window, so the freezing effect is no longer offset and warm-start performance worsens with smaller $F$. \begin{figure}[t] \centering \hspace*{-6mm} \begin{subfigure}{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, % xmode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-3, ymax=1e-1, grid=both, legend pos = south west, xtick={32,512,1024,2048,4096}, % xtick={0.001,0.0015,...,0.004}, xticklabels = {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$}, xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Number of BP iterations}, ylabel={Per-round-LER}, % extra description/.code={ % \node[rotate=90, anchor=south] % at ([xshift=10mm]current axis.east) % {Warm s. (---), Cold s. (- - -)}; % }, ] \foreach \W/\col/\mark in {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} { \edef\temp{\noexpand \addplot+[mark=\mark, densely dashed, forget plot, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp } \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_True/F_1/W_\W/LERs.csv}; } \temp \addlegendentryexpanded{$W = \W$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \end{subfigure}% \hfill% \begin{subfigure}{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, % xmode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-3, ymax=1e-1, grid=both, legend pos = south west, xtick={32,512,1024,2048,4096}, % xtick={0.001,0.0015,...,0.004}, xticklabels = {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$}, xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Number of BP iterations}, % ylabel={Per-round-LER}, yticklabels={\empty}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \foreach \F/\col/\mark in {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} { \edef\temp{\noexpand \addplot+[mark=\mark, densely dashed, forget plot, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_False/F_\F/W_5/LERs.csv}; } \temp } \foreach \F/\col/\mark in {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoderPassDecimation/p_0.0025/pass_soft_info_True/F_\F/W_5/LERs.csv}; } \temp \addlegendentryexpanded{$F = \F$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \end{subfigure} \caption{ \red{\lipsum[2]} } \end{figure} \begin{figure}[t] \centering \hspace*{-6mm} \begin{subfigure}{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Physical error rate}, ylabel={Per-round-LER}, % extra description/.code={ % \node[rotate=90, anchor=south] % at ([xshift=10mm]current axis.east) % {Warm s. (---), Cold s. (- - -)}; % }, ] \foreach \W/\col/\mark in {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} { \edef\temp{\noexpand \addplot+[ mark=\mark, densely dashed, mark options={fill=\col}, \col, forget plot ] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp } \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_True/F_1/W_\W/LERs.csv}; } \temp \addlegendentryexpanded{$W = \W$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \end{subfigure}% \hfill% \begin{subfigure}{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-5, ymax=2e-1, grid=both, legend pos = south east, xtick={0.001,0.0015,0.002,...,0.004}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Physical error rate}, yticklabels={\empty}, % ylabel={Per-round-LER}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \foreach \F/\col/\mark in {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} { \edef\temp{\noexpand \addplot+[ mark=\mark, densely dashed, mark options={fill=\col}, \col, forget plot ] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_False/F_\F/W_5/LERs.csv}; } \temp } \foreach \F/\col/\mark in {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=physical_p, y=LER_per_round, ] {res/sim/WF/WindowingSyndromeSpaGdDecoder/max_iter_5000/pass_soft_info_True/F_\F/W_5/LERs.csv}; } \temp \addlegendentryexpanded{$F = \F$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \end{subfigure} \caption{ \red{\lipsum[2]} } \end{figure} \begin{figure}[t] \centering \hspace*{-6mm} \begin{subfigure}{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, % xmode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-3, ymax=1e-1, grid=both, legend pos = north east, xtick={32,512,1024,2048,4096}, % xtick={0.001,0.0015,...,0.004}, xticklabels = {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$}, xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Number of BP iterations}, ylabel={Per-round-LER}, % extra description/.code={ % \node[rotate=90, anchor=south] % at ([xshift=10mm]current axis.east) % {Warm s. (---), Cold s. (- - -)}; % }, ] \foreach \W/\col/\mark in {3/KITred/triangle,4/KITblue/diamond,5/KITorange/square} { \edef\temp{\noexpand \addplot+[mark=\mark, densely dashed, forget plot, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_False/F_1/W_\W/LERs.csv}; } \temp } \foreach \W/\col/\mark in {3/KITred/triangle*,4/KITblue/diamond*,5/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_True/F_1/W_\W/LERs.csv}; } \temp \addlegendentryexpanded{$W = \W$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \end{subfigure}% \hfill% \begin{subfigure}{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[ width=8cm, height=6cm, ymode=log, % xmode=log, legend style={ cells={anchor=west}, cells={align=left}, }, enlargelimits=false, ymin=1e-3, ymax=1e-1, grid=both, legend pos = north east, xtick={32,512,1024,2048,4096}, % xtick={0.001,0.0015,...,0.004}, xticklabels = {$32$,$512$,$1{,}024$,,$2{,}048$,,$3{,}072$,,$4{,}096$}, xtick={32, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096}, xticklabel style={/pgf/number format/fixed}, xticklabel style={/pgf/number format/precision=4}, x tick label style={rotate=45, anchor=north east, inner sep=1mm}, scaled x ticks=false, xlabel={Number of BP iterations}, % ylabel={Per-round-LER}, yticklabels={\empty}, extra description/.code={ \node[rotate=90, anchor=south] at ([xshift=10mm]current axis.east) {Warm s. (---), Cold s. (- - -)}; }, ] \foreach \F/\col/\mark in {3/KITred/triangle,2/KITblue/diamond,1/KITorange/square} { \edef\temp{\noexpand \addplot+[mark=\mark, densely dashed, forget plot, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_False/F_\F/W_5/LERs.csv}; } \temp } \foreach \F/\col/\mark in {3/KITred/triangle*,2/KITblue/diamond*,1/KITorange/square*} { \edef\temp{\noexpand \addplot+[mark=\mark, solid, mark options={fill=\col}, \col] table[ col sep=comma, x=max_iter, y=LER_per_round, ] {res/sim/max_iter/WindowingSyndromeSpaGdDecoder/p_0.0025/pass_soft_info_True/F_\F/W_5/LERs.csv}; } \temp \addlegendentryexpanded{$F = \F$} } \end{axis} \end{tikzpicture} \caption{\red{Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt}} \end{subfigure} \caption{ \red{\lipsum[2]} } \end{figure}