Include claude corrections for first 5 pages of decoding chapter

This commit is contained in:
2026-05-02 09:01:19 +02:00
parent 606d68e2c1
commit 15190ccf48

View File

@@ -13,14 +13,13 @@ under \acp{dem}.
We investigate decoding \acf{qldpc} codes under \acp{dem} in particular.
We focus on \ac{qldpc} codes, as they have emerged as leading
candidates for practical quantum error correction, offering the
ability to encode more logical qubits per physical qubit than surface
codes while maintaining favorable threshold properties
candidates for practical quantum error correction, offering
comparable thresholds with substantially improved encoding rates
\cite[Sec.~1]{bravyi_high-threshold_2024}.
Because of this, the decoding algorithms we consider will all be
related to \acf{bp} in some way.
Our aim is to build a fault-tolerant \ac{qec} system that works well
even under consideration of circuit-level noise.
even in the presence of circuit-level noise.
We must overcome two main challenges to achieve this.
First, recall the problems related to degeneracy, which is inherent
@@ -29,21 +28,21 @@ Because multiple minimum-weight codewords exist, the \ac{bp}
algorithm becomes uncertain of the direction to proceed in.
Additionally, the commutativity conditions of the stabilizers
necessitate the existence of short cycles.
These two aspects together lead to substantial convergence problems
of \ac{bp} for quantum codes, when it is used on it's own.
Together, these two aspects lead to substantial convergence problems
of \ac{bp} for quantum codes, when it is used on its own.
Second, the consideration of circuit-level noise introduces many more
error locations into the circuit.
Using \acp{dem}, we construct a new circuit code and model each of
these error locations as a new \acf{vn}.
We also perform multiple rounds of syndrome measuremetns,
We also perform multiple rounds of syndrome measurements,
exacerbating the problem.
This leads to a massively increased computational complexity and
latency of the decoding process.
In our experiments using the $\llbracket 144,12,12 \rrbracket$
\acf{bb} code with $12$ syndrome measurement rounds, for example, the
number of \acp{vn} was increased from $144$ to $9504$, and the
number of \acfp{cn} was increased from $72$ to $1008$.
number of \acp{vn} grew from $144$ to $9504$, and the
number of \acfp{cn} grew from $72$ to $1008$.
The first problem is not inherent to \acp{dem} or fault-tolerance,
but rather quantum codes in general.
@@ -53,23 +52,25 @@ The most popular approach is combining a few initial
iterations of \ac{bp} with a second decoding algorithm, \ac{osd}
\cite{roffe_decoding_2020}.
Other approaches exist, such as \ac{aed}
\cite{koutsioumpas_automorphism_2025}, were multiple variations of
\cite{koutsioumpas_automorphism_2025}, where multiple variations of
the code are decoded simultaneously to increase the chances of convergence.
Here, we will focus on the \acf{bpgd} algorithm
\cite{yao_belief_2024} we already introduced in \Cref{ch:Fundamentals},
for reasons that will become clear later in the chapter.
The second problem is inherent to decoding using \acp{dem}.
This is an area that has been less studied.
This is an area that has received less attention.
As we saw in \Cref{sec:Quantum Error Correction}, for \ac{qec},
latency is the main constraint, not raw computational complexity.
The main way this is addressed in the literature is \emph{sliding
window decoding}, which attempts to divide the overall decoding
problem into many smaller ones that can be solved more efficiently.
% TODO: This could potentially be abit more text (e.g., go into
% TODO: This could potentially be a bit more text (e.g., go into
% SC-LDPC like structure that serves as the inspiration for the
% warm-start decoding. Or just go into warm-start decoding)
Our own work will focus mostly on the the solution of the second
problem using sliding-window decoding.
We will start by briefly reviewing the existing work related to
sliding-window decoding,
before focusing on one specific realization.
@@ -78,7 +79,7 @@ perform numerical simulations to evaluate it.
% and reducing latency is the main goal of the existing literature.
% This is generally done using windowing approaches; either
% sliding-window based, where the latency is reduced due an earlier
% sliding-window based, where the latency is reduced due to an earlier
% start to the decoding process \cite{kuo_fault-tolerant_2024}%
% \cite{huang_improved_2023}\cite{huang_increasing_2024}\cite{gong_toward_2024},
% or by decoding multiple windows in parallel
@@ -202,21 +203,21 @@ Each of these windows is then decoded separately.
related to sliding-window decoding.
The papers \cite{huang_improved_2023} and \cite{huang_increasing_2024} are
lumped together, as they share the same content;
one is simply preprint published earlier.
one is simply a preprint published earlier.
We will only refer to \cite{huang_increasing_2024} in the following.
\cite{kang_quits_2025} is somewhat special in that the authors focus
more on the introduction of a new simluator framework they call
more on the introduction of a new simulator framework they call
QUITS, rather than the performance of sliding-window decoding itself.
\cite{gong_toward_2024} and \cite{kang_quits_2025} have made their
software freely available online%
\footnote{
https://github.com/mkangquantum/quits
\url{https://github.com/mkangquantum/quits}
}%
\footnote{
https://github.com/gongaa/SlidingWindowDecoder
\url{https://github.com/gongaa/SlidingWindowDecoder}
}.
A final thing to note is that \cite{dennis_topological_2002} never
explicitly mention sliding windows, they call their scheme
explicitly mentions sliding windows; the authors call their scheme
``overlapping recovery''.
% Topological vs QLDPC
@@ -227,7 +228,7 @@ Most of the work on topological codes has treated surface codes,
with the exception of \cite{kuo_fault-tolerant_2024} where toric
codes were considered.
With regard to \ac{qldpc} codes, in \cite{huang_increasing_2024}
they examine \emph{hypergraph product} (\acs{hgp}) and
the authors examine \emph{hypergraph product} (\acs{hgp}) and
\emph{lifted-product} (\acs{lp}) codes.
HGP codes are constructed from the product of two classical codes,
while LP codes generalize this construction by additionally applying
@@ -237,7 +238,7 @@ are additionally considered.
Like HGP codes, BPC codes are derived from a product construction,
but exploit an additional symmetry to yield fewer physical qubits for
the same code parameters.
Finally, in \cite{gong_toward_2024} the authors explore \ac{bb} codes.
Finally, \cite{gong_toward_2024} explores \ac{bb} codes.
% Sequential vs parallel
@@ -246,14 +247,14 @@ arises of how exactly to realize the decoding.
There are two main approaches, with differing mechanisms of reducing
the latency.
Some papers decode the sliding windows in a parallel fashion.
The benefit in this case is the option to more effectively utilize
classical hardware for decoding.
The benefit in this case is
is that classical hardware can be utilized more effectively.
Others choose a sequential approach.
Here, decoding can start earlier, as there is no need to wait for the
syndrome measurements of all windows before beginning with the decoding.
With the exception of \cite{dennis_topological_2002}, literature
treating topological codes has mostly focused on parallel decoding
while literature treating \ac{qldpc} codes has wholely considered
while literature treating \ac{qldpc} codes has wholly considered
sequential decoding.
% Deep-dive into QLDPC methods
@@ -267,20 +268,21 @@ As we noted above, \ac{hgp} and \ac{lp} codes are considered in
\ac{hgp}, \ac{lp} and \ac{bpc} codes are considered in \cite{kang_quits_2025},
and \ac{bb} codes are considered in \cite{gong_toward_2024}.
The employed noise models also differ;
\cite{huang_increasing_2024} use phenomenological noise, while
\cite{huang_increasing_2024} uses phenomenological noise, while
\cite{gong_toward_2024} and \cite{kang_quits_2025} use circuit-level noise.
Finally, \cite{gong_toward_2024} introduce their own variation of
Finally, in \cite{gong_toward_2024} the authors introduce their own variation of
\ac{bpgd}, \ac{bp} with \ac{gdg}, while \cite{huang_increasing_2024}
and \cite{kang_quits_2025} use \ac{bp} + \ac{osd}.
We would additionally like to note that only in
\cite{gong_toward_2024} and \cite{kang_quits_2025} do the authors
\cite{gong_toward_2024} and \cite{kang_quits_2025}
explicitly work with the \ac{dem} formalism.
\renewcommand{\arraystretch}{1.1}
\setlength{\tabcolsep}{12pt}
\begin{table}[t]
\centering
\caption{Experimental conditions for papers related to \ac{qldpc} codes.}
\caption{Experimental conditions in the literature on
sliding-window decoding for \ac{qldpc} codes.}
\vspace*{3mm}
\label{table:experimental_conditions}
\begin{tabular}{l|ccc}
@@ -381,14 +383,14 @@ explicitly work with the \ac{dem} formalism.
In this section, we will examine the methodology by which a detector
error matrix is divided into overlapping windows.
The algorithm detailed here follows \cite{kang_quits_2025}, whose
work is in turn based on \cite{huang_increasing_2024}.
The algorithm detailed here follows \cite{kang_quits_2025}, which
is in turn based on \cite{huang_increasing_2024}.
% Very high-level overview
Sliding-window decoding is made possible by the time-like structure
of the syndrome extraction circuitry.
This is epecially clearly visible under the \ac{dem} formalism, where
This is especially clearly visible under the \ac{dem} formalism, where
this manifests as a block-diagonal structure of the detector
error matrix $\bm{H}$.
Note that this presupposes a choice of detectors as seen in
@@ -396,7 +398,7 @@ Note that this presupposes a choice of detectors as seen in
This block-diagonal structure introduces some locality in the
interdependence between \acp{vn} and \acp{cn}.
For each local set of \acp{vn}, there is only a local set of connected \acp{cn}.
We exploit this fact by cutting the matrix into overlapping windows.
We exploit this fact by partitioning the matrix into overlapping windows.
\Cref{fig:windowing_pcm} depicts this process using the $\llbracket
72, 6, 6 \rrbracket$ BB code as an example.
@@ -404,9 +406,9 @@ We exploit this fact by cutting the matrix into overlapping windows.
How the locality is leveraged can be understood by considering the
decoding process.
After decoding a window, there is a subset of \acp{cn} that no longer
contribute to decoding, as they are not connected to any \acp{vn}
considered for the subsequent windows.
After decoding a window, there is a subset of \acp{cn} that
no longer contribute to decoding, since none of their
neighboring \acp{vn} appear in subsequent windows.
We call the set of \acp{vn} connected to those \acp{cn} the
\emph{commit region} and we wish to commit them before moving to the
next window, i.e., fix the values we estimate for the corresponding bits.
@@ -419,13 +421,13 @@ measurements for the first window are complete.
There are two degrees of freedom in how we perform the windowing.
The \emph{window size} $W \in \mathbb{N}$ represents the number of
syndrome extraction rounds lumped into one window.
The \emph{step size} $F \in \mathbb{N}$ represents the number of
syndrome extraction rounds passed over before starting the next window.
syndrome extraction rounds lumped into one window, while
the \emph{step size} $F \in \mathbb{N}$ represents the number of
syndrome extraction rounds skipped before starting the next window.
$W$ controls the size of the windows while $F$ controls the overlap
between them.
As illustrated in \Cref{fig:windowing_pcm}, $W$ and $F$ control the
window dimensions and locactions by defining the related \acp{cn},
window dimensions and locations by defining the related \acp{cn},
not the \acp{vn}.
This is because while the number of overall \acp{cn} is only affected
by the choice of the underlying code and the number of syndrome
@@ -462,9 +464,22 @@ and is difficult to predict beforehand.
\vspace*{10mm}
\caption{
Visualization of the windowing process on a detector
error matrix generated from the $\llbracket 72, 6, 6
\rrbracket$ BB code.
Visualization of the windowing process on a detector error
matrix generated from the $\llbracket 72, 6, 6 \rrbracket$
BB code under circuit-level noise.
The block-diagonal structure reflects the time-like locality
of the syndrome extraction circuit., with each block
corresponding to one syndrome measurement round.
Two consecutive windows are highlighted: the window size $W$
controls the number of syndrome rounds included in each
window, while the step size $F$ controls how many rounds
separate the start of one window from the next.
The bracketed region indicates the commit
region of the first window, i.e., the \acp{vn} that are committed
before moving to the second window.
% Visualization of the windowing process on a detector
% error matrix generated from the $\llbracket 72, 6, 6
% \rrbracket$ BB code.
}
\label{fig:windowing_pcm}
\end{figure}
@@ -476,9 +491,9 @@ We use the variables $n,m \in \mathbb{N}$ to describe the number of
\acp{vn} and \acp{cn} respectively.
We index the \acp{vn} using the variable $i \in \mathcal{I} :=
[0:n-1]$ and the \acp{cn} using the variable $j \in \mathcal{J} := [ 0 : m-1]$.
Finally, we call $\mathcal{N}_\text{V}(i) = \left\{ i\in \mathcal{I}:
\bm{H}_{j,i} = 1 \right\}$ and $\mathcal{N}_\text{C}(j) := \left\{ j
\in \mathcal{J} : \bm{H}_{j,i} = 1 \right\}$ the neighborhoods of the
Finally, we call $\mathcal{N}_\text{V}(i) = \left\{ j\in \mathcal{J}:
\bm{H}_{j,i} = 1 \right\}$ and $\mathcal{N}_\text{C}(j) := \left\{ i
\in \mathcal{I} : \bm{H}_{j,i} = 1 \right\}$ the neighborhoods of the
corresponding nodes.
In this case, we take $\bm{H} \in \mathbb{F}_2^{m\times n}$ to be the
check matrix of the underlying code, from which the \ac{dem} was generated.
@@ -493,7 +508,7 @@ where $n_\text{win} \in \mathbb{N}$ is the number of windows.
Because we defined the step size $F$ as the number of syndrome
extraction rounds to skip, the first \ac{cn} of window $\ell$ should have index
$\ell F m$.
Similarly, because of the way we defined the step size $W$, the
Similarly, because of the way we defined the window size $W$, the
number of \acp{cn} should be $Wm$ for all but the last window.
The number of \acp{cn} in the last window may differ if there are
not enough \acp{cn} left to completely fill it.
@@ -511,8 +526,8 @@ We thus define
$\mathcal{J}_\text{win}^{(\ell)}$ is the set of all \acp{cn} in the
window while $\mathcal{J}_\text{commit}^{(\ell)}$ is the set of \acp{cn}
that do not contribute to the next window and whose neighboring
\acp{vn} will thus be comitted.
We can additionally define the set of \acp{vn} that are shared between windows
\acp{vn} will thus be committed.
We can additionally define the set of \acp{cn} that are shared between windows
$\ell$ and $\ell + 1$ as $\mathcal{J}_\text{overlap}^{(\ell)} :=
\mathcal{J}_\text{win}^{(\ell)}\setminus \mathcal{J}_\text{commit}^{(\ell)}$.
@@ -540,7 +555,7 @@ The commit region of window $\ell$ should include all of the \acp{vn}
neighboring any of the \acp{cn} in $\mathcal{J}_\text{commit}^{(\ell)}$.
Consequently, the maximum index of the \acp{vn} we consider should be
$i_\text{max}(\mathcal{J}_\text{commit}^{(\ell)})$.
Additionally, the set of \acp{vn} comitted in the next window should
Additionally, the set of \acp{vn} committed in the next window should
start immediately afterwards.
We thus define
\begin{align*}
@@ -634,17 +649,32 @@ and after decoding all windows we will therefore have committed all \acp{vn}.
};
\end{tikzpicture}
\caption{Visual representation of notation used for window splitting.}
\caption{
Visual representation of the index sets used to define a sliding window.
The solid box delimits the rows ($\mathcal{J}_\text{win}^{(\ell)}$)
and columns ($\mathcal{I}_\text{win}^{(\ell)}$) of the detector
error matrix considered when decoding window $\ell$, while the
dashed box shows the analogous region for window $\ell + 1$.
The shaded region marks the submatrix
$\bm{H}_\text{overlap}^{(\ell)}$, whose rows correspond to the
overlap CNs $\mathcal{J}_\text{overlap}^{(\ell)} =
\mathcal{J}_\text{win}^{(\ell)} \setminus
\mathcal{J}_\text{commit}^{(\ell)}$ shared with the next window,
and whose columns correspond to the committed VNs
$\mathcal{I}_\text{commit}^{(\ell)}$.
After decoding window $\ell$, this submatrix is used to update
the syndrome of the overlap CNs based on the committed bit estimates.
}
\label{fig:vis_rep}
\end{figure}
% Syndrome update
\Cref{fig:vis_rep} illustrates the meaning of the various sets of nodes.
We can also see a particular point we have to be careful about when
We can also see a subtlety we must handle carefully when
moving on to decode the next window.
While the \acp{vn} in $\mathcal{J}_\text{commit}^{(\ell)}$ have no
bearing on the further decoding process, the values commit for the
bearing on the further decoding process, the values commited for the
\acp{vn} in $\mathcal{I}_\text{commit}^{(\ell)}$ do.
This is the case because these \acp{vn} have neighboring \acp{cn} in
the next window.
@@ -656,7 +686,7 @@ $\bm{H}_\text{overlap}^{(\ell)} =
We have to account for this fact by updating the syndrome $\bm{s}$
based on the committed bit values.
Specifically, if $\bm{e}_\text{commit}^{(\ell)}$ describes the error
estimates commited after decoding window $\ell$, we have to set
estimates committed after decoding window $\ell$, we have to set
\begin{align*}
\bm{s}_{\mathcal{J}_\text{overlap}^{(\ell)}} =
\bm{H}_\text{overlap}^{(\ell)}
@@ -671,7 +701,7 @@ estimates commited after decoding window $\ell$, we have to set
% Intro
The sliding-window structure visible in \Cref{fig:windowing_pcm} is
highly reminicent of the way \ac{sc}-\ac{ldpc} codes are decoded.
highly reminiscent of windowed decoding for \ac{sc}-\ac{ldpc} codes.
Switching our viewpoint to the Tanner graph depicted in
\Cref{fig:windowing_tanner}, however, we can see an important
difference between \ac{sc}-\ac{ldpc} decoding and the
@@ -690,6 +720,15 @@ we perform a \emph{warm start} by initializing the messages in the
overlapping region to the values last held during the decoding of the
previous window.
\content{callback to intro: explain why we consider BPGD instead of,
e.g., BP+OSD}
\content{Explain why we expect a warm start to be beneficial}
\content{Mention that our own work ties into the bottom category in
\Cref{fig:literature}}
\content{Explicitly state that $\mathcal{I}_\text{win}^{(\ell)}$
overlaps with $\mathcal{I}_\text{win}^{(\ell + 1)}$, and that this is
where the warm start applies}
\begin{figure}[t]
\centering
@@ -793,11 +832,13 @@ previous window.
\end{figure}
%%%%%%%%%%%%%%%%
\subsection{Warm-Start Belief Propagation Decoding}
\label{subsec:Warm-Start Belief Propagation Decoding}
\subsection{Belief Propagation}
\label{subsec:Warm-Start Belief Propagation}
% Warm-Start decoding for BP
\content{Explicitly name messages passed (${L_{j\leftarrow i} : i \in
\ldots, j\in \ldots}$)}
\content{Pass messages to next window}
\content{(?) Explicitly mention initialization using only CN->VN
messages + swapping of CN and VN update?}
@@ -926,7 +967,7 @@ messages + swapping of CN and VN update?}
\end{figure}
%%%%%%%%%%%%%%%%
\subsection{Warm-Start Belief Propagation with Guided Decimation Decoding}
\subsection{Belief Propagation with Guided Decimation}
\label{subsec:Warm-Start Belief Propagation with Guided Decimation Decoding}
% Warm-Start decoding for BPGD
@@ -1073,7 +1114,7 @@ messages, pass decimation info}
\section{Numerical Results}
\label{sec:Numerical Results}
% Intro
% Simulation setup
In this section, we perform numerical experiments to evaluate the
modification to sliding-window decoding we introduced in
@@ -1093,6 +1134,10 @@ of the per-round \ac{ler} as defined in
All datapoints have been generated by simulating at least $200$
logical error events.
\content{Mention the number of syndrome extraction rounds}
% Software stack: Layer 1
For the practical aspects of implementation, several layers of
abstraction must be considered.
The lowest layer is the circuit-level simulator.
@@ -1101,21 +1146,29 @@ quantum mechanical aspects of the system, including the modeling of
noise on gates, idling qubits, and measurements according to the
chosen noise model.
% Software stack: Layer 2
Moving one level of abstraction higher, the syndrome extraction
circuit itself must be generated.
This entails constructing the full circuit, including the ancilla
measurements and the error locations introduced by the chosen noise
model, both of which depend on the code and noise model in question.
% Software stack: Layer 3
Even further up, given an already constructed syndrome extraction
circuit and the resulting \acf{dem}, we must split the detector error
matrix into separate windows and manage the interplay between the
inner decoders acting on those individual windows.
% Software stack: Layer 4
Finally, we require the decoder itself, which operates on a
\acf{pcm} and a syndrome, with no dependence on the complexity of the
layers below.
% Software stack: Tools
In our implementation, Stim \cite{gidney_stim_2021} served as the
circuit-level simulator, chosen for its efficiency and native support
for the \ac{dem} formalism.
@@ -1134,9 +1187,36 @@ We reimplemented both the window splitting and the decoders themselves.
\subsection{Belief Propagation}
\label{subsec:Belief Propagation}
% Simulation setup
% Intro
\content{Use min-sum}
We begin our investigation by using \ac{bp} with no further
modifications as the inner decoder.
We chose the min-sum variant of \ac{bp} due to its low computational complexity.
% Whole decoding as a lower bound on the error rate
We initially wanted to gain an impression for the performance gain we could
expect from a modification to the sliding-window decoding procedure.
To this end, we began by analyzing the decoding performance of the
original process, without our warm-start modification.
We will call this \emph{cold-start} decoding in the following.
We examined the decoding performance for different window sizes $W$
and compared it against the performance when decoding on the whole
detector error matrix at once, i.e., without windowing.
\Cref{fig:whole_vs_cold} depicts the results of this analysis.
\red{[Write more about the experimental setup (200 BP iterations,
fixed step size, what else?)]}
\red{[Describe the plot (whole decoding in black, (?) list different
window sizes and colors/markers, what else?)]}
We can see that a larger window results in a lower overall error rate.
This seems sensible, because the lower the window size, the more
locally the decoding is performed.
While this allows us to leverage the time-like structure of the
circuitry more strongly and further reduce the latency, it is
expected to lower the performance, since \red{[find something to say here]}.
Decoding the whole detector error matrix globally with no windowing
provides the best performance.
\begin{figure}[t]
\centering
@@ -1196,8 +1276,27 @@ We reimplemented both the window splitting and the decoders themselves.
extraction were performed and the noise model is
standard circuit-based depolarizing noise.
}
\label{fig:whole_vs_cold}
\end{figure}
% Initial results of warm-start decoding
As a next step, we additionally generated error rate curves using our
warm-start modification.
\red{[Again 200 BP iterations, etc.]}
\Cref{fig:whole_vs_cold_vs_warm} shows the numerical results from
this experiment.
The cold-start results from the previous graph are now plotted in
dashed lines, while the new warm-start results are plotted with solid lines.
We can see that the decoding performance has been improved overall.
% Unexpected: Warm-start better than whole
Additionally, we can see some initially unexpected behavior: The warm-start
sliding window decoding with $W=5$ performs better than decoding
under consideration of the whole detector error matrix at once, even
though the process is less global.
\begin{figure}[t]
\centering
\begin{tikzpicture}
@@ -1279,6 +1378,7 @@ We reimplemented both the window splitting and the decoders themselves.
extraction were performed and the noise model is
standard circuit-based depolarizing noise.
}
\label{fig:whole_vs_cold_vs_warm}
\end{figure}
\begin{figure}[t]