Files
ma-thesis/src/thesis/chapters/2_fundamentals.tex

1717 lines
66 KiB
TeX

\chapter{Fundamentals of Classical and Quantum Error Correction}
\label{ch:Fundamentals}
\Ac{qec} is a field of research combining ``classical''
communications engineering and quantum information science.
This chapter provides the relevant theoretical background on both of
these topics and subsequently introduces the fundamentals of \ac{qec}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Classical Error Correction}
\label{sec:Classical Error Correction}
The core concept underpinning error correcting codes is the
realization that introducing a finite amount of redundancy to
information before transmission can considerably reduce the error rate.
Specifically, Shannon proved in 1948 that for any channel, a block
code can be found that achieves arbitrarily small probability of
error at any communication rate up to the capacity of the channel
when the block length approaches infinity
\cite[Sec.~13]{shannon_mathematical_1948}.
In this section, we explore the concepts of ``classical'' (as in non-quantum)
error correction that are central to this work.
We start by looking at different ways of encoding information,
first considering binary linear block codes in general and then \ac{ldpc} and
\ac{sc}-\ac{ldpc} codes.
Finally, we pivot to the decoding process, specifically the \ac{bp}
algorithm.
\subsection{Binary Linear Block Codes}
%
% Codewords, n, k, rate
%
One particularly important class of coding schemes is that of binary
linear block codes.
The information to be protected takes the form of a sequence of
binary symbols, which is split into separate blocks.
Each block is encoded, transmitted, and decoded separately.
The encoding step introduces redundancy by mapping input messages
$\bm{u} \in \mathbb{F}_2^k$ of length $k \in \mathbb{N}$ (called the
\textit{information length}) onto \textit{codewords} $\bm{x} \in
\mathbb{F}_2^n$ of length $n \in \mathbb{N}$ (called the
\textit{block length}) with $n > k$.
A measure of the amount of introduced redundancy is the \textit{code
rate} $R = k/n$.
We call the set of all codewords $\mathcal{C}$ the \textit{code}
\cite[Sec.~3.1.1]{ryan_channel_2009}.
%
% d_min and the [] Notation
%
During the encoding process, a mapping from $\mathbb{F}_2^k$
onto $\mathcal{C} \subset \mathbb{F}_2^n$ takes place.
The input messages are mapped onto an expanded vector space, where
they are ``further apart'', giving rise to the error correcting
properties of the code.
This notion of the distance between two codewords $\bm{x}_1$ and
$\bm{x}_2$ can be expressed using the \textit{Hamming distance} $d(\bm{x}_1,
\bm{x}_2)$, which is defined as the number of positions in which they differ.
We define the \textit{minimum distance} of a code $\mathcal{C}$ as
%
\begin{align*}
d_\text{min} := \min \left\{ d(\bm{x}_1, \bm{x}_2) : \bm{x}_1,
\bm{x}_2 \in \mathcal{C}, \bm{x}_1 \neq \bm{x}_2 \right\}
.
\end{align*}
%
We can signify that a binary linear block code has information length
$k$, block length $n$ and minimum distance $d_\text{min}$ using the
notation $[n,k,d_\text{min}]$ \cite[Sec.~1.3]{macwilliams_theory_1977}.
%
% Parity checks, H, and the syndrome
%
A particularly elegant way of describing the subspace $C$ of
$\mathbb{F}_2^n$ that the codewords make up is the notion of
\textit{parity checks}.
Since $\lvert \mathcal{C} \rvert = 2^k$ and $\lvert \mathbb{F}_2^n
\rvert = 2^n$, we could introduce $n-k$ conditions to constrain the
additional degrees of freedom.
These conditions, called parity checks, take the form of equations
over $\mathbb{F}_2^n$, linking the individual positions of each codeword.
We can arrange the coefficients of these equations in a
\textit{parity-check matrix} (\acs{pcm}) $\bm{H} \in
\mathbb{F}_2^{(n-k) \times n}$ and equivalently define the code as
\cite[Sec.~3.1.1]{ryan_channel_2009}
\begin{align*}
\mathcal{C} = \left\{ \bm{x} \in \mathbb{F}_2^n :
\bm{H}\bm{x}^\text{T} = \bm{0} \right\}
.%
\end{align*}
Note that in general we may have linearly dependent parity checks,
prompting us to define the \ac{pcm} as $\bm{H} \in
\mathbb{F}_2^{m\times n}$ with $\hspace{2mm} m \ge n-k$ instead.
The \textit{syndrome} $\bm{s} = \bm{H} \bm{v}^\text{T}$ describes
which parity checks a candidate codeword $\bm{v} \in \mathbb{F}_2^n$ violates.
The representation using the \ac{pcm} has the benefit of providing a
description of the code, the memory complexity of which does not grow
exponentially with $n$, in contrast to keeping track of all codewords directly.
%
% The decoding problem
%
\Cref{fig:Diagram of a transmission system} visualizes the
communication process \cite[Sec.~1.1]{ryan_channel_2009}.
An input message $\bm{u}\in \mathbb{F}_2^k$ is mapped onto a codeword $\bm{x}
\in \mathbb{F}_2^n$. This is passed on to a modulator, which
interacts with the physical channel.
A demodulator processes the channel output and forwards the result
$\bm{y}$ to a decoder.
We differentiate between \textit{soft-decision} decoding, where
$\bm{y} \in \mathbb{R}^n$, and \textit{hard-decision} decoding, where
$\bm{y} \in \mathbb{F}_2^n$ \cite[Sec.~1.5.1.3]{ryan_channel_2009}.
Finally, the decoder is responsible for obtaining an estimate
$\hat{\bm{u}} \in \mathbb{F}_2^k$ of the original input message.
This is done by first finding an estimate $\hat{\bm{x}}$ of the sent
codeword and undoing the encoding.
The decoding problem that we generally attempt to solve thus consists
in finding the best estimate $\hat{\bm{x}}$ given $\bm{y}$.
\begin{figure}[t]
\centering
\tikzset{
box/.style={
rectangle, draw=black, minimum width=17mm, minimum height=8mm,
},
}
\begin{tikzpicture}
[
node distance = 2mm and 7mm,
]
\node (in) {};
\node[box, right=of in] (enc) {Encoder};
\node[box, minimum width=25mm, right=of enc] (mod) {Modulator};
\node[box, below right=of mod] (cha) {Channel};
\node[box, minimum width=25mm, below left=of cha] (dem) {Demodulator};
\node[box, left=of dem] (dec) {Decoder};
\node[left=of dec] (out) {};
\draw[-{latex}] (in) -- (enc) node[midway, above] {$\bm{u}$};
\draw[-{latex}] (enc) -- (mod) node[midway, above] {$\bm{x}$};
\draw[-{latex}] (mod) -| (cha);
\draw[-{latex}] (cha) |- (dem);
\draw[-{latex}] (dem) -- (dec) node[midway, above] {$\bm{y}$};
\draw[-{latex}] (dec) -- (out) node[midway, above] {$\hat{\bm{u}}$};
\end{tikzpicture}
\caption{Overview of a transmission system.}
\label{fig:Diagram of a transmission system}
\end{figure}
%
%
% Hard vs. soft information
%
\subsection{Low-Density Parity-Check Codes}
%
% Core concept
%
Shannon's noisy-channel coding theorem is stated for codes whose block
length approaches infinity. This suggests that as the block length
becomes larger, the performance of the considered codes should
generally improve.
However, the size of the \ac{pcm}, and thus in general the decoding complexity,
of a linear block code grows quadratically with $n$.
This would quickly render decoding intractable as we increase the block length.
We can get around this problem by constructing $\bm{H}$ in such a
manner that the number of nonzero entries grows less than quadratically, e.g.,
only linearly.
This is exactly the motivation behind \ac{ldpc} codes
\cite[Ch.~1]{gallager_low_1960}.
%
% Tanner Graph, VNs and CNs
%
\ac{ldpc} codes belong to a class sometimes referred to as ``modern codes''.
These differ from ``classical codes'' in their decoding algorithms:
Classical codes are usually decoded using one-step hard-decision decoding,
whereas modern codes are suitable for iterative soft-decision
decoding \cite[Preface]{ryan_channel_2009}. The iterative decoding algorithms
in question are generally defined in terms of message passing on the
\textit{Tanner graph} of the code. The Tanner graph is a bipartite
graph that constitutes an alternative representation of the \ac{pcm}.
We define two types of nodes: \acp{vn}, corresponding to codeword
bits, and \acp{cn}, corresponding to individual parity checks.
We then construct the Tanner graph by connecting each \ac{cn} to
the \acp{vn} that make up the corresponding parity check
\cite[Sec.~5.1.2]{ryan_channel_2009}.
\Cref{PCM and Tanner graph of the Hamming code} shows this
construction for the [7,4,3]-Hamming code.
%
\begin{figure}[t]
\centering
\begin{align*}
\bm{H} =
\begin{pmatrix}
0 & 1 & 1 & 1 & 1 & 0 & 0 \\
1 & 0 & 1 & 1 & 0 & 1 & 0 \\
1 & 1 & 0 & 1 & 0 & 0 & 1 \\
\end{pmatrix}
\end{align*}
\vspace*{2mm}
\tikzset{
VN/.style={
circle, fill=KITgreen, minimum width=1mm, minimum height=1mm,
},
CN/.style={
rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm,
},
}
\begin{tikzpicture}
\node[VN, label=above:$x_0$] (vn1) {};
\node[VN, right=12mm of vn1, label=above:$x_1$] (vn2) {};
\node[VN, right=12mm of vn2, label=above:$x_2$] (vn3) {};
\node[VN, right=12mm of vn3, label=above:$x_3$] (vn4) {};
\node[VN, right=12mm of vn4, label=above:$x_4$] (vn5) {};
\node[VN, right=12mm of vn5, label=above:$x_5$] (vn6) {};
\node[VN, right=12mm of vn6, label=above:$x_6$] (vn7) {};
\node[
CN, below=25mm of vn4,
label={below:$x_0 + x_2 + x_3 + x_5 = 0$}
] (cn2) {};
\node[
CN, left=40mm of cn2,
label={below:$x_1 + x_2 + x_3 + x_4 = 0$}
] (cn1) {};
\node[
CN, right=40mm of cn2,
label={below:$x_0 + x_1 + x_3 + x_6 = 0$}
] (cn3) {};
\foreach \n in {2,3,4,5} {
\draw (cn1) -- (vn\n);
}
\foreach \n in {1,3,4,6} {
\draw (cn2) -- (vn\n);
}
\foreach \n in {1,2,4,7} {
\draw (cn3) -- (vn\n);
}
\end{tikzpicture}
\caption{The \ac{pcm} and corresponding Tanner graph of the
[7,4,3]-Hamming code.}
\label{PCM and Tanner graph of the Hamming code}
\end{figure}
%
% N_V(j), N_C(i)
%
Mathematically, we represent a \ac{vn} using the index $i \in
\mathcal{I} := \left[ 0:n-1 \right] := \left\{ 0,1,\ldots,n-1 \right\}$
and a \ac{cn} using the index $j \in \mathcal{J}
:= \left[ 0 : m-1 \right]$.
We can then encode the information contained in the graph by defining
the neighborhood of a variable node $i$ as
$\mathcal{N}_\text{V} (i) = \left\{ j \in \mathcal{J} : \bm{H}_{j,i}
= 1 \right\}$
and that of a check node $j$ as
$\mathcal{N}_\text{C} (j) = \left\{ i \in \mathcal{I} : \bm{H}_{j,i}
= 1 \right\}$.
%
% Error floor and waterfall regions
%
We typically evaluate the performance of LDPC codes using the
\ac{ber} or the \ac{fer} (a \textit{frame} referes to one whole
transmitted block in this context).
Considering an \ac{awgn} channel, \Cref{fig:ldpc-perf} shows a
qualitative performance characteristic of an \ac{ldpc} code
\cite[Fig.~1]{costello_spatially_2014}. We talk of the
\textit{waterfall} and the \textit{error floor} regions.
\begin{figure}[t]
\centering
\begin{tikzpicture}
\begin{axis}[
width=12cm,
height=9cm,
xlabel={Signal-to-noise ratio},
ylabel={Error rate},
% xmin=0, xmax=6,
enlarge x limits=false,
ymin=1e-9, ymax=1,
ticks=none,
% y tick label={},
ymode=log,
grid=both,
grid style={line width=0.2pt, draw=gray!30},
major grid style={line width=0.4pt, draw=gray!50},
legend pos=north east,
legend cell align={left},
]
\addplot+[mark=none, solid, smooth, KITblue] coordinates {
(4.5789E-01, 1.1821E-01)
(6.6842E-01, 9.4575E-02)
(8.6316E-01, 5.2657E-02)
(1.0421E+00, 2.2183E-02)
(1.1789E+00, 8.3588E-03)
(1.3368E+00, 1.4835E-03)
(1.4895E+00, 1.6852E-04)
(1.5842E+00, 2.8285E-05)
(1.6737E+00, 4.2465E-06)
(1.7684E+00, 3.4519E-07)
(1.8316E+00, 3.9213E-08)
(1.8684E+00, 6.2247E-09)
(1.9053E+00, 1E-09)
};
\addlegendentry{Regular}
\addplot+[mark=none, solid, smooth, KITorange] coordinates {
(4.5789E-01, 1.1821E-01)
(6.4211E-01, 4.9800E-02)
(7.5263E-01, 1.2700E-02)
(8.1579E-01, 2.3177E-03)
(8.6842E-01, 3.5779E-04)
(9.1053E-01, 5.3716E-05)
(9.4737E-01, 4.8818E-06)
(9.8947E-01, 6.5555E-07)
(1.0421E+00, 9.5713E-08)
% (1.0684E+00, 2.9670E-08)
(1.1474E+00, 1.2499E-08)
(1.3000E+00, 7.1560E-09)
(1.4579E+00, 6.0535E-09)
% (1.6105E+00, 5E-09)
(1.9579E+00, 4E-09)
(2.2947E+00, 3.1876E-09)
% (2.8842E+00, 2.0403E-09)
};
\addlegendentry{Irregular}
\draw[gray, densely dashed]
(axis cs:0.65, 2e-3) rectangle (axis cs:1.65, 5e-5);
\node[below] at (axis cs:1.15, 6e-5) {Waterfall};
\draw[gray, densely dashed]
(axis cs:1, 6e-8) rectangle (axis cs:2, 2e-9);
\node[above] at (axis cs:1.5, 7e-8) {Error floor};
\end{axis}
\end{tikzpicture}
\caption{
Qualitative performance characteristic of \ac{ldpc} code
in an \ac{awgn} channel. Adapted from
\cite[Fig.~1]{costello_spatially_2014}.
}
\label{fig:ldpc-perf}
\end{figure}
Broadly, there are two kinds of \ac{ldpc} codes, \textit{regular} and
\textit{irregular}.
Regular codes are characterized by the fact that the weights, i.e.,
the numbers of ones, of their rows and columns are constant
\cite[Sec.~5.1.1]{ryan_channel_2009}.
Already during their introduction, regular \ac{ldpc} codes were shown to have
a minimum distance scaling linearly with the block length $n$ for
large values \cite[Ch.~2,~Theorem~1]{gallager_low_1960},
which leads to them not exhibiting an error floor under \ac{ml} decoding.
Irregular codes, on the other hand, generally do exhibit an error floor,
their redeeming quality being the ability to reach near-capacity
performance in the waterfall region \cite[Intro.]{costello_spatially_2014}.
\subsection{Spatially-Coupled LDPC Codes}
A relatively recent development in the world of \ac{ldpc} codes is
that of \ac{sc}-\ac{ldpc} codes.
Their key feature is that they combine the best properties of regular
and irregular codes.
They have a minimum distance that grows linearly with $n$, promising
good error floor behavior, and capacity approaching
iterative decoding behavior, promising good performance in the
waterfall region \cite[Intro.]{costello_spatially_2014}.
The essential property of \ac{sc}-\ac{ldpc} codes is that codewords
from different \textit{spatial positions}, that would ordinarily be sent
one after the other independently, are coupled.
This is achieved by connecting some \acp{vn} of one spatial position to
\acp{cn} of another, resulting in a \ac{pcm} of the form
\cite[Eq.~1]{hassan_fully_2016}
%
\begin{align*}
\bm{H} =
\begin{pmatrix}
\bm{H}_0(1) & & \\
\vdots & \ddots & \\
\bm{H}_K(1) & & \bm{H}_0(L) \\
& \ddots & \\
& & \bm{H}_K(L) \\
\end{pmatrix}
,
\end{align*}
%
where $K \in \mathbb{N}$ is the \textit{coupling width} and $L \in
\mathbb{N}$ is the number of spatial positions.
This construction results in a Tanner graph as depicted in
\Cref{fig:sc-ldpc-tanner}.
\begin{figure}[t]
\centering
\tikzset{
VN/.style={
circle, fill=KITgreen, minimum width=1mm, minimum height=1mm,
},
CN/.style={
rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm,
},
}
\begin{tikzpicture}[node distance=7mm and 1cm]
\node[VN] (vn00) {};
\node[VN, below = of vn00] (vn01) {};
\node[VN, below = of vn01] (vn02) {};
\node[VN, below = of vn02] (vn03) {};
\node[VN, below = of vn03] (vn04) {};
\coordinate (temp) at ($(vn01)!0.5!(vn02)$);
\node[CN, right = of temp] (cn00) {};
\node[CN, below = of cn00] (cn01) {};
\draw (vn00) -- (cn00);
\draw (vn01) -- (cn00);
\draw (vn03) -- (cn00);
\draw (vn01) -- (cn01);
\draw (vn02) -- (cn01);
\draw (vn04) -- (cn01);
\foreach \i in {1,2,3} {
\pgfmathtruncatemacro{\previ}{\i-1}
\node[VN, right = 25mm of vn\previ 0] (vn\i0) {};
\foreach \j in {1,...,4} {
\pgfmathtruncatemacro{\prevj}{\j-1}
\node[VN, below = of vn\i\prevj] (vn\i\j) {};
}
\coordinate (temp) at ($(vn\i1)!0.5!(vn\i2)$);
\node[CN, right = of temp] (cn\i0) {};
\node[CN, below = of cn\i0] (cn\i1) {};
\draw (vn\i0) -- (cn\i0);
\draw (vn\i1) -- (cn\i0);
\draw (vn\i3) -- (cn\i0);
\draw (vn\i1) -- (cn\i1);
\draw (vn\i2) -- (cn\i1);
\draw (vn\i4) -- (cn\i1);
}
\node[right = 25mm of vn30] (vn40) {};
\node[below = of vn40] (vn41) {};
\node[below = of vn41] (vn42) {};
\node[below = of vn42] (vn43) {};
\node[below = of vn43] (vn44) {};
\coordinate (temp) at ($(vn41)!0.5!(vn42)$);
\node[right = of temp] (cn40) {};
\node[below = of cn40] (cn41) {};
\foreach \i in {0,1,2} {
\pgfmathtruncatemacro{\next}{\i+1}
\pgfmathtruncatemacro{\nextnext}{\i+2}
\draw (vn\i 3) to[bend right] (cn\next 1);
\draw (vn\i 1) to[bend left] (cn\nextnext 0);
}
\draw (vn33) to[bend right] (cn41);
\node at ($(cn40)!0.5!(cn41)$) {\dots};
\draw[decorate, decoration={brace, amplitude=10pt}]
([xshift=-5mm,yshift=2mm]vn00.north) --
([xshift=5mm,yshift=2mm]vn00.north -| cn20.north)
node[midway, above=4mm] {K};
\end{tikzpicture}
\caption{
Visualization of the coupling between the Tanner graphs
of individual spatial positions.
}
\label{fig:sc-ldpc-tanner}
\end{figure}
Note that at the first and last few spatial positions, some \acp{cn}
have lower degrees.
This leads to more reliable information about the
\acp{vn} that, as we will see, is
later passed to subsequent spatial positions during decoding.
This is precisely the effect that leads to the good performance of
\ac{sc}-\ac{ldpc} codes in the waterfall region \cite{costello_spatially_2014}.
\subsection{Iterative Decoding}
\label{subsec:Iterative Decoding}
% Introduction
\ac{ldpc} codes are generally decoded using efficient iterative
algorithms, something that is possible due to their sparsity
\cite[Sec.~5.3]{ryan_channel_2009}.
The algorithm originally proposed alongside LDPC codes for this
purpose by Gallager in 1960 is now known as the \ac{spa}
\cite[5.4.1]{ryan_channel_2009}, also called \ac{bp}.
The optimality criterion the \ac{spa} is built around is a
symbol-wise \ac{map} decision \cite[Sec.~5.4.1]{ryan_channel_2009}.
The core idea of the resulting algorithm is to view \acp{cn} as
representing single-parity check codes and \acp{vn} as representing
repetition codes.
The algorithm alternates between consolidating soft information about
the \acp{vn} in the \acp{cn}, and consolidating soft information about
the \acp{cn} in the \acp{vn}.
To this end, messages are passed back and forth along the edges of
the Tanner graph.
$L_{i\rightarrow j}$ represents a message passed from \ac{vn} $i$ to
\ac{cn} j, $L_{i\leftarrow j}$ represents a message passed from
\ac{cn} j to \ac{vn} i.
The \acp{vn} additionally receive messages \cite[5.4.2]{ryan_channel_2009}
\begin{align*}
\tilde{L}_i = \log \frac{P(X=0 \vert Y=y)}{P(X=1 \vert Y=y)},
\end{align*}
computed from the channel outputs.
The consolidation of the information occurs in the \ac{vn} update
\begin{align*}
L_{i\rightarrow j} = \tilde{L}_i + \sum_{j'\in \mathcal{N}(i)\setminus
j} L_{i\leftarrow j'}
\end{align*}
and the \ac{cn} update
\begin{align*}
L_{i\leftarrow j} = 2\cdot \tanh^{-1} \left( \prod_{i'\in
\mathcal{N}(j)\setminus i} \tanh \frac{L_{i'\rightarrow j}}{2} \right)
.
\end{align*}
A basic assumption for the derivation of the \ac{spa} is that the
messages are statistically independent.
If the Tanner graph has cycles, however, this
condition is not met.
The shorter the cycles, the sooner this condition is violated and the
worse the approximation becomes \cite[Sec.~5.4.4]{ryan_channel_2009}.
Cycles of length four (so-called \emph{$4$-cycles}) are the shortest
possible cycles and are thus especially problematic.
% Min-sum algorithm
A simplification of the \ac{spa} is the min-sum decoder. Here, the
\ac{cn} update is approximated as \cite[Sec.~5.5.1]{ryan_channel_2009}
\begin{align*}
L_{i \leftarrow j} = \prod_{i' \in \mathcal{N}(j)\setminus i}
\sign \left( L_{i' \rightarrow j} \right)
\cdot \min_{i' \in \mathcal{N}(j)\setminus i} \lvert
L_{i'\rightarrow j} \rvert
.
\end{align*}
% Sliding-window decoding
For \ac{sc}-\ac{ldpc} codes, the iterative decoding process is wrapped by a
windowing step. This is done to reduce the latency and memory requirements and
also the overall computational complexity \cite{costello_spatially_2014}.
To this end, the Tanner graph is split into several overlapping windows.
During decoding, the messages that are passed along the edges of the
graph in the overlapping regions are kept in memory and used for the
decoding of subsequent blocks \cite[Sec.~III.~C.]{hassan_fully_2016}.
\section{Quantum Mechanics and Quantum Information Science}
\label{sec:Quantum Mechanics and Quantum Information Science}
Designing codes and decoders for \ac{qec} is generally performed on a
layer of abstraction far removed from the quantum mechanical
processes underlying the actual physics.
Nevertheless, having a fundamental understanding of the related
quantum mechanical concepts is useful to grasp the unique constraints
of this field.
The purpose of this section is to convey these concepts to the reader.
%%%%%%%%%%%%%%%%
\subsection{Core Concepts and Notation}
\label{subsec:Notation}
% Wave functions
In quantum mechanics, the state of a particle is described by a
\emph{wave function} $\psi(x,t)$.
Born's statistical interpretation provides a connection between this
function and the observable world:
$\lvert \psi (x,t) \rvert^2$ is the \ac{pdf} of finding a particle at
position $x$ and time $t$ \cite[Sec.~1.2]{griffiths_introduction_1995}.
Note that this presupposes a normalization of $\psi$ such that
$\int_{-\infty}^{\infty} \lvert \psi(x,t) \rvert^2 dx = 1$.
% Dirac notation
Much of the related mathematics can be very elegantly expressed
using the language of linear algebra.
The so-called Bra-ket or Dirac notation is especially appropriate,
having been proposed by Paul Dirac in 1939 for the express purpose
of simplifying quantum mechanical notation \cite{dirac_new_1939}.
Two new symbols are defined, \emph{bra}s $\bra{\cdot}$ and
\emph{ket}s $\ket{\cdot}$.
Kets denote column vectors, while bras denote their Hermitian conjugates.
For example, two vectors specified by the labels $a$ and $b$
respectively are written as $\ket{a}$ and $\ket{b}$.
Their inner product is $\braket{a\vert b}$.
% Expressing wave functions using linear algebra
The connection we will make between quantum mechanics and linear
algebra is that we will model the state space of a system as a
\emph{function space}, the Hilbert space $L_2$.
We will represent the state of a particle with wave function
$\psi(x,t)$ using the vector $\ket{\psi}$
\cite[Sec.~3.3]{griffiths_introduction_1995}.
% Operators
Another important notion is that of an \emph{operator}, a transformation
that takes a function as an input and returns another function as an
output \cite[Sec.~3.2.2]{griffiths_introduction_1995}.
Operators are useful to describe the relations between different
quantities relating to a particle.
An example of this is the differential operator $\partial x$.
We define the \emph{commutator} of two operators $P_1$ and $P_2$ as
\begin{align*}
[P_1,P_2] = P_1P_2 - P_2P_1
\end{align*}
and the \emph{anticommutator} as
\begin{align*}
[P_1,P_2]_+ = P_1P_2 + P_2P_1
.%
\end{align*}
We say the two operators \emph{commute} iff $[P_1,P_2] = 0$, and they
\emph{anti-commute} iff $[P_1,P_2]_+ = 0$.
%%%%%%%%%%%%%%%%
\subsection{Observables}
\label{subsec:Observables}
% Observable quantities
An \emph{observable quantity} $Q(x,p,t)$ is a quantity of a quantum
mechanical system that we can measure, such as the position $x$ or
momentum $p$ of a particle.
In general, such measurements are not deterministic, i.e.,
measurements on identically prepared states can yield different results.
There are some states, however, that are \emph{determinate} for a
specific observable: measuring those will always yield identical
observations \cite[Sec.~3.3]{griffiths_introduction_1995}.
% General expression for expected value of observable quantity
If we know the wave function of a particle, we should be able to
compute the expected value $\braket{Q}$ of any observable quantity we wish.
It can be shown that for any $Q$, we can find a
corresponding Hermitian operator $\hat{Q}$ such that
\cite[Sec.~3.3]{griffiths_introduction_1995}
\begin{align}
\label{eq:gen_expr_Q_exp}
\braket{Q} = \int_{-\infty}^{\infty} \psi^*(x,t) \hat{Q} \psi(x,t) dx
.%
\end{align}%
While the derivation of this relationship is out of the scope of this
work, we can at least look at an example to illustrate it.
Considering the position $Q = x$ of a particle and setting the observable
operator to $\hat{Q} = x$, we can write
\cite[Sec.~1.5]{griffiths_introduction_1995}
\begin{align*}
\braket{x} = \int_{-\infty}^{\infty} \psi^*(x,t) \cdot x \cdot \psi(x,t) dx
= \int_{-\infty}^{\infty} x \lvert \psi(x,t) \rvert ^2 dx
.%
\end{align*}
Note that $\lvert \psi(x,t) \rvert^2 $ represents the \ac{pdf} of
finding a particle in a specific state. We immediately see that the
formula simplifies to the direct calculation of the expected value.
% Determinate states and eigenvalues
Let us now examine how the observable operator $\hat{Q}$ relates to
the determinate states of the observable quantity.
We begin by translating \Cref{eq:gen_expr_Q_exp} into linear algebra as
\cite[Eq.~3.114]{griffiths_introduction_1995}
\begin{align}
\label{eq:gen_expr_Q_exp_lin}
\braket{Q} = \braket{\psi \vert \hat{Q}\psi}
.%
\end{align}
\Cref{eq:gen_expr_Q_exp_lin} expresses an inherently probabilistic
relationship.
The determinate states are inherently deterministic.
To relate the two, we note that since determinate states should
always yield the same measurement results, the variance of the
observable should be zero.
We thus compute \cite[Eq.~3.116]{griffiths_introduction_1995}
\begin{align}
0 &\overset{!}{=} \braket{(Q - \braket{Q})^2}
= \braket{e_n \vert (\hat{Q} - \braket{Q})^2 e_n} \nonumber\\
&= \braket{(Q - \braket{Q})e_n \vert (\hat{Q} - \braket{Q})
e_n} \nonumber\\
&= \lVert (Q - \braket{Q}) e_n \rVert^2 \nonumber\\[3mm]
&\hspace{-8mm}\Leftrightarrow (\hat{Q} - \braket{Q}) \ket{e_n} =
0 \nonumber\\
\label{eq:observable_eigenrelation}
&\hspace{-8mm}\Leftrightarrow \hat{Q}\ket{e_n}
= \underbrace{\braket{Q}}_{\lambda_n} \ket{e_n}
.%
\end{align}%
%
Because we have assumed the variance to be zero, the expected value
$\braket{Q}$ is now the deterministic measurement result
corresponding to the determinate state
$\ket{e_n},~n\in \mathbb{N}$.
We can see that the determinate states are the \emph{eigenstates} of
the observable operator $\hat{Q}$ and that the measurement values are
the corresponding \emph{eigenvalues} $\lambda_n$
\cite[Sec.~3.3]{griffiths_introduction_1995}.
% Determinate states as a basis
As we are modelling the wave function $\psi(x,t)$ as a vector
$\ket{\psi}$, we can find a set of basis vectors to decompose it into.
We can use the determinate states for this purpose, expressing the state as%
\footnote{
We are only considering the case of having a \emph{discrete
spectrum} here, i.e., having a discrete set of eigenvalues and vectors.
For continuous spectra, the procedure is analogous.
}
\begin{align}
\label{eq:determinate_basis}
\ket{\psi} = \sum_{n=1}^{\infty} c_n \ket{e_n}, \hspace{3mm}
c_n := \braket{e_n \vert \psi}
.%
\end{align}
Because of the normalization of the wave function such that
$\int_{-\infty}^{\infty} \lvert \psi(x,t) \rvert^2 dx = 1$, we have
$\sum_{n=1}^{\infty} \lvert c_n \rvert ^2 = 1$.
Inserting \Cref{eq:determinate_basis} into
\Cref{eq:gen_expr_Q_exp_lin} we obtain
% tex-fmt: off
\cite[Prob.~3.35c)]{griffiths_introduction_1995}
% tex-fmt: on
\begin{align*}
\braket{Q} = \left( \sum_{n=1}^{\infty} c_n \bra{e_n} \right)
\left( \sum_{m=1}^{\infty} c_m\hat{Q}\ket{e_m} \right)
= \sum_{n=1}^{\infty} \sum_{m=1}^{\infty} c_n c_m
\lambda_m\braket{e_n \vert e_m}
= \sum_{n=1}^{\infty} \lambda_n \lvert c_n \rvert ^2
.%
\end{align*}
We can thus interpret $\lvert c_n \rvert ^2$ as the probability of
obtaining value $\lambda_n$ from the measurement.
% Recap
To summarize, we mathematically model an observable quantity
$Q(x,t,p)$ using a corresponding operator $\hat{Q}$, which allows us
to compute the expected value as $\braket{Q} = \braket{\psi
\vert \hat{Q} \psi}$.
The eigenvectors of $\hat{Q}$ are the determinate states
$\ket{e_n},~n\in \mathbb{N}$ and the eigenvalues are the respective
measurement outcomes.
We can decompose an arbitrary state as $\ket{\psi} = \sum_{n=1}^{\infty} c_n
\ket{e_n}$, where $\lvert c_n \rvert ^2$ represents the probability
of obtaining a certain measurement value.
Note that when we speak of an \emph{observable}, we are usually
referring to the operator $\hat{Q}$.
%%%%%%%%%%%%%%%%
\subsection{Projective Measurements}
\label{subsec:Projective Measurements}
% Projective measurements
The measurements we considered in the previous section, for which
\Cref{eq:gen_expr_Q_exp_lin} holds, belong to the category of
\emph{projective measurements}.
For these, certain restrictions such as repeatability apply: the act
of measuring a quantum state should \emph{collapse} it onto one of
the determinate states.
Further measurements should then yield the same value.
More general methods of modelling measurements exist, e.g., describing
destructive measurements \cite[Box~2.5]{nielsen_quantum_2010}, but
they are not relevant to this work.
% Projection operators
We can model the collapse of the original state onto one of the
superimposed basis states as a \emph{projection}.
To see this, we use
\Cref{eq:determinate_basis,eq:observable_eigenrelation} to compute
\begin{align*}
\hat{Q}\ket{\psi} = \sum_{n=1}^{\infty} c_n \hat{Q} \ket{e_n}
= \sum_{n=1}^{\infty} \lambda_n c_n \ket{e_n}
.%
\end{align*}%
We see that $\hat{Q}$ has the effect of multiplying the component
along each basis vector with the corresponding eigenvalue.
We decompose $\hat{Q}$ into its constituent parts that act on each of
the separate components as
\begin{align*}
\hat{Q} = \sum_{n=1}^{\infty} \lambda_n \hat{P}_n
\end{align*}
using \emph{projection operators} \cite[Eq.~3.160]{griffiths_introduction_1995}
\begin{align*}
\hat{P}_n := \ket{e_n}\bra{e_n}, \hspace{3mm} n\in \mathbb{N}
.
\end{align*}%
These project a vector onto the subspace spanned by $\ket{e_n}$.
% % Using projection operators to measure if a state has a component
% % along a basis vector
%
% A particularly interesting property of projection operators is that
% \begin{align*}
% \hat{P}_n (\hat{P}_n \ket{\psi}) = \hat{P}_n^2 \ket{\psi}
% = \hat{P}_n \ket{\psi},
% \end{align*}%
% and the only way this can hold for any $\ket{\psi}$ is if $\hat{P}_n$
% only has the eigenvalues $0$ or $1$
% % tex-fmt: off
% \cite[Prob.~3.57a)]{griffiths_introduction_1995}.
% % tex-fmt: on
% The eigenvalues can again be interpreted as possible measurement results.
% We can thus use $\hat{P}$ as an observable and treat
% the eigenvalue as an indicator of the state having a component along
% the related basis vector.
%%%%%%%%%%%%%%%%
\subsection{Qubits and Multi-Qubit States}
\label{subsec:Qubits and Multi-Qubit States}
% Intro
% TODO: Make sure `quantum gate` is proper terminology
A central concept for quantum computing is that of the \emph{qubit}.
We employ it analogously to the classical \emph{bit}.
For classical computers, we alter bits' states using \emph{gates}.
We can chain multiple of these gates together to build up more complex logic,
such as half-adders or eventually a full processor.
In principle, quantum computers work in a similar fashion, only that
instead of bits we use qubits and instead of, e.g., AND, OR, and XOR
operations we use \emph{quantum gates} \cite[Sec.~1.3]{nielsen_quantum_2010}.
% Qubits and multi-qubit states
We fix an orthonormal basis of $\mathbb{C}^2$ to be
\begin{align*}
\ket{0} =
\begin{pmatrix}
1 \\
0
\end{pmatrix}, \hspace{5mm}
\ket{1} =
\begin{pmatrix}
0 \\
1
\end{pmatrix}
.%
\end{align*}
A qubit is defined to be a system with quantum state
\begin{align}
\label{eq:gen_qubit_state}
\ket{\psi} =
\begin{pmatrix}
\alpha \\
\beta
\end{pmatrix}
= \alpha \ket{0} + \beta \ket{1}
.%
\end{align}
The overall state of a composite quantum system is described using
the \emph{tensor product}, denoted as $\otimes$
\cite[Sec.~2.2.8]{nielsen_quantum_2010}.
Take for example the two qubits
\begin{align*}
\ket{\psi_1} = \alpha_1 \ket{0} + \beta_1 \ket{1},\hspace*{10mm}
\ket{\psi_2} = \alpha_2 \ket{0} + \beta_2 \ket{1}
.%
\end{align*}
We examine the state $\ket{\psi}$ of the composite system.
Assuming the qubits are independent, this is a \emph{product state}
$\ket{\psi} = \ket{\psi_1}\otimes\ket{\psi_2}$.
When not ambiguous, we may omit the tensor product symbol or even write
the entire product state as a single ket
\cite[Sec.~6.2]{griffiths_consistent_2001}.
We have
\begin{align}
\label{eq:product_state}
\begin{split}
\ket{\psi} = \ket{\psi_1} \ket{\psi_2}
&= \left( \alpha_1 \ket{0} + \beta_1 \ket{1} \right)
\left( \alpha_2 \ket{0} + \beta_2 \ket{1} \right) \\
&= \alpha_1\alpha_2\ket{00}
+ \alpha_1\beta_2\ket{01}
+ \beta_1\alpha_2\ket{10}
+ \beta_1\beta_2\ket{11}
.%
\end{split}
\end{align}
We call $\ket{x_0, \ldots, x_n}~, x_i \in \{0,1\}$ the
\emph{computational basis states} \cite[Sec.~4.6]{nielsen_quantum_2010}.
To additionally simplify set notation, we define
\begin{align*}
\mathcal{M}^{\otimes n} := \underbrace{\mathcal{M}\otimes \ldots
\otimes \mathcal{M}}_{n \text{ times}}
.%
\end{align*}
% Entanglement
States that are not able to be decomposed into such products
are called \emph{entangled} \cite[Sec.~2.2.8]{nielsen_quantum_2010}.
An example of such states are the \emph{Bell states}
\begin{align*}
\begin{split}
\ket{\psi_{00}} &= \frac{\ket{00} + \ket{11}}{\sqrt{2}} \hspace{15mm}
\ket{\psi_{01}} = \frac{\ket{01} + \ket{10}}{\sqrt{2}} \\
\ket{\psi_{10}} &= \frac{\ket{01} - \ket{10}}{\sqrt{2}} \hspace{15mm}
\ket{\psi_{11}} = \frac{\ket{00} - \ket{11}}{\sqrt{2}}
\end{split}
\hspace{4mm}.%
\end{align*}
Quantum entanglement plays a major role in the way information
is encoded on quantum systems compared to classical ones.
Instead of employing only the individual qubit states, the
information is stored in the correlations between the qubits
\cite[Sec.~2]{preskill_quantum_2018}.
% The size of the vector space
As we can see in \Cref{eq:product_state}, the number of
computational basis states needed to express the full composite state
is $2^n$.
This is in contrast to classical systems, where the dimensionality of
the state space only grows linearly with $n$.
This exponential growth of the state space is what makes it difficult
to simulate quantum systems on classical hardware.
It is also what motivated the research into performing computations
using quantum hardware in the first place
\cite[Sec.~3]{feynman_simulating_1982}.
% Basic types of gates
After examining the modelling of single- and multi-qubit systems,
we now shift our focus to describing the evolution of their states.
We model state changes as operators.
Unlike classical systems, where there are only two possible states and
thus the only possible state change is a bit-flip, a general qubit
state as shown in \Cref{eq:gen_qubit_state} lives on a continuum of values.
We thus technically also have an infinite number of possible state changes.
Fortunately, we can express any operator as a linear combination of the
\emph{Pauli operators} \cite[Sec.~2.2]{gottesman_stabilizer_1997}
\cite[Sec.~2.2]{roffe_quantum_2019}
\begin{align*}
\begin{array}{c}
I\text{ Operator} \\
\hline\\
\ket{0} \mapsto \ket{0} \\
\ket{1} \mapsto \ket{1}
\end{array}%
\hspace{10mm}%
\begin{array}{c}
X\text{ Operator} \\
\hline\\
\ket{0} \mapsto \ket{1} \\
\ket{1} \mapsto \ket{0}
\end{array}%
\hspace{10mm}%
\begin{array}{c}
Z\text{ Operator} \\
\hline\\
\ket{0} \mapsto \phantom{-}\ket{0} \\
\ket{1} \mapsto -\ket{1}
\end{array}%
\hspace{10mm}%
\begin{array}{c}
Y\text{ Operator} \\
\hline\\
\ket{0} \mapsto \phantom{-}j\ket{1} \\
\hspace{2.75mm}\ket{1} \mapsto -j\ket{0} \hspace*{1mm}.
\end{array}
\end{align*}
In fact, if we allow for complex coefficients, the $X$ and $Z$
operators are sufficient to express any other operator as a linear
combination \cite[Sec.~2.2]{roffe_quantum_2019}.
$I$ is the identity operator and $X$ and $Z$ are referred to as
\emph{bit-flips} and \emph{phase-flips} respectively.
We call the set $\mathcal{G}_n = \left\{ \pm I,\pm jI, \pm X,\pm jX,
\pm Y,\pm jY, \pm Z, \pm jZ \right\}^{\otimes n}$ the \emph{Pauli
group} over $n$ qubits.
In the context of modifying qubit states, we also call operators \emph{gates}.
When working with multi-qubit systems, we can also apply Pauli gates
to individual qubits independently, which we write, e.g., as $I_1 X_2
I_3 Z_4 Y_5$.
We often omit the identity operators, instead writing, e.g., $X_2 Z_4 Y_5$.
Other important operators include the \emph{Hadamard} and
\emph{controlled-NOT (CNOT)} gates \cite[Sec.~1.3]{nielsen_quantum_2010}
\vspace*{-7mm}
\begin{figure}[H]
\centering
\begin{minipage}[t]{0.4\textwidth}
\centering
\begin{align*}
\begin{array}{c}
H\text{ Operator} \\
\hline\\
\ket{0} \mapsto \frac{1}{\sqrt{2}} \left( \ket{0} +
\ket{1} \right) \\[2mm]
\ket{1} \mapsto \frac{1}{\sqrt{2}} \left( \ket{0} -
\ket{1} \right)
\end{array}
\end{align*}
\end{minipage}%
\begin{minipage}[t]{0.4\textwidth}
\centering
\begin{align*}
\begin{array}{c}
CNOT\text{ Operator} \\
\hline\\
\ket{00} \mapsto \ket{00} \\
\ket{01} \mapsto \ket{01} \\
\ket{10} \mapsto \ket{11} \\
\hspace{2.75mm}\ket{11} \mapsto \ket{10} \hspace*{1mm}.
\end{array}
\end{align*}
\end{minipage}%
\end{figure}
\vspace{-4mm}
\noindent Many more operators relevant to quantum computing exist, but they are
not covered here as they are not central to this work.
%%%%%%%%%%%%%%%%
\subsection{Quantum Circuits}
\label{Quantum Circuits}
% Intro
Using these quantum gates, we can construct \emph{circuits} to manipulate
the states of qubits \cite[Sec.~1.3.4]{nielsen_quantum_2010}.
Circuits are read from left to right and each horizontal wire
represents a qubit whose state evolves as it passes through
successive gates.
% General notation
A single line carries a quantum state, while a double line
denotes a classical bit, typically used to carry the result of a measurement.
A measurement is represented by a meter symbol.
In general, gates are represented as labeled boxes placed on one or more wires.
An exception is the CNOT gate, where the operation is represented as
the symbol $\oplus$.
% Controlled gates & example
We can additionally add a control input to a gate.
This conditions its application on the state of another qubit
\cite[Sec.~4.3]{nielsen_quantum_2010}.
The control connection is represented by a vertical line connecting
the gate to the corresponding qubit, where a filled dot is placed.
A controlled gate applies the respective operation only if the
control qubit is in state $\ket{1}$.
An example of this is the CNOT gate introduced in
\Cref{subsec:Qubits and Multi-Qubit States}, which is depicted in
\Cref{fig:cnot_circuit}.
\begin{figure}[t]
\centering
\begin{quantikz}
\lstick{$\ket{\psi}_1$} & \ctrl{1} & \\
\lstick{$\ket{\psi}_2$} & \targ{} & \\
\end{quantikz}
\caption{CNOT gate circuit.}
\label{fig:cnot_circuit}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Quantum Error Correction}
\label{sec:Quantum Error Correction}
% TODO: Use this for the introduction as well
% General motivation behind QEC
One of the major barriers on the road to building a functioning
quantum computer is the inevitability of errors during quantum
computation. These arise due to the difficulty in sufficiently isolating the
qubits from external noise \cite[Intro.]{roffe_quantum_2019}.
This isolation is critical for quantum systems, as the constant interactions
with the environment act as small measurements, an effect called
\emph{decoherence} of the quantum state
\cite[Intro.]{gottesman_stabilizer_1997}.
\ac{qec} is one approach of dealing with this problem, by protecting
the quantum state in a similar fashion to information in classical error
correction.
% The unique challenges of QEC
The problem setting of \ac{qec} differs slightly from the classical case.
Three main restrictions apply \cite[Sec.~2.4]{roffe_quantum_2019}:
\begin{itemize}
\item The no-cloning theorem states that it is
impossible to exactly copy the state of one qubit into another.
\item Qubits are susceptible to more types of errors than
just bit-flips, as we saw in
\Cref{subsec:Qubits and Multi-Qubit States}.
\item Directly measuring the state of a qubit collapses it onto
one of the determinate states, thereby potentially destroying
information.
\end{itemize}
% General idea (logical vs. physical gates) + notation
Much like in classical error correction, in \ac{qec} information
is protected by mapping it onto codewords in a higher-dimensional space,
thereby introducing redundancy.
To this end, $k \in \mathbb{N}$ \emph{logical qubits} are mapped onto
$n \in \mathbb{N}$ \emph{physical qubits}, $n>k$.
We circumvent the no-cloning restriction by not copying the state of any of
the $k$ logical qubits, instead spreading the total state out over all $n$
physical qubits \cite[Intro.]{calderbank_good_1996}.
To differentiate quantum codes from classical ones, we denote a
code with parameters $k,n$ and minimum distance $d_\text{min}$ using
double brackets, as $\llbracket n,k,d_\text{min} \rrbracket$
\cite[Sec.~4]{roffe_quantum_2019}.
% The backlog problem
Another difference between quantum and classical error correction
lies in the resource constraints.
For \ac{qec}, the most important property is low latency, not, e.g.,
low overall computational complexity.
This is due to the \emph{backlog problem}
\cite[Sec.~II.G.3.]{terhal_quantum_2015}: There are certain gates
at which the effect of existing errors on single qubits may be
exacerbated by transforming them to multi-qubit errors.
We wish to correct the errors before passing qubits through such gates.
If the \ac{qec} system is not fast enough, there will be an increasing
backlog of information at this point in the circuit, leading to an
exponential slowdown in computation.
%%%%%%%%%%%%%%%%
\subsection{Stabilizer Measurements}
\label{subsec:Stabilizer Measurements}
% Setting the stage
Before we move on to the description of entire codes, we introduce
the notion of the \emph{stabilizer measurement}.
Consider the two-qubit repetition code
\cite[Sec.~2.4]{roffe_quantum_2019}, where we map
\begin{align*}
\ket{\psi} = \alpha \ket{0} + \beta \ket{1}
\hspace*{3mm} \mapsto \hspace*{3mm}
\ket{\psi}_\text{L}
= \alpha \underbrace{\ket{00}}_{=:\ket{0}_\text{L}} + \beta
\underbrace{\ket{11}}_{=:\ket{1}_\text{L}}
.%
\end{align*}
We call $\ket{\psi}_L$ the logical state, and
we define the \emph{codespace} as $\mathcal{C} := \text{span}\mleft\{
\ket{00}, \ket{11} \mright\}$ and the \emph{error subspace} as
$\mathcal{F} := \text{span} \mleft\{\ket{01}, \ket{10} \mright\}$.
Note that this code is only able to detect single $X$-type errors.
% Measuring stabilizers
To determine if an error occurred, we want to measure
whether a state belongs
% TODO: Remove footnote?
% \footnote{
% It is possible for a state to not completely lie in either subspace.
% In this case, we can interpret it as being in
% $\mathcal{C}$ or $\mathcal{F}$ with a certain probability.
% }
to $\mathcal{C}$ or $\mathcal{F}$.
As explained in \Cref{subsec:Observables}, physical measurements
can be mathematically described using operators whose eigenvalues
are the possible measurement results.
Here, we need an operator with two eigenvalues and the corresponding
eigenspaces should be $\mathcal{C}$ and $\mathcal{F}$ respectively.
For the two-qubit code, $Z_1Z_2$ is such an operator:
\begin{align}
Z_1Z_2 E \ket{\psi}_\text{L} &= (+1) E \ket{\psi}_\text{L}
\hspace*{3mm} \forall
E \ket{\psi}_\text{L} \in \mathcal{C} \\
Z_1Z_2 E \ket{\psi}_\text{L} &= (-1) E \ket{\psi}_\text{L}
\hspace*{3mm} \forall
E \ket{\psi}_\text{L} \in \mathcal{F}
.%
\end{align}
$E \in \left\{ X,I \right\}$ is an operator describing a possible
error and $E \ket{\psi}_\text{L}$ is the resulting state after that error.
By measuring the corresponding eigenvalue, we can determine if
$E\ket{\psi}_\text{L}$ lies in $\mathcal{C}$ or $\mathcal{F}$.
% TODO: If necessary, cite \cite[Sec.~3]{roffe_quantum_2019} for the
% non-compromising meausrement of the information
To do this without directly observing (and thus potentially
collapsing) the logical state $\ket{\psi}_\text{L}$, we prepare an
ancilla qubit with state $\ket{0}_\text{A}$ and entangle it with
$\ket{\psi}_\text{L}$ in such a way that the eigenvalue is indicated
by measuring the ancilla qubit instead.
More specifically, using a stabilizer measurement circuit as shown in
\Cref{fig:stabilizer_measurement}, we transform the state of the
three-qubit system as
\begin{align}
\label{eq:error_projection}
E\ket{\psi}_\text{L} \ket{0}_\text{A} \hspace*{3mm} \rightarrow
\hspace*{3mm}
\underbrace{\frac{1}{2} \mleft( I_1I_2 + Z_1Z_2 \mright)}_{=:
P_\mathcal{C}} E\ket{\psi}_\text{L}
\ket{0}_\text{A}
+ \underbrace{\frac{1}{2} \mleft( I_1I_2 - Z_1Z_2 \mright)}_{=:
P_\mathcal{F}}
E\ket{\psi}_\text{L} \ket{1}_\text{A}
.%
\end{align}
If $E \ket{\psi}_\text{L} \in \mathcal{C}$, the second term will
cancel and we will deterministically measure $\ket{0}_\text{A}$ for
the ancilla qubit. Similarly, if $E \ket{\psi}_\text{L} \in
\mathcal{F}$, we will deterministically measure $\ket{1}_\text{A}$.
\begin{figure}[t]
\centering
% tex-fmt: off
\begin{quantikz}
\lstick[2]{$E\ket{\psi}_\text{L}$} & & \gate[2]{Z_1Z_2} & & & \\
& & & & & \\
\lstick{$\ket{0}_\text{A}$} & \gate{H} & \ctrl{-1} & \gate{H} & \meter{} & \setwiretype{c} \\
\end{quantikz}
% tex-fmt: on
\caption{Stabilizer measurement circuit for the two-qubit repetition code.}
\label{fig:stabilizer_measurement}
\end{figure}
% Digitization of errors
Note that it is possible for a vector $E\ket{\psi}$ to not completely
lie in either subspace.
In this case, we can interpret it as being in $\mathcal{C}$ or
$\mathcal{F}$ with a certain probability.
However, when we measure the stabilizer, we will find that the vector
lies either in one or the other.
This is because the act of measuring the error partly collapses the
state, eliminating the uncertainty about the type of the error
\cite[Sec.~10.2]{nielsen_quantum_2010}.
This can be seen in \Cref{eq:error_projection}, as the expressions
$P_\mathcal{C}$ and $P_\mathcal{F}$ constitute projection operators onto
$\mathcal{C}$ and $\mathcal{F}$.
E.g., $P_\mathcal{C}$ will eliminate all components of $E
\ket{\psi}_\text{L}$ that lie in $\mathcal{F}$.
This process, together with the fact that any coherent error can be
decomposed into a linear combination of $X$ and $Z$ errors, means
that it is sufficient for \ac{qec} to be able to correct only these
types of errors.
This effect is referred to as error \emph{digitization}
\cite[Sec.~2.2]{roffe_quantum_2019}.
% The stabilizer group
Operators such as $Z_1Z_2$ above are called \emph{stabilizers}.
More generally, an operator $P_i \in \mathcal{G}_n$ is called a stabilizer of an
$[[n, k, d_\text{min}]]$ code $\mathcal{C}$, if
\begin{itemize}
\item It stabilizes all logical states, i.e.,
$P_i\ket{\psi}_\text{L} = (+1)\ket{\psi}_\text{L} ~\forall~
\ket{\psi}_\text{L} \in \mathcal{C}$.
\item It commutes with all other stabilizers $P_j$ of the code,
i.e., $[P_i, P_j] = 0$.
This property is important to be able to measure the
eigenvalue of a stabilizer without disturbing the
eigenvectors of the others \cite[Sec.~1.2]{gottesman_stabilizer_1997}.
\end{itemize}
Formally, we define the \emph{stabilizer group} $\mathcal{S}$ as
\cite[Sec.~4.1]{roffe_quantum_2019}
\begin{align*}
\mathcal{S} = \left\{P_i \in \mathcal{G}_n ~:~ P_i \ket{\psi}_\text{L} =
(+1)\ket{\psi}_\text{L} \forall \ket{\psi}_\text{L} ~\cap~
[P_i,P_j] = 0 \forall i,j\right\}
.%
\end{align*}
We care in particular about the commuting properties of stabilizers
with respect to possible errors.
The measurement circuit for an arbitrary stabilizer $P_i$ modifies
the state as \cite[Eq.~29]{roffe_quantum_2019}
\begin{align*}
E\ket{\psi}_\text{L}\ket{0}_\text{A}
\hspace{3mm}\mapsto\hspace{3mm}
\frac{1}{2} \left( I + P_i
\right)E\ket{\psi}_\text{L}\ket{0}_\text{A} + \frac{1}{2}
\left( I - P_i \right)E\ket{\psi}_\text{A} \ket{1}_\text{A}
.%
\end{align*}
If a given error $E$ anticommutes with $P_i$, we have
\begin{align*}
EP_i \ket{\psi}_{L} &= -P_i E \ket{\psi}_\text{L} \\
\Rightarrow E \ket{\psi}_{L} &= -P_i E \ket{\psi}_\text{L} \\
\Rightarrow \left( I + P_i \right)E\ket{\psi}_\text{L} &= 0
\end{align*}
and the stabilizer measurement returns 1.
%%%%%%%%%%%%%%%%
\subsection{Stabilizer Codes}
\label{subsec:Stabilizer Codes}
% Structure of a stabilizer code
For classical binary linear block codes, we use $n-k$ parity-checks
to reduce the degrees of freedom introduced by the encoding operation.
Effectively, each parity-check defines a local code splitting the
vector space in half, with only one part containing valid codewords.
The global code is the intersection of all local codes.
We can do the same in the quantum case.
Each split is represented using a stabilizer, whose eigenvalues signify
whether a candidate vector lies in the local codespace or local error subspace.
It is only a valid codeword if it lies in the codespace of all local codes.
We call codes constructed this way \emph{stabilizer codes}.
% Syndrome extraction circuitry
Similar to the classical case, we can use a syndrome vector to
describe which local codes are violated.
To obtain the syndrome, we simply measure the corresponding
operators $P_i$, each using a circuit as explained in
\Cref{subsec:Stabilizer Measurements}.
Note that this is an abstract representation of the syndrome extraction.
For the actual implementation in hardware, we can transform this into
a circuit that requires only CNOT and H-gates
\cite[Sec.~10.5.8]{nielsen_quantum_2010}.
% Logical operators
In order to modify the logical state encoded using the physical
qubits, we can use \emph{logical operators} \cite[Sec.~4.2]{roffe_quantum_2019}.
For each qubit, there are two logical operators, $X_i$ and $Z_j$.
These are operators that
\begin{itemize}
\item Commute with all the stabilizers in $\mathcal{S}$.
\item Anti-commute with one another, i.e., $[ \overline{X}_i,
\overline{Z}_i ]_{+} = \overline{X}_i \overline{Z}_i +
\overline{Z}_i \overline{X}_i = 0$.
\end{itemize}
We can also measure these operators to find out the logical state a
physical state corresponds to \cite[Sec.~2.6]{derks_designing_2025}.
% Parity-check matrix
% TODO: Do I have to introduce before that stabilizers only need X
% and Z operators?
We can represent stabilizer codes using a \emph{check matrix}
\cite[Sec.~10.5.1]{nielsen_quantum_2010}
\begin{align*}
\bm{H} = \left[
\begin{array}{c|c}
\bm{H}_X & \bm{H}_Z
\end{array}
\right]
,%
\end{align*}
with $\bm{H} \in \mathbb{F}_2^{(n-k)\times(2n)}$.
This is similar to a classical \ac{pcm} in that it contains $n-k$
rows, each describing one constraint. Each constraint restricts an additional
degree of freedom of the higher-dimensional space we use to introduce
redundancy.
In contrast to the classical case, this matrix now has $2n$ columns,
as we have to consider both the $X$ and $Z$ type operators that make up
the stabilizers.
Take for example the Steane code \cite[Eq.~10.83]{nielsen_quantum_2010}.
We can describe it using the check matrix
\begin{align}
\label{eq:steane}
\bm{H}_\text{Steane} = \left[
\begin{array}{ccccccc|ccccccc}
0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1
\end{array}
\right]
.%
\end{align}
The first $n$ columns correspond to $X$ operators acting on the
corresponding physical qubit, the rest to the $Z$ operators.
\begin{figure}[t]
\centering
% tex-fmt: off
\begin{quantikz}
\lstick[2]{$E\ket{\psi}_\text{L}$} & & \gate[2]{P_1} & \gate[2]{P_2} & \gate[style={draw=none},2]{\ldots} & \gate[2]{P_{n-k}} & & & \\
& & & & & & & & \\
\lstick{$\ket{0}_{\text{A}_1}$} & \gate{H} & \ctrl{-1} & & & & \gate{H} & \meter{} & \setwiretype{c} \\
\lstick{$\ket{0}_{\text{A}_2}$} & \gate{H} & & \ctrl{-2} & & & \gate{H} & \meter{} & \setwiretype{c} \\
\vdots\setwiretype{n} & & & & & & & & \vdots \\
\lstick{$\ket{0}_{\text{A}_{n-k}}$} & \gate{H} & & & & \ctrl{-4} & \gate{H} & \meter{} & \setwiretype{c} \\
\end{quantikz}
% tex-fmt: on
\caption{
Illustration of a full syndrome extraction circuit.
Adapted from \cite[Figure~4]{roffe_quantum_2019}.
}
\label{fig:sec}
\end{figure}
%%%%%%%%%%%%%%%%
\subsection{Calderbank-Shor-Steane Codes}
\label{subsec:Calderbank-Shor-Steane Codes}
% Intro
Stabilizer codes are especially practical to work with when they can
handle $X$ and $Z$ type errors independently.
As $Z$ errors anti-commute with $X$ operators in the stabilizers and
vice versa, this property translates into being able to split the
stabilizers into a subset being made up of only $X$
operators and the rest only of $Z$ operators.
We call such codes \ac{css} codes.
We can see this property in \Cref{eq:steane} in the check matrix
of the Steane code.
% Construction
We can exploit this separate consideration of $X$ and $Z$ errors in
the construction of \ac{css} codes.
We combine two binary linear codes $\mathcal{C}_1$ and
$\mathcal{C}_2$, each responsible for correcting one type of error
\cite[Sec.~10.5.6]{nielsen_quantum_2010}.
Using the dual code of $\mathcal{C}_2$ \cite[Eq.~3.4]{ryan_channel_2009}
\begin{align*}
\mathcal{C}_2^\perp := \left\{ \bm{x}' \in \mathbb{F}^2 :
\bm{x}' \bm{x}^\text{T} = 0 ~\forall \bm{x} \in \mathcal{C}_2 \right\}
,%
\end{align*}
we define $\bm{H}_X := \bm{H}(\mathcal{C}_2^\perp)$ and $\bm{H}_Z
:= \bm{H}(\mathcal{C}_1)$, and construct the check matrix as
\begin{align*}
\left[
\begin{array}{c|c}
\bm{H}_X & \bm{0} \\
\bm{0} & \bm{H}_Z
\end{array}
\right]
.%
\end{align*}
In order to yield a valid stabilizer code, $\mathcal{C}_1$ and
$\mathcal{C}_2$ must satisfy the commutativity condition
\begin{align}
\label{eq:css_condition}
\bm{H}_X \bm{H}_Z^\text{T} = \bm{0}
.%
\end{align}
We can ensure this is the case by choosing them such that
$\mathcal{C}_2 \subset \mathcal{C}_1$.
%%%%%%%%%%%%%%%%
\subsection{Quantum Low-Density Parity-Check Codes}
\label{subsec:Quantum Low-Density Parity-Check Codes}
% Intro
Various methods of constructing \ac{qec} codes exist
\cite{swierkowska_eccentric_2025}.
Topological codes, for example, encode information in the features of
a lattice and are intrinsically robust against local errors.
Among these, the \emph{surface code} is the most widely studied.
Another example are concatenated codes, which nest one code within
another, allowing for especially simple and flexible constructions
\cite[Sec.~3.2]{swierkowska_eccentric_2025}.
An area of research that has recently seen more attention is that of
quantum \ac{ldpc} (\acs{qldpc}) codes.
They have much better encoding efficiency than, e.g., the surface
code, scaling up of which would be prohibitively expensive
\cite[Sec.~I]{bravyi_high-threshold_2024}.
% Bivariate Bicycle codes
A recent addition to the class of \ac{qldpc} codes is that of \ac{bb}
codes \cite[Sec.~3]{bravyi_high-threshold_2024}.
These are a special type of \ac{css} code, where $\bm{H}_X$ and
$\bm{H}_Z$ are constructed from two matrices $\bm{A}$ and $\bm{B}$ as
\begin{align*}
\bm{H}_X = [\bm{A} \vert \bm{B}]
\hspace*{5mm} \text{and} \hspace*{5mm}
\bm{H}_Z = [\bm{B}^\text{T} \vert \bm{A}^\text{T}]
.%
\end{align*}
This way, we can guarantee the satisfaction of the commutativity
condition (\Cref{eq:css_condition}).
To define $\bm{A}$ and $\bm{B}$ we first introduce some additional notation.
We denote the identity matrix as $\bm{I_l} \in \mathbb{F}^{l\times l}$ and
the \emph{cyclic shift matrix} as $\bm{S_l} \in \mathbb{F}^{l\times
l},~S_{l,i,j}= \delta_{i+1,j}$, with $l \in \mathbb{N}$.
We further define
\begin{align*}
x = \bm{S}_l \otimes \bm{I}_m
\hspace*{5mm} \text{and} \hspace*{5mm}
y = \bm{I}_l \otimes \bm{S}_m
.%
\end{align*}
We can then construct $\bm{A}$ and $\bm{B}$ as bivariate polynomials
\begin{align*}
\bm{A} = \bm{A}_1 + \bm{A}_2 + \bm{A}_3
\hspace*{5mm} \text{and} \hspace*{5mm}
\bm{B} = \bm{B}_1 + \bm{B}_2 + \bm{B}_3
,%
\end{align*}
where $\bm{A}_i$ and $\bm{B}_i$ are powers of $\bm{x}$ or $\bm{y}$.
\ac{bb} codes have large minimum distance $d_\text{min}$ and high rate,
offering a more than 10-fold reduction of encoding overhead over the
surface code.
Additionally, they posess short-depth syndrome measurement circuits,
leading to lower time requirements for the syndrome extraction
and thus lower error rates \cite[Sec.~1]{bravyi_high-threshold_2024}.
% Syndrome-based BP
As we saw in \Cref{subsec:Stabilizer Measurements}, we work only
with the parity information contained in the syndrome, to avoid
disturbing the quantum states of individual qubits.
This necessitates a modification of the standard \ac{bp} algorithm
introduced in \Cref{subsec:Iterative Decoding}
\cite[Sec.~3.1]{yao_belief_2024}.
Instead of attempting to find the most likely codeword directly, the
algorithm will now try to find an error pattern $\hat{\bm{e}} \in
\mathbb{F}_2^n$ that satisfies
\begin{align*}
\bm{H} \hat{\bm{e}}^\text{T} = \bm{s}
.%
\end{align*}
To this end, we initialize the channel \acp{llr} as
\begin{align*}
\tilde{L}_i = \log{\frac{P(X_i = 0)}{P(X_i = 1)}} = \log{\frac{1
- p_i}{p_i}}
,%
\end{align*}
where $p_i$ is the prior probability of error of \ac{vn} $i$.
Additionally, we amend the \ac{cn} update to consider the parity
indicated by the syndrome, calculating
\begin{align*}
L_{i\leftarrow j} = 2\cdot (-1)^{s_j} \cdot \tanh^{-1} \left( \prod_{i'\in
\mathcal{N}(j)\setminus \{i\}} \tanh \frac{L_{i'\rightarrow j}}{2} \right)
.
\end{align*}
The resulting syndrome-based \ac{bp} algorithm is shown in
\Cref{alg:syndome_bp}.
% tex-fmt: off
\tikzexternaldisable
\begin{algorithm}[t]
\caption{Binary syndrome-based belief propagation (BP) algorithm.}
\label{alg:syndome_bp}
\begin{algorithmic}[1]
\State \textbf{Initialize:} $\tilde{L}_i \leftarrow
\log \frac{1-p_i}{p_i}$ for all $i \in \mathcal{I}$
\State \textbf{Initialize:} $L_{i \rightarrow j} \leftarrow
\tilde{L}_i$ for all $i \in \mathcal{I},\, j \in \mathcal{N}_\text{V}(i)$
\State \textbf{Initialize:} $\hat{e} \leftarrow \bm{0}$
\For{$\ell = 1, \ldots, n_\text{iter}$}
\For{$j \in \mathcal{J}$}
\For{$i \in \mathcal{N}_\text{C}(j)$}
\State $\displaystyle L_{i \leftarrow j} \leftarrow
2\cdot(-1)^{s_j}\cdot\tanh^{-1}
\!\left(
\prod_{i' \in \mathcal{N}_\text{C}(j)\setminus\{i\}}
\tanh\frac{L_{i'\rightarrow j}}{2}
\right)$
\EndFor
\EndFor
\For{$i \in \mathcal{I}$}
\For{$j \in \mathcal{N}_\text{V}(i)$}
\State $\displaystyle L_{i \rightarrow j} \leftarrow
\tilde{L}_i +
\sum_{j' \in \mathcal{N}_\text{V}(i)\setminus\{j\}}
L_{i \leftarrow j'}$
\EndFor
\EndFor
\For{$i \in \mathcal{I}$}
\State $\displaystyle \hat{e}_i \leftarrow
\mathbbm{1}\left\{
\tilde{L}_i +
\sum_{j \in \mathcal{N}_\text{V}(i)} L_{i \leftarrow j} < 0
\right\}$
\EndFor
\If{$\bm{H}\hat{\bm{e}}^\text{T} = \bm{s}$}
\State \textbf{break}
\EndIf
\EndFor
\State \textbf{return} $\hat{\bm{e}}$
\end{algorithmic}
\end{algorithm}
\tikzexternalenable
% tex-fmt: on
% Degeneracy and short cycles
Decoding \ac{qldpc} codes poses some unique challenges.
One issue is that of \emph{quantum degeneracy}.
Because errors that differ by a stabilizer have the same impact on
all codewords, there can be multiple minimum-weight solutions to the
quantum decoding problem \cite[Sec.~II.C.]{babar_fifteen_2015}
\cite[Sec.~V]{roffe_decoding_2020}.
This leads to the decoding algorithm getting confused about the
direction to proceed in \cite[Sec.~5]{yao_belief_2024}.
Another problem is that due to the commutativity property of the stabilizers,
quantum codes inherently contain short cycles
\cite[Sec.~IV.C]{babar_fifteen_2015}.
As discussed in \Cref{subsec:Iterative Decoding}, these lead to
the violation of the independence assumption of the messages passed
during decoding, impeding performance.
% BPGD
The aforementioned issues both manifest themselves as convergence problems
of the \ac{bp} algorithm, and different ways of modifying the algorithm
to aid with convergence exist.
One approach is to use \ac{bp} with guided decimation (\acs{bpgd})
\cite[Alg.~1]{yao_belief_2024}.
Here, a number $T\in \mathbb{N}$ of \ac{bp} iterations are performed,
before \emph{decimating} the most reliable \ac{vn}, i.e., performing
a hard decision and excluding it from further decoding.
This constrains the solution space more and more as the decoding
progresses, encouraging the algorithm to converge to one of the
solutions \cite[Sec.~5]{yao_belief_2024}.
\Cref{alg:bpgd} shows this process.
Note that as the Tanner graph only has $n$ \acp{vn}, this is a
natural constraint on the maximum number of outer iterations of the algorithm.
Quantum degeneracy additionally necessitates some care in the way
error rates are computed in simulations.
We must consider the fact that multiple solutions are valid by
comparing the logical states, computed by measuring the logical operators.
This way, we obtain the \ac{ler}.
% TODO: Explain that setting the channel LLR to infinity is the same
% as a hard decision and ignoring the VN in the further decoding
% tex-fmt: off
\tikzexternaldisable
\begin{algorithm}[t]
\caption{Belief propagation with guided decimation (BPGD) algorithm.}
\label{alg:bpgd}
\begin{algorithmic}[1]
\State \textbf{Initialize:} $\tilde{L}_i \leftarrow
\log \frac{1-p_i}{p_i}$ for all $i \in \mathcal{I}$
\State \textbf{Initialize:} $L_{i \rightarrow j} \leftarrow
\tilde{L}_i$ for all $i \in \mathcal{I},\, j \in \mathcal{N}_\text{V}(i)$
\State \textbf{Initialize:} $\hat{e} \leftarrow \bm{0}$
\State \textbf{Initialize:} $\mathcal{I}' \leftarrow \mathcal{I}$
\For{$r = 1, \ldots, n$}
\For{$\ell = 1, \ldots, T$}
\State Perform \ac{cn} update
\State Perform \ac{vn} update
\State $L^\text{total}_i \leftarrow \tilde{L}_i + \sum_{j \in \mathcal{N}_\text{V}(i)} L_{i \leftarrow j}$
\EndFor
\For{$i \in \mathcal{I}$}
\State $\displaystyle \hat{e}_i \leftarrow
\mathbbm{1}\left\{ L^\text{total}_i \right\}$
\EndFor
\If{$\bm{H}\hat{\bm{e}}^\text{T} = \bm{s}$}
\State \textbf{break}
\Else
\State $i_\text{max} \leftarrow \argmax_{i \in \mathcal{I}'} \lvert L^\text{total}_i \rvert $
\If{$L^\text{total}_{i_\text{max}} < 0$}
\State $\tilde{L}_{i_\text{max}} \leftarrow -\infty$
\Else
\State $\tilde{L}_{i_\text{max}} \leftarrow +\infty$
\EndIf
\State $\mathcal{I}' \leftarrow \mathcal{I}'\setminus\{i_\text{max}\}$
\EndIf
\EndFor
\State \textbf{return} $\hat{\bm{e}}$
\end{algorithmic}
\end{algorithm}
\tikzexternalenable
% tex-fmt: on