ma-thesis/src/thesis/chapters/2_fundamentals.tex

\chapter{Fundamentals}
\label{ch:Fundamentals}

\Ac{qec} is a field of research combining ``classical''
communications engineering and quantum information science.
This chapter provides the relevant theoretical background on both of
these topics and subsequently introduces the fundamentals of \ac{qec}.

% TODO: Is an explanation of BP with guided decimation needed in this chapter?
% TODO: Is an explanation of OSD needed chapter?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Classical Error Correction}
\label{sec:Classical Error Correction}

The core concept underpinning error correcting codes is the
realization that introducing a finite amount of redundancy to
information before transmission can considerably reduce the error rate.
Specifically, Shannon proved in 1948 that for any channel, a block
code can be found that achieves arbitrarily small probability of
error at any communication rate up to the capacity of the channel
when the block length approaches infinity
\cite[Sec.~13]{shannon_mathematical_1948}.

In this section, we explore the concepts of ``classical'' (as in non-quantum)
error correction that are central to this work.
We start by looking at different ways of encoding information,
first considering binary linear block codes in general and then \ac{ldpc} and
\ac{sc}-\ac{ldpc} codes.
Finally, we pivot to the decoding process, specifically the \ac{bp}
algorithm.

\subsection{Binary Linear Block Codes}

%
% Codewords, n, k, rate
%

One particularly important class of coding schemes is that of binary
linear block codes.
The information to be protected takes the form of a sequence of
binary symbols, which is split into separate blocks.
Each block is encoded, transmitted, and decoded separately.
The encoding step introduces redundancy by mapping input messages
$\bm{u} \in \mathbb{F}_2^k$ of length $k \in \mathbb{N}$ (called the
\textit{information length}) onto \textit{codewords} $\bm{x} \in
\mathbb{F}_2^n$ of length $n \in \mathbb{N}$ (called the
\textit{block length}) with $n > k$.
A measure of the amount of introduced redundancy is the \textit{code
rate} $R = k/n$.
We call the set of all codewords $\mathcal{C}$ the \textit{code}
\cite[Sec.~3.1.1]{ryan_channel_2009}.

%
% d_min and the [] Notation
%

During the encoding process, a mapping from $\mathbb{F}_2^k$
onto $\mathcal{C} \subset \mathbb{F}_2^n$ takes place.
The input messages are mapped onto an expanded vector space, where
they are ``further apart'', giving rise to the error correcting
properties of the code.
This notion of the distance between two codewords $\bm{x}_1$ and
$\bm{x}_2$ can be expressed using the \textit{Hamming distance} $d(\bm{x}_1,
\bm{x}_2)$, which is defined as the number of positions in which they differ.
We define the \textit{minimum distance} of a code $\mathcal{C}$ as
%
\begin{align*}
    d_\text{min} := \min \left\{ d(\bm{x}_1, \bm{x}_2) : \bm{x}_1,
    \bm{x}_2 \in \mathcal{C}, \bm{x}_1 \neq \bm{x}_2 \right\}
    .
\end{align*}
%
We can signify that a binary linear block code has information length
$k$, block length $n$ and minimum distance $d_\text{min}$ using the
notation $[n,k,d_\text{min}]$ \cite[Sec.~1.3]{macwilliams_theory_1977}.

%
% Parity checks, H, and the syndrome
%

A particularly elegant way of describing the subspace $C$ of
$\mathbb{F}_2^n$ that the codewords make up is the notion of
\textit{parity checks}.
Since $\lvert \mathcal{C} \rvert = 2^k$ and $\lvert \mathbb{F}_2^n
\rvert = 2^n$, we could introduce $n-k$ conditions to constrain the
additional degrees of freedom.
These conditions, called parity checks, take the form of equations
over $\mathbb{F}_2^n$, linking the individual positions of each codeword.
We can arrange the coefficients of these equations in a
\textit{parity-check matrix} (\acs{pcm}) $\bm{H} \in
\mathbb{F}_2^{(n-k) \times n}$ and equivalently define the code as
\cite[Sec.~3.1.1]{ryan_channel_2009}
\begin{align*}
    \mathcal{C} = \left\{ \bm{x} \in \mathbb{F}_2^n :
    \bm{H}\bm{x}^\text{T} = \bm{0} \right\}
    .%
\end{align*}
Note that in general we may have linearly dependent parity checks,
prompting us to define the \ac{pcm} as $\bm{H} \in
\mathbb{F}_2^{m\times n}$ with $\hspace{2mm} m \ge n-k$ instead.
The \textit{syndrome} $\bm{s} = \bm{H} \bm{v}^\text{T}$ describes
which parity checks a candidate codeword $\bm{v} \in \mathbb{F}_2^n$ violates.
The representation using the \ac{pcm} has the benefit of providing a
description of the code, the memory complexity of which doesn't grow
exponentially with $n$, in contrast to keeping track of all codewords directly.

%
% The decoding problem
%

Figure \ref{fig:Diagram of a transmission system} visualizes the
communication process \cite[Sec.~1.1]{ryan_channel_2009}.
An input message $\bm{u}\in \mathbb{F}_2^k$ is mapped onto a codeword $\bm{x}
\in \mathbb{F}_2^n$. This is passed on to a modulator, which
interacts with the physical channel.
A demodulator processes the channel output and forwards the result
$\bm{y}$ to a decoder.
We differentiate between \textit{soft-decision} decoding, where
$\bm{y} \in \mathbb{R}^n$, and \textit{hard-decision} decoding, where
$\bm{y} \in \mathbb{F}_2^n$ \cite[Sec.~1.5.1.3]{ryan_channel_2009}.
Finally, the decoder is responsible for obtaining an estimate
$\hat{\bm{u}} \in \mathbb{F}_2^k$ of the original input message.
This is done by first finding an estimate $\hat{\bm{x}}$ of the sent
codeword and undoing the encoding.
The decoding problem that we generally attempt to solve thus consists
in finding the best estimate $\hat{\bm{x}}$ given $\bm{y}$.

\begin{figure}[t]
    \centering

    \tikzset{
        box/.style={
            rectangle, draw=black, minimum width=17mm, minimum height=8mm,
        },
    }

    \begin{tikzpicture}
        [
            node distance = 2mm and 7mm,
        ]
        \node                                              (in)  {};
        \node[box, right=of in]                            (enc) {Encoder};
        \node[box, minimum width=25mm, right=of enc]       (mod) {Modulator};
        \node[box, below right=of mod]                     (cha) {Channel};
        \node[box, minimum width=25mm, below left=of cha]  (dem) {Demodulator};
        \node[box, left=of dem]                            (dec) {Decoder};
        \node[left=of dec]                                 (out) {};

        \draw[-{latex}] (in)  -- (enc) node[midway, above] {$\bm{u}$};
        \draw[-{latex}] (enc) -- (mod) node[midway, above] {$\bm{x}$};
        \draw[-{latex}] (mod) -| (cha);
        \draw[-{latex}] (cha) |- (dem);
        \draw[-{latex}] (dem) -- (dec) node[midway, above] {$\bm{y}$};
        \draw[-{latex}] (dec) -- (out) node[midway, above] {$\hat{\bm{u}}$};
    \end{tikzpicture}

    \caption{Overview of a transmission system.}
    \label{fig:Diagram of a transmission system}
\end{figure}
%

%
% Hard vs. soft information
%

\subsection{Low-Density Parity-Check Codes}

%
% Core concept
%

Shannon's noisy-channel coding theorem is stated for codes whose block
length approaches infinity. This suggests that as the block length
becomes larger, the performance of the considered codes should
generally improve.
However, the size of the \ac{pcm}, and thus in general the decoding complexity,
of a linear block code grows quadratically with $n$.
This would quickly render decoding intractable as we increase the block length.
We can get around this problem by constructing $\bm{H}$ in such a
manner that the number of nonzero entries grows less than quadratically, e.g.,
only linearly.
This is exactly the motivation behind \ac{ldpc} codes
\cite[Ch.~1]{gallager_low_1960}.

%
% Tanner Graph, VNs and CNs
%

\ac{ldpc} codes belong to a class sometimes referred to as ``modern codes''.
These differ from ``classical codes'' in their decoding algorithms:
Classical codes are usually decoded using one-step hard-decision decoding,
whereas modern codes are suitable for iterative soft-decision
decoding \cite[Preface]{ryan_channel_2009}. The iterative decoding algorithms
in question are generally defined in terms of message passing on the
\textit{Tanner graph} of the code. The Tanner graph is a bipartite
graph that constitutes an alternative representation of the \ac{pcm}.
We define two types of nodes: \acp{vn}, corresponding to codeword
bits, and \acp{cn}, corresponding to individual parity checks.
We then construct the Tanner graph by connecting each \ac{cn} to
the \acp{vn} that make up the corresponding parity check
\cite[Sec.~5.1.2]{ryan_channel_2009}.
Figure \ref{PCM and Tanner graph of the Hamming code} shows this
construction for the [7,4,3]-Hamming code.
%
\begin{figure}[t]
    \centering

    \begin{align*}
        \bm{H} =
        \begin{pmatrix}
            0 & 1 & 1 & 1 & 1 & 0 & 0 \\
            1 & 0 & 1 & 1 & 0 & 1 & 0 \\
            1 & 1 & 0 & 1 & 0 & 0 & 1 \\
        \end{pmatrix}
    \end{align*}

    \vspace*{2mm}

    \tikzset{
        VN/.style={
            circle, fill=KITgreen, minimum width=1mm, minimum height=1mm,
        },
        CN/.style={
            rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm,
        },
    }

    \begin{tikzpicture}
        \node[VN, label=above:$x_1$]                    (vn1) {};
        \node[VN, right=12mm of vn1, label=above:$x_2$] (vn2) {};
        \node[VN, right=12mm of vn2, label=above:$x_3$] (vn3) {};
        \node[VN, right=12mm of vn3, label=above:$x_4$] (vn4) {};
        \node[VN, right=12mm of vn4, label=above:$x_5$] (vn5) {};
        \node[VN, right=12mm of vn5, label=above:$x_6$] (vn6) {};
        \node[VN, right=12mm of vn6, label=above:$x_7$] (vn7) {};

        \node[
            CN, below=25mm of vn4,
            label={below:$x_1 + x_3 + x_4 + x_6 = 0$}
        ] (cn2) {};
        \node[
            CN, left=40mm of cn2,
            label={below:$x_2 + x_3 + x_4 + x_5 = 0$}
        ] (cn1) {};
        \node[
            CN, right=40mm of cn2,
            label={below:$x_1 + x_2 + x_4 + x_7 = 0$}
        ] (cn3) {};

        \foreach \n in {2,3,4,5} {
            \draw (cn1) -- (vn\n);
        }

        \foreach \n in {1,3,4,6} {
            \draw (cn2) -- (vn\n);
        }

        \foreach \n in {1,2,4,7} {
            \draw (cn3) -- (vn\n);
        }
    \end{tikzpicture}

    \caption{The \ac{pcm} and corresponding Tanner graph of the
    [7,4,3]-Hamming code.}
    \label{PCM and Tanner graph of the Hamming code}
\end{figure}

%
% N_V(j), N_C(i)
%

Mathematically, we represent a \ac{vn} using the index $i \in
\mathcal{I} := \left[
1 : n \right]$ and a \ac{cn} using the index $j \in \mathcal{J}
:= \left[ 1 : m \right]$.
We can then encode the information contained in the graph by defining
the neighborhood of a variable node $i$ as
$\mathcal{N}_\text{V} (i) = \left\{ j \in \mathcal{J} : \bm{H}_{j,i}
= 1 \right\}$
and that of a check node $j$ as
$\mathcal{N}_\text{C} (j) = \left\{ i \in \mathcal{I} : \bm{H}_{j,i}
= 1 \right\}$.

%
% Error floor and waterfall regions
%

We typically evaluate the performance of LDPC codes using the
\ac{ber} or the \ac{fer} (a \textit{frame} referes to one whole
transmitted block in this context).
Considering an \ac{awgn} channel, \autoref{fig:ldpc-perf} shows a
qualitative performance characteristic of an \ac{ldpc} code
\cite[Fig.~1]{costello_spatially_2014}. We talk of the
\textit{waterfall} and the \textit{error floor} regions.

\begin{figure}[t]
    \centering

    \begin{tikzpicture}
        \begin{axis}[
                width=12cm,
                height=9cm,
                xlabel={Signal-to-noise ratio},
                ylabel={Error rate},
                % xmin=0, xmax=6,
                enlarge x limits=false,
                ymin=1e-9, ymax=1,
                ticks=none,
                % y tick label={},
                ymode=log,
                grid=both,
                grid style={line width=0.2pt, draw=gray!30},
                major grid style={line width=0.4pt, draw=gray!50},
                legend pos=north east,
                legend cell align={left},
            ]

            \addplot+[mark=none, solid, smooth, KITblue] coordinates {
                (4.5789E-01, 1.1821E-01)
                (6.6842E-01, 9.4575E-02)
                (8.6316E-01, 5.2657E-02)
                (1.0421E+00, 2.2183E-02)
                (1.1789E+00, 8.3588E-03)
                (1.3368E+00, 1.4835E-03)
                (1.4895E+00, 1.6852E-04)
                (1.5842E+00, 2.8285E-05)
                (1.6737E+00, 4.2465E-06)
                (1.7684E+00, 3.4519E-07)
                (1.8316E+00, 3.9213E-08)
                (1.8684E+00, 6.2247E-09)
                (1.9053E+00, 1E-09)
            };
            \addlegendentry{Regular}

            \addplot+[mark=none, solid, smooth, KITorange] coordinates {
                (4.5789E-01, 1.1821E-01)
                (6.4211E-01, 4.9800E-02)
                (7.5263E-01, 1.2700E-02)
                (8.1579E-01, 2.3177E-03)
                (8.6842E-01, 3.5779E-04)
                (9.1053E-01, 5.3716E-05)
                (9.4737E-01, 4.8818E-06)
                (9.8947E-01, 6.5555E-07)
                (1.0421E+00, 9.5713E-08)
                % (1.0684E+00, 2.9670E-08)
                (1.1474E+00, 1.2499E-08)
                (1.3000E+00, 7.1560E-09)
                (1.4579E+00, 6.0535E-09)
                % (1.6105E+00, 5E-09)
                (1.9579E+00, 4E-09)
                (2.2947E+00, 3.1876E-09)
                % (2.8842E+00, 2.0403E-09)
            };
            \addlegendentry{Irregular}

            \draw[gray, densely dashed]
            (axis cs:0.65, 2e-3) rectangle (axis cs:1.65, 5e-5);
            \node[below] at (axis cs:1.15, 6e-5) {Waterfall};

            \draw[gray, densely dashed]
            (axis cs:1, 6e-8) rectangle (axis cs:2, 2e-9);
            \node[above] at (axis cs:1.5, 7e-8) {Error floor};
        \end{axis}
    \end{tikzpicture}

    \caption{
        Qualitative performance characteristic of \ac{ldpc} code
        in an \ac{awgn} channel. Adapted from
        \cite[Fig.~1]{costello_spatially_2014}.
    }
    \label{fig:ldpc-perf}
\end{figure}

Broadly, there are two kinds of \ac{ldpc} codes, \textit{regular} and
\textit{irregular}.
Regular codes are characterized by the fact that the weights, i.e.,
the numbers of ones, of their rows and columns are constant
\cite[Sec.~5.1.1]{ryan_channel_2009}.
Already during their introduction, regular \ac{ldpc} codes were shown to have
a minimum distance scaling linearly with the block length $n$ for
large values \cite[Ch.~2,~Theorem~1]{gallager_low_1960},
which leads to them not exhibiting an error floor under \ac{ml} decoding.
Irregular codes, on the other hand, generally do exhibit an error floor,
their redeeming quality being the ability to reach near-capacity
performance in the waterfall region \cite[Intro.]{costello_spatially_2014}.

\subsection{Spatially-Coupled LDPC Codes}

A relatively recent development in the world of \ac{ldpc} codes is
that of \ac{sc}-\ac{ldpc} codes.
Their key feature is that they combine the best properties of regular
and irregular codes.
They have a minimum distance that grows linearly with $n$, promising
good error floor behavior, and capacity approaching
iterative decoding behavior, promising good performance in the
waterfall region \cite[Intro.]{costello_spatially_2014}.

The essential property of \ac{sc}-\ac{ldpc} codes is that codewords
from different \textit{spatial positions}, that would ordinarily be sent
one after the other independently, are coupled.
This is achieved by connecting some \acp{vn} of one spatial position to
\acp{cn} of another, resulting in a \ac{pcm} of the form
\cite[Eq.~1]{hassan_fully_2016}
%
\begin{align*}
    \bm{H} =
    \begin{pmatrix}
        \bm{H}_0(1) &        &             \\
        \vdots      & \ddots &             \\
        \bm{H}_K(1) &        & \bm{H}_0(L) \\
        & \ddots &             \\
        &        & \bm{H}_K(L) \\
    \end{pmatrix}
    ,
\end{align*}
%
where $K \in \mathbb{N}$ is the \textit{coupling width} and $L \in
\mathbb{N}$ is the number of spatial positions.
This construction results in a Tanner graph as depicted in
\autoref{fig:sc-ldpc-tanner}.

\begin{figure}[t]
    \centering

    \tikzset{
        VN/.style={
            circle, fill=KITgreen, minimum width=1mm, minimum height=1mm,
        },
        CN/.style={
            rectangle, fill=KITblue, minimum width=1mm, minimum height=1mm,
        },
    }

    \begin{tikzpicture}[node distance=7mm and 1cm]
        \node[VN]                  (vn00) {};
        \node[VN, below = of vn00] (vn01) {};
        \node[VN, below = of vn01] (vn02) {};
        \node[VN, below = of vn02] (vn03) {};
        \node[VN, below = of vn03] (vn04) {};

        \coordinate (temp) at ($(vn01)!0.5!(vn02)$);

        \node[CN, right = of temp] (cn00) {};
        \node[CN, below = of cn00] (cn01) {};

        \draw (vn00) -- (cn00);
        \draw (vn01) -- (cn00);
        \draw (vn03) -- (cn00);
        \draw (vn01) -- (cn01);
        \draw (vn02) -- (cn01);
        \draw (vn04) -- (cn01);

        \foreach \i in {1,2,3} {
            \pgfmathtruncatemacro{\previ}{\i-1}
            \node[VN, right = 25mm of vn\previ 0] (vn\i0) {};

            \foreach \j in {1,...,4} {
                \pgfmathtruncatemacro{\prevj}{\j-1}
                \node[VN, below = of vn\i\prevj] (vn\i\j) {};
            }

            \coordinate (temp) at ($(vn\i1)!0.5!(vn\i2)$);

            \node[CN, right = of temp]  (cn\i0) {};
            \node[CN, below = of cn\i0] (cn\i1) {};

            \draw (vn\i0) -- (cn\i0);
            \draw (vn\i1) -- (cn\i0);
            \draw (vn\i3) -- (cn\i0);
            \draw (vn\i1) -- (cn\i1);
            \draw (vn\i2) -- (cn\i1);
            \draw (vn\i4) -- (cn\i1);
        }

        \node[right = 25mm of vn30] (vn40) {};
        \node[below = of vn40]      (vn41) {};
        \node[below = of vn41]      (vn42) {};
        \node[below = of vn42]      (vn43) {};
        \node[below = of vn43]      (vn44) {};

        \coordinate (temp) at ($(vn41)!0.5!(vn42)$);

        \node[right = of temp] (cn40) {};
        \node[below = of cn40] (cn41) {};

        \foreach \i in {0,1,2} {
            \pgfmathtruncatemacro{\next}{\i+1}
            \pgfmathtruncatemacro{\nextnext}{\i+2}

            \draw (vn\i 3) to[bend right] (cn\next 1);
            \draw (vn\i 1) to[bend left]  (cn\nextnext 0);
        }

        \draw (vn33) to[bend right] (cn41);

        \node at ($(cn40)!0.5!(cn41)$) {\dots};

        \draw[decorate, decoration={brace, amplitude=10pt}]
        ([xshift=-5mm,yshift=2mm]vn00.north) --
        ([xshift=5mm,yshift=2mm]vn00.north -| cn20.north)
        node[midway, above=4mm] {K};
    \end{tikzpicture}

    \caption{
        Visualization of the coupling between the Tanner graphs
        of individual spatial positions.
    }
    \label{fig:sc-ldpc-tanner}
\end{figure}

Note that at the first and last few spatial positions, some \acp{cn}
have lower degrees.
This leads to more reliable information about the
\acp{vn} that, as we will see, is
later passed to subsequent spatial positions during decoding.
This is precisely the effect that leads to the good performance of
\ac{sc}-\ac{ldpc} codes in the waterfall region \cite{costello_spatially_2014}.

\subsection{Iterative Decoding}

% Introduction

\ac{ldpc} codes are generally decoded using efficient iterative
algorithms, something that is possible due to their sparsity
\cite[Sec.~5.3]{ryan_channel_2009}.
The algorithm originally proposed alongside LDPC codes for this
purpose by Gallager in 1960 is now known as the \ac{spa}
\cite[5.4.1]{ryan_channel_2009}, also called \ac{bp}.

The optimality criterion the \ac{spa} is built around is a
symbol-wise \ac{map} decision \cite[Sec.~5.4.1]{ryan_channel_2009}.
The core idea of the resulting algorithm is to view \acp{cn} as
representing single-parity check codes and \acp{vn} as representing
repetition codes.
The algorithm alternates between consolidating soft information about
the \acp{vn} in the \acp{cn}, and consolidating soft information about
the \acp{cn} in the \acp{vn}.
To this end, messages are passed back and forth along the edges of
the Tanner graph.
$L_{i\rightarrow j}$ represents a message passed from \ac{vn} $i$ to
\ac{cn} j, $L_{i\leftarrow j}$ represents a message passed from
\ac{cn} j to \ac{vn} i.
The \acp{vn} additionally receive messages \cite[5.4.2]{ryan_channel_2009}
\begin{align*}
    \tilde{L}_i = \log \frac{P(X=0 \vert Y=y)}{P(X=1 \vert Y=y)},
\end{align*}
computed from the channel outputs.
The consolidation of the information occurs in the \ac{vn} update
\begin{align*}
    L_{i\rightarrow j} = \tilde{L}_i + \sum_{j'\in \mathcal{N}(i)\setminus
    j} L_{i\leftarrow j'}
\end{align*}
and the \ac{cn} update
\begin{align*}
    L_{i\leftarrow j} = 2\cdot \tanh^{-1} \left( \prod_{i'\in
    \mathcal{N}(j)\setminus i} \tanh \frac{L_{i'\rightarrow j}}{2} \right)
    .
\end{align*}

A basic assumption for the derivation of the \ac{spa} is that the
messages are statistically independent.
If the Tanner graph has cycles, however, this
condition is not met.
The shorter the cycles, the sooner this condition is violated and the
worse the approximation becomes \cite[Sec.~5.4.4]{ryan_channel_2009}.
Cycles of length four (so-called \emph{$4$-cycles}) are the shortest
possible cycles and are thus especially problematic.

% Min-sum algorithm

A simplification of the \ac{spa} is the min-sum decoder. Here, the
\ac{cn} update is approximated as \cite[Sec.~5.5.1]{ryan_channel_2009}
\begin{align*}
    L_{i \leftarrow j} = \prod_{i' \in \mathcal{N}(j)\setminus i}
    \sign \left( L_{i' \rightarrow j} \right)
    \cdot \min_{i' \in \mathcal{N}(j)\setminus i} \lvert
    L_{i'\rightarrow j} \rvert
    .
\end{align*}

% Sliding-window decoding

For \ac{sc}-\ac{ldpc} codes, the iterative decoding process is wrapped by a
windowing step. This is done to reduce the latency and memory requirements and
also the overall computational complexity \cite{costello_spatially_2014}.
To this end, the Tanner graph is split into several overlapping windows.
During decoding, the messages that are passed along the edges of the
graph in the overlapping regions are kept in memory and used for the
decoding of subsequent blocks \cite[Sec.~III.~C.]{hassan_fully_2016}.

\section{Quantum Mechanics and Quantum Information Science}
\label{sec:Quantum Mechanics and Quantum Information Science}

Designing codes and decoders for \ac{qec} is generally performed on a
layer of abstraction far removed from the quantum mechanical
processes underlying the actual qubits.
Nevertheless, having a fundamental understanding of the related
quantum mechanical concepts is useful to understand the unique constraints
of this field.
The purpose of this section is to convey these concepts to the reader.

%%%%%%%%%%%%%%%%
\subsection{Core Concepts and Notation}
\label{subsec:Notation}

% Wave functions

In quantum mechanics, the evolution of a state of a particle over tme
and space is described by a \emph{wave function} $\psi(x,t)$.
The connection between this function and the world that we can observe
is the fact that $\lvert \psi (x,t) \rvert^2$ is the \ac{pdf} of
finding a praticle in that particular state.

% Dirac notation

A lot of the related mathematics can be very elegantly expressed
using the language of linear algebra.
The so called Bra-ket or Dirac notation is especially appropriate,
having been proposed by Paul Dirac in 1939 for the express purpose
of simplifying quantum mechanical notation \cite{dirac_new_1939}.
Two new symbols are defined, \emph{bra}s $\bra{\cdot}$ and
\emph{ket}s $\ket{\cdot}$.
Kets denote ordinary vectors, while bras denote their Hermitian conjugates.
For example, two vectors specified by the labels $a$ and $b$
respectively are written as $\ket{a}$ and $\ket{b}$.
Their inner product is $\braket{a\vert b}$.

% Expressing wave functions using linear algebra

We can model a wave function $\psi(x,t)$ as a linear combination of different
\emph{basis functions} $e_n(x,t),~n\in \mathbb{N}$ as%
\begin{align*}
    \psi(x,t) = \sum_{n=1}^{\infty} c_n \cdot e_n(x,t)
    .%
\end{align*}
To express this relation using linear algebra, we represent
$\psi(x,t)$ and $e_n(x,t)$ as vectors $\ket{\psi}$ and $\ket{e_n}$.
We write%
\begin{align*}
    \ket{\psi} = \sum_{n=1}^{\infty}  c_n \ket{e_n}
    .%
\end{align*}

% Operators

Another important notion is that of an \emph{operator}, a component
that takes a function as an input and returns another function as an output.
Operators are useful to describe the relations between different
quantities relating to a particle.
An example of this is the differential operator $\partial x$.

%%%%%%%%%%%%%%%%
\subsection{Observables}
\label{subsec:Observables}

% Observable quantities

An \emph{observable quantity} $Q$ is \ldots .
Due to the probabilistic nature of quantum mechanics, the result of a
measurement is not deterministic.
Thus, it is useful to consider the \emph{expected value} $\braket{Q}$
of an observable quantity in addition to individual measurement results.

% General expression for expected value of observable quantity

If we know the wave function of a particle, we should be able to
compute $\braket{Q}$ for any observable quantity we wish.
It can be shown that for any $Q$, we can compute a
corresponding operator $\hat{Q}$ such that%
\begin{align}
    \label{eq:gen_expr_Q_exp}
    \braket{Q} = \int_{-\infty}^{\infty} \psi^*(x,t) \hat{Q} \psi(x,t) dx
    .%
\end{align}%
While the derivation of this relationship is out of the scope of this
work, we can at least look at an example to illustrate it.
Considering the position $Q = x$ of a particle and setting the observable
operator to $\hat{Q} = x$, we can write%
\begin{align*}
    \braket{x} = \int_{-\infty}^{\infty} \psi^*(x,t) \cdot x \cdot \psi(x,t) dx
    = \int_{-\infty}^{\infty} x \lvert \psi(x,t) \rvert ^2 dx
    .%
\end{align*}
Note that $\lvert \psi(x,t) \rvert^2 $ represents the \ac{pdf} of
finding a particle in a specific state. We immediately see that the
formula simplifies to the direct calculation of the expected value.

% Determinate states and eigenvalues

% TODO: Introduce determinate states above
% TODO: Nicer phrasing
% TODO: Use different symbol for determinate states (not psi)
% TODO: Fix equation
Let us now examine how the observable operator $\hat{Q}$ relates to
the determinate states that make up the overall superposition state
of the particle.
We begin by translating \autoref{eq:gen_expr_Q_exp} into linear alebra as%
\begin{align}
    \label{eq:gen_expr_Q_exp_lin}
    \braket{Q} = \braket{\psi \vert \hat{Q}\psi}
    .%
\end{align}
\autoref{eq:gen_expr_Q_exp_lin} expresses an inherently probabilistic
relationhip.
The determinate states are inherently deterministic.
To relate the two, we look at those states $\ket{\psi}$, where the
variance of the measurements of $Q$ is zero. These are exactly the
determinate states.%
\begin{align}
    0 &\overset{!}{=} \braket{(Q - \braket{Q})^2}
    = \braket{\psi \vert (\hat{Q} - \braket{Q})^2 \psi} \nonumber\\
    &= \braket{(Q - \braket{Q})\psi \vert (\hat{Q} - \braket{Q})
    \psi} \nonumber\\
    &= \lVert (Q - \braket{Q}) \psi \rVert^2 \nonumber\\[3mm]
    &\hspace{-8mm}\Leftrightarrow (\hat{Q} - \braket{Q}) \ket{\psi} =
    0 \nonumber\\
    \label{eq:observable_eigenrelation}
    &\hspace{-8mm}\Leftrightarrow \hat{Q}\ket{\psi}
    = \underbrace{\braket{Q}}_{\lambda_n} \ket{\psi}
    .%
\end{align}%
%
Because we have assumed the variance to be zero, $\braket{Q}$ is now
the deterministic measurement value corresponding to the determinate
state $\ket{\psi}$.
We can see that the determinate states are the \emph{eigenstates} of
the observable operator $\hat{Q}$ and that the corresponding
(deterministic) measurement values are the corresponding
\emph{eigenvalues} $\lambda_n$.

% Determinate states as a basis

% TODO: Rephrase
% TODO: Show that |c_n|^2 is the probability of finding a particle in
% a given state
% In particular, using the determinate states $\ket{e_n}$ as a basis to
% write the superimposed state
% \begin{align*}
%     \ket{\psi} = \sum_{n=1}^{\infty} c_n \ket{e_n}
%     ,
% \end{align*}

% Recap

% TODO: Mention that `observable` is used to refer to the observable operator
% TODO: Mention eigenstates and eigenvalues again
To summarize, we can mathematically express any observable quantity
$Q$ using a corresponding operator $\hat{Q}$.
This operator allows us to both compute the expected value of the
observable using \autoref{eq:gen_expr_Q_exp_lin}, and describe the
individual determinate states and corresponding measurement values
using \autoref{eq:observable_eigenrelation}.

%%%%%%%%%%%%%%%%
\subsection{Projective Measurements}
\label{subsec:Projective Measurements}

% Projective measurements

% TODO: Better introduce the collapse of the superposition state
The measurements we considered in the previous section, for which
\autoref{eq:gen_expr_Q_exp_lin} holds, belong to the category of
\emph{projective measurements}.
For these, certain restrictions such as repeatability apply: after
measuring a quantum state and thus collapsing it onto one of the
determinate states, futher measurements should yield the same value.
More general methods of modelling measurements exist, e.g., describing
destructive measurements, but they are not relevant to us here
\cite[Box~2.5]{nielsen_quantum_2010}.

% Projection operators

% TODO: Fix notational issues related to e_n
We can model the collapse of the original state onto one of the
superimposed basis states as a \emph{projection}.
To see this, we insert \autoref{eq:determinate_basis} into
\autoref{eq:observable_eigenrelation}, obtaining%
\begin{align*}
    \hat{Q}\ket{\psi} = \sum_{n=1}^{\infty} c_n \hat{Q} \ket{e_n}
    = \sum_{n=1}^{\infty} \lambda_n c_n \ket{e_n}
    .%
\end{align*}%
We see that $\hat{Q}$ has the effect of multiplying the component
along each basis vector with the corresponding eigenvalue.
We decompose $\hat{Q}$ into its constituent parts that act on each of
the separate components as
\begin{align*}
    \hat{Q} = \sum_{n=1}^{\infty} \lambda_n \hat{P}_n
\end{align*}
using \emph{projection operators}
\begin{align*}
    \hat{P}_n := \ket{e_n}\bra{e_n}, \hspace{3mm} n\in \mathbb{N}
    .
\end{align*}%
These project a vector onto the subspace spanned by $\ket{e_n}$.

% Using projection operators to measure if a state has a component
% along a basis vector

A particularly interesting property of projection operators is that
\begin{align*}
    \hat{P}_n (\hat{P}_n \ket{\psi}) = \hat{P}_n^2 \ket{\psi}
    = \hat{P}_n \ket{\psi},
\end{align*}%
and the only way this can hold for any $\ket{\psi}$ is if $\hat{P}_n$
only has the eigenvalues $0$ or $1$.
As explained in the previous section, the eigenvalues are the results
of performing a measurement.
We can thus use the projection operator as an observable and treat
the eigenvalue as an indicator of the state having a component along
the related basis vector.

%%%%%%%%%%%%%%%%
\subsection{Qubits and Multi-Qubit States}
\label{subsec:Qubits and Multi-Qubit States}

\red{
    \begin{itemize}
        \item Qubits and multi-qubit states
            \begin{itemize}
                \item The qubit
                    \begin{itemize}
                        \item Similar structure to classical
                            computing: bits are modified with gates
                            -> quantum bits are modified with quantum gates
                    \end{itemize}
                \item The tensor product
                \item Information is not stored in the individual bit
                    states but in the correlations / entanglement between them
                \item -> The size of the vector space
                \item The X,Z and Y operators
                \item (?) Notation of operators on multi-qubit states
            \end{itemize}
    \end{itemize}
}

\red{
    \begin{itemize}
        \item Representing wave functions as vectors (psi as label,
            building a vector space using basis functions)
    \end{itemize}
}

\red{\textbf{Tensor product}}
\red{\ldots
    Take for example two systems with the determinate states $\ket{0}$
    and $\ket{1}$. In general, the state of each can be written as the
    superposition%
    %
    \begin{align*}
        \alpha \ket{0} + \beta \ket{1}
        .%
    \end{align*}
    %
    Combining these two sytems into one, the overall state becomes%
    %
    \begin{align*}
        &\mleft( \alpha_1 \ket{0} + \beta_1 \ket{1} \mright) \otimes
        \mleft( \alpha_2 \ket{0} + \beta_2 \ket{1} \mright) \\
        = &\alpha_1 \alpha_2 \ket{0} \ket{0}
        + \alpha_1 \alpha_2 \ket{0} \ket{1}
        + \beta_1 \alpha_2 \ket{1} \ket{0}
        + \beta_1 \beta_2 \ket{1} \ket{1}
        % =: &\alpha_{00} \ket{00}
        % + \alpha_{01} \ket{01}
        % + \alpha_{10} \ket{10}
        % + \alpha_{11} \ket{11}
        .%
    \end{align*}%
    %
    \ldots When not ambiguous in the context, the tensor product
    symbol may be omitted, e.g.,
    \begin{align*}
        \ket{0} \otimes \ket{0} = \ket{0}\ket{0}
        .%
    \end{align*}
}

As we will see, the core concept that gives quantum computing its
power is entanglement. When two quantum mechanical systems are
entangled, measuring the state of one will collapse that of the other.
Take for example two subsystems with the overall state
%
\begin{align*}
    \ket{\psi} = \frac{1}{\sqrt{2}} \mleft( \ket{0}\ket{0} +
    \ket{1}\ket{1} \mright)
    .%
\end{align*}
%
If we measure the first subsystem as being in $\ket{0}$, we can
be certain that a measurement of the second subsystem will also yield $\ket{0}$.
Introducing a new notation for entangled states, we can write%
%
\begin{align*}
    \ket{\psi} = \frac{1}{\sqrt{2}} \left( \ket{00} + \ket{11} \right)
    .%
\end{align*}
%

%%%%%%%%%%%%%%%%
\subsection{Quantum Gates}
\label{subsec:Quantum Gates}

\red{
    \textbf{Content:}
    \begin{itemize}
        \item Bra-ket notation
        \item The tensor product
        \item Projective measurements (the related operators,
            eigenvalues/eigenspaces, etc.)
            \begin{itemize}
                \item First explain what an operator is
            \end{itemize}
        \item Abstract intro to QC: Use gates to process qubit
            states, similar to classical case
        \item X, Z, Y operators/gates
        \item Hadamard gate (+ X and Z are the same thing in differt bases)
        \item Notation of operators on multi-qubit states
        \item The Pauli, Clifford and Magic groups
    \end{itemize}
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Quantum Error Correction}
\label{sec:Quantum Error Correction}

\red{
    \textbf{Content:}
    \begin{itemize}
        \item  General context
            \begin{itemize}
                \item Why we want QC
                \item Why we need QEC (correcting errors due to noisy gates)
                \item Main challenges of QEC compared to classical
                    error correction
            \end{itemize}
        \item Stabilizer codes
            \begin{itemize}
                \item Definition of a stabilizer code
                \item The stabilizer its generators (note somewhere
                        that the generators have to commute to be able to
                    be measured without disturbing each other)
                \item syndrome extraction circuit
                \item Stabilizer codes are effectively the QM
                    % TODO: Actually binary linear codes or just linear codes?
                    equivalent of binary linear codes (e.g.,
                    expressible via check matrix)
            \end{itemize}
        \item Digitization of errors
        \item CSS codes
        \item Color codes?
        \item Surface codes?
        \item Fault tolerant error correction (gates with which we do
            error correction are also noisy)
            \begin{itemize}
                \item Transversal operations
                \item \dots
            \end{itemize}
        \item Circuit level noise
        \item Detector error model
            \begin{itemize}
                \item Columns of the check matrix represent different
                    possible error patterns $\rightarrow$ Check matrix
                    doesn't quite correspond to the codewords we used
                    initially anymore, but some similar structure ist
                    still there (compare with syndrome)
            \end{itemize}
    \end{itemize}
    \textbf{General Notes:}
    \begin{itemize}
        \item Give a brief overview of the history of QEC
        \item Note (and research if this is actually correct) that QC
            was developed on an abstract level before thinking of
            what hardware to use
        \item Note that there are other codes than stabilizer codes
            (and research and give some examples), but only
            stabilizer codes are considered in this work
        \item Degeneracy
        \item The QEC decoding problem (considering degeneracy)
    \end{itemize}
}

\subsection{Stabilizer Codes}
\subsection{CSS Codes}
\subsection{Quantum Low-Density Parity-Check Codes}