Last changes

Final readthrough corrections of quantum fundamentals
Final readthrough corrections of classical fundamentals
2026-05-06 01:58:49 +02:00 · 2026-05-04 23:04:28 +02:00 · 2026-05-04 21:07:25 +02:00 · 2026-05-04 20:56:35 +02:00 · 2026-05-04 20:24:27 +02:00 · 2026-05-04 20:21:21 +02:00
6 changed files with 303 additions and 282 deletions
--- a/src/thesis/chapters/2_fundamentals.tex
+++ b/src/thesis/chapters/2_fundamentals.tex
@@ -35,9 +35,9 @@ algorithm.
 % Codewords, n, k, rate
 %

-One particularly important class of coding schemes is that of binary
-linear block codes.
-The information to be protected takes the form of a sequence of
+Binary linear block codes form one particularly important class of
+coding schemes.
+The information to be protected is represented by a sequence of
 binary symbols, which is split into separate blocks.
 Each block is encoded, transmitted, and decoded separately.
 The encoding step introduces redundancy by mapping input messages
@@ -45,10 +45,11 @@ $\bm{u} \in \mathbb{F}_2^k$ of length $k \in \mathbb{N}$ (called the
 \textit{information length}) onto \textit{codewords} $\bm{x} \in
 \mathbb{F}_2^n$ of length $n \in \mathbb{N}$ (called the
 \textit{block length}) with $n > k$.
-A measure of the amount of introduced redundancy is the \textit{code
-rate} $R = k/n$.
-We call the set of all codewords $\mathcal{C}$ the \textit{code}
-\cite[Sec.~3.1.1]{ryan_channel_2009}.
+The \textit{code rate} $R = k/n$ is a measure of the amount of
+introduced redundancy.
+We call the set of all codewords
+$\mathcal{C} = \{\bm{x}^{(1)}, \bm{x}^{(2)}, \ldots, \bm{x}^{(2^k)}\}$
+the \textit{code} \cite[Sec.~3.1.1]{ryan_channel_2009}.

 %
 % d_min and the [] Notation
@@ -77,7 +78,7 @@ $[n,k,d_\text{min}]$.
 % Parity checks, H, and the syndrome
 %

-A particularly elegant way of describing the code space $C$ is the
+A particularly elegant way of describing the code space $\mathcal{C}$ is the
 notion of \textit{parity checks}.
 Since $\lvert \mathcal{C} \rvert = 2^k$ and $\lvert \mathbb{F}_2^n
 \rvert = 2^n$, there are $n-k$ conditions constrain the additional
@@ -86,17 +87,17 @@ These conditions, called parity checks, take the form of equations
 over $\mathbb{F}_2^n$, linking the individual positions of each codeword.
 We can arrange the coefficients of these equations in a
 \textit{parity-check matrix} (\acs{pcm}) $\bm{H} \in
-\mathbb{F}_2^{(n-k) \times n}$ and equivalently define the code as
-\cite[Sec.~3.1.1]{ryan_channel_2009}
+\mathbb{F}_2^{(n-k) \times n}$, $\text{rank}(\bm{H}) = n-k$, and
+equivalently define the code as \cite[Sec.~3.1.1]{ryan_channel_2009}
 \begin{align*}
-    \mathcal{C} = \left\{ \bm{x} \in \mathbb{F}_2^n :
-    \bm{H}\bm{x}^\text{T} = \bm{0} \right\}
+    \mathcal{C} := \text{kern}(\bm{H}) = \left\{ \bm{x} \in \mathbb{F}_2^n :
+    \bm{H}\bm{x}^\mathsf{T} = \bm{0} \right\}
    .%
 \end{align*}
-Note that in general we may have linearly dependent parity checks,
+In general, we have linearly dependent parity checks,
 prompting us to define the \ac{pcm} as $\bm{H} \in
 \mathbb{F}_2^{m\times n}$ with $\hspace{2mm} m \ge n-k$ instead.
-The \textit{syndrome} $\bm{s} = \bm{H} \bm{v}^\text{T}$ describes
+The \textit{syndrome} $\bm{s} = \bm{H} \bm{v}^\mathsf{T}$ describes
 which parity checks a vector $\bm{v} \in \mathbb{F}_2^n$ violates.
 The representation using the \ac{pcm} has the benefit of providing a
 description of the code, the memory complexity of which does not grow
@@ -118,9 +119,9 @@ $\bm{y} \in \mathbb{R}^n$, and \textit{hard-decision} decoding, where
 $\bm{y} \in \mathbb{F}_2^n$ \cite[Sec.~1.5.1.3]{ryan_channel_2009}.
 Finally, the decoder is responsible for obtaining an estimate
 $\hat{\bm{u}} \in \mathbb{F}_2^k$ of the original input message.
-This is done by first finding an estimate $\hat{\bm{x}}$ of the sent
+This can be done by first finding an estimate $\hat{\bm{x}}$ of the sent
 codeword and undoing the encoding.
-The decoding problem that we generally attempt to solve thus consists
+The decoding problem that we attempt to solve thus consists
 in finding the best estimate $\hat{\bm{x}}$ given $\bm{y}$.

 \begin{figure}[t]
@@ -168,9 +169,9 @@ in finding the best estimate $\hat{\bm{x}}$ given $\bm{y}$.
 %

 Shannon's noisy-channel coding theorem is stated for codes whose block
-length approaches infinity. This suggests that as the block length
-becomes larger, the performance of the considered codes should
-generally improve.
+length $n$ approaches infinity.
+This suggests that as the block length becomes larger, the
+performance of the considered codes should generally improve.
 However, the size of the \ac{pcm} of a linear block code grows
 quadratically with $n$.
 This would quickly render decoding intractable as we increase the
@@ -189,13 +190,14 @@ This is exactly the motivation behind \ac{ldpc} codes
 These differ from ``classical codes'' in their decoding algorithms:
 Classical codes are usually decoded using one-step hard-decision decoding,
 whereas modern codes are suitable for iterative soft-decision
-decoding \cite[Preface]{ryan_channel_2009}. The iterative decoding algorithms
+decoding \cite[Preface]{ryan_channel_2009}.
+For \ac{ldpc} codes, the iterative decoding algorithms
 are generally defined in terms of message passing on the
 \textit{Tanner graph} of a code. The Tanner graph is a bipartite
 graph that constitutes an alternative representation of the \ac{pcm}.
-We define two types of nodes: \acp{vn}, corresponding to codeword
+We define two types of nodes: \Acp{vn}, corresponding to codeword
 bits, and \acp{cn}, corresponding to individual parity checks.
-We then construct the Tanner graph by connecting each \ac{cn} to
+Then, we construct the Tanner graph by connecting each \ac{cn} to
 the \acp{vn} that make up the corresponding parity check
 \cite[Sec.~5.1.2]{ryan_channel_2009}.
 \Cref{PCM and Tanner graph of the Hamming code} shows the Tanner
@@ -273,11 +275,11 @@ Mathematically, we represent a \ac{vn} using the index $i \in
 and a \ac{cn} using the index $j \in \mathcal{J}
 := \left[ 0 : m-1 \right]$.
 We can then encode the information contained in the graph by defining
-the neighborhood of a variable node $i$ as
-$\mathcal{N}_\text{V} (i) = \left\{ j \in \mathcal{J} : \bm{H}_{j,i}
+the neighborhood of a \ac{vn} $i$ as
+$\mathcal{N}_\text{V} (i) = \left\{ j \in \mathcal{J} : H_{j,i}
 = 1 \right\}$
-and that of a check node $j$ as
-$\mathcal{N}_\text{C} (j) = \left\{ i \in \mathcal{I} : \bm{H}_{j,i}
+and the neighborhood of a \ac{cn} $j$ as
+$\mathcal{N}_\text{C} (j) = \left\{ i \in \mathcal{I} : H_{j,i}
 = 1 \right\}$.

 %
@@ -379,15 +381,15 @@ the numbers of ones, of their rows and columns are constant
 Already during their introduction, regular \ac{ldpc} codes were shown to have
 a minimum distance scaling linearly with the block length $n$ for
 large values \cite[Ch.~2,~Theorem~1]{gallager_low_1960},
-which leads to the fact that they do not exhibit an error floor under
-\ac{ml} decoding.
-Irregular codes, on the other hand, generally do exhibit an error floor,
-while their redeeming quality is the ability to reach near-capacity
-performance in the waterfall region \cite[Intro.]{costello_spatially_2014}.
+which leads to a more favorable behavior of the error rate for high
+signal-to-noise ratios.
+Irregular codes, on the other hand, have more severe error floor behavior.
+However, they have the the ability to reach near-capacity performance
+in the waterfall region \cite[Intro.]{costello_spatially_2014}.

 \subsection{Spatially-Coupled LDPC Codes}

-A recent development in the field of \ac{ldpc} codes is that of
+A more recent development in the field of \ac{ldpc} codes is that of
 \ac{sc}-\ac{ldpc} codes.
 Their key feature is that they combine the best properties of regular
 and irregular codes.
@@ -399,11 +401,12 @@ waterfall region \cite[Intro.]{costello_spatially_2014}.
 The essential property of \ac{sc}-\ac{ldpc} codes is that codewords
 from different \textit{spatial positions}, which would ordinarily be sent
 one after the other independently, are linked.
-This is achieved by connecting some \acp{vn} of one spatial position to
-\acp{cn} of another, resulting in a \ac{pcm} of the form
+This is achieved by introducing edges between \acp{vn} of one spatial
+position and \acp{cn} of another, resulting in a \ac{pcm} of the form
 \cite[Eq.~1]{hassan_fully_2016}
 %
-\begin{align*}
+\begin{align}
+    \label{eq:PCM}
    \bm{H} =
    \begin{pmatrix}
        \bm{H}_0(1) &        &             \\
@@ -413,10 +416,11 @@ This is achieved by connecting some \acp{vn} of one spatial position to
        &        & \bm{H}_K(L) \\
    \end{pmatrix}
    ,
-\end{align*}
+\end{align}
 %
 where $K \in \mathbb{N}$ is the \textit{coupling width} and $L \in
 \mathbb{N}$ is the number of spatial positions.
+The parts of the \ac{pcm} left empty in \Cref{eq:PCM} are filled with zeros.
 This construction results in a Tanner graph as depicted in
 \Cref{fig:sc-ldpc-tanner}.

@@ -513,7 +517,7 @@ Note that at the first few spatial positions some \acp{cn} have lower degrees.
 This leads to more reliable information about the
 \acp{vn} that, as we will see, is
 later passed to subsequent spatial positions during decoding.
-This is precisely the effect that leads to the good performance of
+This is precisely the effect that leads to the improved performance of
 \ac{sc}-\ac{ldpc} codes in the waterfall region \cite{costello_spatially_2014}.

 \subsection{Iterative Decoding}
@@ -521,15 +525,14 @@ This is precisely the effect that leads to the good performance of

 % Introduction

-\ac{ldpc} codes are generally decoded using efficient iterative
-algorithms, something that is possible due to their sparsity
-\cite[Sec.~5.3]{ryan_channel_2009}.
-The algorithm originally proposed alongside LDPC codes for this
-purpose by Gallager in 1960 is now known as the \ac{spa}
+Due to their sparse graphs, efficient iterative decoders exist for
+\ac{ldpc} codes \cite[Sec.~5.3]{ryan_channel_2009}.
+The decoding algorithm originally proposed alongside LDPC codes by
+Gallager in 1960 is now known as the \ac{spa}
 \cite[5.4.1]{ryan_channel_2009}, also called \acf{bp}.

 The optimality criterion the \ac{spa} is built around is a
-symbol-wise \ac{map} decision \cite[Sec.~5.4.1]{ryan_channel_2009}.
+bit-wise \ac{map} decision \cite[Sec.~5.4.1]{ryan_channel_2009}.
 The core idea of the resulting algorithm is to view \acp{cn}
 and \acp{vn} as representing individual local codes.
 A \ac{cn} represents a single parity check on the connected \acp{vn},
@@ -539,11 +542,11 @@ should agree on its value; it can therefore be understood as a repetition code.
 The algorithm alternates between consolidating soft information about
 the \acp{vn} in the \acp{cn}, and consolidating soft information about
 the \acp{cn} in the \acp{vn}.
-To this end, messages are passed back and forth along the edges of
-the Tanner graph.
+To this end, messages computed in the nodes are passed back and forth
+along the edges of the Tanner graph.
 $L_{i\rightarrow j}$ represents a message passed from \ac{vn} $i$ to
-\ac{cn} j, $L_{i\leftarrow j}$ represents a message passed from
-\ac{cn} j to \ac{vn} i.
+\ac{cn} $j$, $L_{i\leftarrow j}$ represents a message passed from
+\ac{cn} $j$ to \ac{vn} $i$.
 The \acp{vn} additionally receive messages \cite[5.4.2]{ryan_channel_2009}
 \begin{align*}
    \tilde{L}_i = \log \frac{P(X=0 \vert Y=y)}{P(X=1 \vert Y=y)},
@@ -574,7 +577,7 @@ possible cycles and are thus especially problematic.

 % Min-sum algorithm

-A simplification of the \ac{spa} is the min-sum decoder. Here, the
+A simplification of the \ac{spa} is the min-sum algorithm. Here, the
 \ac{cn} update is approximated as \cite[Sec.~5.5.1]{ryan_channel_2009}
 \begin{align*}
    L_{i \leftarrow j} = \prod_{i' \in \mathcal{N}_\text{C}(j)\setminus \{i\}}
@@ -598,7 +601,7 @@ decoding of subsequent blocks \cite[Sec.~III.~C.]{hassan_fully_2016}.
 \label{sec:Quantum Mechanics and Quantum Information Science}

 Designing codes and decoders for \ac{qec} is generally performed on a
-layer of abstraction far removed from the quantum mechanical
+layer of mathematical abstraction far removed from the quantum mechanical
 processes underlying the actual physics.
 Nevertheless, having a fundamental understanding of the related
 quantum mechanical concepts is useful to grasp the unique constraints
@@ -618,39 +621,41 @@ function and the observable world:
 $\lvert \psi (x,t) \rvert^2$ is the \ac{pdf} of finding a particle at
 position $x$ and time $t$ \cite[Sec.~1.2]{griffiths_introduction_1995}.
 Note that this presupposes a normalization of $\psi$ such that
-$\int_{-\infty}^{\infty} \lvert \psi(x,t) \rvert^2 dx = 1$.
+\begin{align*}
+    \int_{-\infty}^{\infty} \lvert \psi(x,t) \rvert^2 dx = 1
+    .%
+\end{align*}

 % Dirac notation

-Much of the related mathematics can be very elegantly expressed
-using the language of linear algebra.
-The so-called Bra-ket or Dirac notation is especially appropriate,
-having been proposed by Paul Dirac in 1939 for the express purpose
-of simplifying quantum mechanical notation \cite{dirac_new_1939}.
-Two new symbols are defined, \emph{bra}s $\bra{\cdot}$ and
-\emph{ket}s $\ket{\cdot}$.
+The language of linear algebra allows one to express the related
+mathematics particularly elegantly.
+The so-called Bra-ket or Dirac notation, introducced
+by Paul Dirac in 1939 for the express purpose of simplifying quantum
+mechanical notation \cite{dirac_new_1939}, is especially appropriate.
+Two new symbols are defined, \emph{bra} $\bra{\cdot}$ and
+\emph{ket} $\ket{\cdot}$.
 Kets denote column vectors, while bras denote their Hermitian conjugates.
-For example, two vectors specified by the labels $a$ and $b$
-respectively are written as $\ket{a}$ and $\ket{b}$.
+For example, two vectors specified by the labels $a$ and $b$,
+respectively, are written as $\ket{a}$ and $\ket{b}$.
 Their inner product is $\braket{a\vert b}$.

 % Expressing wave functions using linear algebra

 The connection we will make between quantum mechanics and linear
 algebra is that we will model the state space of a system as a
-\emph{function space}, the Hilbert space $L_2$.
-We will represent the state of a particle with wave function
-$\psi(x,t)$ using the vector $\ket{\psi}$
-\cite[Sec.~3.3]{griffiths_introduction_1995}.
+\emph{function space}, namely the Hilbert space $L_2$.
+The state of a particle with wave function $\psi(x,t)$ is represented
+by the vector $\ket{\psi}$ \cite[Sec.~3.3]{griffiths_introduction_1995}.

 % Operators

 Another important notion is that of an \emph{operator}, a transformation
-that takes a function as an input and returns another function as an
-output \cite[Sec.~3.2.2]{griffiths_introduction_1995}.
+that maps a function onto another function
+\cite[Sec.~3.2.2]{griffiths_introduction_1995}.
+A prominent example of this is the differential operator $\partial x$.
 Operators are useful to describe the relations between different
 quantities relating to a particle.
-An example of this is the differential operator $\partial x$.
 We define the \emph{commutator} of two operators $P_1$ and $P_2$ as
 \begin{align*}
    [P_1,P_2] = P_1P_2 - P_2P_1
@@ -669,22 +674,21 @@ We say the two operators \emph{commute} iff $[P_1,P_2] = 0$, and they

 % Observable quantities

-An \emph{observable quantity} $Q(x,p,t)$ is a quantity of a quantum
-mechanical system that we can measure, such as the position $x$ or
+An \emph{observable} $Q(x,p,t)$ is a quantity of a quantum
+mechanical system that we can measure, e.g., the position $x$ or
 momentum $p$ of a particle.
 In general, such measurements are not deterministic, i.e.,
 measurements on identically prepared states can yield different results.
-There are some states, however, that are \emph{determinate} for a
-specific observable: measuring those will always yield identical
-observations \cite[Sec.~3.3]{griffiths_introduction_1995}.
+However, some states are \emph{determinate} for a
+specific observable: Measuring those will always yield identical
+outcomes \cite[Sec.~3.3]{griffiths_introduction_1995}.

 % General expression for expected value of observable quantity

-If we know the wave function of a particle, we should be able to
-compute the expected value $\braket{Q}$ of any observable quantity we wish.
-It can be shown that for any $Q$, we can find a
-corresponding Hermitian operator $\hat{Q}$ such that
-\cite[Sec.~3.3]{griffiths_introduction_1995}
+If the wave function of a particle is known, the expected value
+$\braket{Q}$ of any observable quantity can be computed.
+Indeed, for any $Q$, there exists a corresponding Hermitian operator
+$\hat{Q}$ such that \cite[Sec.~3.3]{griffiths_introduction_1995}
 \begin{align}
    \label{eq:gen_expr_Q_exp}
    \braket{Q} = \int_{-\infty}^{\infty} \psi^*(x,t) \hat{Q} \psi(x,t) dx
@@ -700,9 +704,9 @@ operator to $\hat{Q} = x$, we can write
    = \int_{-\infty}^{\infty} x \lvert \psi(x,t) \rvert ^2 dx
    .%
 \end{align*}
-Note that $\lvert \psi(x,t) \rvert^2 $ represents the \ac{pdf} of
-finding a particle in a specific state. We immediately see that the
-formula simplifies to the direct calculation of the expected value.
+Note that $\lvert \psi(x,t) \rvert^2 $ is the \ac{pdf} of
+finding a particle at position $x$. Hence, we immediately see that
+the formula simplifies to the direct calculation of the expected value.

 % Determinate states and eigenvalues

@@ -716,40 +720,40 @@ We begin by translating \Cref{eq:gen_expr_Q_exp} into linear algebra as
    .%
 \end{align}
 \Cref{eq:gen_expr_Q_exp_lin} expresses an inherently probabilistic
-relationship.
-The determinate states are inherently deterministic.
+relationship, whereas the determinate states are inherently deterministic.
 To relate the two, we note that since determinate states should
 always yield the same measurement results, the variance of the
-observable should be zero.
+observable must be zero.
 We thus compute \cite[Eq.~3.116]{griffiths_introduction_1995}
 \begin{align}
    0 &\overset{!}{=} \braket{(Q - \braket{Q})^2}
    = \braket{e_n \vert (\hat{Q} - \braket{Q})^2 e_n} \nonumber\\
-    &= \braket{(Q - \braket{Q})e_n \vert (\hat{Q} - \braket{Q})
+    &= \braket{(\hat{Q} - \braket{Q})e_n \vert (\hat{Q} - \braket{Q})
    e_n} \nonumber\\
-    &= \lVert (Q - \braket{Q}) e_n \rVert^2 \nonumber\\[3mm]
-    &\hspace{-8mm}\Leftrightarrow (\hat{Q} - \braket{Q}) \ket{e_n} =
-    0 \nonumber\\
+    &= \lVert (\hat{Q} - \braket{Q}) e_n \rVert^2 \nonumber\\[3mm]
+    &\hspace{-14mm}\iff (\hat{Q} - \braket{Q}) \ket{e_n}
+    = 0 \nonumber\\
    \label{eq:observable_eigenrelation}
-    &\hspace{-8mm}\Leftrightarrow \hat{Q}\ket{e_n}
+    &\hspace{-14mm}\iff \hat{Q}\ket{e_n}
    = \underbrace{\braket{Q}}_{\lambda_n} \ket{e_n}
    .%
 \end{align}%
 %
-Because we have assumed the variance to be zero, the expected value
-$\braket{Q}$ is now the deterministic measurement result
+By setting the variance to zero, the expected value
+$\braket{Q}$ becomes a deterministic measurement result
 corresponding to the determinate state
 $\ket{e_n},~n\in \mathbb{N}$.
-We can see that the determinate states are the \emph{eigenstates} of
-the observable operator $\hat{Q}$ and that the measurement values are
-the corresponding \emph{eigenvalues} $\lambda_n$
+The determinate states are precisely the \emph{eigenstates} of
+the observable operator $\hat{Q}$, and the associated measurement
+values are the corresponding \emph{eigenvalues} $\lambda_n$
 \cite[Sec.~3.3]{griffiths_introduction_1995}.

 % Determinate states as a basis

-As we are modelling the wave function $\psi(x,t)$ as a vector
+As we model the wave function $\psi(x,t)$ as a vector
 $\ket{\psi}$, we can find a set of basis vectors to decompose it into.
-We can use the determinate states for this purpose, expressing the state as%
+In particular, we can use the determinate states for this purpose,
+expressing the state as%
 \footnote{
    We only consider the case of having a \emph{discrete
    spectrum} here, i.e., having a discrete set of eigenvalues and vectors.
@@ -787,7 +791,7 @@ $Q(x,t,p)$ using a corresponding operator $\hat{Q}$, which allows us
 to compute the expected value as $\braket{Q} = \braket{\psi
 \vert \hat{Q} \psi}$.
 The eigenvectors of $\hat{Q}$ are the determinate states
-$\ket{e_n},~n\in \mathbb{N}$ and the eigenvalues are the respective
+$\ket{e_n},~n\in \mathbb{N}$, and the eigenvalues are the respective
 measurement outcomes.
 We can decompose an arbitrary state as $\ket{\psi} = \sum_{n=1}^{\infty} c_n
 \ket{e_n}$, where $\lvert c_n \rvert ^2$ represents the probability
@@ -805,16 +809,16 @@ The measurements we considered in the previous section, for which
 \Cref{eq:gen_expr_Q_exp_lin} holds, belong to the category of
 \emph{projective measurements}.
 For these, certain restrictions such as repeatability apply: the act
-of measuring a quantum state should \emph{collapse} it onto one of
+of measuring a quantum state \emph{collapses} it onto one of
 the determinate states.
-Further measurements should then yield the same value.
-More general methods of modelling measurements exist, e.g., describing
+Further measurements then yield the same value.
+More general methods of modelling measurements exist, e.g.,
 destructive measurements \cite[Box~2.5]{nielsen_quantum_2010}, but
 they are not relevant to this work.

 % Projection operators

-We can model the collapse of the original state onto one of the
+We model the collapse of the original state onto one of the
 superimposed basis states as a \emph{projection}.
 To see this, we use
 \Cref{eq:determinate_basis,eq:observable_eigenrelation} to compute
@@ -833,9 +837,9 @@ the separate components as
 using \emph{projection operators} \cite[Eq.~3.160]{griffiths_introduction_1995}
 \begin{align*}
    \hat{P}_n := \ket{e_n}\bra{e_n}, \hspace{3mm} n\in \mathbb{N}
-    .
+    ,
 \end{align*}%
-These project a vector onto the subspace spanned by $\ket{e_n}$.
+which project a vector onto the subspace spanned by $\ket{e_n}$.

 % % Using projection operators to measure if a state has a component
 % % along a basis vector
@@ -861,10 +865,9 @@ These project a vector onto the subspace spanned by $\ket{e_n}$.

 % Intro

-% TODO: Make sure `quantum gate` is proper terminology
 A central concept for quantum computing is that of the \emph{qubit}.
-We employ it analogously to the classical \emph{bit}.
-For classical computers, we alter bits' states using \emph{gates}.
+It takes the place of the classical \emph{bit}.
+For classical computers, we alter the state of a bit using \emph{gates}.
 We can chain multiple of these gates together to build up more complex logic,
 such as half-adders or eventually a full processor.
 In principle, quantum computers work in a similar fashion, only that
@@ -895,10 +898,10 @@ A qubit is defined to be a system with quantum state
        \alpha \\
        \beta
    \end{pmatrix}
-    = \alpha \ket{0} + \beta \ket{1}
+    = \alpha \ket{0} + \beta \ket{1}, \hspace{5mm} \alpha,\beta \in \mathbb{C}
    .%
 \end{align}
-The overall state of a composite quantum system is described using
+The overall state of a multi-qubit quantum system is described using
 the \emph{tensor product}, denoted as $\otimes$
 \cite[Sec.~2.2.8]{nielsen_quantum_2010}.
 Take for example the two qubits
@@ -927,9 +930,9 @@ i.e.,
        .%
    \end{split}
 \end{align}
-We call $\ket{x_0, \ldots, x_n}~, x_i \in \{0,1\}$ the
+We call $\ket{x_0, \ldots, x_n},~x_i \in \{0,1\}$ the
 \emph{computational basis states} \cite[Sec.~4.6]{nielsen_quantum_2010}.
-To additionally simplify set notation, we define
+To simplify set notation, we define
 \begin{align*}
    \mathcal{M}^{\otimes n} := \underbrace{\mathcal{M}\otimes \ldots
    \otimes \mathcal{M}}_{n \text{ times}}
@@ -938,7 +941,7 @@ To additionally simplify set notation, we define

 % Entanglement

-States that are not able to be decomposed into such products
+States that are not able to be decomposed into products of single-qubit states
 are called \emph{entangled} \cite[Sec.~2.2.8]{nielsen_quantum_2010}.
 An example of such states are the \emph{Bell states}
 \begin{align*}
@@ -976,7 +979,7 @@ we now shift our focus to describing the evolution of their states.
 We model state changes as operators.
 Unlike classical systems, where there are only two possible states and
 thus the only possible state change is a bit-flip, a general qubit
-state as shown in \Cref{eq:gen_qubit_state} lives on a continuum of values.
+state as shown in \Cref{eq:gen_qubit_state} lies on a continuum of values.
 We thus technically also have an infinite number of possible state changes.
 Fortunately, we can express any operator as a linear combination of the
 \emph{Pauli operators} \cite[Sec.~2.2]{gottesman_stabilizer_1997}
@@ -1013,13 +1016,15 @@ Fortunately, we can express any operator as a linear combination of the
 In fact, if we allow for complex coefficients, the $X$ and $Z$
 operators are sufficient to express any other operator as a linear
 combination \cite[Sec.~2.2]{roffe_quantum_2019}.
-$I$ is the identity operator and $X$ and $Z$ are referred to as
+Hereby, $I$ is the identity operator and $X$ and $Z$ are referred to as
 \emph{bit-flips} and \emph{phase-flips} respectively.
-We call the set $\mathcal{G}_n = \left\{ \pm I,\pm \mathrm{i}I, \pm
-    X,\pm \mathrm{i}X,
-\pm Y,\pm \mathrm{i}Y, \pm Z, \pm \mathrm{i}Z \right\}^{\otimes n}$
-the \emph{Pauli
-group} over $n$ qubits.
+We call the set
+\begin{align}
+    \mathcal{G}_n = \left\{ \pm I,\pm \mathrm{i}I, \pm
+        X,\pm \mathrm{i}X,
+    \pm Y,\pm \mathrm{i}Y, \pm Z, \pm \mathrm{i}Z \right\}^{\otimes n}
+\end{align}
+the \emph{Pauli group} over $n$ qubits.

 In the context of modifying qubit states, we also call operators \emph{gates}.
 When working with multi-qubit systems, we can also apply Pauli gates
@@ -1049,7 +1054,7 @@ Other important operators include the \emph{Hadamard} and
        \centering
        \begin{align*}
            \begin{array}{c}
-                CNOT\text{ Operator} \\
+                \text{CNOT Operator} \\
                \hline\\
                \ket{00} \mapsto \ket{00} \\
                \ket{01} \mapsto \ket{01} \\
@@ -1060,7 +1065,9 @@ Other important operators include the \emph{Hadamard} and
    \end{minipage}%
 \end{figure}
 \vspace{-4mm}
-\noindent Many more operators relevant to quantum computing exist, but they are
+\noindent The CNOT operator is a 2-qubit gate that applies a bit-flip to the
+second qubit conditioned on the state of the first one.
+Many more operators relevant to quantum computing exist, but they are
 not covered here as they are not central to this work.

 %%%%%%%%%%%%%%%%
@@ -1093,9 +1100,8 @@ The control connection is represented by a vertical line connecting
 the gate to the corresponding qubit, where a filled dot is placed.
 A controlled gate applies the respective operation only if the
 control qubit is in state $\ket{1}$.
-An example of this is the CNOT gate introduced in
-\Cref{subsec:Qubits and Multi-Qubit States}, which is depicted in
-\Cref{fig:cnot_circuit}.
+\Cref{fig:cnot_circuit} depicts an example of this: The  CNOT gate
+introduced in \Cref{subsec:Qubits and Multi-Qubit States}.

 \begin{figure}[t]
    \centering
@@ -1117,7 +1123,7 @@ An example of this is the CNOT gate introduced in

 % General motivation behind QEC

-One of the major barriers on the road to building a functioning
+One of the major barriers on the road to building a functioning and scalable
 quantum computer is the inevitability of errors during quantum
 computation. These arise due to the difficulty in sufficiently isolating the
 qubits from external noise \cite[Sec.~1]{roffe_quantum_2019}.
@@ -1126,7 +1132,7 @@ with the environment act as small measurements, an effect called
 \emph{decoherence} of the quantum state
 \cite[Sec.~1]{gottesman_stabilizer_1997}.
 \ac{qec} is one approach of dealing with this problem, by protecting
-the quantum state in a similar fashion to information in classical error
+a quantum state in a similar fashion to information in classical error
 correction.

 % The unique challenges of QEC
@@ -1146,9 +1152,10 @@ Three main restrictions apply \cite[Sec.~2.4]{roffe_quantum_2019}:

 % General idea (logical vs. physical gates) + notation

-Much like in classical error correction, in \ac{qec} information
-is protected by mapping it onto codewords in a higher-dimensional space,
-thereby introducing redundancy.
+Much like in classical error correction, \ac{qec} protects information by
+introducing redundancy.
+The information, represented by a state in a low-dimensional space,
+is mapped onto an encoded state in a higher-dimensional space.
 To this end, $k \in \mathbb{N}$ \emph{logical qubits} are mapped onto
 $n \in \mathbb{N}$ \emph{physical qubits}, $n>k$.
 We circumvent the no-cloning restriction by not copying the state of any of
@@ -1169,8 +1176,9 @@ This is due to the \emph{backlog problem}
 \cite[Sec.~II.G.3.]{terhal_quantum_2015}: There are certain gates
 at which the effect of existing errors on single qubits may be
 exacerbated by transforming them to multi-qubit errors.
-We wish to correct the errors before passing qubits through such gates.
-If the \ac{qec} system is not fast enough, there will be an increasing
+If we ensure decoding with sufficiently low latency, we can correct
+the errors before passing qubits through such gates.
+However, if the \ac{qec} system is not fast enough, there will be an increasing
 backlog of information at this point in the circuit, leading to an
 exponential slowdown in computation.

@@ -1200,8 +1208,8 @@ Note that this code is only able to detect single $X$-type errors.

 % Measuring stabilizers

-To determine if an error occurred, we want to measure
-whether a state belongs
+To determine if an error occurred, we aim at to measuring whether a
+state belongs
 % TODO: Remove footnote?
 % \footnote{
 %     It is possible for a state to not completely lie in either subspace.
@@ -1210,11 +1218,12 @@ whether a state belongs
 % }
 to $\mathcal{C}$ or $\mathcal{F}$.
 As explained in \Cref{subsec:Observables}, physical measurements
-can be mathematically described using operators whose eigenvalues
+can be mathematically described using operators, whose eigenvalues
 are the possible measurement results.
 Here, we need an operator with two eigenvalues and the corresponding
 eigenspaces should be $\mathcal{C}$ and $\mathcal{F}$ respectively.
-For the two-qubit code, $Z_1Z_2$ is such an operator:
+For the two-qubit repetition code, $Z_1Z_2 \in \mathcal{G}_2$ is such
+an operator:
 \begin{align}
    Z_1Z_2 E \ket{\psi}_\text{L} &= (+1) E \ket{\psi}_\text{L}
    \hspace*{3mm} \forall
@@ -1225,13 +1234,14 @@ For the two-qubit code, $Z_1Z_2$ is such an operator:
    .%
 \end{align}
 $E \in \left\{ X,I \right\}$ is an operator describing a possible
-error and $E \ket{\psi}_\text{L}$ is the resulting state after that error.
+single-qubit error and $E \ket{\psi}_\text{L}$ is the resulting state
+after that error.
 By measuring the corresponding eigenvalue, we can determine if
 $E\ket{\psi}_\text{L}$ lies in $\mathcal{C}$ or $\mathcal{F}$.
 % TODO: If necessary, cite \cite[Sec.~3]{roffe_quantum_2019} for the
 % non-compromising meausrement of the information
-To do this without directly observing (and thus potentially
-collapsing) the logical state $\ket{\psi}_\text{L}$, we prepare an
+To do this without directly observing and, thus potentially
+collapsing, the logical state $\ket{\psi}_\text{L}$, we prepare an
 ancilla qubit with state $\ket{0}_\text{A}$ and entangle it with
 $\ket{\psi}_\text{L}$ in such a way that the eigenvalue is indicated
 by measuring the ancilla qubit instead.
@@ -1296,11 +1306,11 @@ This effect is referred to as error \emph{digitization}
 % The stabilizer group

 Operators such as $Z_1Z_2$ above are called \emph{stabilizers}.
-More generally, an operator $P_i \in \mathcal{G}_n$ is called a stabilizer of an
+More generally, an operator $P_i \in \mathcal{G}_n$ is a stabilizer of an
 $\llbracket n, k, d_\text{min} \rrbracket$ code $\mathcal{C}$, if
 \begin{itemize}
    \item It stabilizes all logical states, i.e.,
-        $P_i\ket{\psi}_\text{L} = (+1)\ket{\psi}_\text{L} ~\forall~
+        $P_i\ket{\psi}_\text{L} = (+1)\ket{\psi}_\text{L}, ~\forall~
        \ket{\psi}_\text{L} \in \mathcal{C}$.
    \item It commutes with all other stabilizers $P_j$ of the code,
        i.e., $[P_i, P_j] = 0$.
@@ -1316,8 +1326,8 @@ Formally, we define the \emph{stabilizer group} $\mathcal{S}$ as
    [P_i,P_j] = 0 \forall i,j\right\}
    .%
 \end{align*}
-We care in particular about the commuting properties of stabilizers
-with respect to possible errors.
+We care about the commuting properties of stabilizers with respect to
+possible errors, in particular.
 The measurement circuit for an arbitrary stabilizer $P_i$ modifies
 the state as \cite[Eq.~29]{roffe_quantum_2019}
 \begin{align*}
@@ -1350,6 +1360,7 @@ If a given error $E$ anticommutes with $P_i$, we have
 \end{align*}
 and measuring the ancilla $\text{A}_i$ corresponding to stabilizer
 $P_i$ returns 1.
+Similarly, if it commutes, the ancilla measurement returns 0.

 %%%%%%%%%%%%%%%%
 \subsection{Stabilizer Codes}
@@ -1357,9 +1368,10 @@ $P_i$ returns 1.

 % Structure of a stabilizer code

-For classical binary linear block codes, we use $n-k$ parity-checks
+Stabilizer codes are the quantum analogue of classical binary linear
+block codes, for which we use $n-k$ parity checks
 to reduce the degrees of freedom introduced by the encoding operation.
-Effectively, each parity-check defines a local code splitting the
+Effectively, each parity check defines a local code splitting the
 vector space in half, with only one part containing valid codewords.
 The global code is the intersection of all local codes.
 We can do the same in the quantum case.
@@ -1377,19 +1389,23 @@ operators $P_i$, each using a circuit as explained in
 \Cref{subsec:Stabilizer Measurements}.
 Note that this is an abstract representation of the syndrome extraction.
 For the actual implementation in hardware, we can transform this into
-a circuit that requires only CNOT and H-gates
+a circuit that requires only CNOT and $H$-gates
 \cite[Sec.~10.5.8]{nielsen_quantum_2010}.

 % Logical operators

 In order to modify the logical state encoded using the physical
 qubits, we can use \emph{logical operators} \cite[Sec.~4.2]{roffe_quantum_2019}.
-For each qubit, there are two logical operators, $X_i$ and $Z_j$.
-These are operators that
+For a $\llbracket n,k \rrbracket$ stabilizer code, there exist
+logical operators generated by $2k$ representatives $X_i,
+Z_j,~i,j\in[1:k]$ such that
 \begin{itemize}
-    \item Commute with all the stabilizers in $\mathcal{S}$.
-    \item Anti-commute with one another, i.e., $[ \overline{X}_i,
-        \overline{Z}_i ]_{+} = \overline{X}_i \overline{Z}_i +
+    \item They commute with all stabilizers in $\mathcal{S}$.
+    \item For $i=j$, they anti-commute with one another, i.e., $[
+        \overline{X}_i, \overline{Z}_i ]_{+} = \overline{X}_i
+        \overline{Z}_i + \overline{Z}_i \overline{X}_i = 0$.
+    \item For $i\neq j$, they commute with one another, i.e., $[ \overline{X}_i,
+        \overline{Z}_i ] = \overline{X}_i \overline{Z}_i -
        \overline{Z}_i \overline{X}_i = 0$.
 \end{itemize}
 We can also measure these operators to find out the logical state a
@@ -1399,22 +1415,22 @@ physical state corresponds to \cite[Sec.~2.6]{derks_designing_2025}.

 % TODO: Do I have to introduce before that stabilizers only need X
 % and Z operators?
-We can represent stabilizer codes using a \emph{check matrix}
-\cite[Sec.~10.5.1]{nielsen_quantum_2010}
+We can represent stabilizer codes using a binary \emph{check matrix}
+$\bm{H} \in \mathbb{F}_2^{(n-k)\times(2n)}$
+\cite[Sec.~10.5.1]{nielsen_quantum_2010} with
 \begin{align*}
    \bm{H} = \left[
        \begin{array}{c|c}
            \bm{H}_X & \bm{H}_Z
        \end{array}
    \right]
-    ,%
+    .%
 \end{align*}
-with $\bm{H} \in \mathbb{F}_2^{(n-k)\times(2n)}$.
 This is similar to a classical \ac{pcm} in that it contains $n-k$
 rows, each describing one constraint. Each constraint restricts an additional
 degree of freedom of the higher-dimensional space we use to introduce
 redundancy.
-In contrast to the classical case, this matrix now has $2n$ columns,
+In contrast to the classical case, this matrix has $2n$ columns,
 as we have to consider both the $X$ and $Z$ type operators that make up
 the stabilizers.
 Take for example the Steane code \cite[Eq.~10.83]{nielsen_quantum_2010}.
@@ -1433,8 +1449,8 @@ We can describe it using the check matrix
    \right]
    .%
 \end{align}
-The first $n$ columns correspond to $X$ operators acting on the
-corresponding physical qubit, the rest to the $Z$ operators.
+The first $n$ columns correspond to $X$ stabilizers acting on the
+corresponding physical qubit, the rest to the $Z$ stabilizers.

 \begin{figure}[t]
    \centering
@@ -1463,27 +1479,27 @@ corresponding physical qubit, the rest to the $Z$ operators.

 % Intro

-Stabilizer codes are especially practical to work with when they can
-handle $X$ and $Z$ type errors independently.
+Stabilizer codes are especially practical to work with when the
+stabilizers can be split into one subset consisting only of
+$Z$ stabilizers and one consisting only of $X$ stabilizers.
 As $Z$ errors anti-commute with $X$ operators in the stabilizers and
-vice versa, this property translates into being able to split the
-stabilizers into a subset being made up of only $X$
-operators and the rest only of $Z$ operators.
+vice versa, this property translates into being able to correct $X$
+or $Z$ errors independently.
 We call such codes \ac{css} codes.
 We can see this property in \Cref{eq:steane} in the check matrix
 of the Steane code.

 % Construction

-We can exploit this separate consideration of $X$ and $Z$ errors in
+We can exploit this separate consideration of $X$ and $Z$ stabilizers in
 the construction of \ac{css} codes.
 We combine two binary linear codes $\mathcal{C}_1$ and
-$\mathcal{C}_2$, each responsible for correcting one type of error
+$\mathcal{C}_2$, each responsible for correcting either $Z$ or $X$ errors
 \cite[Sec.~10.5.6]{nielsen_quantum_2010}.
 Using the dual code of $\mathcal{C}_2$ \cite[Eq.~3.4]{ryan_channel_2009}
 \begin{align*}
    \mathcal{C}_2^\perp := \left\{ \bm{x}' \in \mathbb{F}^2 :
-    \bm{x}' \bm{x}^\text{T} = 0 ~\forall \bm{x} \in \mathcal{C}_2 \right\}
+    \bm{x}' \bm{x}^\mathsf{T} = 0 ~\forall \bm{x} \in \mathcal{C}_2 \right\}
    ,%
 \end{align*}
 we define $\bm{H}_X$ as the \ac{pcm} of $\mathcal{C}_2^\perp$ and $\bm{H}_Z$
@@ -1501,7 +1517,7 @@ In order to yield a valid stabilizer code, $\mathcal{C}_1$ and
 $\mathcal{C}_2$ must satisfy the commutativity condition
 \begin{align}
    \label{eq:css_condition}
-    \bm{H}_X \bm{H}_Z^\text{T} = \bm{0}
+    \bm{H}_X \bm{H}_Z^\mathsf{T} = \bm{0}
    .%
 \end{align}
 We can ensure this by choosing $\mathcal{C}_1$ and $\mathcal{C}_2$
@@ -1516,15 +1532,15 @@ such that $\mathcal{C}_2 \subset \mathcal{C}_1$.
 Various methods of constructing \ac{qec} codes exist
 \cite{swierkowska_eccentric_2025}.
 Topological codes, for example, encode information in the features of
-a lattice and are intrinsically robust against local errors.
+a lattice in a way that allows for local interactions between qubits.
 Among these, the \emph{surface code} is the most widely studied.
 Another example are concatenated codes, which nest one code within
 another, allowing for especially simple and flexible constructions
 \cite[Sec.~3.2]{swierkowska_eccentric_2025}.
 An area of research that has recently seen more attention is that of
 quantum \ac{ldpc} (\acs{qldpc}) codes.
-They have much better encoding efficiency than, e.g., the surface
-code, scaling up of which would be prohibitively expensive
+They have much higher rate than, e.g., surface codes, scaling up of
+which would be prohibitively expensive
 \cite[Sec.~I]{bravyi_high-threshold_2024}.

 % Bivariate Bicycle codes
@@ -1536,7 +1552,7 @@ $\bm{H}_Z$ are constructed from two matrices $\bm{A}$ and $\bm{B}$ as
 \begin{align*}
    \bm{H}_X = [\bm{A} \vert \bm{B}]
    \hspace*{5mm} \text{and} \hspace*{5mm}
-    \bm{H}_Z = [\bm{B}^\text{T} \vert \bm{A}^\text{T}]
+    \bm{H}_Z = [\bm{B}^\mathsf{T} \vert \bm{A}^\mathsf{T}]
    .%
 \end{align*}
 This way, we can guarantee the satisfaction of the commutativity
@@ -1576,16 +1592,17 @@ This necessitates a modification of the standard \ac{bp} algorithm
 introduced in \Cref{subsec:Iterative Decoding}
 \cite[Sec.~3.1]{yao_belief_2024}.
 Instead of attempting to find the most likely codeword directly, the
-algorithm will now try to find an error pattern $\hat{\bm{e}} \in
-\mathbb{F}_2^n$ that satisfies
+syndrome-based decoding algorithm tries to find an error pattern
+$\hat{\bm{e}} \in \mathbb{F}_2^n$ that satisfies
 \begin{align*}
-    \bm{H} \hat{\bm{e}}^\text{T} = \bm{s}
+    \bm{H} \hat{\bm{e}}^\mathsf{T} = \bm{s}
    .%
 \end{align*}
 To this end, we initialize the channel \acp{llr} as
 \begin{align*}
-    \tilde{L}_i = \log{\frac{P(X_i = 0)}{P(X_i = 1)}} = \log{\frac{1
-    - p_i}{p_i}}
+    \tilde{L}_i = \log{\frac{P(X_i = 0)}{P(X_i = 1)}} = \log{
+        \left( \frac{1 - p_i}{p_i} \right)
+    }
    ,%
 \end{align*}
 where $p_i$ is the prior probability of error of \ac{vn} $i$.
@@ -1642,7 +1659,7 @@ The resulting syndrome-based \ac{bp} algorithm is shown in
                       \right\}$
            \EndFor

-            \If{$\bm{H}\hat{\bm{e}}^\text{T} = \bm{s}$}
+            \If{$\bm{H}\hat{\bm{e}}^\mathsf{T} = \bm{s}$}
            \State \textbf{break}
            \EndIf

@@ -1721,7 +1738,7 @@ This way, we obtain the \ac{ler}.
                       \mathbbm{1}\left\{ L^\text{total}_i \right\}$
            \EndFor

-            \If{$\bm{H}\hat{\bm{e}}^\text{T} = \bm{s}$}
+            \If{$\bm{H}\hat{\bm{e}}^\mathsf{T} = \bm{s}$}
                \State \textbf{break}
            \Else
                \State $i_\text{max} \leftarrow \argmax_{i \in \mathcal{I}'} \lvert L^\text{total}_i \rvert $
--- a/src/thesis/chapters/3_fault_tolerant_qec.tex
+++ b/src/thesis/chapters/3_fault_tolerant_qec.tex
@@ -16,17 +16,19 @@ using qubits.
 While the use of error correcting codes may facilitate this, it also
 introduces two new challenges \cite[Sec.~4]{gottesman_introduction_2009}:
 \begin{itemize}
-    \item We must be able to perform operations on the encoded state
-        in such a way that we do not lose the protection against errors.
-    \item \ac{qec} systems are themselves partially implemented in
-        quantum hardware. In addition to the errors we have
-        originally introduced them for, these systems must
-        be able to account for the fact they are implemented on noisy
-        hardware themselves.
+    \item To realize a quantum algorithm, we must be able to
+        perform operations on the encoded state in such a way that we
+        do not lose the protection against errors.
+    \item \ac{qec} systems, in particular the syndrome extraction
+        circuit, are themselves partially implemented in
+        quantum hardware.
+        In addition to the errors we have originally introduced them
+        for, these systems must therefore be able to account for the
+        fact they are implemented on noisy hardware themselves.
 \end{itemize}
 In the literature, both of these points are viewed under the umbrella
 of \emph{fault-tolerant} quantum computing.
-We focus only on the second aspect in this work.
+In this thesis, we focus on the second aspect.

 It was recognized early on as a challenge of \ac{qec} that the correction
 machinery itself may introduce new faults \cite[Sec.~III]{shor_scheme_1995}.
@@ -43,16 +45,16 @@ address both.
 We model the possible occurrence of errors during any processing
 stage as different \emph{error locations} $E_i,~i\in [1:N]$
 in the circuit.
-$N \in \mathbb{N}$ is the total number of considered error locations.
+The parameter $N \in \mathbb{N}$ is the total number of considered
+error locations.
 The \emph{circuit error vector} $\bm{e} \in \{0,1\}^N$ is a vector
 indicating which errors occurred, with
 \begin{align*}
    e_i :=
    \begin{cases}
-        1, & \text{Error $E_i$ occurred} \\
-        0, & \text{otherwise}
+        1, & \text{error $E_i$ occurred}, \\
+        0, & \text{otherwise}.
    \end{cases}
-    .%
 \end{align*}
 \Cref{fig:fault_tolerance_overview} illustrates the flow of errors.
 Specifically for \ac{css} codes, a \ac{qec} procedure is deemed
@@ -72,12 +74,14 @@ fault-tolerant, if \cite[Def.~4.2]{derks_designing_2025}
 where $t = \lfloor (d_\text{min} -1)/2 \rfloor$ is the number of
 errors the code is able to correct.
 The vectors $\bm{e}_{\text{output},X}$ and $\bm{e}_{\text{output},Z}$
-denote only $X$ and $Z$ errors respectively.
+denote only $X$ and $Z$ errors, respectively.

 % TODO: Properly introduce d_min for QEC, specifically for CSS codes
 In order to deal with internal errors that flip syndrome bits,
-multiple rounds of syndrome measurements must be performed.
-Typically, the number of syndrome extraction rounds is chosen as $d_\text{min}$.
+multiple rounds of syndrome measurements are performed.
+Typically, the number of syndrome extraction rounds is chosen as
+$d_\text{min}$, e.g., \cite{gong_toward_2024}
+\cite{koutsioumpas_automorphism_2025}.

 % % This is the definition of a fault-tolerant QEC gadget
 % A \ac{qec} procedure is deemed fault tolerant if
@@ -150,7 +154,7 @@ Typically, the number of syndrome extraction rounds is chosen as $d_\text{min}$.
 % Intro

 We collect the probabilities of error at each location in the
-\emph{noise model}, a vector $\bm{p} \in [0,1]^N$.
+\emph{noise model}, represented by a vector $\bm{p} \in [0,1]^N$.
 There are different types of noise models, each allowing for
 different error locations in the circuit.

@@ -178,8 +182,7 @@ $\ket{\psi}_\text{L}$ as \emph{data qubits}.
 Note that this is a concrete implementation using CNOT gates, as
 opposed to the system-level view introduced in
 \Cref{subsec:Stabilizer Codes}.
-We visualize the different types of noise models in
-\Cref{fig:noise_model_types}.
+\Cref{fig:noise_model_types} visualizes the different types of noise models.

 %%%%%%%%%%%%%%%%
 \subsection{Bit-Flip Noise}
@@ -190,7 +193,7 @@ This corresponds to the classical \ac{bsc}, i.e., only $X$ errors on the
 data qubits are possible \cite[Appendix~A]{gidney_new_2023}.
 The occurrence of bit-flip errors is modeled as a Bernoulli process
 $\text{Bern}(p)$.
-This type of noise model is shown in \Cref{subfig:bit_flip}.
+\Cref{subfig:bit_flip} shows this type of noise model.

 Note that bit-flip noise is not suitable for developing fault-tolerant
 systems, as it does not account for errors during the syndrome extraction.
@@ -223,7 +226,7 @@ Here, we consider multiple rounds of syndrome measurements with a
 depolarizing channel before each round.
 Additionally, we allow for measurement errors by having $X$ error
 locations right before each measurement \cite[Appendix~A]{gidney_new_2023}.
-Note that it is enough to only consider $X$ errors at these points,
+Note that it is enough to only consider $X$ errors before measuring,
 since that is the only type of error directly affecting the
 measurement outcomes.
 This model is depicted in \Cref{subfig:phenomenological}.
@@ -253,7 +256,7 @@ While phenomenological noise is useful for some design aspects of
 fault-tolerant circuitry, for simulations, circuit-level noise should
 always be used \cite[Sec.~4.2]{derks_designing_2025}.
 Note that this introduces new challenges during the decoding process,
-as the decoding complexity is increased considerably due to the many
+as the decoding complexity is considerably increased due to the many
 error locations.

 \begin{figure}[t]
@@ -284,11 +287,11 @@ error locations.
 framework for
 passing information about a circuit used for \ac{qec} to a decoder.
 They are also useful as a theoretical tool to aid in the design of
-fault-tolerant \ac{qec} schemes.
-E.g., they can be used to easily determine whether a measurement
-schedule is fault-tolerant \cite[Example~12]{derks_designing_2025}.
+fault-tolerant \ac{qec} schemes, e.g., they can be used to easily
+determine whether a measurement schedule is fault-tolerant
+\cite[Example~12]{derks_designing_2025}.

-Other approaches of implementing fault-tolerance circuits exist, such as
+Other approaches of implementing fault-tolerance circuits exist, e.g.,
 flag error correction, which uses additional ancilla qubits to detect
 potentially damaging high-weight errors \cite[Sec.~1]{chamberland_flag_2018}.
 However, \acp{dem} offer some unique advantages
@@ -310,7 +313,7 @@ To achieve fault tolerance, the goal we strive towards is to
 consider the internal errors in addition to the input errors during
 the decoding process.
 The core idea behind detector error models is to do this by defining
-a new \emph{circuit code} that describes the circuit.
+a new \emph{circuit code} describing the whole circuit.
 Each \ac{vn} of this new code corresponds to an error location in the
 circuit and each \ac{cn} corresponds to a syndrome measurement.
 % This circuit code, combined with the prior probabilities of error
@@ -446,12 +449,11 @@ matrix} $\bm{\Omega} \in \mathbb{F}_2^{M\times N}$, with
 \begin{align*}
    \Omega_{\ell,i} =
    \begin{cases}
-        1, & \text{Error $i$ flips measurement $\ell$}\\
-        0, & \text{otherwise}
+        1, & \text{error $i$ flips measurement $\ell$},\\
+        0, & \text{otherwise},
    \end{cases}
-    ,%
 \end{align*}
-where $M \in \mathbb{N}$ is the number of measurements.
+where $M \in \mathbb{N}$ is the number of performed syndrome measurements.
 To obtain $\bm{\Omega}$, we must propagate Pauli errors through the
 circuit, tracking which measurements they affect
 \cite[Sec.~2.4]{derks_designing_2025}.
@@ -466,8 +468,8 @@ Each round yields an additional set of syndrome bits,
 and we combine them by stacking them in a new vector
 $\bm{s} \in \mathbb{F}_2^{R(n-k)}$, where $R \in \mathbb{N}$ is the
 number of syndrome measurement rounds.
-We thus have to replicate the rows of $\bm{H}_Z$, once for each
-additional syndrome measurement, to obtain
+Thus, we have to replicate the rows of $\bm{H}_Z$, once for each
+additional syndrome measurement, and obtain
 \begin{align*}
    \bm{\Omega}_0 =
    \begin{pmatrix}
@@ -493,11 +495,11 @@ extraction circuitry, so we still consider only bit flip noise at this stage.
 Recall that $\bm{\Omega}_0$ describes which \ac{vn} is connected to
 which parity check and the syndrome indicates which parity checks
 are violated.
-This means that if an error exists at only a single \ac{vn}, we can
-read off the syndrome in the corresponding column.
+Therefore, if an error occurs that corresponds to a single \ac{vn},
+the measured syndrome is the corresponding column.
 If errors occur at multiple locations, the resulting syndrome will be
 the linear combination of the respective columns.
-We thus have
+Thus, we have
 \begin{align*}
    \bm{s} \in \text{span} \{\bm{\Omega}_0\}
    .%
@@ -505,13 +507,13 @@ We thus have

 % Expand to phenomenological

-We now wish to expand the error model to phenomenological noise, though
+Next, we expand the error model to phenomenological noise, though
 only considering $X$ errors in this case.
 We introduce new error locations at the appropriate positions,
-arriving at the circuit depicted in
+resulting in the circuit depicted in
 \Cref{fig:rep_code_multiple_rounds_phenomenological}.
 For each additional error location, we extend $\bm{\Omega}_0$ by
-appending the corresponding syndrome vector as a column.
+appending the corresponding syndrome vector as a column, yielding
 \begin{gather}
    \label{eq:syndrome_matrix_ex}
    \bm{\Omega}_1 =
@@ -668,7 +670,7 @@ extraction round.

 \begin{figure}[t]
    \begin{gather*}
-        \hspace*{-33.3mm}%
+        \hspace*{-31.8mm}%
        \begin{array}{c}
            E_6 \\
            \downarrow
@@ -790,15 +792,14 @@ to a detector.
 We should note at this point that the combination of measurements
 into detectors has no bearing on the actual construction of the
 syndrome extraction circuitry.
-It is something that happens ``virtually'' after the fact and only
-affects the decoder.
+It is something that happens ``virtually'' and only affects the decoder.

 Note that we can use the detector matrix $\bm{D}$ to describe the set
 of possible measurement outcomes under the absence of noise.
 Similar to the we use a \ac{pcm} to describe the code space as
 \begin{equation*}
    \mathcal{C}
-    = \{ \bm{x} \in \mathbb{F}_2^{n} : \bm{H}\bm{x}^\text{T} = \bm{0} \}
+    = \{ \bm{x} \in \mathbb{F}_2^{n} : \bm{H}\bm{x}^\mathsf{T} = \bm{0} \}
    ,%
 \end{equation*}
 the set of possible measurement outcomes is simply $\text{kern}\{\bm{D}\}$
@@ -815,7 +816,7 @@ affect the measurements (through $\bm{\Omega}$), and we know how the
 measurements relate to the detectors (through $\bm{D}$).
 For decoding, we are interested in the effect of the errors on the
 detectors directly.
-We thus construct the \emph{detector error matrix} $\bm{H} \in
+Thus, we construct the \emph{detector error matrix} $\bm{H} \in
 \mathbb{F}_2^{D\times N}$ \cite[Def.~2.9]{derks_designing_2025} as
 \begin{align*}
    \bm{H} := \bm{D}\bm{\Omega}
@@ -843,10 +844,10 @@ violate the same set of detectors, i.e.,
 \begin{align*}
    \hspace{-15mm}
    % tex-fmt: off
-                        && \bm{H} \bm{e}_1^\text{T} & \neq \bm{H} \bm{e}_2^\text{T} \\
-    \iff \hspace{-33mm} && \bm{H} \left( \bm{e}_1 - \bm{e}_2 \right)^\text{T} & \neq 0 \\
-    \iff \hspace{-33mm} && \bm{D} \bm{\Omega} \left( \bm{e}_1 - \bm{e}_2 \right)^\text{T} & \neq 0 \\
-    \iff \hspace{-33mm} && \bm{\Omega} \left( \bm{e}_1 - \bm{e}_2 \right)^\text{T} & \notin \text{kern} \{\bm{D}\}
+                        && \bm{H} \bm{e}_1^\mathsf{T} & \neq \bm{H} \bm{e}_2^\mathsf{T} \\
+    \iff \hspace{-33mm} && \bm{H} \left( \bm{e}_1 - \bm{e}_2 \right)^\mathsf{T} & \neq 0 \\
+    \iff \hspace{-33mm} && \bm{D} \bm{\Omega} \left( \bm{e}_1 - \bm{e}_2 \right)^\mathsf{T} & \neq 0 \\
+    \iff \hspace{-33mm} && \bm{\Omega} \left( \bm{e}_1 - \bm{e}_2 \right)^\mathsf{T} & \notin \text{kern} \{\bm{D}\}
    % tex-fmt: on
    .%
 \end{align*}
@@ -859,7 +860,7 @@ It may, however, change the decoding performance when using a practical decoder.

 What constitutes a good set of detectors is difficult to assess
 without performing explicit decoding simulations, since it ultimately
-depends on the decoder employed.
+depends on the employed decoder.
 For iterative decoders, high sparsity is generally beneficial, but
 finding detectors that maximize sparsity is an NP-complete problem
 \cite[Sec.~2.6]{derks_designing_2025}.
@@ -868,7 +869,7 @@ at a later stage.
 To the measurement results from each syndrome extraction round we
 can add the results from the previous round, as illustrated in
 \Cref{fig:detectors_from_measurements_general}.
-We thus have $D=n-k$.
+Thus, we have $D=n-k$.
 Concretely, we denote the outcome of
 measurement $\ell \in [1:n-k]$ in round $r \in [1:R]$ by
 $m_\ell^{(r)} \in \mathbb{F}_2$
@@ -935,9 +936,10 @@ note that the error $E_6$ in
 \Cref{fig:rep_code_multiple_rounds_phenomenological} has not only
 triggered the measurements in the syndrome extraction round immediately
 afterwards, but all subsequent ones as well.
-To only see errors in the rounds immediately following them, we
-consider our newly defined detectors instead of the measurements,
-that effectively compute the difference between the measurements.
+To only see the effect of errors in the syndrome measurement round
+immediately following them, we consider our newly defined detectors
+instead of the measurements.
+These effectively compute the difference between the measurements.

 Each error can only trigger syndrome bits that follow it.
 This is reflected in the triangular structure of $\bm{\Omega}$ in
@@ -945,7 +947,7 @@ This is reflected in the triangular structure of $\bm{\Omega}$ in
 Combining the measurements into detectors according to
 \Cref{eq:measurement_combination}, we are effectively performing
 row additions in such a way as to clear the bottom left of the matrix.
-The detector error matrix
+The resulting detector error matrix
 \begin{align*}
    \bm{H} =
    \left(
@@ -959,7 +961,7 @@ The detector error matrix
        \end{array}
    \right)
 \end{align*}
-obtained this way has a block-diagonal structure.
+has a block-diagonal structure.
 Note that we exploit the fact that each syndrome measurement round is
 identical to obtain this structure.

@@ -1008,9 +1010,8 @@ error matrix $\bm{H}$ and the noise model $\bm{p}$.
 \cite[Sec.~6]{derks_designing_2025}.
 It serves as an abstract representation of a circuit and can be used
 both to transfer information to a decoder but also to aid in the
-design of fault-tolerant systems.
-E.g., it can be used to investigate the properties of a circuit with
-respect to fault tolerance.
+design of fault-tolerant systems, e.g., it can be used to investigate
+the properties of a circuit with respect to fault tolerance.
 It contains all information necessary for the decoding process.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -1052,7 +1053,7 @@ value, the physical error rate $p_\text{phys}$.

 % Per-round LER

-Another aspect that is important to consider is the meaning of the
+Another important aspect to consider is the meaning of the
 \ac{ler} in the context of a \ac{qec} system with multiple
 rounds of syndrome measurements.
 In order to facilitate the comparability of results obtained from
@@ -1063,7 +1064,7 @@ The simplest way of calculating the per-round \ac{ler} is by modeling
 each round as an independent experiment.
 For each experiment, an error might occur with a certain probability
 $p_\text{e,round}$.
-The overall probability of error is then
+Then the overall probability of error is
 \begin{align}
    \hspace{-12mm}
    p_\text{e,total} &= 1 - (1 - p_\text{e,round})^{R} \nonumber\\
@@ -1073,13 +1074,14 @@ The overall probability of error is then
    .%
    \hspace{12mm}
 \end{align}
-We approximate $p_\text{e,total}$ using a Monte Carlo simulation and
-compute the per-round-\ac{ler} using \Cref{eq:per_round_ler}.
+To this end, we approximate $p_\text{e,total}$ using a Monte Carlo
+simulation and
+compute the per-round-\ac{ler} according to \Cref{eq:per_round_ler}.
 This is the approach taken in \cite{gong_toward_2024}\cite{wang_fully_2025}.

 Another approach \cite{chen_exponential_2021}%
 \cite{bausch_learning_2024}\cite{beni_tesseract_2025} is to assume an
-exponential decay for the decoder's \emph{logical fidelity}
+exponential decay for the \emph{logical fidelity} of the decoder
 \cite[Eq.~(2)]{bausch_learning_2024}
 \begin{align*}
    F_\text{total} = (F_\text{round})^{R}
@@ -1104,10 +1106,10 @@ topic to our own work.
 \subsection{Stim}
 \label{subsec:Stim}

-It is not immediately apparent how the \ac{dem} will look from looking
-at a code's \ac{pcm}, because it heavily depends on the exact circuit
-construction and choice of noise model.
-As we noted in \Cref{subsec:Measurement Syndrome Matrix}, we can
+It is not immediately apparent how the \ac{dem} will look from
+considering the \ac{pcm} of a code, because it heavily depends on the
+exact circuit construction and choice of noise model.
+As we noted in \Cref{subsec:Measurement Syndrome Matrix}, we
 obtain a measurement syndrome matrix by propagating Pauli frames
 through the circuit.
 The standard choice of simulation tool used for this purpose is
@@ -1118,16 +1120,16 @@ pypi package.
 In fact, it was in this tool that the concept of the \ac{dem} was
 first introduced.

-One capability of stim, and \acp{dem} in general, that we didn't go
-into detail about in this chapter is the merging of error mechanisms.
+One capability of stim, and \acp{dem} in general, that we did not
+explain in detail in this chapter, is the merging of error mechanisms.
 Since \acp{dem} differentiate errors based on their effect on the
 measurements and not on their Pauli type and location
 \cite[Sec.~1.4.3]{higgott_practical_2024}, it is natural to group
-errors that have the same effect.
+errors that have the same effect, i.e., syndrome.
 This slightly lowers the computational complexity of decoding, as the
 number of resulting \acp{vn} is reduced.

-While stim is a useful tool for circuit simulation, it doesn't
+While stim is a useful tool for circuit simulation, it does not
 include many utilities for building syndrome extraction circuitry automatically.
 The user has to define most, if not all, of the circuit manually,
 depending on the code in question.
--- a/src/thesis/chapters/4_decoding_under_dems.tex
+++ b/src/thesis/chapters/4_decoding_under_dems.tex
@@ -470,7 +470,7 @@ model and is difficult to predict beforehand.
        The block-diagonal structure reflects the time-like locality
        of the syndrome extraction circuit, with each block
        corresponding to one syndrome measurement round.
-        Two consecutive windows are highlighted: the window size $W
+        Two consecutive windows are highlighted: The window size $W
        \in \mathbb{N}$ controls the number of syndrome rounds
        included in each window, while the step size $F \in
        \mathbb{N}$ controls how many rounds separate the start of
@@ -701,7 +701,7 @@ estimates committed after decoding window $\ell$, we have to set
 \begin{align*}
    \left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}} =
    \bm{H}_\text{overlap}^{(\ell)}
-    \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}
+    \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\mathsf{T}
    .%
 \end{align*}

@@ -986,7 +986,7 @@ Note that the decoding procedure performed on the individual windows
            \State $\displaystyle\left(\hat{\bm{e}}^\text{total}\right)_{\mathcal{I}^{(\ell)}_\text{commit}} \leftarrow \hat{\bm{e}}^{(\ell)}_\text{commit}$
            \State $\displaystyle\left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}}
                \leftarrow \bm{H}_\text{overlap}^{(\ell)}
-                \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}$
+                \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\mathsf{T}$
            \If{$\ell < n_\text{win} - 1$}
                \State $L^{(\ell+1)}_{i\leftarrow j} \leftarrow
                L^{(\ell)}_{i\leftarrow j}
@@ -1013,8 +1013,8 @@ the most reliable \ac{vn}, meaning we perform a hard decision and
 remove it from the following decoding process.

 This means that when moving from one window to the next, we now have
-more information available: not just the \ac{bp} messages but also the
-information about what \acp{vn} were decimated and to what values.
+more information available: Not just the \ac{bp} messages but also the
+Information about what \acp{vn} were decimated and to what values.
 We call this \emph{decimation information} in the following.
 We can extend \Cref{alg:warm_start_bp} by additionally passing the
 decimation information after initializing the \ac{cn} to \ac{vn} messages.
@@ -1184,7 +1184,7 @@ decimation information after initializing the \ac{cn} to \ac{vn} messages.
 %             \State $\displaystyle\left(\hat{\bm{e}}^\text{total}\right)_{\mathcal{I}^{(\ell)}_\text{commit}} \leftarrow \hat{\bm{e}}^{(\ell)}_\text{commit}$
 %             \State $\displaystyle\left(\bm{s}\right)_{\mathcal{J}_\text{overlap}^{(\ell)}}
 %                 \leftarrow \bm{H}_\text{overlap}^{(\ell)}
-%                 \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\text{T}$
+%                 \left( \hat{\bm{e}}_\text{commit}^{(\ell)} \right)^\mathsf{T}$
 %             \If{$\ell < n_\text{win} - 1$}
 %                 \State $L^{(\ell+1)}_{i\leftarrow j} \leftarrow
 %                 L^{(\ell)}_{i\leftarrow j}
@@ -1404,7 +1404,7 @@ The fact that the $W = 5$ curve is already very close to the
 whole-block decoder indicates that the marginal benefit of enlarging
 the window saturates after a certain point.
 Thus, from a practical standpoint, the choice of $W$ represents a
-trade-off between decoding latency and accuracy: larger windows
+trade-off between decoding latency and accuracy: Larger windows
 delay the start of decoding by requiring more syndrome extraction
 rounds to be collected upfront, while the diminishing returns above
 $W = 4$ suggest that growing the window much further yields little
@@ -1511,7 +1511,7 @@ The dashed colored curves reproduce the cold-start results from
 corresponding warm-start runs for the same window sizes
 $W \in \{3, 4, 5\}$.
 The remaining experimental parameters are unchanged:
-the step size is fixed to $F = 1$,
+The step size is fixed to $F = 1$,
 the inner \ac{bp} decoder is allowed up to $200$ iterations per
 window invocation, the black curve again gives the whole-block
 reference, and the physical error rate is swept from $p = 0.001$ to
@@ -1707,7 +1707,7 @@ $n_\text{iter} \in [32, 512]$.

 All curves decrease monotonically with the iteration budget, but
 contrary to our expectation, none of them appears to fully saturate
-within the swept range: even at $n_\text{iter} = 4096$, every curve
+within the swept range: Even at $n_\text{iter} = 4096$, every curve
 still exhibits a noticeable downward slope.
 At $n_\text{iter} = 32$, the whole-block curve lies below both the
 $W=4$ and $W=5$ sliding-window curves.
@@ -1729,7 +1729,7 @@ mirroring the behavior already observed in \Cref{fig:whole_vs_cold_vs_warm}.
 These observations are largely consistent with the effective-iterations
 hypothesis put forward above.
 The whole-block decoder eventually overtaking every windowed scheme
-matches the prediction made there: with a sufficiently large
+matches the prediction made there: With a sufficiently large
 iteration budget, the whole-block decoder reaches an error rate
 that none of the windowed schemes can beat, because of the more global
 nature of the considered constraints.
@@ -1767,7 +1767,7 @@ sliding-window approach is still at an advantage.
 Having examined the effect of the window size $W$, we next turn to
 the second windowing parameter, the step size $F$.
 We carry out an investigation analogous to the one above:
-we first compare warm- and cold-start decoding across the full range
+We first compare warm- and cold-start decoding across the full range
 of physical error rates at a fixed iteration budget, and then we
 examine the dependence on the iteration budget at a fixed physical
 error rate.
@@ -1994,7 +1994,7 @@ At fixed $F$, the warm-start approach lies below
 cold-start across the entire sweep, and at fixed
 warm or cold start, smaller $F$ produces a lower \ac{ler}.
 Both gaps grow as the physical error rate decreases:
-the curves at $F = 1$ separate further from those at $F = 2$ and $F = 3$,
+The curves at $F = 1$ separate further from those at $F = 2$ and $F = 3$,
 and the warm-start curves separate further from the cold-start ones.
 In \Cref{fig:bp_f_over_iter}, all six curves again decrease
 monotonically with the iteration budget, with no clear saturation
@@ -2016,7 +2016,7 @@ With $W$ held fixed, decreasing $F$ enlarges the overlap between
 consecutive windows from $W - F$ to $W - F + 1$ syndrome measurement rounds, so
 a smaller step size is beneficial for the same reason that a larger
 window size is:
-each \ac{vn} in an overlap region participates in more window
+Each \ac{vn} in an overlap region participates in more window
 invocations, and the warm-start modification effectively accumulates
 iterations on it across these invocations.
 The widening of the warm/cold gap towards low iteration counts and
@@ -2281,7 +2281,7 @@ This is the opposite of what we observed for plain \ac{bp}, where
 warm-start improved upon cold-start at every parameter setting.
 The gap between the warm- and cold-start curves additionally widens
 as the physical error rate decreases:
-at the lowest sampled rate $p = 0.001$, the per-round \ac{ler} of the
+At the lowest sampled rate $p = 0.001$, the per-round \ac{ler} of the
 warm-start runs is more than two orders of magnitude above that of
 the corresponding cold-start runs.
 In \Cref{fig:bpgd_w}, larger window sizes yield lower per-round
@@ -2300,13 +2300,13 @@ than its cold-start counterpart is surprising in light of the results
 for plain \ac{bp}, where the warm-start modification was uniformly beneficial.
 The dependence on the window size in \Cref{fig:bpgd_w} is, on its own,
 consistent with the same explanation that we gave for
-\Cref{fig:whole_vs_cold}: larger windows expose the inner decoder to
+\Cref{fig:whole_vs_cold}: Larger windows expose the inner decoder to
 a larger fraction of the constraints encoded in the detector error
 matrix at the time of decoding, and this benefits both warm- and
 cold-start decoding.
 The dependence on the step size in \Cref{fig:bpgd_f}, however, is the
 opposite of the corresponding dependence under plain \ac{bp}
-(\Cref{fig:bp_f_over_p}): for warm-start, smaller $F$ now degrades performance
+(\Cref{fig:bp_f_over_p}): For warm-start, smaller $F$ now degrades performance
 rather than helps, even though smaller $F$ implies a larger overlap
 in both cases.

@@ -2564,7 +2564,7 @@ the warm-start curves now show a clear reordering as $n_\text{iter}$
 grows.
 At low iteration budgets the warm-start ordering matches the
 cold-start ordering, with $F = 1$ best and $F = 3$ worst, but at the
-largest iteration budget this ordering is fully inverted: warm-start
+largest iteration budget this ordering is fully inverted: Warm-start
 $F = 1$ is now the worst and $F = 3$ the best.

 % [Interpretation] Figure 4.11
@@ -2596,7 +2596,7 @@ decoding performance.
 The same mechanism explains the inversion of the step-size ordering
 in \Cref{fig:bpgd_iter_F}.
 At low iteration budgets, the ordering is set by the same overlap
-argument as for plain \ac{bp}: smaller $F$ implies a larger overlap
+argument as for plain \ac{bp}: Smaller $F$ implies a larger overlap
 between consecutive windows, more shared messages, and therefore
 better warm-start performance.
 At large iteration budgets, the ordering is set by the premature hard
@@ -2777,7 +2777,7 @@ since the decimation decisions were made based on the messages themselves.
 \Cref{fig:bpgd_msg} repeats the experiment of \Cref{fig:bpgd_wf}
 with the modified warm-start procedure that carries over only the
 \ac{bp} messages.
-All other experimental parameters are unchanged: the maximum number
+All other experimental parameters are unchanged: The maximum number
 of inner \ac{bp} iterations is $n_\text{iter} = 5000$, and the
 physical error rate is swept from $p = 0.001$ to $p = 0.004$ in steps
 of $0.0005$.
@@ -2810,7 +2810,7 @@ the warm-start regression observed in \Cref{fig:bpgd_wf},
 and warm-start now consistently outperforms cold-start.
 The dependence on the window size and the step size also recovers
 the qualitative behavior we observed for plain \ac{bp} in
-\Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}: a larger overlap
+\Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}: A larger overlap
 between consecutive windows, achieved either by enlarging $W$ or by
 decreasing $F$, both improves the absolute decoding performance and
 increases the warm-start advantage over cold-start.
@@ -2994,7 +2994,7 @@ cold-start curves across the entire range of $n_\text{iter}$ available to us.
 \Cref{fig:bpgd_msg_iter} repeats the experiment of
 \Cref{fig:bpgd_iter} with the modified warm-start procedure that
 carries over only the \ac{bp} messages.
-All other experimental parameters are unchanged: the physical error
+All other experimental parameters are unchanged: The physical error
 rate is fixed at $p = 0.0025$ and the iteration budget is swept over
 $n_\text{iter} \in \{32, 128, 256, 512, 1024, 1536, 2048, 2560,
 3072, 3584, 4096\}$.
@@ -3026,7 +3026,7 @@ initialization no longer freezes any \acp{vn} in the next window.
 The dependence of this benefit on $W$ and $F$ also recovers the
 pattern observed for plain \ac{bp} in
 \Cref{fig:whole_vs_cold_vs_warm,fig:bp_f_over_p}:
-larger overlap, achieved by larger $W$ or smaller $F$, yields more
+Larger overlap, achieved by larger $W$ or smaller $F$, yields more
 effective extra iterations and therefore a larger warm-start gain.

 % BPGD conclusion
@@ -3048,7 +3048,7 @@ cold-start that follows the same behavior as for plain \ac{bp} with
 regard to overlap.
 A second observation specific to \ac{bpgd} is that its iteration
 requirements are substantially larger than those of plain \ac{bp}:
-the per-round \ac{ler} drops sharply only once the iteration budget
+The per-round \ac{ler} drops sharply only once the iteration budget
 is on the order of the number of \acp{vn} in each window.

 Future work could include a softer treatment of the decimation state
--- a/src/thesis/chapters/5_conclusion_and_outlook.tex
+++ b/src/thesis/chapters/5_conclusion_and_outlook.tex
@@ -7,7 +7,7 @@ This thesis investigates decoding under \acp{dem} for fault-tolerant
 \ac{qec}, with a focus on low-latency decoding methods for \ac{qldpc} codes.
 The repetition of the syndrome measurements, especially under
 consideration of circuit-level noise, leads to a significant increase
-in decoding complexity: in our experiments on the $\llbracket
+in decoding complexity: In our experiments on the $\llbracket
 144,12,12 \rrbracket$ \ac{bb} code with $12$ syndrome extraction
 rounds, the check matrix grows from 144 \acp{vn} and 72
 \acp{cn} to 9504 \acp{vn} and 1008 \acp{cn}.
@@ -46,18 +46,18 @@ min-sum algorithm.
 For standard min-sum \ac{bp}, the warm start is consistently
 beneficial to the cold start, across the considered parameter ranges.
 The size of the gain depends on the overlap between consecutive
-windows: enlarging $W$ or shrinking $F$, both of which enlarge the
+windows: Enlarging $W$ or shrinking $F$, both of which enlarge the
 overlap, result in larger gains of the warm-start.
 We observe that the underlying mechanism is an effective increase in
 the number of \ac{bp} iterations spent on the \acp{vn} in the overlap
-region: each such \ac{vn} is processed by multiple consecutive window
+region: Each such \ac{vn} is processed by multiple consecutive window
 invocations, and the warm start lets these invocations accumulate
 iterations on the same \acp{vn} rather than restarting from scratch.
 The gain was most pronounced at low numbers of maximum iterations, where
 every additional iteration carries proportionally more information.

 For \ac{bpgd}, we note that more information is available in the
-overlap region of a window: in addition to the \ac{bp} messages,
+overlap region of a window: In addition to the \ac{bp} messages,
 there is information about which \acp{vn} were decimated and to what value.
 Passing this decimation information to the next window in addition to
 the messages turned out to worsen the performance considerably, which
@@ -66,7 +66,7 @@ overlap region.
 Restricting the warm start to the \ac{bp} messages alone, removed this effect.
 The resulting message-only warm start recovered a consistent
 improvement over cold-start that followed the same qualitative
-behaviour as for standard \ac{bp}: larger overlap, achieved by larger
+behaviour as for standard \ac{bp}: Larger overlap, achieved by larger
 $W$ or smaller $F$, yielded a larger gain, and the
 performance difference is most pronounced at low numbers of maximum iterations.

--- a/src/thesis/chapters/abstract.tex
+++ b/src/thesis/chapters/abstract.tex
@@ -49,7 +49,7 @@ For both standard \ac{bp} and \ac{bpgd} decoding, the warm-start
 initialization provides a consistent improvement across all examined
 parameter settings.
 We attribute this to an effective increase in \ac{bp} iterations on
-variable nodes in the overlap regions: each such VN is processed by
+variable nodes in the overlap regions: Each such VN is processed by
 multiple consecutive windows, and warm-starting lets these
 invocations accumulate iterations rather than restart from scratch.
 Crucially, the warm-start modification incurs no additional
--- a/src/thesis/main.tex
+++ b/src/thesis/main.tex
@@ -90,10 +90,10 @@
 % \thesisHeadOfInstitute{Prof. Dr.-Ing. Peter Rost}
 %\thesisHeadOfInstitute{Prof. Dr.-Ing. Peter Rost\\Prof. Dr.-Ing.
 % Laurent Schmalen}
-\thesisSupervisor{M.Sc. Jonathan Mandelbaum}
-\thesisStartDate{01.11.2025}
-\thesisEndDate{04.05.2026}
-\thesisSignatureDate{04.05.2026}
+\thesisSupervisor{Dr.-Ing. Hedongliang Liu\\ && M.Sc. Jonathan Mandelbaum}
+\thesisStartDate{Nov. 1st, 2025}
+\thesisEndDate{May 4th, 2026}
+\thesisSignatureDate{May 4th, 2026}
 \thesisSignature{res/Unterschrift_AT_blue.png}
 \thesisSignatureHeight{2.4cm}
 \thesisLanguage{english}
@@ -109,9 +109,11 @@
 \cleardoublepage
 \pagenumbering{arabic}

+\newgeometry{a4paper,left=3cm,right=3cm,top=2cm,bottom=2.5cm}                               
 \addtocontents{toc}{\protect\vspace*{-9mm}}
-\tableofcontents
+\tableofcontents                                                                            
 \cleardoublepage
+\restoregeometry                                                                            

 \input{chapters/1_introduction.tex}
 \input{chapters/2_fundamentals.tex}
@@ -124,10 +126,10 @@
 % \listoftables
 % \include{abbreviations}

-% \cleardoublepage
-% \phantomsection
-% \addcontentsline{toc}{chapter}{List of Abbreviations}
-% \printacronyms
+\cleardoublepage
+\phantomsection
+\addcontentsline{toc}{chapter}{List of Abbreviations}
+\printacronyms

 \bibliography{lib/cel-thesis/IEEEabrv,src/thesis/bibliography}
Author	SHA1	Message	Date
Andreas Tsouchlos	4dfb3a7c35	Last changes	2026-05-06 01:58:49 +02:00
Andreas Tsouchlos	10d791fe04	Final readthrough corrections of quantum fundamentals	2026-05-04 23:04:28 +02:00
Andreas Tsouchlos	06852b8e62	Final readthrough corrections of classical fundamentals	2026-05-04 21:07:25 +02:00
Andreas Tsouchlos	400dc47df0	Incorporate Jonathan's corrections to classical fundamentals	2026-05-04 20:56:35 +02:00
Andreas Tsouchlos	ece8fc1715	Center error marker	2026-05-04 20:24:27 +02:00
Andreas Tsouchlos	56e3a0e5ca	Consistently capitalize character after semicolon	2026-05-04 20:21:21 +02:00
Andreas Tsouchlos	8d6df8a79d	Final readthrough corrections for fault tolerance chapter	2026-05-04 20:06:18 +02:00
Andreas Tsouchlos	c41ac9f61f	Incorporate Jonathan's corrections to Fault Tolerance Chapter	2026-05-04 19:45:15 +02:00
Andreas Tsouchlos	a41e0b05fe	Add Lia as supervisor	2026-05-04 19:20:08 +02:00