diff --git a/src/thesis/chapters/2_fundamentals.tex b/src/thesis/chapters/2_fundamentals.tex
index 9269d5c..660a101 100644
--- a/src/thesis/chapters/2_fundamentals.tex
+++ b/src/thesis/chapters/2_fundamentals.tex
@@ -106,7 +106,7 @@ exponentially with $n$, in contrast to keeping track of all codewords directly.
 % The decoding problem
 %
 
-Figure \ref{fig:Diagram of a transmission system} visualizes the
+\Cref{fig:Diagram of a transmission system} visualizes the
 communication process \cite[Sec.~1.1]{ryan_channel_2009}.
 An input message $\bm{u}\in \mathbb{F}_2^k$ is mapped onto a codeword $\bm{x}
 \in \mathbb{F}_2^n$. This is passed on to a modulator, which
@@ -197,7 +197,7 @@ bits, and \acp{cn}, corresponding to individual parity checks.
 We then construct the Tanner graph by connecting each \ac{cn} to
 the \acp{vn} that make up the corresponding parity check
 \cite[Sec.~5.1.2]{ryan_channel_2009}.
-Figure \ref{PCM and Tanner graph of the Hamming code} shows this
+\Cref{PCM and Tanner graph of the Hamming code} shows this
 construction for the [7,4,3]-Hamming code.
 %
 \begin{figure}[t]
@@ -286,7 +286,7 @@ $\mathcal{N}_\text{C} (j) = \left\{ i \in \mathcal{I} : \bm{H}_{j,i}
 We typically evaluate the performance of LDPC codes using the
 \ac{ber} or the \ac{fer} (a \textit{frame} referes to one whole
 transmitted block in this context).
-Considering an \ac{awgn} channel, \autoref{fig:ldpc-perf} shows a
+Considering an \ac{awgn} channel, \Cref{fig:ldpc-perf} shows a
 qualitative performance characteristic of an \ac{ldpc} code
 \cite[Fig.~1]{costello_spatially_2014}. We talk of the
 \textit{waterfall} and the \textit{error floor} regions.
@@ -415,7 +415,7 @@ This is achieved by connecting some \acp{vn} of one spatial position to
 where $K \in \mathbb{N}$ is the \textit{coupling width} and $L \in
 \mathbb{N}$ is the number of spatial positions.
 This construction results in a Tanner graph as depicted in
-\autoref{fig:sc-ldpc-tanner}.
+\Cref{fig:sc-ldpc-tanner}.
 
 \begin{figure}[t]
     \centering
@@ -701,14 +701,14 @@ formula simplifies to the direct calculation of the expected value.
 
 Let us now examine how the observable operator $\hat{Q}$ relates to
 the determinate states of the observable quantity.
-We begin by translating \autoref{eq:gen_expr_Q_exp} into linear algebra as
+We begin by translating \Cref{eq:gen_expr_Q_exp} into linear algebra as
 \cite[Eq.~3.114]{griffiths_introduction_1995}
 \begin{align}
     \label{eq:gen_expr_Q_exp_lin}
     \braket{Q} = \braket{\psi \vert \hat{Q}\psi}
     .%
 \end{align}
-\autoref{eq:gen_expr_Q_exp_lin} expresses an inherently probabilistic
+\Cref{eq:gen_expr_Q_exp_lin} expresses an inherently probabilistic
 relationship.
 The determinate states are inherently deterministic.
 To relate the two, we note that since determinate states should
@@ -757,8 +757,8 @@ We can use the determinate states for this purpose, expressing the state as%
 Because of the normalization of the wave function such that
 $\int_{-\infty}^{\infty} \lvert \psi(x,t) \rvert^2 dx = 1$, we have
 $\sum_{n=1}^{\infty} \lvert c_n \rvert ^2 = 1$.
-Inserting \autoref{eq:determinate_basis} into
-\autoref{eq:gen_expr_Q_exp_lin} we obtain
+Inserting \Cref{eq:determinate_basis} into
+\Cref{eq:gen_expr_Q_exp_lin} we obtain
 % tex-fmt: off
 \cite[Prob.~3.35c)]{griffiths_introduction_1995}
 % tex-fmt: on
@@ -795,7 +795,7 @@ referring to the operator $\hat{Q}$.
 % Projective measurements
 
 The measurements we considered in the previous section, for which
-\autoref{eq:gen_expr_Q_exp_lin} holds, belong to the category of
+\Cref{eq:gen_expr_Q_exp_lin} holds, belong to the category of
 \emph{projective measurements}.
 For these, certain restrictions such as repeatability apply: the act
 of measuring a quantum state should \emph{collapse} it onto one of
@@ -809,8 +809,8 @@ they are not relevant to this work.
 
 We can model the collapse of the original state onto one of the
 superimposed basis states as a \emph{projection}.
-To see this, we use Equations \ref{eq:determinate_basis} and
-\ref{eq:observable_eigenrelation} to compute
+To see this, we use
+\Cref{eq:determinate_basis,eq:observable_eigenrelation} to compute
 \begin{align*}
     \hat{Q}\ket{\psi} = \sum_{n=1}^{\infty} c_n \hat{Q} \ket{e_n}
     = \sum_{n=1}^{\infty} \lambda_n c_n \ket{e_n}
@@ -881,7 +881,8 @@ We fix an orthonormal basis of $\mathbb{C}^2$ to be
     .%
 \end{align*}
 A qubit is defined to be a system with quantum state
-\begin{align*}
+\begin{align}
+    \label{eq:gen_qubit_state}
     \ket{\psi} =
     \begin{pmatrix}
         \alpha \\
@@ -889,7 +890,7 @@ A qubit is defined to be a system with quantum state
     \end{pmatrix}
     = \alpha \ket{0} + \beta \ket{1}
     .%
-\end{align*}
+\end{align}
 The overall state of a composite quantum system is described using
 the \emph{tensor product}, denoted as $\otimes$
 \cite[Sec.~2.2.8]{nielsen_quantum_2010}.
@@ -950,7 +951,7 @@ information is stored in the correlations between the qubits
 
 % The size of the vector space
 
-As we can see in \autoref{eq:product_state}, the number of
+As we can see in \Cref{eq:product_state}, the number of
 computational basis states needed to express the full composite state
 is $2^n$.
 This is in contrast to classical systems, where the dimensionality of
@@ -968,7 +969,7 @@ we now shift our focus to describing the evolution of their states.
 We model state changes as operators.
 Unlike classical systems, where there are only two possible states and
 thus the only possible state change is a bit-flip, a general qubit
-state as shown in \autoref{eq:gen_qubit_state} lives on a continuum of values.
+state as shown in \Cref{eq:gen_qubit_state} lives on a continuum of values.
 We thus technically also have an infinite number of possible state changes.
 Fortunately, we can express any operator as a linear combination of the
 \emph{Pauli operators} \cite[Sec.~2.2]{gottesman_stabilizer_1997}
@@ -1083,8 +1084,8 @@ the gate to the corresponding qubit, where a filled dot is placed.
 A controlled gate applies the respective operation only if the
 control qubit is in state $\ket{1}$.
 An example of this is the CNOT gate introduced in
-\autoref{subsec:Qubits and Multi-Qubit States}, which is depicted in
-\autoref{fig:cnot_circuit}.
+\Cref{subsec:Qubits and Multi-Qubit States}, which is depicted in
+\Cref{fig:cnot_circuit}.
 
 \begin{figure}[t]
     \centering
@@ -1127,7 +1128,7 @@ Three main restrictions apply \cite[Sec.~2.4]{roffe_quantum_2019}:
         impossible to exactly copy the state of one qubit into another.
     \item Qubits are susceptible to more types of errors than
         just bit-flips, as we saw in
-        \autoref{subsec:Qubits and Multi-Qubit States}.
+        \Cref{subsec:Qubits and Multi-Qubit States}.
     \item Directly measuring the state of a qubit collapses it onto
         one of the determinate states, thereby potentially destroying
         information.
@@ -1198,7 +1199,7 @@ whether a state belongs
 %     $\mathcal{C}$ or $\mathcal{F}$ with a certain probability.
 % }
 to $\mathcal{C}$ or $\mathcal{F}$.
-As explained in \autoref{subsec:Observables}, physical measurements
+As explained in \Cref{subsec:Observables}, physical measurements
 can be mathematically described using operators whose eigenvalues
 are the possible measurement results.
 Here, we need an operator with two eigenvalues and the corresponding
@@ -1225,7 +1226,7 @@ ancilla qubit with state $\ket{0}_\text{A}$ and entangle it with
 $\ket{\psi}_\text{L}$ in such a way that the eigenvalue is indicated
 by measuring the ancilla qubit instead.
 More specifically, using a stabilizer measurement circuit as shown in
-\autoref{fig:stabilizer_measurement}, we transform the state of the
+\Cref{fig:stabilizer_measurement}, we transform the state of the
 three-qubit system as
 \begin{align}
     \label{eq:error_projection}
@@ -1270,7 +1271,7 @@ lies either in one or the other.
 This is because the act of measuring the error partly collapses the
 state, eliminating the uncertainty about the type of the error
 \cite[Sec.~10.2]{nielsen_quantum_2010}.
-This can be seen in \autoref{eq:error_projection}, as the expressions
+This can be seen in \Cref{eq:error_projection}, as the expressions
 $P_\mathcal{C}$ and $P_\mathcal{F}$ constitute projection operators onto
 $\mathcal{C}$ and $\mathcal{F}$.
 E.g., $P_\mathcal{C}$ will eliminate all components of $E
@@ -1348,7 +1349,7 @@ Similar to the classical case, we can use a syndrome vector to
 describe which local codes are violated.
 To obtain the syndrome, we simply measure the corresponding
 operators $P_i$, each using a circuit as explained in
-\autoref{subsec:Stabilizer Measurements}.
+\Cref{subsec:Stabilizer Measurements}.
 Note that this is an abstract representation of the syndrome extraction.
 For the actual implementation in hardware, we can transform this into
 a circuit that requires only CNOT and H-gates
@@ -1444,7 +1445,7 @@ vice versa, this property translates into being able to split the
 stabilizers into a subset being made up of only $X$
 operators and the rest only of $Z$ operators.
 We call such codes \ac{css} codes.
-We can see this property in \autoref{eq:steane} in the check matrix
+We can see this property in \Cref{eq:steane} in the check matrix
 of the Steane code.
 
 % Construction
@@ -1514,7 +1515,7 @@ $\bm{H}_Z$ are constructed from two matrices $\bm{A}$ and $\bm{B}$ as
     .%
 \end{align*}
 This way, we can guarantee the satisfaction of the commutativity
-condition (\autoref{eq:css_condition}).
+condition (\Cref{eq:css_condition}).
 To define $\bm{A}$ and $\bm{B}$ we first introduce some additional notation.
 We denote the identity matrix as $\bm{I_l} \in \mathbb{F}^{l\times l}$ and
 the \emph{cyclic shift matrix} as $\bm{S_l} \in \mathbb{F}^{l\times
@@ -1543,11 +1544,11 @@ and thus lower error rates \cite[Sec.~1]{bravyi_high-threshold_2024}.
 
 % Syndrome-based BP
 
-As we saw in \autoref{subsec:Stabilizer Measurements}, we work only
+As we saw in \Cref{subsec:Stabilizer Measurements}, we work only
 with the parity information contained in the syndrome, to avoid
 disturbing the quantum states of individual qubits.
 This necessitates a modification of the standard \ac{bp} algorithm
-introduced in \autoref{subsec:Iterative Decoding}
+introduced in \Cref{subsec:Iterative Decoding}
 \cite[Sec.~3.1]{yao_belief_2024}.
 Instead of attempting to find the most likely codeword directly, the
 algorithm will now try to find an error pattern $\hat{\bm{e}} \in
@@ -1571,7 +1572,7 @@ indicated by the syndrome, calculating
     .
 \end{align*}
 The resulting syndrome-based \ac{bp} algorithm is shown in
-algorithm \ref{alg:syndome_bp}.
+\Cref{alg:syndome_bp}.
 
 % tex-fmt: off
 \tikzexternaldisable
@@ -1639,7 +1640,7 @@ direction to proceed in \cite[Sec.~5]{yao_belief_2024}.
 Another problem is that due to the commutativity property of the stabilizers,
 quantum codes inherently contain short cycles
 \cite[Sec.~IV.C]{babar_fifteen_2015}.
-As discussed in \autoref{subsec:Iterative Decoding}, these lead to
+As discussed in \Cref{subsec:Iterative Decoding}, these lead to
 the violation of the independence assumption of the messages passed
 during decoding, impeding performance.
 
@@ -1656,7 +1657,7 @@ a hard decision and excluding it from further decoding.
 This constrains the solution space more and more as the decoding
 progresses, encouraging the algorithm to converge to one of the
 solutions \cite[Sec.~5]{yao_belief_2024}.
-Algorithm \ref{alg:bpgd} shows this process.
+\Cref{alg:bpgd} shows this process.
 Note that as the Tanner graph only has $n$ \acp{vn}, this is a
 natural constraint on the maximum number of outer iterations of the algorithm.
 
diff --git a/src/thesis/chapters/3_fault_tolerant_qec.tex b/src/thesis/chapters/3_fault_tolerant_qec.tex
index 88c4c36..95de0f2 100644
--- a/src/thesis/chapters/3_fault_tolerant_qec.tex
+++ b/src/thesis/chapters/3_fault_tolerant_qec.tex
@@ -53,7 +53,7 @@ indicating which errors occurred, with
     \end{cases}
     .%
 \end{align*}
-\autoref{fig:fault_tolerance_overview} illustrates the flow of errors.
+\Cref{fig:fault_tolerance_overview} illustrates the flow of errors.
 Specifically for \ac{css} codes, a \ac{qec} procedure is deemed
 fault-tolerant, if \cite[Def.~4.2]{derks_designing_2025}
 \begin{gather*}
@@ -170,15 +170,15 @@ This is a code with check matrix
     .
 \end{gather}
 We can see that it has stabilizers $Z_1Z_2$ and $Z_2Z_3$.
-\autoref{fig:pure_syndrome_extraction} shows the corresponding
+\Cref{fig:pure_syndrome_extraction} shows the corresponding
 syndrome extraction circuit.
 We refer to the qubits carrying the logical state
 $\ket{\psi}_\text{L}$ as \emph{data qubits}.
 Note that this is a concrete implementation using CNOT gates, as
 opposed to the system-level view introduced in
-\autoref{subsec:Stabilizer Codes}.
+\Cref{subsec:Stabilizer Codes}.
 We visualize the different types of noise models in
-\autoref{fig:noise_model_types}.
+\Cref{fig:noise_model_types}.
 
 %%%%%%%%%%%%%%%%
 \subsection{Bit-Flip Noise}
@@ -187,7 +187,7 @@ We visualize the different types of noise models in
 The simplest type of noise model is \emph{bit-flip} noise.
 This corresponds to the classical \ac{bsc}, i.e., only $X$ errors on the
 data qubits are possible \cite[Appendix~A]{gidney_new_2023}.
-This type of noise model is shown in \autoref{subfig:bit_flip}.
+This type of noise model is shown in \Cref{subfig:bit_flip}.
 
 Note that we cannot use bit-flip noise to develop fault-tolerant
 systems, as it doesnt't account for errors during the syndrome extraction.
@@ -199,7 +199,7 @@ systems, as it doesnt't account for errors during the syndrome extraction.
 Extending bit-flip noise to consider $X,Z$ or $Y$ instead of just $X$
 errors, we obtain the \emph{depolarizing channel}
 \cite[Sec.~7.6]{gottesman_stabilizer_1997}, depicted in
-\autoref{subfig:depolarizing}.
+\Cref{subfig:depolarizing}.
 It is well-suited for modeling memory experiments, where data qubits
 are stored idly for some period of time and errors accumulate due to
 decoherence.
@@ -223,7 +223,7 @@ locations right before each measurement \cite[Appendix~A]{gidney_new_2023}.
 Note that it is enough to only consider $X$ errors at these points,
 since that is the only type of error directly affecting the
 measurement outcomes.
-This model is depicted in \autoref{subfig:phenomenological}.
+This model is depicted in \Cref{subfig:phenomenological}.
 
 While not fully capturing all possible error mechanisms,
 phenomenological noise is already a significant step beyond the code
@@ -244,7 +244,7 @@ Specifically, we allow arbitrary $n$-qubit Pauli errors after each
 $n$-qubit gate \cite[Def.~2.5]{derks_designing_2025}.
 An $n$-qubit Pauli error is simply a series of correlated Pauli
 errors on each related individual qubit.
-This type of noise model is shown in \autoref{subfig:circuit_level}.
+This type of noise model is shown in \Cref{subfig:circuit_level}.
 
 While phenomenological noise is useful for some design aspects of
 fault tolerant circuitry, for simulations, circuit-level noise should
@@ -457,7 +457,7 @@ circuit, tracking which measurements they affect
 
 We turn to our example of the three-qubit repetition code to
 illustrate the construction of the syndrome measurement matrix.
-We begin by extending our check matrix in \autoref{eq:rep_code_H}
+We begin by extending our check matrix in \Cref{eq:rep_code_H}
 to represent three rounds of syndrome extraction.
 Each round yields an additional set of syndrome bits,
 and we combine them by stacking them in a new vector
@@ -476,7 +476,7 @@ additional syndrome measurement, to obtain
     \end{pmatrix}
     .%
 \end{align*}
-\autoref{fig:rep_code_multiple_rounds_bit_flip}
+\Cref{fig:rep_code_multiple_rounds_bit_flip}
 depicts the corresponding circuit.
 Note that we have not yet introduced error locations in the syndrome
 extraction circuitry, so we still consider only bit flip noise at this stage.
@@ -499,7 +499,7 @@ We now wish to expand the error model to phenomenological noise, though
 only considering $X$ errors in this case.
 We introduce new error locations at the appropriate positions,
 arriving at the circuit depicted in
-\autoref{fig:rep_code_multiple_rounds_phenomenological}.
+\Cref{fig:rep_code_multiple_rounds_phenomenological}.
 For each additional error location, we extend $\bm{\Omega}$ by
 appending the corresponding syndrome vector as a column.
 \begin{gather}
@@ -823,7 +823,7 @@ For two detector matrices $\bm{D}_1$ and $\bm{D}_2$, as long as
 \end{gather}
 they describe the same set of possible measurement outcomes (under
 the absence of noise) and thus the same circuit.
-In fact, as long as \autoref{eq:kern_condition} holds, the detector
+In fact, as long as \Cref{eq:kern_condition} holds, the detector
 error matrices we construct from them can distinguish between the
 same pairs of error sets \cite[Lemma~6]{derks_designing_2025}.
 To see this, we note that we can distinguish between two circuit
@@ -856,7 +856,7 @@ There is, however, one way of defining the detectors that will prove useful
 at a later stage.
 To the measurement results from each syndrome extraction round we
 can add the results from the previous round, as illustrated in
-\autoref{fig:detectors_from_measurements_general}.
+\Cref{fig:detectors_from_measurements_general}.
 We thus have $D=n-k$.
 Concretely, we denote the outcome of
 measurement $\ell \in \{1,\ldots,n-k\}$ in round $r \in \{1,\ldots,R\}$ by
@@ -912,15 +912,15 @@ with $\bm{m}^{(0)} = \bm{0}$.
 \end{figure}
 
 We again turn our attention to the three-qubit repetition code.
-In \autoref{fig:rep_code_multiple_rounds_phenomenological} we can see
+In \Cref{fig:rep_code_multiple_rounds_phenomenological} we can see
 that $E_6$ has occurred and has subsequently tripped the last four measurements.
 We now take those measurements and combine them according to
-\autoref{eq:measurement_combination}.
+\Cref{eq:measurement_combination}.
 We can see this process graphically in
-\autoref{fig:detectors_from_measurements_rep_code}.
+\Cref{fig:detectors_from_measurements_rep_code}.
 To understand why this way of defining the detectors is useful, we
 note that the error $E_6$ in
-\autoref{fig:rep_code_multiple_rounds_phenomenological} has not only
+\Cref{fig:rep_code_multiple_rounds_phenomenological} has not only
 tripped the measurements in the syndrome extraction round immediately
 afterwards, but all subsequent ones as well.
 To only see errors in the rounds immediately following them, we
@@ -929,9 +929,9 @@ that effectively compute the difference between the measurements.
 
 Each error can only trip syndrome bits that follow it.
 This is reflected in the triangular structure of $\bm{\Omega}$ in
-\autoref{eq:syndrome_matrix_ex}.
+\Cref{eq:syndrome_matrix_ex}.
 Combining the measurements into detectors according to
-\autoref{eq:measurement_combination}, we are effectively performing
+\Cref{eq:measurement_combination}, we are effectively performing
 row additions in such a way as to clear the bottom left of the matrix.
 The detector error matrix
 \begin{align*}
@@ -1062,7 +1062,7 @@ The overall probability of error is then
     \hspace{12mm}
 \end{align}
 We approximate $p_\text{e,total}$ using a Monte Carlo simulation and
-compute the per-round-\ac{ler} using \autoref{eq:per_round_ler}.
+compute the per-round-\ac{ler} using \Cref{eq:per_round_ler}.
 This is a common approach taken in the literature
 \cite{gong_toward_2024}\cite{wang_fully_2025}.
 
@@ -1086,7 +1086,7 @@ As it is related to the error rate through $F = 1 - 2p$, we obtain
 \end{align}
 
 We have chosen to use the first approach, i.e.,
-\autoref{eq:per_round_ler}, as the related literature is closer in
+\Cref{eq:per_round_ler}, as the related literature is closer in
 topic to our own work.
 
 %%%%%%%%%%%%%%%%
@@ -1096,7 +1096,7 @@ topic to our own work.
 It is not immediately apparent how the \ac{dem} will look from looking
 at a code's \ac{pcm}, because it heavily depends on the exact circuit
 construction and choice of noise model.
-As we noted in \autoref{subsec:Measurement Syndrome Matrix}, we can
+As we noted in \Cref{subsec:Measurement Syndrome Matrix}, we can
 obtain a measurement syndrome matrix by propagating Pauli frames
 through the circuit.
 The standard choice of simulation tool used for this purpose is
diff --git a/src/thesis/main.tex b/src/thesis/main.tex
index 3530a69..d767888 100644
--- a/src/thesis/main.tex
+++ b/src/thesis/main.tex
@@ -27,6 +27,7 @@
 \usepackage[noEnd=false]{algpseudocodex}
 \usepackage{nicematrix}
 \usepackage{colortbl}
+\usepackage{cleveref}
 
 \usetikzlibrary{calc, positioning, arrows, fit}
 \usetikzlibrary{external}
@@ -38,6 +39,11 @@
 
 \setcounter{MaxMatrixCols}{20}
 
+\Crefname{equation}{}{}
+\Crefname{section}{Section}{Sections}
+\Crefname{subsection}{Subsection}{Subsections}
+\Crefname{figure}{Figure}{Figures}
+
 %
 %
 % Custom commands