diff --git a/latex/thesis/chapters/theoretical_background.tex b/latex/thesis/chapters/theoretical_background.tex
index 95c71d5..b8d10e3 100644
--- a/latex/thesis/chapters/theoretical_background.tex
+++ b/latex/thesis/chapters/theoretical_background.tex
@@ -28,7 +28,8 @@ Additionally, a shorthand notation will be used to denote series of indices and
 of indexed variables:%
 %
 \begin{align*}
-    \left[ m:n \right]      &:= \left\{ m, m+1, \ldots, n-1, n \right\} \\
+    \left[ m:n \right]      &:= \left\{ m, m+1, \ldots, n-1, n \right\},
+        \hspace{5mm} m,n\in\mathbb{Z}\\
     x_{\left[ m:n \right] } &:= \left\{ x_m, x_{m+1}, \ldots, x_{n-1}, x_n \right\} 
 .\end{align*}
 %
@@ -40,7 +41,7 @@ and the \textit{Hadamard power}, the operator $\circ$ will be used:%
         &:= \begin{bmatrix} a_1 b_1 & \ldots & a_n b_n \end{bmatrix} ^\text{T},
     \hspace{5mm} &&\boldsymbol{a}, \boldsymbol{b} \in \mathbb{R}^n, \hspace{2mm} n\in \mathbb{N} \\
     \boldsymbol{a}^{\circ k} &:= \begin{bmatrix} a_1^k \ldots a_n^k \end{bmatrix}^\text{T},
-    \hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}k\in \mathbb{Z}
+    \hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}n\in \mathbb{N}, k\in \mathbb{Z}
 .\end{alignat*}
 %
 
@@ -59,11 +60,12 @@ This is known as modulation. The modulation scheme chosen here is \ac{BPSK}:%
 .\end{align*}
 %
 The symbol that reaches the receiver, $\boldsymbol{y}$, is distorted by the channel.
-This distortion is described by the channel model, which here is chosen to be \ac{AWGN}:%
+This distortion is described by the channel model, which in the context of
+this thesis is chosen to be \ac{AWGN}:%
 %
 \begin{align*}
-    \boldsymbol{y} = \boldsymbol{x} + \boldsymbol{z},
-        \hspace{5mm} z_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right),
+    \boldsymbol{y} = \boldsymbol{x} + \boldsymbol{n},
+        \hspace{5mm} n_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right),
             \hspace{2mm} i \in \left[ 1:n \right] 
 .\end{align*}
 %
@@ -81,11 +83,11 @@ conducting this process, whereby \textit{data words} are mapped onto longer
 \textit{codewords}, which carry redundant information.
 \Ac{LDPC} codes have become especially popular, since they are able to
 reach arbitrarily small probabilities of error at coderates up to the capacity
-of the channel \cite[Sec. II.B.]{mackay_rediscovery} and their structure allows
-for very efficient decoding.
+of the channel \cite[Sec. II.B.]{mackay_rediscovery} while having a structure
+that allows for very efficient decoding.
 
-The lengths of the data words and codewords are denoted by $k$ and $n$,
-respectively.
+The lengths of the data words and codewords are denoted by $k\in\mathbb{N}$
+and $n\in\mathbb{N}$, respectively.
 The set of codewords $\mathcal{C} \subset \mathbb{F}_2^n$ of a binary
 linear code can be represented using the \textit{parity-check matrix}
 $\boldsymbol{H} \in \mathbb{F}_2^{m\times n}$, where $m$ represents
@@ -101,7 +103,7 @@ $\boldsymbol{c} \in \mathbb{F}_2^n$ using the \textit{generator matrix}
 $\boldsymbol{G} \in \mathbb{F}_2^{k\times n}$:%
 %
 \begin{align*}
-    \boldsymbol{c} = \boldsymbol{u}\boldsymbol{G}
+    \boldsymbol{c} = \boldsymbol{u}^\text{T}\boldsymbol{G}
 .\end{align*}
 %
 
@@ -110,7 +112,7 @@ as described in section \ref{sec:theo:Preliminaries: Channel Model and Modulatio
 The received signal $\boldsymbol{y}$ is then decoded to obtain
 an estimate of the transmitted codeword, $\hat{\boldsymbol{c}}$.
 Finally, the encoding procedure is reversed and an estimate of the originally
-sent data word, $\hat{\boldsymbol{u}}$, is obtained.
+sent data word, $\hat{\boldsymbol{u}}$, is produced.
 The methods examined in this work are all based on \textit{soft-decision} decoding,
 i.e., $\boldsymbol{y}$ is considered to be in $\mathbb{R}^n$ and no preliminary decision
 is made by a demodulator.
@@ -156,9 +158,6 @@ figure \ref{fig:theo:channel_overview}.%
     \label{fig:theo:channel_overview}
 \end{figure}
 
-\todo{Explicitly mention $\boldsymbol{n}$}
-\todo{Mapper $\to$ Modulator?}
-
 The decoding process itself is generally based either on the \ac{MAP} or the \ac{ML}
 criterion:%
 %
@@ -183,7 +182,7 @@ This is especially true for \ac{LDPC} codes, as the established decoding
 algorithms are \textit{message passing algorithms}, which are inherently
 graph-based.
 
-Binary linear codes with a parity-check matrix $\boldsymbol{H}$ can be
+A binary linear code with a parity-check matrix $\boldsymbol{H}$ can be
 visualized using a \textit{Tanner} or \textit{factor graph}:
 Each row of $\boldsymbol{H}$, which represents one parity-check, is viewed as a
 \ac{CN}.
@@ -263,8 +262,9 @@ The neighbourhood of the $i$th \ac{VN} is denoted by $N_v\left( i \right)$.
 For the code depicted in figure \ref{fig:theo:tanner_graph}, for example, 
 $N_c\left( 1 \right) = \left\{ 1, 3, 5, 7 \right\}$ and
 $N_v\left( 3 \right) = \left\{ 1, 2 \right\}$.
-
-\todo{Define $d_i$ and $d_j$}
+The degree $d_j$ of a \ac{CN} is defined as the number of adjacent \acp{VN}:
+$d_j := \left| N_c\left( j \right)  \right| $; the degree of a \ac{VN} is
+similarly defined as $d_i := \left| N_v\left( i \right)  \right|$.
 
 Message passing algorithms are based on the notion of passing messages between
 \acp{CN} and \acp{VN}.
@@ -273,13 +273,17 @@ It aims to compute the posterior probabilities
 $p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$
 \cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate $\hat{\boldsymbol{c}}$.
 For cycle-free graphs this goal is reached after a finite
-number of steps and \ac{BP} is thus equivalent to \ac{MAP} decoding.
+number of steps and \ac{BP} is equivalent to \ac{MAP} decoding.
 When the graph contains cycles, however, \ac{BP} only approximates the probabilities
 and is sub-optimal.
 This leads to generally worse performance than \ac{MAP} decoding for practical codes.
 Additionally, an \textit{error floor} appears for very high \acp{SNR}, making
 the use of \ac{BP} impractical for applications where a very low \ac{BER} is
 desired \cite[Sec. 15.3]{ryan_lin_2009}.
+Another popular decoding method for \ac{LDPC} codes is the
+\textit{min-sum algorithm}.
+This is a simplification of \ac{BP} using an approximation of the the
+non-linear $\tanh$ function to improve the computational performance.
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -438,12 +442,12 @@ which minimizes the objective function $g$.
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\section{Optimization Methods}
+\section{An introduction to the proximal gradient method and ADMM}
 \label{sec:theo:Optimization Methods}
 
 \textit{Proximal algorithms} are algorithms for solving convex optimization
 problems, that rely on the use of \textit{proximal operators}.
-The proximal operator $\textbf{prox}_f : \mathbb{R}^n \rightarrow \mathbb{R}^n$
+The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$
 of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
 \cite[Sec. 1.1]{proximal_algorithms}%
 %
@@ -456,8 +460,8 @@ of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
 This operator computes a point that is a compromise between minimizing $f$
 and staying in the proximity of $\boldsymbol{v}$.
 The parameter $\lambda$ determines how heavily each term is weighed.
-The \textit{proximal gradient method} is an iterative optimization method used to
-solve problems of the form%
+The \textit{proximal gradient method} is an iterative optimization method
+utilizing proximal operators, used to solve problems of the form%
 %
 \begin{align*}
     \text{minimize}\hspace{5mm}f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right) 
@@ -473,11 +477,14 @@ and minimizing $g$ using the proximal operator
 ,\end{align*}
 %
 Since $g$ is minimized with the proximal operator and is thus not required
-to be differentiable, it can be used to encode the constraints of the problem.
+to be differentiable, it can be used to encode the constraints of the problem
+(e.g., in the form of an \textit{indicator funcion}, as mentioned in
+\cite[Sec. 1.2]{proximal_algorithms}).
 
-A special case of convex optimization problems are \textit{linear programs}.
-These are problems where the objective function is linear and the constraints
-consist of linear equalities and inequalities.
+The \ac{ADMM} is another optimization method.
+In this thesis it will be used to solve a \textit{linear program}, which
+is a special type of convex optimization problem, where the objective function
+is linear and the constraints consist of linear equalities and inequalities.
 Generally, any linear program can be expressed in \textit{standard form}%
 \footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be
 interpreted componentwise.}