diff --git a/latex/thesis/chapters/proximal_decoding.tex b/latex/thesis/chapters/proximal_decoding.tex
index 879a285..edc79a9 100644
--- a/latex/thesis/chapters/proximal_decoding.tex
+++ b/latex/thesis/chapters/proximal_decoding.tex
@@ -17,7 +17,8 @@ Proximal decoding was proposed by Wadayama et al. as a novel formulation of
 optimization-based decoding \cite{proximal_paper}.
 With this algorithm, minimization is performed using the proximal gradient
 method.
-In contrast to \ac{LP} decoding, the objective function is based on a
+In contrast to \ac{LP} decoding, which will be covered in chapter
+\ref{chapter:lp_dec_using_admm}, the objective function is based on a
 non-convex optimization formulation of the \ac{MAP} decoding problem.
 
 In order to derive the objective function, the authors begin with the
@@ -121,8 +122,9 @@ and the decoding problem is reformulated to%
 .\end{align*}
 %
 
-For the solution of the approximate \ac{MAP} decoding problem, the two parts
-of equation (\ref{eq:prox:objective_function}) are considered separately:
+For the solution of the approximate \ac{MAP} decoding problem, using the
+proximal gradient method, the two parts of equation
+(\ref{eq:prox:objective_function}) are considered separately:
 the minimization of the objective function occurs in an alternating
 fashion, switching between the negative log-likelihood
 $L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
@@ -140,10 +142,8 @@ descent:%
 .\end{align}%
 %
 For the second step, minimizing the scaled code-constraint polynomial, the
-proximal gradient method is used \todo{The proximal gradient method is not
-just used for the second step. It is the name for the alternating iterative process}
-and the \textit{proximal operator} of
-$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed.
+\textit{proximal operator} of $\gamma h\left( \tilde{\boldsymbol{x}} \right) $
+has to be computed.
 It is then immediately approximated with gradient-descent:%
 %
 \begin{align*}
@@ -258,7 +258,7 @@ It was subsequently reimplemented in C++ using the Eigen%
 linear algebra library to achieve higher performance.
 The focus has been set on a fast implementation, sometimes at the expense of
 memory usage, somewhat limiting the size of the codes the implemenation can be
-used with \todo{Is this sentence appropriate for a bachelor's thesis?}.
+used with.
 The evaluation of the simulation results has been wholly realized in Python.
 
 The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
@@ -309,8 +309,6 @@ matrix-vector multiplication.
 This is beneficial, as the libraries employed for the implementation are
 heavily optimized for such calculations (e.g., through vectorization of the
 operations).
-\todo{Note about how the equation with which the gradient is calculated is
-itself similar to a message-passing rule}
 
 The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
 compute, as it amounts to simply clipping each component of the vector onto
@@ -332,15 +330,13 @@ The convergence properties are reviewed and related to the decoding
 performance.
 Finally, the computational performance is examined on a theoretical basis
 as well as on the basis of the implementation completed in the context of this
-work.
+thesis.
 
 All simulation results presented hereafter are based on Monte Carlo
 simulations.
 The \ac{BER} and \ac{FER} curves in particular have been generated by
 producing at least 100 frame-errors for each data point, unless otherwise
 stated.
-\todo{Mention number of datapoints from which each graph was created for
-non ber and fer curves}
 
 
 \subsection{Choice of Parameters}
@@ -418,9 +414,9 @@ while the newly generated ones are shown with dashed lines.
 \noindent It is noticeable that for a moderately chosen value of $\gamma$
 ($\gamma = 0.05$) the decoding performance is better than for low
 ($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
-The question arises if there is some optimal value maximazing the decoding
+The question arises whether there is some optimal value maximazing the decoding
 performance, especially since it seems to dramatically depend on $\gamma$.
-To better understand how $\gamma$ and the decoding performance are
+To better understand how they are
 related, figure \ref{fig:prox:results} was recreated, but with a considerably
 larger selection of values for $\gamma$.
 In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
@@ -431,11 +427,7 @@ The previously shown results are highlighted.
 Evidently, while the decoding performance does depend on the value of
 $\gamma$, there is no single optimal value offering optimal performance, but
 rather a certain interval in which it stays largely unchanged.
-When examining a number of different codes (figure
-\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
-landscape of the graph depends on the code, the general behaviour is the same
-in each case.
-
+%
 \begin{figure}[h]
     \centering
 
@@ -485,11 +477,15 @@ in each case.
     \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
 }%
 %
-\noindent This indicates that while the choice of the parameter $\gamma$
+This indicates that while the choice of the parameter $\gamma$
 significantly affects the decoding performance, there is not much benefit
 attainable in undertaking an extensive search for an exact optimum.
 Rather, a preliminary examination providing a rough window for $\gamma$ may
 be sufficient.
+When examining a number of different codes (figure
+\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
+landscape of the graph depends on the code, the general behaviour is the same
+in each case.
 
 The parameter $\gamma$ describes the step-size for the optimization step
 dealing with the code-constraint polynomial;
@@ -497,10 +493,12 @@ the parameter $\omega$ describes the step-size for the step dealing with the
 negative-log likelihood.
 The relationship between $\omega$ and $\gamma$ is portrayed in figure
 \ref{fig:prox:gamma_omega}.
+The color of each cell indicates the \ac{BER} when the corresponding values
+are chosen for the parameters.
 The \ac{SNR} is kept constant at $\SI{4}{dB}$.
-Similar behaviour to $\gamma$ is exhibited: the \ac{BER} is minimized when
-keeping the value within certain bounds, without displaying a clear
-optimum.
+The \ac{BER} exhibits similar behaviour in its dependency on $\omega$ and
+on $\gamma$: it is minimized when keeping the value within certain
+bounds, without displaying a single clear optimum.
 It is noteworthy that the decoder seems to achieve the best performance for
 similar values of the two step sizes.
 Again, this consideration applies to a multitude of different codes, as
@@ -552,19 +550,21 @@ depicted in figure \ref{fig:prox:gamma_omega_multiple}.
 
 To better understand how to determine the optimal value for the parameter $K$,
 the average error is inspected.
-This time $\gamma$ and $\omega$ are held constant and the average error is
-observed during each iteration of the decoding process for a number of
-different \acp{SNR}.
-The plots have been generated by averaging the error over $\SI{500000}{}$ decodings.
+This time $\gamma$ and $\omega$ are held constant at $0.05$ and the average
+error is observed during each iteration of the decoding process, for a number
+of different \acp{SNR}.
+The plots have been generated by averaging the error over $\SI{500000}{}$
+decodings.
 As some decodings go one for more iterations than others, the number of values
 which are averaged for each datapoints vary.
 This explains the dip visible in all curves around $k=20$, since after
-this point more and more correct decodings stop iterating,
+this point more and more correct decodings are completed,
 leaving more and more faulty ones to be averaged.
-A this point the decline in the average error stagnates, rendering an
-increase in $K$ counterproductive as it only raises the average timing
-requirements of the decoding process.
-The higher the \ac{SNR}, the fewer decodings are present at each iteration
+Additionally, at this point the decline in the average error stagnates,
+rendering an increase in $K$ counterproductive as it only raises the average
+timing requirements of the decoding process.
+Another aspect to consider is that the higher the \ac{SNR}, the fewer
+decodings are present at each iteration
 to average, since a solution is found earlier.
 This explains the decreasing smootheness of the lines as the \ac{SNR} rises.
 Remarkably, the \ac{SNR} seems to not have any impact on the number of
@@ -740,9 +740,6 @@ performance.
 The decoding failure rate closely resembles the \ac{FER}, suggesting that
 the frame errors may largely be attributed to decoding failures.
 
-\todo{Maybe reference to the structure of the algorithm (1 part likelihood
-1 part constraints)}
-
 
 \subsection{Convergence Properties}
 \label{subsec:prox:conv_properties}