diff --git a/latex/thesis/chapters/proximal_decoding.tex b/latex/thesis/chapters/proximal_decoding.tex
index 90c522e..7002f20 100644
--- a/latex/thesis/chapters/proximal_decoding.tex
+++ b/latex/thesis/chapters/proximal_decoding.tex
@@ -263,7 +263,8 @@ It was subsequently reimplemented in C++ using the Eigen%
 \footnote{\url{https://eigen.tuxfamily.org}}
 linear algebra library to achieve higher performance.
 The focus has been set on a fast implementation, sometimes at the expense of
-memory usage.
+memory usage, somewhat limiting the size of the codes the implemenation can be
+used with \todo{Is this a appropriate for a bachelor's thesis?}.
 The evaluation of the simulation results has been wholly realized in Python.
 
 The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
@@ -859,8 +860,7 @@ the frame errors may largely be attributed to decoding failures.
 The previous observation, that the \ac{FER} arises mainly due to the
 non-convergence of the algorithm instead of convergence to the wrong codeword,
 raises the question why the decoding process does not converge so often.
-In figure \ref{fig:prox:convergence}, the iterative process is visualized
-for each iteration.
+In figure \ref{fig:prox:convergence}, the iterative process is visualized.
 In order to be able to simultaneously consider all components of the vectors
 being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
 Each chart shows one component of the current estimates during a given
@@ -1076,7 +1076,8 @@ As such, the constraints are not being satisfied and the estimate is not
 converging towards a valid codeword.
 
 While figure \ref{fig:prox:convergence} shows only one instance of a decoding
-task, it is indicative of the general behaviour of the algorithm.
+task, with no statistical significance, it is indicative of the general
+behaviour of the algorithm.
 This can be justified by looking at the gradients themselves.
 In figure \ref{fig:prox:gradients} the gradients of the negative
 log-likelihood and the code-constraint polynomial for a repetition code with
@@ -1089,8 +1090,8 @@ estimate in opposing directions, leading to an oscillation as illustrated
 in figure \ref{fig:prox:convergence}.
 Consequently, this oscillation is an intrinsic property of the structure of
 the proximal decoding algorithm, where the two parts of the objective function
-are minimized in an alternating manner using their gradients.
-
+are minimized in an alternating manner by use of their gradients.%
+%
 \begin{figure}[H]
     \centering
 
@@ -1127,6 +1128,7 @@ are minimized in an alternating manner using their gradients.
 
         \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $
             for a repetition code with $n=2$}
+        \label{fig:prox:gradients:L}
     \end{subfigure}%
     \hfill%
     \begin{subfigure}[c]{0.5\textwidth}
@@ -1161,10 +1163,14 @@ are minimized in an alternating manner using their gradients.
 
         \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $
             for a repetition code with $n=2$}
+        \label{fig:prox:gradients:h}
     \end{subfigure}%
-\end{figure}
-
 
+    \caption{Gradiensts of the negative log-likelihood and the code-constraint
+        polynomial}
+    \label{fig:prox:gradients}
+\end{figure}%
+%
 While the initial net movement is generally directed in the right direction
 owing to the gradient of the negative log-likelihood, the final oscillation
 may well take place in a segment of space not corresponding to a valid
@@ -1173,17 +1179,34 @@ This also partly explains the difference in decoding performance when looking
 at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while
 still yielding an invalid codeword.
 
+The higher the \ac{SNR}, the more likely the gradient of the negative
+log-likelihood is to point to a valid codeword.
+The common component of the two gradient then pulls the estimate closer to
+a valid codeword before the oscillation takes place.
+This explains why the decoding performance is so much better for higher
+\acp{SNR}.
+
 When considering codes with larger $n$, the behaviour generally stays the
 same, with some minor differences.
 In figure \ref{fig:prox:convergence_large_n} the decoding process is
 visualized for one component of a code with $n=204$, for a single decoding.
-The two gradients still start to fight each other and the estimate still
-starts to oscillate, the same as illustrated on the basis of figure
-\ref{fig:prox:convergence} for a code with $n=7$.
+The two gradients still eventually oppose each other and the estimate still
+starts to oscillate, the same as illustrated in figure
+\ref{fig:prox:convergence} on the basis of a code with $n=7$.
 However, in this case, the gradient of the code-constraint polynomial iself
 starts to oscillate, its average value being such that the effect of the
 gradient of the negative log-likelihood is counteracted.
 
+Looking at figure \ref{fig:prox:gradients:h} it also becomes apparent why the
+value of the parameter $\gamma$ has to be kept small, as mentioned in section
+\ref{sec:prox:Decoding Algorithm}.
+Local minima are introduced between the codewords, in the ares in which it is
+not immediately clear which codeword is the most likely one.
+Raising the value of $\gamma$ results in
+$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
+objective function, thereby introducing these local minima into the objective
+function.
+
 In conclusion, as a general rule, the proximal decoding algorithm reaches
 an oscillatory state which it cannot escape as a consequence of its structure.
 In this state, the constraints may not be satisfied, leading to the algorithm
@@ -1237,8 +1260,8 @@ returning an invalid codeword.
 \label{sec:prox:Improved Implementation}
 
 As mentioned earlier, frame errors seem to mainly stem from decoding failures.
-This, coupled with the fact that the \ac{BER} indicates so much better
-performance than the \ac{FER}, leads to the assumption that only a small
+Coupled with the fact that the \ac{BER} indicates so much better
+performance than the \ac{FER}, this leads to the assumption that only a small
 number of components of the estimated vector may be responsible for an invalid
 result.
 If it was possible to limit the number of possibly wrong components of the
@@ -1247,13 +1270,66 @@ a limited number of possible results (``ML-in-the-List'' as it will
 subsequently be called) to improve the decoding performance.
 This concept is pursued in this section.
 
-\begin{itemize}
-    \item Decoding performance and comparison with standard proximal decoding
-    \item Computational performance and comparison with standard proximal decoding
-    \item Conclusion
-        \begin{itemize}
-            \item Summary
-            \item Up to $\SI{1}{dB}$ gain possible
-        \end{itemize}
-\end{itemize}
+First, a guideline has to be found with which to assess the probability that
+a given component of an estimate is wrong.
+One compelling observation is that the closer an estimate is to being at a
+valid codeword, the smaller the magnitude of the gradient of the
+code-constraint polynomial, as illustrated in figure \ref{fig:prox:gradients}.
+This gives rise to the notion that some property or behaviour of
+$\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
+magnitude to the confidence that a given bit is correct.
+And indeed, the magnitude of the oscillation of
+$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced in a previous
+section) and the probability of having a bit error are strongly correlated,
+a relationship depicted in figure \ref{fig:prox:correlation}.
+
+TODO: Figure
+
+\noindent The y-axis depicts whether there is a bit error and the x-axis the
+variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration
+$k=100$. While this is not exactly the magnitude of the oscillation, it is
+proportional and easier to compute.
+The datapoints are taken from a single decoding operation
+\todo{Generate same figure with multiple decodings}.
+
+Using this observation as a rule to determine the $N\in\mathbb{N}$ most
+probably wrong bits, all variations of the estimate with those bits modified
+can be generated.
+An \ac{ML}-in-the-List step can then be performed in order to determine the
+most likely candidate.
+This process is outlined in figure \ref{fig:prox:improved_algorithm}.
+
+Figure \ref{fig:prox:improved_results} shows the gain that can be achieved.
+Again, three values of gamma are chosen, for which the \ac{BER}, \ac{FER}
+and decoding failure rate is plotted.
+The simulation results for the original proximal decoding algorithm are shown
+with solid lines and the results for the improved version are shown with
+dashed lines.
+The gain seems to depend on the value of $\gamma$, as well as become more
+pronounced for higher \ac{SNR} values.
+This is to be expected, since with higher \ac{SNR} values the number of bit
+errors decreases, making the correction of those errors in the ML-in-the-List
+step more likely.
+In figure \ref{fig:prox:improved_results_multiple} the decoding performance
+between proximal decoding and the improved algorithm is compared for a number
+of different codes.
+Similar behaviour can be observed in all cases, with varying improvement over
+standard proximal decoding.
+
+Interestingly, the improved algorithm does not have much different time
+complexity than proximal decoding.
+This is the case, because the ML-in-the-List step is only performed when the
+proximal decoding algorithm produces an invalid result, which in absolute
+terms happens relatively infrequently.
+This is illustrated in figure \ref{fig:prox:time_complexity_comp}, where the
+average time needed to decode a single received frame is visualized for
+proximal decoding as well as for the improved algorithm.
+
+In conclusion, the decoding performance of proximal decoding can be improved
+by appending an ML-in-the-List step when the algorithm does not produce a
+valid result.
+The gain is in some cases as high as $\SI{1}{dB}$ and can be achieved with
+negligible computational performance penalty.
+The improvement is mainly noticable for higher \ac{SNR} values and depends on
+the code as well as the chosen parameters.