Added compiled pdf document

Fixed captions
Moved figures around and fixed caption
2023-04-25 16:41:03 +02:00 · 2023-04-24 23:23:35 +02:00 · 2023-04-24 22:55:55 +02:00 · 2023-04-24 20:57:02 +02:00 · 2023-04-24 18:35:53 +02:00 · 2023-04-24 12:55:11 +02:00
11 changed files with 1593 additions and 592 deletions
--- a/latex/thesis/bibliography.bib
+++ b/latex/thesis/bibliography.bib
@@ -223,3 +223,12 @@
    date   = {2023-04},
    url    = {http://www.inference.org.uk/mackay/codes/data.html}
 }
+
+@article{adam,
+  title={Adam: A method for stochastic optimization},
+  author={Kingma, Diederik P and Ba, Jimmy},
+  journal={arXiv preprint arXiv:1412.6980},
+  year={2014},
+  doi={10.48550/arXiv.1412.6980}
+}
+
--- a/latex/thesis/chapters/acknowledgements.tex
+++ b/latex/thesis/chapters/acknowledgements.tex
@@ -0,0 +1,18 @@
+\chapter*{Acknowledgements}
+
+I would like to thank Prof. Dr.-Ing. Laurent Schmalen for granting me the
+opportunity to write my bachelor's thesis at the Communications Engineering Lab,
+as well as all other members of the institute for their help and many productive
+discussions, and for creating a very pleasant environment to do research in.
+
+I am very grateful to Dr.-Ing. Holger Jäkel
+for kindly providing me with his knowledge and many suggestions,
+and for his constructive criticism during the preparation of this work.
+
+Special thanks also to Mai Anh Vu for her invaluable feedback and support
+during the entire undertaking that is this thesis.
+
+Finally, I would like to thank my family, who have enabled me to pursue my
+studies in a field I thoroughly enjoy and who have supported me completely
+throughout my journey.
+
--- a/latex/thesis/chapters/appendix.tex
+++ b/latex/thesis/chapters/appendix.tex
@@ -508,7 +508,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                legend columns=1,
                legend pos=outer north east,
@@ -549,7 +549,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                legend columns=1,
                legend pos=outer north east,
@@ -593,7 +593,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                legend columns=1,
                legend pos=outer north east,
@@ -647,7 +647,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                legend columns=1,
                legend pos=outer north east,
@@ -692,7 +692,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                legend columns=1,
                legend pos=outer north east,
@@ -735,7 +735,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                legend columns=1,
                legend pos=outer north east,
--- a/latex/thesis/chapters/comparison.tex
+++ b/latex/thesis/chapters/comparison.tex
@@ -3,7 +3,7 @@

 In this chapter, proximal decoding and \ac{LP} Decoding using \ac{ADMM} are compared.
 First, the two algorithms are studied on a theoretical basis.
-Subsequently, their respective simulation results are examined and their
+Subsequently, their respective simulation results are examined, and their
 differences are interpreted based on their theoretical structure.


@@ -32,13 +32,13 @@ $\mathcal{P}_{d_j}, \hspace{1mm} j\in\mathcal{J}$, defined as%
 %
 by moving the constraints into the objective function, as shown in figure
 \ref{fig:ana:theo_comp_alg:admm}.
-Both algorithms are composed of an iterative approach consisting of two
-alternating steps.
 The objective functions of the two problems are similar in that they
 both comprise two parts: one associated to the likelihood that a given
-codeword was sent, stemming from the channel model, and one associated
-to the constraints the decoding process is subjected to, stemming from the
+codeword was sent, arising from the channel model, and one associated
+to the constraints the decoding process is subjected to, arising from the
 code used.
+Both algorithms are composed of an iterative approach consisting of two
+alternating steps, each minimizing one part of the objective function.
 %

 \begin{figure}[h]
@@ -109,7 +109,7 @@ return $\tilde{\boldsymbol{c}}$
    \end{subfigure}%
    

-    \caption{Comparison of the proximal gradient method and \ac{ADMM}}
+    \caption{Comparison of proximal decoding and \ac{LP} decoding using \ac{ADMM}}
    \label{fig:ana:theo_comp_alg}
 \end{figure}%
 %
@@ -139,7 +139,7 @@ This means that additional redundant parity-checks can be added successively
 until the codeword returned is valid and thus the \ac{ML} solution is found
 \cite[Sec. IV.]{alp}.

-In terms of time complexity the two decoding algorithms are comparable.
+In terms of time complexity, the two decoding algorithms are comparable.
 Each of the operations required for proximal decoding can be performed
 in $\mathcal{O}\left( n \right) $ time for \ac{LDPC} codes (see section
 \ref{subsec:prox:comp_perf}).
@@ -172,10 +172,10 @@ while stopping critierion unfulfilled do
 |\vspace{0.22mm}\Reactivatenumber|
    end for
    for i in $\mathcal{I}$ do
-        $s_i \leftarrow s_i + \gamma \left[ 4\left( s_i^2 - 1 \right)s_i
-            \phantom{\frac{4}{s_i}}\right.$|\Suppressnumber|
-                     |\Reactivatenumber|$\left.+ \frac{4}{s_i}\sum_{j\in N_v\left( i \right) }
-                        M_{j\to i} \right] $
+        $s_i\leftarrow \Pi_\eta \left( s_i + \gamma \left( 4\left( s_i^2 - 1 \right)s_i
+            \phantom{\frac{4}{s_i}}\right.\right.$|\Suppressnumber|
+                  |\Reactivatenumber|$\left.\left.+ \frac{4}{s_i}\sum_{j\in
+                             N_v\left( i \right) } M_{j\to i} \right)\right) $
        $r_i \leftarrow r_i + \omega \left( s_i - y_i \right)$
    end for
 end while
@@ -216,7 +216,7 @@ return $\tilde{\boldsymbol{c}}$
    \end{subfigure}%
    

-    \caption{The proximal gradient method and \ac{LP} decoding using \ac{ADMM}
+    \caption{Proximal decoding and \ac{LP} decoding using \ac{ADMM}
        as message passing algorithms}
    \label{fig:comp:message_passing}
 \end{figure}%
@@ -232,7 +232,7 @@ With proximal decoding this minimization is performed for all constraints at onc
 in an approximative manner, while with \ac{LP} decoding using \ac{ADMM} it is
 performed for each constraint individually and with exact results.
 In terms of time complexity, both algorithms are linear with
-respect to $n$ and are heavily parallelisable.
+respect to $n$ and are heavily parallelizable.



@@ -241,18 +241,18 @@ respect to $n$ and are heavily parallelisable.
 \label{sec:comp:res}

 The decoding performance of the two algorithms is compared in figure
-\ref{fig:comp:prox_admm_dec} in the form of the \ac{FER}.
+\ref{fig:comp:prox_admm_dec} in form of the \ac{FER}.
 Shown as well is the performance of the improved proximal decoding
 algorithm presented in section \ref{sec:prox:Improved Implementation}.
 The \ac{FER} resulting from decoding using \ac{BP} and,
-wherever available, the \ac{FER} of \ac{ML} decoding taken from
-\cite{lautern_channelcodes} are plotted as a reference.
+wherever available, the \ac{FER} of \ac{ML} decoding, taken from
+\cite{lautern_channelcodes}, are plotted as a reference.
 The parameters chosen for the proximal and improved proximal decoders are
 $\gamma=0.05$, $\omega=0.05$, $K=200$, $\eta = 1.5$ and $N=12$.
 The parameters chosen for \ac{LP} decoding using \ac{ADMM} are $\mu = 5$,
 $\rho = 1$, $K=200$, $\epsilon_\text{pri} = 10^{-5}$ and
 $\epsilon_\text{dual} = 10^{-5}$.
-For all codes considered in the scope of this work, \ac{LP} decoding using
+For all codes considered within the scope of this work, \ac{LP} decoding using
 \ac{ADMM} consistently outperforms both proximal decoding and the improved
 version, reaching very similar performance to \ac{BP}.
 The decoding gain heavily depends on the code, evidently becoming greater for
@@ -268,8 +268,12 @@ calculations performed in each case.
 With proximal decoding, the calculations are approximate, leading
 to the constraints never being quite satisfied.
 With \ac{LP} decoding using \ac{ADMM},
-the constraints are fulfilled for each parity check individualy after each
+the constraints are fulfilled for each parity check individually after each
 iteration of the decoding process.
+A further contributing factor might be the structure of the optimization
+process, as the alternating minimization with respect to the same variable
+leads to oscillatory behavior, as explained in section
+\ref{subsec:prox:conv_properties}.
 It should be noted that while in this thesis proximal decoding was
 examined with respect to its performance in \ac{AWGN} channels, in
 \cite{proximal_paper} it is presented as a method applicable to non-trivial
@@ -279,21 +283,21 @@ broadening its usefulness beyond what is shown here.
 The timing requirements of the decoding algorithms are visualized in figure
 \ref{fig:comp:time}.
 The datapoints have been generated by evaluating the metadata from \ac{FER}
-and \ac{BER} simulations using the parameters mentioned earlier when
+and \ac{BER} simulations and using the parameters mentioned earlier when
 discussing the decoding performance.
 The codes considered are the same as in sections \ref{subsec:prox:comp_perf}
 and \ref{subsec:admm:comp_perf}.
-While the \ac{ADMM} implementation seems to be faster the the proximal
-decoding and improved proximal decoding implementations, infering some
+While the \ac{ADMM} implementation seems to be faster than the proximal
+decoding and improved proximal decoding implementations, inferring some
 general behavior is difficult in this case.
 This is because of the comparison of actual implementations, making the
 results dependent on factors such as the grade of optimization of each of the
 implementations.
 Nevertheless, the run time of both the proximal decoding and the \ac{LP}
-decoding using \ac{ADMM} implementations is similar and both are
-reasonably performant, owing to the parallelisable structure of the
+decoding using \ac{ADMM} implementations is similar, and both are
+reasonably performant, owing to the parallelizable structure of the
 algorithms.
-%
+
 \begin{figure}[h]
    \centering

@@ -328,8 +332,6 @@ algorithms.
    \label{fig:comp:time}
 \end{figure}%
 %
-\footnotetext{asdf}
-%

 \begin{figure}[h]
    \centering
@@ -340,7 +342,7 @@ algorithms.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                ymax=1.5, ymin=8e-5,
                width=\textwidth,
@@ -376,7 +378,7 @@ algorithms.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                ymax=1.5, ymin=8e-5,
                width=\textwidth,
@@ -414,7 +416,7 @@ algorithms.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                ymax=1.5, ymin=8e-5,
                width=\textwidth,
@@ -455,7 +457,7 @@ algorithms.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                ymax=1.5, ymin=8e-5,
                width=\textwidth,
@@ -490,7 +492,7 @@ algorithms.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                ymax=1.5, ymin=8e-5,
                width=\textwidth,
@@ -523,7 +525,7 @@ algorithms.
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$E_b / N_0$}, ylabel={FER},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
                ymax=1.5, ymin=8e-5,
                width=\textwidth,
@@ -572,7 +574,7 @@ algorithms.
                \addlegendentry{\acs{LP} decoding using \acs{ADMM}}
                
                \addlegendimage{RoyalPurple, line width=1pt, mark=*, solid}
-                \addlegendentry{\acs{BP} (20 iterations)}
+                \addlegendentry{\acs{BP} (200 iterations)}
                
                \addlegendimage{Black, line width=1pt, mark=*, solid}
                \addlegendentry{\acs{ML} decoding}
@@ -580,8 +582,8 @@ algorithms.
        \end{tikzpicture}
    \end{subfigure}

-    \caption{Comparison of decoding performance between proximal decoding and \ac{LP} decoding
-        using \ac{ADMM}}
+    \caption{Comparison of the decoding performance of the different decoder
+        implementations for various codes}
    \label{fig:comp:prox_admm_dec}
 \end{figure}

--- a/latex/thesis/chapters/conclusion.tex
+++ b/latex/thesis/chapters/conclusion.tex
@@ -1,44 +1,44 @@
-\chapter{Conclusion}%
+\chapter{Conclusion and Outlook}%
 \label{chapter:conclusion}

 In the context of this thesis, two decoding algorithms were considered:
 proximal decoding and \ac{LP} decoding using \ac{ADMM}.
 The two algorithms were first analyzed individually, before comparing them
-based on simulation results as well as their theoretical structure.
+based on simulation results as well as on their theoretical structure.

 For proximal decoding, the effect of each parameter on the behavior of the
-decoder was examined, leading to an approach to choosing the value of each
-of the parameters.
+decoder was examined, leading to an approach to optimally choose the value
+of each parameter.
 The convergence properties of the algorithm were investigated in the context
 of the relatively high decoding failure rate, to derive an approach to correct
-possible wrong componets of the estimate.
-Based on this approach, an improvement over proximal decoding was suggested,
+possibly wrong components of the estimate.
+Based on this approach, an improvement of proximal decoding was suggested,
 leading to a decoding gain of up to $\SI{1}{dB}$, depending on the code and
 the parameters considered.

-For \ac{LP} decoding using \ac{ADMM}, the circumstances brought about via the
-relaxation while formulating the \ac{LP} decoding problem were first explored.
+For \ac{LP} decoding using \ac{ADMM}, the circumstances brought about by the
+\ac{LP} relaxation were first explored.
 The decomposable nature arising from the relocation of the constraints into
 the objective function itself was recognized as the major driver in enabling
-the efficent implementation of the decoding algorithm.
+an efficient implementation of the decoding algorithm.
 Based on simulation results, general guidelines for choosing each parameter
-were again derived.
+were derived.
 The decoding performance, in form of the \ac{FER}, of the algorithm was
 analyzed, observing that \ac{LP} decoding using \ac{ADMM} nearly reaches that
 of \ac{BP}, staying within approximately $\SI{0.5}{dB}$ depending on the code
 in question.

-Finally, strong parallells were discovered with regard to the theoretical
+Finally, strong parallels were discovered with regard to the theoretical
 structure of the two algorithms, both in the constitution of their respective
-objective functions as in the iterative approaches used to minimize them.
+objective functions as well as in the iterative approaches used to minimize them.
 One difference noted was the approximate nature of the minimization in the
 case of proximal decoding, leading to the constraints never being truly
 satisfied.
 In conjunction with the alternating minimization with respect to the same
-variable leading to oscillatory behavior, this was identified as the
-root cause of its comparatively worse decoding performance.
+variable, leading to oscillatory behavior, this was identified as
+a possible cause of its comparatively worse decoding performance.
 Furthermore, both algorithms were expressed as message passing algorithms,
-justifying their similar computational performance.
+illustrating their similar computational performance.

 While the modified proximal decoding algorithm presented in section
 \ref{sec:prox:Improved Implementation} shows some promising results, further
@@ -46,7 +46,13 @@ investigation is required to determine how different choices of parameters
 affect the decoding performance.
 Additionally, a more mathematically rigorous foundation for determining the
 potentially wrong components of the estimate is desirable.
-Another area benefiting from future work is the expantion of the \ac{ADMM}
+A different method to improve proximal decoding might be to use
+moment-based optimization techniques such as \textit{Adam} \cite{adam}
+to try to mitigate the effect of local minima introduced in the objective
+function as well as the adversarial structure of the minimization when employing
+proximal decoding.
+
+Another area benefiting from future work is the expansion of the \ac{ADMM}
 based \ac{LP} decoder into a decoder approximating \ac{ML} performance,
 using \textit{adaptive \ac{LP} decoding}.
 With this method, the successive addition of redundant parity checks is used
--- a/latex/thesis/chapters/introduction.tex
+++ b/latex/thesis/chapters/introduction.tex
@@ -1,16 +1,51 @@
 \chapter{Introduction}%
 \label{chapter:introduction}

+Channel coding using binary linear codes is a way of enhancing the reliability
+of data by detecting and correcting any errors that may occur during
+its transmission or storage.
+One class of binary linear codes, \ac{LDPC} codes, has become especially
+popular due to being able to reach arbitrarily small probabilities of error
+at code rates up to the capacity of the channel \cite[Sec. II.B.]{mackay_rediscovery},
+while retaining a structure that allows for very efficient decoding.
+While the established decoders for \ac{LDPC} codes, such as \ac{BP} and the
+\textit{min-sum algorithm}, offer good decoding performance, they are suboptimal
+in most cases and exhibit an \textit{error floor} for high \acp{SNR}
+\cite[Sec. 15.3]{ryan_lin_2009}, making them unsuitable for applications
+with extreme reliability requirements.

-\begin{itemize}
-    \item Problem definition
-    \item Motivation
-        \begin{itemize}
-            \item Error floor when decoding with BP (seems to not be persent with LP decoding
-                \cite[Sec. I]{original_admm})
-            \item Strong theoretical guarantees that allow for better and better approximations
-                of ML decoding \cite[Sec. I]{original_admm}
-        \end{itemize}
-    \item Results summary
-\end{itemize}
+Optimization based decoding algorithms are an entirely different way of approaching
+the decoding problem.
+The first introduction of optimization techniques as a way of decoding binary
+linear codes was conducted in Feldman's 2003 Ph.D. thesis and a subsequent paper,
+establishing the field of \ac{LP} decoding \cite{feldman_thesis}, \cite{feldman_paper}.
+There, the \ac{ML} decoding problem is approximated by a \textit{linear program}, i.e.,
+a linear, convex optimization problem, which can subsequently be solved using
+several different algorithms \cite{alp}, \cite{interior_point},
+\cite{original_admm}, \cite{pdd}.
+More recently, novel approaches such as \textit{proximal decoding} have been
+introduced. Proximal decoding is based on a non-convex optimization formulation
+of the \ac{MAP} decoding problem \cite{proximal_paper}.

+The motivation behind applying optimization methods to channel decoding is to
+utilize existing techniques in the broad field of optimization theory, as well
+as to find new decoding methods not suffering from the same disadvantages as
+existing message passing based approaches or exhibiting other desirable properties.
+\Ac{LP} decoding, for example, comes with strong theoretical guarantees
+allowing it to be used as a way of closely approximating \ac{ML} decoding
+\cite[Sec. I]{original_admm},
+and proximal decoding is applicable to non-trivial channel models such
+as \ac{LDPC}-coded massive \ac{MIMO} channels \cite{proximal_paper}.
+
+This thesis aims to further the analysis of optimization based decoding
+algorithms as well as to verify and complement the considerations present in
+the existing literature.
+Specifically, the proximal decoding algorithm and \ac{LP} decoding using
+the \ac{ADMM} \cite{original_admm} are explored within the context of
+\ac{BPSK} modulated \ac{AWGN} channels.
+Implementations of both decoding methods are produced, and based on simulation
+results from those implementations the algorithms are examined and compared.
+Approaches to determine the optimal value of each parameter are derived and
+the computational and decoding performance of the algorithms is examined.
+An improvement on proximal decoding is suggested, achieving up to 1 dB of gain,
+depending on the parameters chosen and the code considered.
--- a/latex/thesis/chapters/lp_dec_using_admm.tex
+++ b/latex/thesis/chapters/lp_dec_using_admm.tex
@@ -5,14 +5,12 @@ This chapter is concerned with \ac{LP} decoding - the reformulation of the
 decoding problem as a linear program.
 More specifically, the \ac{LP} decoding problem is solved using \ac{ADMM}.
 First, the general field of \ac{LP} decoding is introduced.
-The application of \ac{ADMM} to the decoding problem is explained.
-Some notable implementation details are mentioned.
+The application of \ac{ADMM} to the decoding problem is explained and some
+notable implementation details are mentioned.
 Finally, the behavior of the algorithm is examined based on simulation
 results.


-
-
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{LP Decoding}%
 \label{sec:lp:LP Decoding}
@@ -547,7 +545,7 @@ parity-checks until a valid result is returned \cite[Sec. IV.]{alp}.


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\section{Decoding Algorithm}%
+\section{Decoding Algorithm and Implementation}%
 \label{sec:lp:Decoding Algorithm}

 The \ac{LP} decoding formulation in section \ref{sec:lp:LP Decoding}
@@ -689,7 +687,6 @@ handled at the same time.
 This can also be understood by interpreting the decoding process as a message-passing
 algorithm \cite[Sec. III. D.]{original_admm}, \cite[Sec. II. B.]{efficient_lp_dec_admm},
 depicted in algorithm \ref{alg:admm}.
-\todo{How are the variables being initialized?}

 \begin{genericAlgorithm}[caption={\ac{LP} decoding using \ac{ADMM} interpreted
            as a message passing algorithm\protect\footnotemark{}}, label={alg:admm},
@@ -735,7 +732,7 @@ before the $\boldsymbol{z}_j$ and $\boldsymbol{u}_j$ update steps (lines 4 and
 subsequently replacing $\boldsymbol{T}_j \tilde{\boldsymbol{c}}$ with the
 computed value in the two updates \cite[Sec. 3.4.3]{distr_opt_book}.

-The main computational effort in solving the linear program then amounts to
+The main computational effort in solving the linear program amounts to
 computing the projection operation $\Pi_{\mathcal{P}_{d_j}} \left( \cdot \right) $
 onto each check polytope. Various different methods to perform this projection
 have been proposed (e.g., in \cite{original_admm}, \cite{efficient_lp_dec_admm},
@@ -743,14 +740,14 @@ have been proposed (e.g., in \cite{original_admm}, \cite{efficient_lp_dec_admm},
 The method chosen here is the one presented in \cite{original_admm}.


-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\section{Implementation Details}%
-\label{sec:lp:Implementation Details}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%\section{Implementation Details}%
+%\label{sec:lp:Implementation Details}

 The development process used to implement this decoding algorithm was the same
 as outlined in section
-\ref{sec:prox:Implementation Details} for proximal decoding.
-At first, an initial version was implemented in Python, before repeating the
+\ref{sec:prox:Decoding Algorithm} for proximal decoding.
+First, an initial version was implemented in Python, before repeating the
 process using C++ to achieve higher performance.
 Again, the performance can be increased by reframing the operations in such
 a way that the computation can take place primarily with element-wise
@@ -788,9 +785,13 @@ expression to be rewritten as%
 .\end{align*}
 %
 Defining%
+\footnote{
+    In this case $d_1, \ldots, d_n$ refer to the degree of the variable nodes,
+    i.e., $d_i,\hspace{1mm}i\in\mathcal{I}$.
+}
 %
 \begin{align*}
-    \boldsymbol{D} := \begin{bmatrix} 
+    \boldsymbol{d} := \begin{bmatrix} 
        d_1 \\
        \vdots \\
        d_n
@@ -800,19 +801,18 @@ Defining%
    \hspace{5mm}%
    \boldsymbol{s} := \sum_{j\in\mathcal{J}} \boldsymbol{T}_j^\text{T}
        \left( \boldsymbol{z}_j - \boldsymbol{u}_j \right)
-\end{align*}%
-\todo{Rename $\boldsymbol{D}$}%
+,\end{align*}%
 %
 the $\tilde{\boldsymbol{c}}$ update can then be rewritten as%
 %
 \begin{align*}
-    \tilde{\boldsymbol{c}} \leftarrow \boldsymbol{D}^{\circ \left(-1\right)} \circ
+    \tilde{\boldsymbol{c}} \leftarrow \boldsymbol{d}^{\circ \left(-1\right)} \circ
        \left( \boldsymbol{s} - \frac{1}{\mu}\boldsymbol{\gamma} \right)  
 .\end{align*}
 %
 This modified version of the decoding process is depicted in algorithm \ref{alg:admm:mod}.

-\begin{genericAlgorithm}[caption={\ac{LP} decoding using \ac{ADMM} algorithm with rewritten
+\begin{genericAlgorithm}[caption={The \ac{LP} decoding using \ac{ADMM} algorithm with rewritten
                         update steps}, label={alg:admm:mod},
        basicstyle=\fontsize{11}{16}\selectfont
        ]
@@ -831,16 +831,13 @@ while $\sum_{j\in\mathcal{J}} \lVert \boldsymbol{T}_j\tilde{\boldsymbol{c}}
            \left( \boldsymbol{z}_j - \boldsymbol{u}_j \right) $
    end for
    for $i$ in $\mathcal{I}$ do
-        $\tilde{\boldsymbol{c}} \leftarrow \boldsymbol{D}^{\circ \left( -1\right)} \circ
+        $\tilde{\boldsymbol{c}} \leftarrow \boldsymbol{d}^{\circ \left( -1\right)} \circ
            \left( \boldsymbol{s} - \frac{1}{\mu}\boldsymbol{\gamma} \right) $
    end for
 end while
 return $\tilde{\boldsymbol{c}}$
 \end{genericAlgorithm}

-\todo{Projection onto $[0, 1]^n$?}
-\todo{Variable initialization}
-

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Analysis and Simulation Results}%
@@ -855,6 +852,12 @@ Subsequently, the decoding performance is observed and compared to that of
 Finally, the computational performance of the implementation and time
 complexity of the algorithm are studied.

+As was the case in chapter \ref{chapter:proximal_decoding} for proximal decoding,
+the following simulation results are based on Monte Carlo simulations
+and the BER and FER curves have been generated by producing at least 100
+frame errors for each data point, except in cases where this is explicitly
+specified otherwise.
+
 \subsection{Choice of Parameters}

 The first two parameters to be investigated are the penalty parameter $\mu$
@@ -868,8 +871,8 @@ The code chosen for this examination is a (3,6) regular \ac{LDPC} code with
 $n=204$ and $k=102$ \cite[\text{204.33.484}]{mackay_enc}.
 When varying $\mu$, $\rho$ is set to 1 and when varying
 $\rho$, $\mu$ is set to 5.
-$K$ is set to 200 and $\epsilon_\text{dual}$ and $\epsilon_\text{pri}$ to
-$10^{-5}$.
+The maximum number of iterations $K$ is set to 200 and
+$\epsilon_\text{dual}$ and $\epsilon_\text{pri}$ to $10^{-5}$.
 The behavior that can be observed is very similar to that of the
 parameter $\gamma$ in proximal decoding, analyzed in section
 \ref{sec:prox:Analysis and Simulation Results}.
@@ -877,7 +880,7 @@ A single optimal value giving optimal performance does not exist; rather,
 as long as the value is chosen within a certain range, the performance is
 approximately equally good.

-\begin{figure}[h]
+\begin{figure}[H]
    \centering

    \begin{subfigure}[c]{0.48\textwidth}
@@ -971,8 +974,9 @@ The values chosen for the rest of the parameters are the same as before.
 It is visible that choosing a large value for $\rho$ as well as a small value
 for $\mu$ minimizes the average number of iterations and thus the average
 run time of the decoding process.
+The same behavior can be observed when looking at various%
 %
-\begin{figure}[h]
+\begin{figure}[H]
    \centering

    \begin{tikzpicture}
@@ -1007,10 +1011,240 @@ run time of the decoding process.
    \label{fig:admm:mu_rho_iterations}
 \end{figure}%
 %
-The same behavior can be observed when looking at a number of different codes,
-as shown in figure \ref{fig:admm:mu_rho_multiple}.
+\noindent different codes, as shown in figure \ref{fig:admm:mu_rho_multiple}.
+
+To get an estimate for the maximum number of iterations $K$ necessary,
+the average error during decoding can be used.
+This is shown in figure \ref{fig:admm:avg_error} as an average of
+$\SI{100000}{}$ decodings.
+$\mu$ is set to 5 and $\rho$ is set to $1$ and the rest of the parameters are
+again chosen as $\epsilon_\text{pri}=10^{-5}$ and
+$\epsilon_\text{dual}=10^{-5}$.
+Similarly to the results in section \ref{subsec:prox:choice}, a dip is
+visible around the $20$ iteration mark.
+This is due to the fact that as the number of iterations increases,
+more and more decodings converge, leaving only the mistaken ones to be
+averaged.
+The point at which the wrong decodings start to become dominant and the
+decoding performance does not increase any longer is largely independent of
+the \ac{SNR}, allowing the maximum number of iterations to be chosen without
+considering the \ac{SNR}.
+
+\begin{figure}[H]
+    \centering
+
+    \begin{tikzpicture}
+        \begin{axis}[
+            grid=both,
+            width=0.6\textwidth,
+            height=0.45\textwidth,
+            xlabel={Iteration}, ylabel={Average $\left\Vert \hat{\boldsymbol{c}}
+                - \boldsymbol{c} \right\Vert$}
+        ]
+            \addplot[ForestGreen, line width=1pt]
+                table [col sep=comma, x=k, y=err,
+                       discard if not={SNR}{1.0},
+                       discard if gt={k}{100}]
+                    {res/admm/avg_error_20433484.csv};
+            \addlegendentry{$E_b / N_0 = \SI{1}{dB}$}
+
+            \addplot[RedOrange, line width=1pt]
+                table [col sep=comma, x=k, y=err,
+                       discard if not={SNR}{2.0},
+                       discard if gt={k}{100}]
+                    {res/admm/avg_error_20433484.csv};
+            \addlegendentry{$E_b / N_0 = \SI{2}{dB}$}
+
+            \addplot[NavyBlue, line width=1pt]
+                table [col sep=comma, x=k, y=err,
+                       discard if not={SNR}{3.0},
+                       discard if gt={k}{100}]
+                    {res/admm/avg_error_20433484.csv};
+            \addlegendentry{$E_b / N_0 = \SI{3}{dB}$}
+
+            \addplot[RoyalPurple, line width=1pt]
+                table [col sep=comma, x=k, y=err,
+                       discard if not={SNR}{4.0},
+                       discard if gt={k}{100}]
+                    {res/admm/avg_error_20433484.csv};
+            \addlegendentry{$E_b / N_0 = \SI{4}{dB}$}
+        \end{axis}
+    \end{tikzpicture}
+
+    \caption{Average error for $\SI{100000}{}$ decodings. (3,6)
+        regular \ac{LDPC} code with $n=204, k=102$ \cite[\text{204.33.484}]{mackay_enc}}
+    \label{fig:admm:avg_error}
+\end{figure}%
+
+The last two parameters remaining to be examined are the tolerances for the
+stopping criterion of the algorithm, $\epsilon_\text{pri}$ and
+$\epsilon_\text{dual}$.
+These are both set to the same value $\epsilon$.
+The effect of their value on the decoding performance is visualized in figure
+\ref{fig:admm:epsilon}.
+All parameters except $\epsilon_\text{pri}$ and $\epsilon_\text{dual}$ are
+kept constant, with $\mu=5$, $\rho=1$ and $E_b / N_0 = \SI{4}{dB}$ and
+performing a maximum of 200 iterations.
+A lower value for the tolerance initially leads to a dramatic decrease in the
+\ac{FER}, this effect fading as the tolerance becomes increasingly lower.
+
+\begin{figure}[H]
+    \centering
+
+    \begin{tikzpicture}
+        \begin{axis}[
+            grid=both,
+            xlabel={$\epsilon$}, ylabel={\acs{FER}},
+            ymode=log,
+            xmode=log,
+            x dir=reverse,
+            width=0.6\textwidth,
+            height=0.45\textwidth,
+        ]
+            \addplot[NavyBlue, line width=1pt, densely dashed, mark=*]
+                table [col sep=comma, x=epsilon, y=FER,
+                       discard if not={SNR}{3.0},]
+                    {res/admm/fer_epsilon_20433484.csv};
+        \end{axis}
+    \end{tikzpicture}
+
+    \caption{Effect of the value of the parameters $\epsilon_\text{pri}$ and
+        $\epsilon_\text{dual}$ on the \acs{FER}. (3,6) regular \ac{LDPC} code with
+        $n=204, k=102$ \cite[\text{204.33.484}]{mackay_enc}}
+    \label{fig:admm:epsilon}
+\end{figure}%
+
+In conclusion, the parameters $\mu$ and $\rho$ should be chosen comparatively
+small and large, respectively, to reduce the average runtime of the decoding
+process, while keeping them within a certain range as to not compromise the
+decoding performance.
+The maximum number of iterations performed can be chosen independently
+of the \ac{SNR}.
+Finally, small values should be given to the parameters
+$\epsilon_{\text{pri}}$ and $\epsilon_{\text{dual}}$ to achieve the lowest
+possible error rate.
+
+
+\subsection{Decoding Performance}
+
+In figure \ref{fig:admm:results}, the simulation results for the ``Margulis''
+\ac{LDPC} code ($n=2640$, $k=1320$) presented by Barman et al. in
+\cite{original_admm} are compared to the results from the simulations
+conducted in the context of this thesis.
+The parameters chosen were $\mu=3.3$, $\rho=1.9$, $K=1000$,
+$\epsilon_\text{pri}=10^{-5}$ and $\epsilon_\text{dual}=10^{-5}$,
+the same as in \cite{original_admm}.
+The two \ac{FER} curves are practically identical.
+Also shown is the curve resulting from \ac{BP} decoding, performing
+1000 iterations.
+The two algorithms perform relatively similarly, staying within $\SI{0.5}{dB}$
+of one another.
+
+\begin{figure}[H]
+    \centering
+
+    \begin{tikzpicture}
+        \begin{axis}[
+            grid=both,
+            xlabel={$E_b / N_0 \left( \text{dB} \right) $}, ylabel={\acs{FER}},
+            ymode=log,
+            width=0.6\textwidth,
+            height=0.45\textwidth,
+            legend style={at={(0.5,-0.57)},anchor=south},
+            legend cell align={left},
+        ]
+            \addplot[Turquoise, line width=1pt, mark=*]
+                table [col sep=comma, x=SNR, y=FER,
+                       discard if gt={SNR}{2.2},
+                       ]
+                    {res/admm/fer_paper_margulis.csv};
+            \addlegendentry{\acs{ADMM} (Barman et al.)}
+            \addplot[NavyBlue, densely dashed, line width=1pt, mark=triangle]
+                table [col sep=comma, x=SNR, y=FER,]
+                    {res/admm/ber_margulis264013203.csv};
+            \addlegendentry{\acs{ADMM} (Own results)}
+            \addplot[RoyalPurple, line width=1pt, mark=*]
+                table [col sep=comma, x=SNR, y=FER, discard if gt={SNR}{2.2},]
+                    {res/generic/fer_bp_mackay_margulis.csv};
+            \addlegendentry{\acs{BP} (Barman et al.)}
+        \end{axis}
+    \end{tikzpicture}
+    
+    \caption{Comparison of datapoints from Barman et al. with own simulation results.
+        ``Margulis'' \ac{LDPC} code with $n = 2640$, $k = 1320$
+        \cite[\text{Margulis2640.1320.3}]{mackay_enc}}
+    \label{fig:admm:results}
+\end{figure}%
 %
-\begin{figure}[h]
+In figure \ref{fig:admm:bp_multiple}, \ac{FER} curves for \ac{LP} decoding
+using \ac{ADMM} and \ac{BP} are shown for various codes.
+To ensure comparability, in all cases the number of iterations was set to
+$K=200$.
+The values of the other parameters were chosen as $\mu = 5$, $\rho = 1$,
+$\epsilon_\text{pri} = 10^{-5}$ and $\epsilon_\text{dual}=10^{-5}$.
+Comparing the simulation results for the different codes, it is apparent that
+the difference in decoding performance depends on the code being
+considered.
+For all codes considered here, however, the performance of \ac{LP} decoding
+using \ac{ADMM} comes close to that of \ac{BP}, again staying withing
+approximately $\SI{0.5}{dB}$.
+
+\subsection{Computational Performance}
+\label{subsec:admm:comp_perf}
+
+In terms of time complexity, the three steps of the decoding algorithm
+in equations (\ref{eq:admm:c_update}) - (\ref{eq:admm:u_update}) have to be
+considered.
+The $\tilde{\boldsymbol{c}}$- and $\boldsymbol{u}_j$-update steps are
+$\mathcal{O}\left( n \right)$ \cite[Sec. III. C.]{original_admm}.
+The complexity of the $\boldsymbol{z}_j$-update step depends on the projection
+algorithm employed.
+Since for the implementation completed for this work the projection algorithm
+presented in \cite{original_admm} is used, the $\boldsymbol{z}_j$-update step
+also has linear time complexity.
+
+\begin{figure}[H]
+    \centering
+
+    \begin{tikzpicture}
+        \begin{axis}[grid=both,
+                     xlabel={$n$}, ylabel={Time per frame (s)},
+                     width=0.6\textwidth,
+                     height=0.45\textwidth,
+                     legend style={at={(0.5,-0.42)},anchor=south},
+                     legend cell align={left},]
+
+            \addplot[NavyBlue, only marks, mark=triangle*]
+                table [col sep=comma, x=n, y=spf]
+                    {res/admm/fps_vs_n.csv};
+        \end{axis}
+    \end{tikzpicture}
+
+    \caption{Timing requirements of the \ac{LP} decoding using \ac{ADMM} implementation}
+    \label{fig:admm:time}
+\end{figure}%
+
+Simulation results from a range of different codes can be used to verify this
+analysis.
+Figure \ref{fig:admm:time} shows the average time needed to decode one
+frame as a function of its length.
+The codes used for this consideration are the same as in section \ref{subsec:prox:comp_perf}
+The results are necessarily skewed because these vary not only
+in their length, but also in their construction scheme and rate.
+Additionally, different optimization opportunities arise depending on the
+length of a code, since for smaller codes dynamic memory allocation can be
+completely omitted.
+This may explain why the datapoint at $n=504$ is higher then would be expected
+with linear behavior.
+Nonetheless, the simulation results roughly match the expected behavior
+following from the theoretical considerations.
+
+\begin{figure}[H]
+    \centering
+    \vspace*{5cm}
+\end{figure}
+
+\begin{figure}[H]
    \centering

    \begin{subfigure}[t]{0.48\textwidth}
@@ -1204,239 +1438,185 @@ as shown in figure \ref{fig:admm:mu_rho_multiple}.

    \end{subfigure}

-    \caption{Dependence of the \ac{BER} on the value of the parameter $\gamma$ for various codes}
+    \caption{Dependence of the average number of iterations required on the parameters
+        $\mu$ and $\rho$ for $E_b / N_0 = \SI{4}{dB}$ for various codes}
    \label{fig:admm:mu_rho_multiple}
 \end{figure}

-To get an estimate for the parameter $K$, the average error during decoding
-can be used.
-This is shown in figure \ref{fig:admm:avg_error} as an average of
-$\SI{100000}{}$ decodings.
-$\mu$ is set to 5 and $\rho$ is set to $1$ and the rest of the parameters are
-again chosen as $K=200, \epsilon_\text{pri}=10^{-5}$ and $ \epsilon_\text{dual}=10^{-5}$.
-Similarly to the results in section
-\ref{sec:prox:Analysis and Simulation Results}, a dip is visible around the
-$20$ iteration mark.
-This is due to the fact that as the number of iterations increases
-more and more decodings converge, leaving only the mistaken ones to be
-averaged.
-The point at which the wrong decodings start to become dominant and the
-decoding performance does not increase any longer is largely independent of
-the \ac{SNR}, allowing the value of $K$ to be chosen without considering the
-\ac{SNR}.
+\vfill

-\begin{figure}[h]
+\newpage
+
+\begin{figure}[H]
    \centering
-
-    \begin{tikzpicture}
-        \begin{axis}[
-            grid=both,
-            width=0.6\textwidth,
-            height=0.45\textwidth,
-            xlabel={Iteration}, ylabel={Average $\left\Vert \hat{\boldsymbol{c}}
-                - \boldsymbol{c} \right\Vert$}
-        ]
-            \addplot[ForestGreen, line width=1pt]
-                table [col sep=comma, x=k, y=err,
-                       discard if not={SNR}{1.0},
-                       discard if gt={k}{100}]
-                    {res/admm/avg_error_20433484.csv};
-            \addlegendentry{$E_b / N_0 = \SI{1}{dB}$}
-
-            \addplot[RedOrange, line width=1pt]
-                table [col sep=comma, x=k, y=err,
-                       discard if not={SNR}{2.0},
-                       discard if gt={k}{100}]
-                    {res/admm/avg_error_20433484.csv};
-            \addlegendentry{$E_b / N_0 = \SI{2}{dB}$}
-
-            \addplot[NavyBlue, line width=1pt]
-                table [col sep=comma, x=k, y=err,
-                       discard if not={SNR}{3.0},
-                       discard if gt={k}{100}]
-                    {res/admm/avg_error_20433484.csv};
-            \addlegendentry{$E_b / N_0 = \SI{3}{dB}$}
-
-            \addplot[RoyalPurple, line width=1pt]
-                table [col sep=comma, x=k, y=err,
-                       discard if not={SNR}{4.0},
-                       discard if gt={k}{100}]
-                    {res/admm/avg_error_20433484.csv};
-            \addlegendentry{$E_b / N_0 = \SI{4}{dB}$}
-        \end{axis}
-    \end{tikzpicture}
-
-    \caption{Average error for $\SI{100000}{}$ decodings. (3,6)
-        regular \ac{LDPC} code with $n=204, k=102$ \cite[\text{204.33.484}]{mackay_enc}}
-    \label{fig:admm:avg_error}
-\end{figure}%
-
-The last two parameters remaining to be examined are the tolerances for the
-stopping criterion of the algorithm, $\epsilon_\text{pri}$ and
-$\epsilon_\text{dual}$.
-These are both set to the same value $\epsilon$.
-The effect of their value on the decoding performance is visualized in figure
-\ref{fig:admm:epsilon}.
-All parameters except $\epsilon_\text{pri}$ and $\epsilon_\text{dual}$ are
-kept constant, with $K=200$, $\mu=5$, $\rho=1$ and $E_b / N_0 = \SI{4}{dB}$.
-A lower value for the tolerance initially leads to a dramatic decrease in the
-\ac{FER}, this effect fading as the tolerance becomes increasingly lower.
-
-\begin{figure}[h]
-    \centering
-
-    \begin{tikzpicture}
-        \begin{axis}[
-            grid=both,
-            xlabel={$\epsilon$}, ylabel={\acs{FER}},
-            ymode=log,
-            xmode=log,
-            x dir=reverse,
-            width=0.6\textwidth,
-            height=0.45\textwidth,
-        ]
-            \addplot[NavyBlue, line width=1pt, densely dashed, mark=*]
-                table [col sep=comma, x=epsilon, y=FER,
-                       discard if not={SNR}{3.0},]
-                    {res/admm/fer_epsilon_20433484.csv};
-        \end{axis}
-    \end{tikzpicture}
-
-    \caption{Effect of the value of the parameters $\epsilon_\text{pri}$ and
-        $\epsilon_\text{dual}$ on the \acs{FER}. (3,6) regular \ac{LDPC} code with
-        $n=204, k=102$ \cite[\text{204.33.484}]{mackay_enc}}
-    \label{fig:admm:epsilon}
-\end{figure}%
-
-In conclusion, the parameters $\mu$ and $\rho$ should be chosen comparatively
-small and large, respectively, to reduce the average runtime of the decoding
-process, while keeping them within a certain range as to not compromise the
-decoding performance.
-The maximum number of iterations $K$ performed can be chosen independantly
-of the \ac{SNR}.
-Finally, relatively small values should be given to the parameters
-$\epsilon_{\text{pri}}$ and $\epsilon_{\text{dual}}$ to achieve the lowest
-possible error rate.
-
-
-\subsection{Decoding Performance}
-
-In figure \ref{fig:admm:results}, the simulation results for the ``Margulis''
-\ac{LDPC} code ($n=2640$, $k=1320$) presented by Barman et al. in
-\cite{original_admm} are compared to the results from the simulations
-conducted in the context of this thesis.
-The parameters chosen were $\mu=3.3$, $\rho=1.9$, $K=1000$,
-$\epsilon_\text{pri}=10^{-5}$ and $\epsilon_\text{dual}=10^{-5}$,
-the same as in \cite{original_admm}.
-The two \ac{FER} curves are practically identical.
-Also shown is the curve resulting from \ac{BP} decoding, performing
-1000 iterations.
-The two algorithms perform relatively similarly, coming within $\SI{0.5}{dB}$
-of one another.
-
-\begin{figure}[h]
-    \centering
-
-    \begin{tikzpicture}
-        \begin{axis}[
-            grid=both,
-            xlabel={$E_b / N_0 \left( \text{dB} \right) $}, ylabel={\acs{FER}},
-            ymode=log,
-            width=0.6\textwidth,
-            height=0.45\textwidth,
-            legend style={at={(0.5,-0.57)},anchor=south},
-            legend cell align={left},
-        ]
-            \addplot[Turquoise, line width=1pt, mark=*]
-                table [col sep=comma, x=SNR, y=FER,
-                       discard if gt={SNR}{2.2},
-                       ]
-                    {res/admm/fer_paper_margulis.csv};
-            \addlegendentry{\acs{ADMM} (Barman et al.)}
-            \addplot[NavyBlue, densely dashed, line width=1pt, mark=triangle]
-                table [col sep=comma, x=SNR, y=FER,]
-                    {res/admm/ber_margulis264013203.csv};
-            \addlegendentry{\acs{ADMM} (Own results)}
-            \addplot[RoyalPurple, line width=1pt, mark=*]
-                table [col sep=comma, x=SNR, y=FER, discard if gt={SNR}{2.2},]
-                    {res/generic/fer_bp_mackay_margulis.csv};
-            \addlegendentry{\acs{BP} (Barman et al.)}
-        \end{axis}
-    \end{tikzpicture}
    
-    \caption{Comparison of datapoints from Barman et al. with own simulation results.
-        ``Margulis'' \ac{LDPC} code with $n = 2640$, $k = 1320$
-        \cite[\text{Margulis2640.1320.3}]{mackay_enc}\protect\footnotemark{}}
-    \label{fig:admm:results}
-\end{figure}%
-%
-In figure \ref{fig:admm:ber_fer}, the \ac{BER} and \ac{FER} for \ac{LP} decoding
-using\ac{ADMM} and \ac{BP} are shown for a (3, 6) regular \ac{LDPC} code with 
-$n=204$.
-To ensure comparability, in both cases the number of iterations was set to
-$K=200$.
-The values of the other parameters were chosen as $\mu = 5$, $\rho = 1$,
-$\epsilon = 10^{-5}$ and $\epsilon=10^{-5}$.
-Comparing figures \ref{fig:admm:results} and \ref{fig:admm:ber_fer} it is
-apparent that the difference in decoding performance depends on the code being
-considered.
-More simulation results are presented in figure \ref{fig:comp:prox_admm_dec}
-in section \ref{sec:comp:res}.
-
-
-\begin{figure}[h]
-    \centering
-
-    \begin{subfigure}[c]{0.48\textwidth}
+    \begin{subfigure}[t]{0.48\textwidth}
        \centering
    
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$\mu$}, ylabel={\acs{BER}},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
+                ymax=1.5, ymin=8e-5,
                width=\textwidth,
                height=0.75\textwidth,
-                ymax=1.5, ymin=3e-7,
            ]
+
                \addplot[Turquoise, line width=1pt, mark=*]
-                    table [col sep=comma, x=SNR, y=BER,
-                           discard if not={mu}{5.0},
-                           discard if gt={SNR}{4.5}]
-                        {res/admm/ber_2d_20433484.csv};
-                \addplot[RoyalPurple, line width=1pt, mark=*]
-                    table [col sep=comma, x=SNR, y=BER,
-                           discard if gt={SNR}{4.5}]
-                        {/home/andreas/bp_20433484.csv};
+                    table [x=SNR, y=FER, col sep=comma, discard if not={mu}{3.0}]
+                        %{res/hybrid/2d_ber_fer_dfr_963965.csv};
+                        {res/admm/ber_2d_963965.csv};
+                \addplot [RoyalPurple, mark=*, line width=1pt]
+                    table [x=SNR, y=FER, col sep=comma]
+                        {res/generic/bp_963965.csv};
            \end{axis}
        \end{tikzpicture}
+
+        \caption{$\left( 3, 6 \right)$-regular \ac{LDPC} code with $n=96, k=48$
+            \cite[\text{96.3.965}]{mackay_enc}}
    \end{subfigure}%
    \hfill%
-    \begin{subfigure}[c]{0.48\textwidth}
+    \begin{subfigure}[t]{0.48\textwidth}
        \centering
    
        \begin{tikzpicture}
            \begin{axis}[
                grid=both,
-                xlabel={$\rho$}, ylabel={\acs{FER}},
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                ymode=log,
+                ymax=1.5, ymin=8e-5,
                width=\textwidth,
                height=0.75\textwidth,
-                ymax=1.5, ymin=3e-7,
            ]
+        
                \addplot[Turquoise, line width=1pt, mark=*]
-                    table [col sep=comma, x=SNR, y=FER,
-                           discard if not={mu}{5.0},
-                           discard if gt={SNR}{4.5}]
-                        {res/admm/ber_2d_20433484.csv};
-                \addplot[RoyalPurple, line width=1pt, mark=*]
-                    table [col sep=comma, x=SNR, y=FER,
-                           discard if gt={SNR}{4.5}]
-                        {/home/andreas/bp_20433484.csv};
+                    table [x=SNR, y=FER, col sep=comma, discard if not={mu}{3.0}]
+                        {res/admm/ber_2d_bch_31_26.csv};
+                \addplot [RoyalPurple, mark=*, line width=1pt]
+                    table [x=SNR, y=FER, col sep=comma]
+                        {res/generic/bp_bch_31_26.csv};
            \end{axis}
        \end{tikzpicture}
+    
+        \caption{BCH code with $n=31, k=26$}
+    \end{subfigure}%
+    
+    \vspace{3mm}
+
+    \begin{subfigure}[t]{0.48\textwidth}
+        \centering
+    
+        \begin{tikzpicture}
+            \begin{axis}[
+                grid=both,
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
+                ymode=log,
+                ymax=1.5, ymin=8e-5,
+                width=\textwidth,
+                height=0.75\textwidth,
+            ]
+        
+                \addplot[Turquoise, line width=1pt, mark=*]
+                    table [x=SNR, y=FER, col sep=comma,
+                           discard if not={mu}{3.0},
+                           discard if gt={SNR}{5.5}]
+                        {res/admm/ber_2d_20433484.csv};
+                \addplot [RoyalPurple, mark=*, line width=1pt]
+                    table [x=SNR, y=FER, col sep=comma]
+                        {res/generic/bp_20433484.csv};
+            \end{axis}
+        \end{tikzpicture}
+    
+        \caption{$\left( 3, 6 \right)$-regular \ac{LDPC} code with $n=204, k=102$
+            \cite[\text{204.33.484}]{mackay_enc}}
+    \end{subfigure}%
+    \hfill%
+    \begin{subfigure}[t]{0.48\textwidth}
+        \centering
+    
+        \begin{tikzpicture}
+            \begin{axis}[
+                grid=both,
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
+                ymode=log,
+                ymax=1.5, ymin=8e-5,
+                width=\textwidth,
+                height=0.75\textwidth,
+            ]
+        
+                \addplot[Turquoise, line width=1pt, mark=*]
+                    table [x=SNR, y=FER, col sep=comma, discard if not={mu}{3.0}]
+                        {res/admm/ber_2d_20455187.csv};
+                \addplot [RoyalPurple, mark=*, line width=1pt,
+                          discard if gt={SNR}{5}]
+                    table [x=SNR, y=FER, col sep=comma]
+                        {res/generic/bp_20455187.csv};
+            \end{axis}
+        \end{tikzpicture}
+    
+        \caption{$\left( 5, 10 \right)$-regular \ac{LDPC} code with $n=204, k=102$
+            \cite[\text{204.55.187}]{mackay_enc}}
    \end{subfigure}%

+    \vspace{3mm}
+    
+    \begin{subfigure}[t]{0.48\textwidth}
+        \centering
+    
+        \begin{tikzpicture}
+            \begin{axis}[
+                grid=both,
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
+                ymode=log,
+                ymax=1.5, ymin=8e-5,
+                width=\textwidth,
+                height=0.75\textwidth,
+            ]
+        
+                \addplot[Turquoise, line width=1pt, mark=*]
+                    table [x=SNR, y=FER, col sep=comma, discard if not={mu}{3.0}]
+                        {res/admm/ber_2d_40833844.csv};
+                \addplot [RoyalPurple, mark=*, line width=1pt,
+                          discard if gt={SNR}{3}]
+                    table [x=SNR, y=FER, col sep=comma]
+                        {res/generic/bp_40833844.csv};
+            \end{axis}
+        \end{tikzpicture}
+    
+        \caption{$\left( 3, 6 \right)$-regular \ac{LDPC} code with $n=204, k=102$
+            \cite[\text{204.33.484}]{mackay_enc}}
+    \end{subfigure}%
+    \hfill%
+    \begin{subfigure}[t]{0.48\textwidth}
+        \centering
+    
+        \begin{tikzpicture}
+            \begin{axis}[
+                grid=both,
+                xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
+                ymode=log,
+                ymax=1.5, ymin=8e-5,
+                width=\textwidth,
+                height=0.75\textwidth,
+            ]
+        
+                \addplot[Turquoise, line width=1pt, mark=*]
+                    table [x=SNR, y=FER, col sep=comma, discard if not={mu}{3.0}]
+                        {res/admm/ber_2d_pegreg252x504.csv};
+                \addplot [RoyalPurple, mark=*, line width=1pt]
+                    table [x=SNR, y=FER, col sep=comma,
+                           discard if gt={SNR}{3}]
+                        {res/generic/bp_pegreg252x504.csv};
+            \end{axis}
+        \end{tikzpicture}
+    
+        \caption{LDPC code (progressive edge growth construction) with $n=504, k=252$
+            \cite[\text{PEGReg252x504}]{mackay_enc}}
+    \end{subfigure}%
+    
+    \vspace{5mm}
+    
    \begin{subfigure}[t]{\textwidth}
        \centering

@@ -1444,74 +1624,20 @@ in section \ref{sec:comp:res}.
            \begin{axis}[hide axis,
                         xmin=10, xmax=50,
                         ymin=0, ymax=0.4,
-                         legend columns=3,
-                         legend style={draw=white!15!black,legend cell align=left}]
-
+                         legend columns=1,
+                         legend cell align={left},
+                         legend style={draw=white!15!black}]
+               
                \addlegendimage{Turquoise, line width=1pt, mark=*}
                \addlegendentry{\acs{LP} decoding using \acs{ADMM}}
-                \addlegendimage{RoyalPurple, line width=1pt, mark=*}
-                \addlegendentry{BP (200 iterations)}
+                
+                \addlegendimage{RoyalPurple, line width=1pt, mark=*, solid}
+                \addlegendentry{\acs{BP} (200 iterations)}
            \end{axis}
        \end{tikzpicture}
    \end{subfigure}

-    \caption{Comparison of the decoding performance of \acs{LP} decoding using
-        \acs{ADMM} and \acs{BP}. (3,6) regular \ac{LDPC} code with $n = 204$, $k = 102$
-        \cite[\text{204.33.484}]{mackay_enc}}
-    \label{fig:admm:ber_fer}
-\end{figure}%
-
-In summary, the decoding performance of \ac{LP} decoding using \ac{ADMM} comes
-close to that of \ac{BP}, their difference staying in the range of
-approximately $\SI{0.5}{dB}$, depending on the code in question.
-
-\subsection{Computational Performance}
-\label{subsec:admm:comp_perf}
-
-In terms of time complexity, the three steps of the decoding algorithm
-in equations (\ref{eq:admm:c_update}) - (\ref{eq:admm:u_update}) have to be
-considered.
-The $\tilde{\boldsymbol{c}}$- and $\boldsymbol{u}_j$-update steps are
-$\mathcal{O}\left( n \right)$ \cite[Sec. III. C.]{original_admm}.
-The complexity of the $\boldsymbol{z}_j$-update step depends on the projection
-algorithm employed.
-Since for the implementation completed for this work the projection algorithm
-presented in \cite{original_admm} is used, the $\boldsymbol{z}_j$-update step
-also has linear time complexity.
-
-\begin{figure}[h]
-    \centering
-
-    \begin{tikzpicture}
-        \begin{axis}[grid=both,
-                     xlabel={$n$}, ylabel={Time per frame (s)},
-                     width=0.6\textwidth,
-                     height=0.45\textwidth,
-                     legend style={at={(0.5,-0.42)},anchor=south},
-                     legend cell align={left},]
-
-            \addplot[NavyBlue, only marks, mark=triangle*]
-                table [col sep=comma, x=n, y=spf]
-                    {res/admm/fps_vs_n.csv};
-        \end{axis}
-    \end{tikzpicture}
-
-    \caption{Timing requirements of the \ac{LP} decoding using \ac{ADMM} implementation}
-    \label{fig:admm:time}
-\end{figure}%
-
-Simulation results from a range of different codes can be used to verify this
-analysis.
-Figure \ref{fig:admm:time} shows the average time needed to decode one
-frame as a function of its length.
-The codes used for this consideration are the same as in section \ref{subsec:prox:comp_perf}
-The results are necessarily skewed because these vary not only
-in their length, but also in their construction scheme and rate.
-Additionally, different optimization opportunities arise depending on the
-length of a code, since for smaller codes dynamic memory allocation can be
-completely omitted.
-This may explain why the datapoint at $n=504$ is higher then would be expected
-with linear behavior.
-Nonetheless, the simulation results roughly match the expected behavior
-following from the theoretical considerations.
-
+    \caption{Comparison of the decoding performance of \ac{LP} decoding using \ac{ADMM} 
+        and \ac{BP} for various codes}
+    \label{fig:admm:bp_multiple}
+\end{figure}
--- a/latex/thesis/chapters/proximal_decoding.tex
+++ b/latex/thesis/chapters/proximal_decoding.tex
--- a/latex/thesis/chapters/theoretical_background.tex
+++ b/latex/thesis/chapters/theoretical_background.tex
@@ -1,13 +1,13 @@
 \chapter{Theoretical Background}%
 \label{chapter:theoretical_background}

-In this chapter, the theoretical background necessary to understand this
-work is given.
+In this chapter, the theoretical background necessary to understand the
+decoding algorithms examined in this work is given.
 First, the notation used is clarified.
-The physical aspects are detailed - the used modulation scheme and channel model.
+The physical layer is detailed - the used modulation scheme and channel model.
 A short introduction to channel coding with binary linear codes and especially
 \ac{LDPC} codes is given.
-The established methods of decoding LPDC codes are briefly explained.
+The established methods of decoding \ac{LDPC} codes are briefly explained.
 Lastly, the general process of decoding using optimization techniques is described
 and an overview of the utilized optimization methods is given.

@@ -31,7 +31,7 @@ Additionally, a shorthand notation will be used, denoting a set of indices as%
        \hspace{5mm} m < n, \hspace{2mm} m,n\in\mathbb{Z}
 .\end{align*}
 %
-In order to designate elemen-twise operations, in particular the \textit{Hadamard product}
+In order to designate element-wise operations, in particular the \textit{Hadamard product}
 and the \textit{Hadamard power}, the operator $\circ$ will be used:%
 %
 \begin{alignat*}{3}
@@ -45,7 +45,7 @@ and the \textit{Hadamard power}, the operator $\circ$ will be used:%


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\section{Preliminaries: Channel Model and Modulation}
+\section{Channel Model and Modulation}
 \label{sec:theo:Preliminaries: Channel Model and Modulation}

 In order to transmit a bit-word $\boldsymbol{c} \in \mathbb{F}_2^n$ of length
@@ -82,7 +82,7 @@ conducting this process, whereby \textit{data words} are mapped onto longer
 \textit{codewords}, which carry redundant information.
 \Ac{LDPC} codes have become especially popular, since they are able to
 reach arbitrarily small probabilities of error at code rates up to the capacity
-of the channel \cite[Sec. II.B.]{mackay_rediscovery} while having a structure
+of the channel \cite[Sec. II.B.]{mackay_rediscovery}, while having a structure
 that allows for very efficient decoding.

 The lengths of the data words and codewords are denoted by $k\in\mathbb{N}$
@@ -97,7 +97,7 @@ the number of parity-checks:%
            \boldsymbol{H}\boldsymbol{c}^\text{T} = \boldsymbol{0} \right\}
 .\end{align*}
 %
-A data word $\boldsymbol{u} \in \mathbb{F}_2^k$ can be mapped onto a codword
+A data word $\boldsymbol{u} \in \mathbb{F}_2^k$ can be mapped onto a codeword
 $\boldsymbol{c} \in \mathbb{F}_2^n$ using the \textit{generator matrix}
 $\boldsymbol{G} \in \mathbb{F}_2^{k\times n}$:%
 %
@@ -179,9 +179,9 @@ codewords:
    &= \argmax_{c\in\mathcal{C}} \frac{f_{\boldsymbol{Y} \mid \boldsymbol{C}}
        \left( \boldsymbol{y} \mid \boldsymbol{c} \right) p_{\boldsymbol{C}}
        \left( \boldsymbol{c} \right)}{f_{\boldsymbol{Y}}\left( \boldsymbol{y} \right) } \\
-    &= \argmax_{c\in\mathcal{C}} f_{\boldsymbol{Y} \mid \boldsymbol{C}}
-        \left( \boldsymbol{y} \mid \boldsymbol{c} \right) p_{\boldsymbol{C}}
-        \left( \boldsymbol{c} \right) \\
+%    &= \argmax_{c\in\mathcal{C}} f_{\boldsymbol{Y} \mid \boldsymbol{C}}
+%        \left( \boldsymbol{y} \mid \boldsymbol{c} \right) p_{\boldsymbol{C}}
+%        \left( \boldsymbol{c} \right) \\
    &= \argmax_{c\in\mathcal{C}}f_{\boldsymbol{Y} \mid \boldsymbol{C}}
        \left( \boldsymbol{y} \mid \boldsymbol{c} \right)
 .\end{align*}
@@ -204,7 +204,7 @@ Each row of $\boldsymbol{H}$, which represents one parity-check, is viewed as a
 Each component of the codeword $\boldsymbol{c}$ is interpreted as a \ac{VN}.
 The relationship between \acp{CN} and \acp{VN} can then be plotted by noting
 which components of $\boldsymbol{c}$ are considered for which parity-check.
-Figure \ref{fig:theo:tanner_graph} shows the tanner graph for the
+Figure \ref{fig:theo:tanner_graph} shows the Tanner graph for the
 (7,4) Hamming code, which has the following parity-check matrix
 \cite[Example 5.7.]{ryan_lin_2009}:%
 %
@@ -263,7 +263,7 @@ Figure \ref{fig:theo:tanner_graph} shows the tanner graph for the
        \draw (cn3) -- (c7);
    \end{tikzpicture}

-    \caption{Tanner graph for the (7,4)-Hamming-code}
+    \caption{Tanner graph for the (7,4) Hamming code}
    \label{fig:theo:tanner_graph}
 \end{figure}%
 %
@@ -285,15 +285,16 @@ Message passing algorithms are based on the notion of passing messages between
 \acp{CN} and \acp{VN}.
 \Ac{BP} is one such algorithm that is commonly used to decode \ac{LDPC} codes.
 It aims to compute the posterior probabilities
-$p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$
-\cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate $\hat{\boldsymbol{c}}$.
+$p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$,
+see \cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate
+$\hat{\boldsymbol{c}}$.
 For cycle-free graphs this goal is reached after a finite
 number of steps and \ac{BP} is equivalent to \ac{MAP} decoding.
-When the graph contains cycles, however, \ac{BP} only approximates the probabilities
+When the graph contains cycles, however, \ac{BP} only approximates the \ac{MAP} probabilities
 and is sub-optimal.
 This leads to generally worse performance than \ac{MAP} decoding for practical codes.
 Additionally, an \textit{error floor} appears for very high \acp{SNR}, making
-the use of \ac{BP} impractical for applications where a very low \ac{BER} is
+the use of \ac{BP} impractical for applications where a very low error rate is
 desired \cite[Sec. 15.3]{ryan_lin_2009}.
 Another popular decoding method for \ac{LDPC} codes is the
 \textit{min-sum algorithm}.
@@ -341,7 +342,7 @@ In contrast to the established message-passing decoding algorithms,
 the perspective then changes from observing the decoding process in its
 Tanner graph representation with \acp{VN} and \acp{CN} (as shown in figure \ref{fig:dec:tanner})
 to a spatial representation (figure \ref{fig:dec:spatial}),
-where the codewords are some of the edges of a hypercube.
+where the codewords are some of the vertices of a hypercube.
 The goal is to find the point $\tilde{\boldsymbol{c}}$,
 which minimizes the objective function $g$.

@@ -457,29 +458,38 @@ which minimizes the objective function $g$.


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\section{An introduction to the proximal gradient method and ADMM}
+\section{A Short Introduction to the Proximal Gradient Method and ADMM}
 \label{sec:theo:Optimization Methods}

+In this section, the general ideas behind the optimization methods used in
+this work are outlined.
+The application of these optimization methods to channel decoding decoding
+will be discussed in later chapters.
+Two methods are introduced, the \textit{proximal gradient method} and
+\ac{ADMM}.
+
 \textit{Proximal algorithms} are algorithms for solving convex optimization
-problems, that rely on the use of \textit{proximal operators}.
+problems that rely on the use of \textit{proximal operators}.
 The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$
 of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
 \cite[Sec. 1.1]{proximal_algorithms}%
 %
 \begin{align*}
-    \textbf{prox}_{\lambda f}\left( \boldsymbol{v} \right) = \argmin_{\boldsymbol{x}} \left(
-        f\left( \boldsymbol{x} \right) + \frac{1}{2\lambda}\lVert \boldsymbol{x}
-            - \boldsymbol{v} \rVert_2^2 \right)
+    \textbf{prox}_{\lambda f}\left( \boldsymbol{v} \right)
+        = \argmin_{\boldsymbol{x} \in \mathbb{R}^n} \left(
+            f\left( \boldsymbol{x} \right) + \frac{1}{2\lambda}\lVert \boldsymbol{x}
+                - \boldsymbol{v} \rVert_2^2 \right)
 .\end{align*}
 %
 This operator computes a point that is a compromise between minimizing $f$
 and staying in the proximity of $\boldsymbol{v}$.
-The parameter $\lambda$ determines how heavily each term is weighed.
-The \textit{proximal gradient method} is an iterative optimization method
+The parameter $\lambda$ determines how each term is weighed.
+The proximal gradient method is an iterative optimization method
 utilizing proximal operators, used to solve problems of the form%
 %
 \begin{align*}
-    \text{minimize}\hspace{5mm}f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right) 
+    \underset{\boldsymbol{x} \in \mathbb{R}^n}{\text{minimize}}\hspace{5mm}
+        f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right) 
 \end{align*}
 %
 that consists of two steps: minimizing $f$ with gradient descent
@@ -492,14 +502,14 @@ and minimizing $g$ using the proximal operator
 ,\end{align*}
 %
 Since $g$ is minimized with the proximal operator and is thus not required
-to be differentiable, it can be used to encode the constraints of the problem
+to be differentiable, it can be used to encode the constraints of the optimization problem
 (e.g., in the form of an \textit{indicator function}, as mentioned in
 \cite[Sec. 1.2]{proximal_algorithms}).

-The \ac{ADMM} is another optimization method.
+\ac{ADMM} is another optimization method.
 In this thesis it will be used to solve a \textit{linear program}, which
-is a special type of convex optimization problem, where the objective function
-is linear, and the constraints consist of linear equalities and inequalities.
+is a special type of convex optimization problem in which the objective function
+is linear and the constraints consist of linear equalities and inequalities.
 Generally, any linear program can be expressed in \textit{standard form}%
 \footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be
 interpreted componentwise.}
@@ -507,38 +517,53 @@ interpreted componentwise.}
 %
 \begin{alignat}{3}
    \begin{alignedat}{3}
-        \text{minimize }\hspace{2mm}   && \boldsymbol{\gamma}^\text{T} \boldsymbol{x}         \\
+        \underset{\boldsymbol{x}\in\mathbb{R}^n}{\text{minimize }}\hspace{2mm}  
+            && \boldsymbol{\gamma}^\text{T} \boldsymbol{x}         \\
        \text{subject to }\hspace{2mm} && \boldsymbol{A}\boldsymbol{x}   & = \boldsymbol{b}   \\
-                                       &&               \boldsymbol{x}   & \ge \boldsymbol{0}.
+                                       &&               \boldsymbol{x}   & \ge \boldsymbol{0},
    \end{alignedat}
    \label{eq:theo:admm_standard}
 \end{alignat}%
 %
-A technique called \textit{Lagrangian relaxation} \cite[Sec. 11.4]{intro_to_lin_opt_book}
-can then be applied.
+where $\boldsymbol{x}, \boldsymbol{\gamma} \in \mathbb{R}^n$, $\boldsymbol{b} \in \mathbb{R}^m$
+and $\boldsymbol{A}\in\mathbb{R}^{m \times n}$.
+A technique called \textit{Lagrangian relaxation} can then be applied
+\cite[Sec. 11.4]{intro_to_lin_opt_book}.
 First, some of the constraints are moved into the objective function itself
 and weights $\boldsymbol{\lambda}$ are introduced. A new, relaxed problem
-is then formulated as
+is formulated as
 %
 \begin{align}
    \begin{aligned}
-        \text{minimize }\hspace{2mm}   & \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
-            + \boldsymbol{\lambda}^\text{T}\left(\boldsymbol{b}
-                - \boldsymbol{A}\boldsymbol{x} \right)  \\
+        \underset{\boldsymbol{x}\in\mathbb{R}^n}{\text{minimize }}\hspace{2mm}
+            & \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
+            + \boldsymbol{\lambda}^\text{T}\left(
+                \boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\right)  \\
        \text{subject to }\hspace{2mm} & \boldsymbol{x} \ge \boldsymbol{0},
    \end{aligned}
    \label{eq:theo:admm_relaxed}
 \end{align}%
 %
 the new objective function being the \textit{Lagrangian}%
+\footnote{
+    Depending on what literature is consulted, the definition of the Lagrangian differs
+    in the order of $\boldsymbol{A}\boldsymbol{x}$ and $\boldsymbol{b}$.
+    As will subsequently be seen, however, the only property of the Lagrangian having
+    any bearing on the optimization process is that minimizing it gives a lower bound
+    on the optimal objective of the original problem.
+    This property is satisfied no matter the order of the terms and the order
+    chosen here is the one used in the \ac{LP} decoding literature making use of
+    \ac{ADMM}.
+}%
 %
 \begin{align*}
 \mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
    = \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
-        + \boldsymbol{\lambda}^\text{T}\left(\boldsymbol{b}
-            - \boldsymbol{A}\boldsymbol{x} \right)
+        + \boldsymbol{\lambda}^\text{T}\left(
+            \boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\right)
 .\end{align*}%
 %
+
 This problem is not directly equivalent to the original one, as the
 solution now depends on the choice of the \textit{Lagrange multipliers}
 $\boldsymbol{\lambda}$.
@@ -562,12 +587,12 @@ Furthermore, for uniquely solvable linear programs \textit{strong duality}
 always holds \cite[Theorem 4.4]{intro_to_lin_opt_book}.
 This means that not only is it a lower bound, the tightest lower
 bound actually reaches the value itself:
-In other words, with the optimal choice of $\boldsymbol{\lambda}$,
+in other words, with the optimal choice of $\boldsymbol{\lambda}$,
 the optimal objectives of the problems (\ref{eq:theo:admm_relaxed})
-and (\ref{eq:theo:admm_standard}) have the same value.
+and (\ref{eq:theo:admm_standard}) have the same value, i.e.,
 %
 \begin{align*}
-    \max_{\boldsymbol{\lambda}} \, \min_{\boldsymbol{x} \ge \boldsymbol{0}}
+    \max_{\boldsymbol{\lambda}\in\mathbb{R}^m} \, \min_{\boldsymbol{x} \ge \boldsymbol{0}}
        \mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right) 
    = \min_{\substack{\boldsymbol{x} \ge \boldsymbol{0} \\ \boldsymbol{A}\boldsymbol{x}
            = \boldsymbol{b}}}
@@ -577,7 +602,7 @@ and (\ref{eq:theo:admm_standard}) have the same value.
 Thus, we can define the \textit{dual problem} as the search for the tightest lower bound:%
 %
 \begin{align}
-    \underset{\boldsymbol{\lambda}}{\text{maximize }}\hspace{2mm}
+    \underset{\boldsymbol{\lambda}\in\mathbb{R}^m}{\text{maximize }}\hspace{2mm}
        & \min_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}
        \left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
    \label{eq:theo:dual}
@@ -600,7 +625,7 @@ using equation (\ref{eq:theo:admm_obtain_primal}); then, update $\boldsymbol{\la
 using gradient descent \cite[Sec. 2.1]{distr_opt_book}:%
 %
 \begin{align*}
-    \boldsymbol{x} &\leftarrow \argmin_{\boldsymbol{x}} \mathcal{L}\left(
+    \boldsymbol{x} &\leftarrow \argmin_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}\left(
        \boldsymbol{x}, \boldsymbol{\lambda} \right) \\
    \boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
        + \alpha\left( \boldsymbol{A}\boldsymbol{x} - \boldsymbol{b} \right),
@@ -608,12 +633,12 @@ using gradient descent \cite[Sec. 2.1]{distr_opt_book}:%
 .\end{align*}
 %
 The algorithm can be improved by observing that when the objective function
-$g: \mathbb{R}^n \rightarrow \mathbb{R}$ is separable into a number
-$N \in \mathbb{N}$ of sub-functions
+$g: \mathbb{R}^n \rightarrow \mathbb{R}$ is separable into a sum of
+$N \in \mathbb{N}$ sub-functions
 $g_i: \mathbb{R}^{n_i} \rightarrow \mathbb{R}$,
 i.e., $g\left( \boldsymbol{x} \right) = \sum_{i=1}^{N} g_i
 \left( \boldsymbol{x}_i \right)$,
-where $\boldsymbol{x}_i,\hspace{1mm} i\in [1:N]$ are subvectors of
+where $\boldsymbol{x}_i\in\mathbb{R}^{n_i},\hspace{1mm} i\in [1:N]$ are subvectors of
 $\boldsymbol{x}$, the Lagrangian is as well:
 %
 \begin{align*}
@@ -624,18 +649,18 @@ $\boldsymbol{x}$, the Lagrangian is as well:
 \begin{align*}
    \mathcal{L}\left( \left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda} \right)
        = \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right) 
-            + \boldsymbol{\lambda}^\text{T} \left( \boldsymbol{b}
-            - \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} \right) 
+            + \boldsymbol{\lambda}^\text{T} \left(
+            \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} - \boldsymbol{b}\right) 
 .\end{align*}%
 %
-The matrices $\boldsymbol{A}_i, \hspace{1mm} i \in [1:N]$ are partitions of
-the matrix $\boldsymbol{A}$, corresponding to
+The matrices $\boldsymbol{A}_i \in \mathbb{R}^{m \times n_i}, \hspace{1mm} i \in [1:N]$
+form a partition of $\boldsymbol{A}$, corresponding to
 $\boldsymbol{A} = \begin{bmatrix}
    \boldsymbol{A}_1 &
    \ldots &
    \boldsymbol{A}_N
 \end{bmatrix}$.
-The minimization of each term can then happen in parallel, in a distributed
+The minimization of each term can happen in parallel, in a distributed
 fashion \cite[Sec. 2.2]{distr_opt_book}.
 In each minimization step, only one subvector $\boldsymbol{x}_i$ of
 $\boldsymbol{x}$ is considered, regarding all other subvectors as being
@@ -643,7 +668,7 @@ constant.
 This modified version of dual ascent is called \textit{dual decomposition}:
 %
 \begin{align*}
-    \boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i}\mathcal{L}\left(
+    \boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i \ge \boldsymbol{0}}\mathcal{L}\left(
        \left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda}\right) 
        \hspace{5mm} \forall i \in [1:N]\\
    \boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
@@ -657,14 +682,15 @@ This modified version of dual ascent is called \textit{dual decomposition}:
 It only differs in the use of an \textit{augmented Lagrangian}
 $\mathcal{L}_\mu\left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)$
 in order to strengthen the convergence properties.
-The augmented Lagrangian extends the ordinary one with an additional penalty term
-with the penaly parameter $\mu$:
+The augmented Lagrangian extends the classical one with an additional penalty term
+with the penalty parameter $\mu$:
 %
 \begin{align*}
    \mathcal{L}_\mu \left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)
        = \underbrace{\sum_{i=1}^{N} g_i\left( \boldsymbol{x_i} \right) 
-            + \boldsymbol{\lambda}^\text{T}\left( \boldsymbol{b}
-        - \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i \right)}_{\text{Ordinary Lagrangian}}
+            + \boldsymbol{\lambda}^\text{T}\left(\sum_{i=1}^{N}
+                \boldsymbol{A}_i\boldsymbol{x}_i - \boldsymbol{b}\right)}
+                _{\text{Classical Lagrangian}}
            + \underbrace{\frac{\mu}{2}\left\Vert \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
            - \boldsymbol{b} \right\Vert_2^2}_{\text{Penalty term}},
        \hspace{5mm} \mu > 0
@@ -674,21 +700,20 @@ The steps to solve the problem are the same as with dual decomposition, with the
 condition that the step size be $\mu$:%
 %
 \begin{align*}
-    \boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i}\mathcal{L}_\mu\left(
+    \boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i \ge \boldsymbol{0}}\mathcal{L}_\mu\left(
        \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda}\right) 
        \hspace{5mm} \forall i \in [1:N]\\
    \boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
        + \mu\left( \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
            - \boldsymbol{b} \right),
        \hspace{5mm} \mu > 0
-%    \boldsymbol{x}_1 &\leftarrow \argmin_{\boldsymbol{x}_1}\mathcal{L}_\mu\left(
-%        \boldsymbol{x}_1, \boldsymbol{x_2}, \boldsymbol{\lambda}\right) \\
-%    \boldsymbol{x}_2 &\leftarrow \argmin_{\boldsymbol{x}_2}\mathcal{L}_\mu\left(
-%        \boldsymbol{x}_1, \boldsymbol{x_2}, \boldsymbol{\lambda}\right) \\
-%    \boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
-%        + \mu\left( \boldsymbol{A}_1\boldsymbol{x}_1 + \boldsymbol{A}_2\boldsymbol{x}_2
-%            - \boldsymbol{b} \right),
-%        \hspace{5mm} \mu > 0
 .\end{align*}
 %

+In subsequent chapters, the decoding problem will be reformulated as an
+optimization problem using two different methodologies.
+In chapter \ref{chapter:proximal_decoding}, a non-convex optimization approach
+is chosen and addressed using the proximal gradient method.
+In chapter \ref{chapter:lp_dec_using_admm}, an \ac{LP} based optimization problem is
+formulated and solved using \ac{ADMM}.
+
--- a/latex/thesis/thesis.pdf
+++ b/latex/thesis/thesis.pdf
--- a/latex/thesis/thesis.tex
+++ b/latex/thesis/thesis.tex
@@ -14,7 +14,7 @@
 \thesisSupervisor{Dr.-Ing. Holger Jäkel}
 \thesisStartDate{24.10.2022}
 \thesisEndDate{24.04.2023}
-\thesisSignatureDate{Signature date} % TODO: Signature date
+\thesisSignatureDate{24.04.2023} % TODO: Signature date
 \thesisLanguage{english}
 \setlanguage

@@ -35,6 +35,7 @@
 \usetikzlibrary{spy}
 \usetikzlibrary{shapes.geometric}
 \usetikzlibrary{arrows.meta,arrows}
+\tikzset{>=latex}

 \pgfplotsset{compat=newest}
 \usepgfplotslibrary{colorbrewer}
@@ -209,6 +210,7 @@
    %
    % 6. Conclusion

+    \include{chapters/acknowledgements}

    \tableofcontents
    \cleardoublepage % make sure multipage TOCs are numbered correctly
@@ -220,7 +222,7 @@
    \include{chapters/comparison}
 %    \include{chapters/discussion}
    \include{chapters/conclusion}
-    \include{chapters/appendix}
+%    \include{chapters/appendix}


    %\listoffigures
Author	SHA1	Message	Date
Andreas Tsouchlos	feb8895d6b	Added compiled pdf document	2023-04-25 16:41:03 +02:00
Andreas Tsouchlos	735b4f62ff	Fixed captions	2023-04-24 23:23:35 +02:00
Andreas Tsouchlos	dd15a2affd	Moved figures around and fixed caption	2023-04-24 22:55:55 +02:00
Andreas Tsouchlos	ca345d7d5b	Almost almost done with corrections	2023-04-24 20:57:02 +02:00
Andreas Tsouchlos	90ee310775	Almost done with corrections	2023-04-24 18:35:53 +02:00
Andreas Tsouchlos	302275cb45	Minor wording changes	2023-04-24 12:55:11 +02:00
Andreas Tsouchlos	4572cde3e8	Rewrote introduction and conclusion	2023-04-24 12:29:01 +02:00
Andreas Tsouchlos	a58b1dd42d	Rewrote introduction	2023-04-24 10:23:56 +02:00
Andreas Tsouchlos	0b12fcb419	First round of corrections	2023-04-23 23:57:15 +02:00
Andreas Tsouchlos	c088a92b3b	Wrote introduction	2023-04-23 15:02:27 +02:00