\chapter{Proximal Decoding}%
\label{chapter:proximal_decoding}

In this chapter, the proximal decoding algorithm is examined.
First, the algorithm itself is described.
Then, some interesting ideas concerning the implementation are presented.
Simulation results are shown, on the basis of which the behaviour of the
algorithm is investigated for different codes and parameters.
Finally, an improvement on proximal decoding is proposed.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Decoding Algorithm}%
\label{sec:prox:Decoding Algorithm}

Proximal decoding was proposed by Wadayama et al. as a novel formulation of
optimization-based decoding \cite{proximal_paper}.
With this algorithm, minimization is performed using the proximal gradient
method.
In contrast to \ac{LP} decoding, the objective function is based on a
non-convex optimization formulation of the \ac{MAP} decoding problem.

In order to derive the objective function, the authors begin with the
\ac{MAP} decoding rule, expressed as a continuous maximization problem%
\footnote{The expansion of the domain to be continuous doesn't constitute a
material difference in the meaning of the rule.
The only change is that what previously were \acp{PMF} now have to be expressed
in terms of \acp{PDF}.}
over $\boldsymbol{x}$:%
%
\begin{align}
    \hat{\boldsymbol{x}} = \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}}
        f_{\tilde{\boldsymbol{X}} \mid \boldsymbol{Y}}
        \left( \tilde{\boldsymbol{x}} \mid \boldsymbol{y} \right)
        = \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}} f_{\boldsymbol{Y}
            \mid \tilde{\boldsymbol{X}}}
        \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
        f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)%
    \label{eq:prox:vanilla_MAP}
.\end{align}%
%
The likelihood $f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}
\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $ is a known function
determined by the channel model.
The prior \ac{PDF} $f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)$ is also
known, as the equal probability assumption is made on
$\mathcal{C}$.
However, since the considered domain is continuous,
the prior \ac{PDF} cannot be ignored as a constant during the minimization
as is often done, and has a rather unwieldy representation:%
%
\begin{align}
    f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right) =
        \frac{1}{\left| \mathcal{C} \right| }
            \sum_{\boldsymbol{c} \in \mathcal{C} }
                \delta\big( \tilde{\boldsymbol{x}} - \left( -1 \right) ^{\boldsymbol{c}}\big)
    \label{eq:prox:prior_pdf}
.\end{align}%
%
In order to rewrite the prior \ac{PDF}
$f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)$,
the so-called \textit{code-constraint polynomial} is introduced as:%
%
\begin{align*}
    h\left( \tilde{\boldsymbol{x}} \right) =
        \underbrace{\sum_{i=1}^{n} \left( \tilde{x_i}^2-1 \right) ^2}_{\text{Bipolar constraint}}
        + \underbrace{\sum_{j=1}^{m} \left[
            \left( \prod_{i\in N_c \left( j \right) } \tilde{x_i} \right)
        -1 \right] ^2}_{\text{Parity constraint}}%
.\end{align*}%
%
The intention of this function is to provide a way to penalize vectors far
from a codeword and favor those close to one.
In order to achieve this, the polynomial is composed of two parts: one term
representing the bipolar constraint, providing for a discrete solution of the
continuous optimization problem, and one term representing the parity
constraints, accommodating the role of the parity-check matrix $\boldsymbol{H}$.
The prior \ac{PDF} is then approximated using the code-constraint polynomial as:%
%
\begin{align}
    f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)
    \approx \frac{1}{Z}\mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) }%
    \label{eq:prox:prior_pdf_approx}
.\end{align}%
%
The authors justify this approximation by arguing that for
$\gamma \rightarrow \infty$, the approximation in equation
(\ref{eq:prox:prior_pdf_approx}) approaches the original function in equation
(\ref{eq:prox:prior_pdf}).
This approximation can then be plugged into equation (\ref{eq:prox:vanilla_MAP})
and the likelihood can be rewritten using the negative log-likelihood
$L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) = -\ln\left(
        f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}\left(
        \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) \right) $:%
%
\begin{align*}
    \hat{\boldsymbol{x}} &= \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}}
            \mathrm{e}^{- L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) }
            \mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) } \\
        &= \argmin_{\tilde{\boldsymbol{x}} \in \mathbb{R}^n}
            L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
            + \gamma h\left( \tilde{\boldsymbol{x}} \right)%
.\end{align*}%
%
Thus, with proximal decoding, the objective function
$g\left( \tilde{\boldsymbol{x}} \right)$ considered is%
%
\begin{align}
    g\left( \tilde{\boldsymbol{x}} \right) = L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}}
    \right)
        + \gamma h\left( \tilde{\boldsymbol{x}} \right)%
    \label{eq:prox:objective_function}
\end{align}%
%
and the decoding problem is reformulated to%
%
\begin{align*}    
    \text{minimize}\hspace{2mm}   &L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
        + \gamma h\left( \tilde{\boldsymbol{x}} \right)\\
    \text{subject to}\hspace{2mm} &\tilde{\boldsymbol{x}} \in \mathbb{R}^n
.\end{align*}
%

For the solution of the approximate \ac{MAP} decoding problem, the two parts
of equation (\ref{eq:prox:objective_function}) are considered separately:
the minimization of the objective function occurs in an alternating
fashion, switching between the negative log-likelihood
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
code-constraint polynomial $\gamma h\left( \boldsymbol{x} \right) $.
Two helper variables, $\boldsymbol{r}$ and $\boldsymbol{s}$, are introduced,
describing the result of each of the two steps.
The first step, minimizing the log-likelihood, is performed using gradient
descent:%
%
\begin{align}
    \boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \nabla
        L\left( \boldsymbol{y} \mid \boldsymbol{s} \right),
    \hspace{5mm}\omega > 0
    \label{eq:prox:step_log_likelihood}
.\end{align}%
%
For the second step, minimizing the scaled code-constraint polynomial, the
proximal gradient method is used and the \textit{proximal operator} of
$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed.
It is then immediately approximated with gradient-descent:%
%
\begin{align*}
    \textbf{prox}_{\gamma h} \left( \tilde{\boldsymbol{x}} \right) &\equiv
        \argmin_{\boldsymbol{t} \in \mathbb{R}^n}
            \gamma h\left( \boldsymbol{t} \right) +
            \frac{1}{2} \left\Vert \boldsymbol{t} - \tilde{\boldsymbol{x}} \right\Vert \\
        &\approx \tilde{\boldsymbol{x}} - \gamma \nabla h \left( \tilde{\boldsymbol{x}} \right),
    \hspace{5mm} \gamma > 0, \text{ small}
.\end{align*}%
%
The second optimization step thus becomes%
%
\begin{align*}
    \boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right),
    \hspace{5mm}\gamma > 0,\text{ small}
.\end{align*}
%
While the approximation of the prior \ac{PDF} made in equation (\ref{eq:prox:prior_pdf_approx})
theoretically becomes better
with larger $\gamma$, the constraint that $\gamma$ be small is important,
as it keeps the effect of $h\left( \tilde{\boldsymbol{x}} \right) $ on the landscape
of the objective function small.
Otherwise, unwanted stationary points, including local minima, are introduced.
The authors say that ``in practice, the value of $\gamma$ should be adjusted
according to the decoding performance.'' \cite[Sec. 3.1]{proximal_paper}.

%The components of the gradient of the code-constraint polynomial can be computed as follows:%
%%
%\begin{align*}
%    \frac{\partial}{\partial x_k} h\left( \boldsymbol{x} \right) =
%        4\left( x_k^2 - 1 \right) x_k + \frac{2}{x_k}
%            \sum_{i\in \mathcal{B}\left( k \right) } \left(
%                \left( \prod_{j\in\mathcal{A}\left( i \right)} x_j\right)^2
%                - \prod_{j\in\mathcal{A}\left( i \right) }x_j \right)
%.\end{align*}%
%\todo{Only multiplication?}%
%\todo{$x_k$: $k$ or some other indexing variable?}%
%%
In the case of \ac{AWGN}, the likelihood
$f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}
    \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)$
is%
%
\begin{align*}
    f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}
        \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
        = \frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{e}^{
            -\frac{\lVert \boldsymbol{y}-\tilde{\boldsymbol{x}}
        \rVert^2 }
    {2\sigma^2}}
.\end{align*}
%
Thus, the gradient of the negative log-likelihood becomes%
\footnote{For the minimization, constants can be disregarded. For this reason,
it suffices to consider only proportionality instead of equality.}%
%
\begin{align*}
    \nabla L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
    &\propto -\nabla \lVert \boldsymbol{y} - \tilde{\boldsymbol{x}} \rVert^2\\
    &\propto \tilde{\boldsymbol{x}} - \boldsymbol{y}
,\end{align*}%
%
allowing equation (\ref{eq:prox:step_log_likelihood}) to be rewritten as%
%
\begin{align*}
    \boldsymbol{r} \leftarrow \boldsymbol{s}
        - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) 
.\end{align*}
%

One thing to consider during the actual decoding process, is that the gradient
of the code-constraint polynomial can take on extremely large values.
To avoid numerical instability, an additional step is added, where all
components of the current estimate are clipped to $\left[-\eta, \eta \right]$,
where $\eta$ is a positive constant slightly larger than one:%
%
\begin{align*}
    \boldsymbol{s} \leftarrow \Pi_{\eta} \left( \boldsymbol{r}
        - \gamma \nabla h\left( \boldsymbol{r} \right)  \right) 
,\end{align*}
%
$\Pi_{\eta}\left( \cdot \right) $ expressing the projection onto
$\left[ -\eta, \eta \right]^n$.

The iterative decoding process resulting from these considerations is
summarized in algorithm \ref{alg:prox}.

\begin{genericAlgorithm}[caption={Proximal decoding algorithm for an \ac{AWGN} channel},
    label={alg:prox}]
$\boldsymbol{s} \leftarrow \boldsymbol{0}$
for $K$ iterations do
    $\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $
    $\boldsymbol{s} \leftarrow \Pi_\eta \left(\boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right)$
    $\boldsymbol{\hat{x}} \leftarrow \text{sign}\left( \boldsymbol{s} \right) $
    if $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ do
        return $\boldsymbol{\hat{c}}$
    end if
end for
return $\boldsymbol{\hat{c}}$
\end{genericAlgorithm}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Implementation Details}%
\label{sec:prox:Implementation Details}

The algorithm was first implemented in Python because of the fast development
process and straightforward debugging ability.
It was subsequently reimplemented in C++ using the Eigen%
\footnote{\url{https://eigen.tuxfamily.org}}
linear algebra library to achieve higher performance.
The focus has been set on a fast implementation, sometimes at the expense of
memory usage, somewhat limiting the size of the codes the implemenation can be
used with \todo{Is this a appropriate for a bachelor's thesis?}.
The evaluation of the simulation results has been wholly realized in Python.

The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
is given by%
%
\begin{align*}
    \nabla h\left( \tilde{\boldsymbol{x}} \right) &= \begin{bmatrix}
        \frac{\partial}{\partial \tilde{x}_1}h\left( \tilde{\boldsymbol{x}} \right) &
        \ldots &
        \frac{\partial}{\partial \tilde{x}_n}h\left( \tilde{\boldsymbol{x}} \right) &
    \end{bmatrix}^\text{T}, \\[1em]
    \frac{\partial}{\partial \tilde{x}_k}h\left( \tilde{\boldsymbol{x}} \right)
        &= 4\left( \tilde{x}_k^2 - 1 \right) \tilde{x}_k
        + \frac{2}{\tilde{x}_k} \sum_{j\in N_v\left( k \right) }\left(
            \left( \prod_{i \in N_c\left( j \right)} \tilde{x}_i \right)^2
                - \prod_{i\in N_c\left( j \right) } \tilde{x}_i \right) 
.\end{align*}
%
Since the products
$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$
are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be
precomputed.
Defining%
%
\begin{align*}
    \boldsymbol{p} := \begin{bmatrix}
        \prod_{i\in N_c\left( 1 \right) } \tilde{x}_i \\
        \vdots \\
        \prod_{i\in N_c\left( m \right) } \tilde{x}_i \\
    \end{bmatrix}
    \hspace{5mm}
    \text{and}
    \hspace{5mm}
    \boldsymbol{v} := \boldsymbol{p}^{\circ 2} - \boldsymbol{p}
,\end{align*}
%
the gradient can be written as%
%
\begin{align*}
    \nabla h\left( \tilde{\boldsymbol{x}} \right) =
        4\left( \tilde{\boldsymbol{x}}^{\circ 3} - \tilde{\boldsymbol{x}} \right)
        + 2\tilde{\boldsymbol{x}}^{\circ -1} \circ \boldsymbol{H}^\text{T}
            \boldsymbol{v}
,\end{align*}
%
enabling the computation of the gradient primarily with element-wise
operations and matrix-vector multiplication.
This is beneficial, as the libraries used for the implementation are
heavily optimized for such calculations (e.g., through vectorization of the
operations).
\todo{Note about how the equation with which the gradient is calculated is
itself similar to a message-passing rule}

The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
compute, as it amounts to simply clipping each component of the vector onto
$[-\eta, \eta]$ individually.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Analysis and Simulation Results}%
\label{sec:prox:Analysis and Simulation Results}

In this section, the general behaviour of the proximal decoding algorithm is
analyzed.
The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is
examined.
The decoding performance is assessed on the basis of the \ac{BER} and the
\ac{FER} as well as the \textit{decoding failure rate} - the rate at which
the algorithm produces invalid results.
The convergence properties are reviewed and related to the decoding
performance.
Finally, the computational performance is examined on a theoretical basis
as well as on the basis of the implementation completed in the context of this
work.

All simulation results presented hereafter are based on Monte Carlo
simulations.
The \ac{BER} and \ac{FER} curves in particular have been generated by
producing at least 100 frame-errors for each data point, unless otherwise
stated.
\todo{Mention number of datapoints from which each graph was created for
non ber and fer curves}


\subsection{Choice of Parameters}

First, the effect of the parameter $\gamma$ is investigated.
Figure \ref{fig:prox:results} shows a comparison of the decoding performance
of the proximal decoding algorithm as presented by Wadayama et al. in
\cite{proximal_paper} and the implementation realized for this work.
\noindent The \ac{BER} curves for three different choices of the parameter
$\gamma$ are shown, as well as the \ac{BER} curve resulting from decoding
using \ac{BP}, as a reference.
The results from Wadayama et al. are shown with solid lines,
while the newly generated ones are shown with dashed lines.

\begin{figure}[H]
    \centering

    \begin{tikzpicture}
        \begin{axis}[grid=both, grid style={line width=.1pt},
                     xlabel={$E_b / N_0$ (dB)}, ylabel={BER},
                     ymode=log,
                     legend style={at={(0.5,-0.7)},anchor=south},
                     width=0.6\textwidth,
                     height=0.45\textwidth,
                     ymax=1.2, ymin=0.8e-4,
                     xtick={1, 2, ..., 5},
                     xmin=0.9, xmax=5.6,
                     legend columns=2,]

            \addplot [ForestGreen, mark=*, line width=1pt]
                table [x=SNR, y=gamma_0_15, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{$\gamma = 0.15$ (Wadayama et al.)}
            \addplot [ForestGreen, mark=triangle, dashed, line width=1pt]
                table [x=SNR, y=BER, col sep=comma,
                            discard if not={gamma}{0.15},
                            discard if gt={SNR}{5.5},]
                {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$ (Own results)}
            
            \addplot [NavyBlue, mark=*, line width=1pt]
                table [x=SNR, y=gamma_0_01, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{$\gamma = 0.01$ (Wadayama et al.)}
            \addplot [NavyBlue, mark=triangle, dashed, line width=1pt]
                table [x=SNR, y=BER, col sep=comma,
                            discard if not={gamma}{0.01},
                            discard if gt={SNR}{5.5},]
                {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$ (Own results)}

            \addplot [RedOrange, mark=*, line width=1pt]
                table [x=SNR, y=gamma_0_05, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{$\gamma = 0.05$ (Wadayama et al.)}
            \addplot [RedOrange, mark=triangle, dashed, line width=1pt]
                table [x=SNR, y=BER, col sep=comma,
                            discard if not={gamma}{0.05},
                            discard if gt={SNR}{5.5},]
                {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$ (Own results)}

            \addplot [RoyalPurple, mark=*, line width=1pt]
                table [x=SNR, y=BP, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{BP (Wadayama et al.)}
        \end{axis}
    \end{tikzpicture}
    
    \caption{Comparison of datapoints from Wadayama et al. with own simulation results%
        \protect\footnotemark{}}
    \label{fig:prox:results}
\end{figure}
%
\footnotetext{(3,6) regular LDPC code with $n = 204$, $k = 102$
    \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
}%
%
\noindent It is noticeable that for a moderately chosen value of $\gamma$
($\gamma = 0.05$) the decoding performance is better than for low
($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
The question arises if there is some optimal value maximazing the decoding
performance, especially since it seems to dramatically depend on $\gamma$.
To better understand how $\gamma$ and the decoding performance are
related, figure \ref{fig:prox:results} was recreated, but with a considerably
larger selection of values for $\gamma$.
In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
stacking the \ac{BER} curves on top of one another in the same plot, the
visualization is extended to three dimensions.
The previously shown results are highlighted.

Evidently, while the decoding performance does depend on the value of
$\gamma$, there is no single optimal value offering optimal performance, but
rather a certain interval in which it stays largely unchanged.
When examining a number of different codes (figure
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
landscape of the graph depends on the code, the general behaviour is the same
in each case.

\begin{figure}[H]
    \centering

    \begin{tikzpicture}

        \begin{axis}[view={75}{30},
                     zmode=log,
                     xlabel={$E_b / N_0$ (dB)},
                     ylabel={$\gamma$},
                     zlabel={BER},
                     %legend pos=outer north east,
                     legend style={at={(0.5,-0.7)},anchor=south},
                     ytick={0, 0.05, 0.1, 0.15},
                     width=0.6\textwidth,
                     height=0.45\textwidth,]

            \addplot3[surf,
                      mesh/rows=17, mesh/cols=14,
                      colormap/viridis] table [col sep=comma,
                                               x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = \left[ 0\text{:}0.01\text{:}0.16 \right] $}
            \addplot3[NavyBlue, line width=1.5] table [col sep=comma,
                                                   discard if not={gamma}{0.01},
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot3[RedOrange, line width=1.5] table [col sep=comma,
                                                  discard if not={gamma}{0.05},
                                                  x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
            \addplot3[ForestGreen, line width=1.5] table [col sep=comma,
                                                    discard if not={gamma}{0.15},
                                                    x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
        \end{axis}
    \end{tikzpicture}

    \caption{Visualization of relationship between the decoding performance\protect\footnotemark{}
        and the parameter $\gamma$}
    \label{fig:prox:results_3d}
\end{figure}%
%
\footnotetext{(3,6) regular LDPC code with n = 204, k = 102 \cite[\text{204.33.484}]{mackay_enc};
    $\omega = 0.05, K=200, \eta=1.5$
}%
%
\noindent This indicates \todo{This is a result fit for the conclusion}
that while the choice of the parameter $\gamma$ significantly
affects the decoding performance, there is not much benefit attainable in
undertaking an extensive search for an exact optimum.
Rather, a preliminary examination providing a rough window for $\gamma$ may
be sufficient.

TODO: $\omega, K$

Changing the parameter $\eta$ does not appear to have a significant effect on
the decoding performance when keeping the value within a reasonable window
(''slightly larger than one``, as stated in \cite[Sec. 3.2]{proximal_paper}),
which seems plausible considering its only function is ensuring numerical stability.

Summarizing the above considerations, \ldots

\begin{itemize}
    \item Conclusion: Number of iterations independent of \ac{SNR}
\end{itemize}

\begin{figure}[H]
    \centering

    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=96, k=48$
            \cite[\text{96.3.965}]{mackay_enc}}
    \end{subfigure}%
    \hfill
    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{BCH code with $n=31, k=26$\\[2\baselineskip]}
    \end{subfigure}

    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b/N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=14,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=204, k=102$
            \cite[\text{204.33.484}]{mackay_enc}}
    \end{subfigure}%
    \hfill
    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 5, 10 \right)$-regular LDPC code with $n=204, k=102$
            \cite[\text{204.55.187}]{mackay_enc}}
    \end{subfigure}%

    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=408, k=204$
            \cite[\text{408.33.844}]{mackay_enc}}
    \end{subfigure}%
    \hfill
    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{LDPC code (Progressive Edge Growth Construction) with $n=504, k=252$
            \cite[\text{PEGReg252x504}]{mackay_enc}}
    \end{subfigure}%

    \vspace{1cm}

    \begin{subfigure}[c]{\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[hide axis,
                         xmin=10, xmax=50,
                         ymin=0, ymax=0.4,
                         legend style={draw=white!15!black,legend cell align=left}]
                \addlegendimage{surf, colormap/viridis}
                \addlegendentry{$\gamma = \left[ 0\text{ : }0.01\text{ : }0.16 \right] $};
                \addlegendimage{NavyBlue, line width=1.5pt}
                \addlegendentry{$\gamma = 0.01$};
                \addlegendimage{RedOrange, line width=1.5pt}
                \addlegendentry{$\gamma = 0.05$};
                \addlegendimage{ForestGreen, line width=1.5pt}
                \addlegendentry{$\gamma = 0.15$};
            \end{axis}
        \end{tikzpicture}

    \end{subfigure}

    \caption{BER for $\omega = 0.05, K=100$ (different codes)}
    \label{fig:prox:results_3d_multiple}
\end{figure}

\subsection{Decoding Performance}

\begin{figure}[H]
    \centering

    \begin{tikzpicture}
        \begin{axis}[grid=both,
                     xlabel={$E_b / N_0$ (dB)}, ylabel={BER},
                     ymode=log,
                     width=0.48\textwidth,
                     height=0.36\textwidth,
                     legend style={at={(0.05,0.05)},anchor=south west},
                     ymax=1.5, ymin=3e-7,]

            \addplot [ForestGreen, mark=*]
                table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.15}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
            \addplot [NavyBlue, mark=*]
                table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.01}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot [RedOrange, mark=*]
                table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.05}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
        \end{axis}
    \end{tikzpicture}%
    \hfill%
    \begin{tikzpicture}
        \begin{axis}[grid=both,
                     xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                     ymode=log,
                     width=0.48\textwidth,
                     height=0.36\textwidth,
                     legend style={at={(0.05,0.05)},anchor=south west},
                     ymax=1.5, ymin=3e-7,]

            \addplot [ForestGreen, mark=*]
                 table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.15}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
            \addplot [NavyBlue, mark=*]
                 table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.01}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot [RedOrange, mark=*]
                 table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.05}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
        \end{axis}
    \end{tikzpicture}\\[1em]
    \begin{tikzpicture}
        \begin{axis}[grid=both,
                     xlabel={$E_b / N_0$ (dB)}, ylabel={Decoding Failure Rate},
                     ymode=log,
                     width=0.48\textwidth,
                     height=0.36\textwidth,
                     legend style={at={(0.05,0.05)},anchor=south west},
                     ymax=1.5, ymin=3e-7,]

            \addplot [ForestGreen, mark=*]
                 table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.15}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
            \addplot [NavyBlue, mark=*]
                 table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.01}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot [RedOrange, mark=*]
                 table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.05}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
        \end{axis}
    \end{tikzpicture}

    \caption{Comparison of \ac{FER}, \ac{BER} and decoding failure rate\protect\footnotemark{}}
    \label{fig:prox:ber_fer_dfr}
\end{figure}%
%
\footnotetext{(3,6) regular LDPC code with n = 204, k = 102
    \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=100, \eta=1.5$
}%
%

Until now, only the \ac{BER} has been considered to gauge the decoding
performance.
The \ac{FER}, however, shows considerably worse behaviour, as can be seen in
figure \ref{fig:prox:ber_fer_dfr}.
Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
\textit{decoding failure rate}.
This is the rate at which the iterative process produces invalid codewords,
i.e., the stopping criterion (line 6 of algorithm \ref{alg:prox}) is never
satisfied and the maximum number of itertations $K$ is reached without
converging to a valid codeword.
Three lines are plotted in each case, corresponding to different values of
the parameter $\gamma$.
The values chosen are the same as in figure \ref{fig:prox:results}, as they
seem to adequately describe the behaviour across a wide range of values
(see figure \ref{fig:prox:results_3d}).

It is apparent that the \ac{FER} and the decoding failure rate are extremely
similar, especially for higher \acp{SNR}.
This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors
arise mainly due to the non-convergence of the algorithm instead of
convergence to the wrong codeword.
This course of thought will be picked up in section
\ref{sec:prox:Improved Implementation} to try to improve the algorithm.

In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding
performance.
The decoding failure rate closely resembles the \ac{FER}, suggesting that
the frame errors may largely be attributed to decoding failures.

\todo{Maybe reference to the structure of the algorithm (1 part likelihood
1 part constraints)}


\subsection{Convergence Properties}

The previous observation, that the \ac{FER} arises mainly due to the
non-convergence of the algorithm instead of convergence to the wrong codeword,
raises the question why the decoding process does not converge so often.
In figure \ref{fig:prox:convergence}, the iterative process is visualized.
In order to be able to simultaneously consider all components of the vectors
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
Each chart shows one component of the current estimates during a given
iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
as the gradients of the negative log-likelihood and the code-constraint
polynomial, which influence the next estimate.

\begin{figure}[H]
    \begin{minipage}[c]{0.25\textwidth}
        \centering

        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_1]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_1]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_1]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_2$}
                \addlegendentry{$\left(\nabla h \right)_2 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_2]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_2]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_2]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_3$}
                \addlegendentry{$\left(\nabla h \right)_3 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_3]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_3]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_3]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_4$}
                \addlegendentry{$\left(\nabla h \right)_4 $}
            \end{axis}
        \end{tikzpicture}
    \end{minipage}%
    \begin{minipage}[c]{0.5\textwidth}
        \vspace*{-1cm}
        \centering

        \begin{tikzpicture}[scale = 0.85, spy using outlines={circle, magnification=6,
                                                              connect spies}]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_0]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_0]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_0]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_1$}
                \addlegendentry{$\left(\nabla h \right)_1 $}
                
                \coordinate (spypoint) at (axis cs:100,0.53);
                \coordinate (magnifyglass) at (axis cs:175,2);
            \end{axis}
            \spy [black, size=2cm] on (spypoint)
               in node[fill=white] at (magnifyglass);
        \end{tikzpicture}
    \end{minipage}%
    \begin{minipage}[c]{0.25\textwidth}
        \centering

        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_4]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_4]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_4]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_5$}
                \addlegendentry{$\left(\nabla h \right)_5 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_5]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_5]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_5]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_6$}
                \addlegendentry{$\left(\nabla h \right)_6 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_6]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_6]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_6]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_7$}
                \addlegendentry{$\left(\nabla h \right)_7 $}
            \end{axis}
        \end{tikzpicture}
    \end{minipage}

    \caption{Internal variables of proximal decoder
        as a function of the number of iterations ($n=7$)\protect\footnotemark{}}
    \label{fig:prox:convergence}
\end{figure}%
%
\footnotetext{A single decoding is shown, using the BCH$\left( 7,4 \right) $ code;
    $\gamma = 0.05, \omega = 0.05, E_b / N_0 = \SI{5}{dB}$
}%
% 
\noindent It is evident that in all cases, past a certain number of
iterations, the estimate starts to oscillate around a particular value.
After a certain point, the two gradients stop further approaching the value
zero.
In particular, this leads to the code-constraints polynomial not being
minimized.
As such, the constraints are not being satisfied and the estimate is not
converging towards a valid codeword.

While figure \ref{fig:prox:convergence} shows only one instance of a decoding
task, with no statistical significance, it is indicative of the general
behaviour of the algorithm.
This can be justified by looking at the gradients themselves.
In figure \ref{fig:prox:gradients} the gradients of the negative
log-likelihood and the code-constraint polynomial for a repetition code with
$n=2$ are shown.
It is obvious that walking along the gradients in an alternating fashion will
produce a net movement in a certain direction, as long as the two gradients
have a common component.
As soon as this common component is exhausted, they will start pulling the
estimate in opposing directions, leading to an oscillation as illustrated
in figure \ref{fig:prox:convergence}.
Consequently, this oscillation is an intrinsic property of the structure of
the proximal decoding algorithm, where the two parts of the objective function
are minimized in an alternating manner by use of their gradients.%
%
\begin{figure}[H]
    \centering

    \begin{subfigure}[c]{0.5\textwidth}
        \centering

        \begin{tikzpicture}
            \begin{axis}[xmin = -1.25, xmax=1.25,
                         ymin = -1.25, ymax=1.25,
                         xlabel={$x_1$}, ylabel={$x_2$},
                         width=\textwidth,
                         height=0.75\textwidth,
                         grid=major, grid style={dotted},
                         view={0}{90}]

                \addplot3[point meta=\thisrow{grad_norm},
                          point meta min=1,
                          point meta max=3,
                          quiver={u=\thisrow{grad_0},
                                  v=\thisrow{grad_1},
                                  scale arrows=.05,
                                  every arrow/.append style={%
                                    line width=.3+\pgfplotspointmetatransformed/1000,
                                    -{Latex[length=0pt 5,width=0pt 3]}
                                  },
                              },
                            quiver/colored = {mapped color},
                            colormap/rocket,
                            -stealth,
                          ]
                            table[col sep=comma] {res/proximal/2d_grad_L.csv};
            \end{axis}
        \end{tikzpicture}

        \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $
            for a repetition code with $n=2$}
        \label{fig:prox:gradients:L}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}[c]{0.5\textwidth}
        \centering

        \begin{tikzpicture}
            \begin{axis}[xmin = -1.25, xmax=1.25,
                         ymin = -1.25, ymax=1.25,
                         xlabel={$x_1$}, ylabel={$x_2$},
                         grid=major, grid style={dotted},
                         width=\textwidth,
                         height=0.75\textwidth,
                         view={0}{90}]
                \addplot3[point meta=\thisrow{grad_norm},
                          point meta min=1,
                          point meta max=4,
                          quiver={u=\thisrow{grad_0},
                                  v=\thisrow{grad_1},
                                  scale arrows=.03,
                                  every arrow/.append style={%
                                    line width=.3+\pgfplotspointmetatransformed/1000,
                                    -{Latex[length=0pt 5,width=0pt 3]}
                                  },
                              },
                            quiver/colored = {mapped color},
                            colormap/rocket,
                            -stealth,
                          ]
                            table[col sep=comma] {res/proximal/2d_grad_h.csv};
            \end{axis}
        \end{tikzpicture}

        \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $
            for a repetition code with $n=2$}
        \label{fig:prox:gradients:h}
    \end{subfigure}%

    \caption{Gradiensts of the negative log-likelihood and the code-constraint
        polynomial}
    \label{fig:prox:gradients}
\end{figure}%
%
While the initial net movement is generally directed in the right direction
owing to the gradient of the negative log-likelihood, the final oscillation
may well take place in a segment of space not corresponding to a valid
codeword, leading to the aforementioned non-convergence of the algorithm.
This also partly explains the difference in decoding performance when looking
at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while
still yielding an invalid codeword.

The higher the \ac{SNR}, the more likely the gradient of the negative
log-likelihood is to point to a valid codeword.
The common component of the two gradient then pulls the estimate closer to
a valid codeword before the oscillation takes place.
This explains why the decoding performance is so much better for higher
\acp{SNR}.

When considering codes with larger $n$, the behaviour generally stays the
same, with some minor differences.
In figure \ref{fig:prox:convergence_large_n} the decoding process is
visualized for one component of a code with $n=204$, for a single decoding.
The two gradients still eventually oppose each other and the estimate still
starts to oscillate, the same as illustrated in figure
\ref{fig:prox:convergence} on the basis of a code with $n=7$.
However, in this case, the gradient of the code-constraint polynomial iself
starts to oscillate, its average value being such that the effect of the
gradient of the negative log-likelihood is counteracted.

Looking at figure \ref{fig:prox:gradients:h} it also becomes apparent why the
value of the parameter $\gamma$ has to be kept small, as mentioned in section
\ref{sec:prox:Decoding Algorithm}.
Local minima are introduced between the codewords, in the ares in which it is
not immediately clear which codeword is the most likely one.
Raising the value of $\gamma$ results in
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
objective function, thereby introducing these local minima into the objective
function.

In conclusion, as a general rule, the proximal decoding algorithm reaches
an oscillatory state which it cannot escape as a consequence of its structure.
In this state, the constraints may not be satisfied, leading to the algorithm
returning an invalid codeword.

\begin{figure}[H]
    \centering

    \begin{tikzpicture}
        \begin{axis}[
            grid=both,
            xlabel={Iterations},
            width=0.6\textwidth,
            height=0.45\textwidth,
            scale only axis,
            xtick={0, 100, ..., 400},
            xticklabels={0, 50, ..., 200},
        ]
            \addplot [NavyBlue, mark=none, line width=1]
                table [col sep=comma, x=k, y=comb_r_s_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addplot [ForestGreen, mark=none, line width=1]
                table [col sep=comma, x=k, y=grad_L_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addplot [RedOrange, mark=none, line width=1]
                table [col sep=comma, x=k, y=grad_h_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addlegendentry{est}
            \addlegendentry{$\left(\nabla L\right)_1$}
            \addlegendentry{$\left(\nabla h\right)_1$}
        \end{axis}
    \end{tikzpicture}

    \caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
    \label{fig:prox:convergence_large_n}
\end{figure}%


\subsection{Computational Performance}

\begin{itemize}
    \item Theoretical analysis
    \item Simulation results to substantiate theoretical analysis
    \item Conclusion: $\mathcal{O}\left( n \right)$ time complexity, implementation heavily
        optimizable
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Improved Implementation}%
\label{sec:prox:Improved Implementation}

As mentioned earlier, frame errors seem to mainly stem from decoding failures.
Coupled with the fact that the \ac{BER} indicates so much better
performance than the \ac{FER}, this leads to the assumption that only a small
number of components of the estimated vector may be responsible for an invalid
result.
If it was possible to limit the number of possibly wrong components of the
estimate to a small subset, an \ac{ML}-decoding step could be performed on
a limited number of possible results (``ML-in-the-List'' as it will
subsequently be called) to improve the decoding performance.
This concept is pursued in this section.

First, a guideline has to be found with which to assess the probability that
a given component of an estimate is wrong.
One compelling observation is that the closer an estimate is to being at a
valid codeword, the smaller the magnitude of the gradient of the
code-constraint polynomial, as illustrated in figure \ref{fig:prox:gradients}.
This gives rise to the notion that some property or behaviour of
$\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
magnitude to the confidence that a given bit is correct.
And indeed, the magnitude of the oscillation of
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced in a previous
section) and the probability of having a bit error are strongly correlated,
a relationship depicted in figure \ref{fig:prox:correlation}.

TODO: Figure

\noindent The y-axis depicts whether there is a bit error and the x-axis the
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration
$k=100$. While this is not exactly the magnitude of the oscillation, it is
proportional and easier to compute.
The datapoints are taken from a single decoding operation
\todo{Generate same figure with multiple decodings}.

Using this observation as a rule to determine the $N\in\mathbb{N}$ most
probably wrong bits, all variations of the estimate with those bits modified
can be generated.
An \ac{ML}-in-the-List step can then be performed in order to determine the
most likely candidate.
This process is outlined in figure \ref{fig:prox:improved_algorithm}.

Figure \ref{fig:prox:improved_results} shows the gain that can be achieved.
Again, three values of gamma are chosen, for which the \ac{BER}, \ac{FER}
and decoding failure rate is plotted.
The simulation results for the original proximal decoding algorithm are shown
with solid lines and the results for the improved version are shown with
dashed lines.
The gain seems to depend on the value of $\gamma$, as well as become more
pronounced for higher \ac{SNR} values.
This is to be expected, since with higher \ac{SNR} values the number of bit
errors decreases, making the correction of those errors in the ML-in-the-List
step more likely.
In figure \ref{fig:prox:improved_results_multiple} the decoding performance
between proximal decoding and the improved algorithm is compared for a number
of different codes.
Similar behaviour can be observed in all cases, with varying improvement over
standard proximal decoding.

Interestingly, the improved algorithm does not have much different time
complexity than proximal decoding.
This is the case, because the ML-in-the-List step is only performed when the
proximal decoding algorithm produces an invalid result, which in absolute
terms happens relatively infrequently.
This is illustrated in figure \ref{fig:prox:time_complexity_comp}, where the
average time needed to decode a single received frame is visualized for
proximal decoding as well as for the improved algorithm.

In conclusion, the decoding performance of proximal decoding can be improved
by appending an ML-in-the-List step when the algorithm does not produce a
valid result.
The gain is in some cases as high as $\SI{1}{dB}$ and can be achieved with
negligible computational performance penalty.
The improvement is mainly noticable for higher \ac{SNR} values and depends on
the code as well as the chosen parameters.