ba-thesis/latex/thesis/chapters/proximal_decoding.tex

\chapter{Proximal Decoding}%
\label{chapter:proximal_decoding}

In this chapter, the proximal decoding algorithm is examined.
First, the algorithm itself is described.
Then, some interesting ideas concerning the implementation are presented.
Simulation results are shown, on the basis of which the behaviour of the
algorithm is investigated for different codes and parameters.
Finally, an improvement on proximal decoding is proposed.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Decoding Algorithm}%
\label{sec:prox:Decoding Algorithm}

Proximal decoding was proposed by Wadayama et. al as a novel formulation of
optimization-based decoding \cite{proximal_paper}.
With this algorithm, minimization is performed using the proximal gradient
method.
In contrast to \ac{LP} decoding, the objective function is based on a
non-convex optimization formulation of the \ac{MAP} decoding problem.

In order to derive the objective function, the authors begin with the
\ac{MAP} decoding rule, expressed as a continuous maximization problem%
\footnote{The expansion of the domain to be continuous doesn't constitute a
material difference in the meaning of the rule.
The only change is that what previously were \acp{PMF} now have to be expressed
in terms of \acp{PDF}.}
over $\boldsymbol{x}$:%
%
\begin{align}
    \hat{\boldsymbol{x}} = \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}}
        f_{\tilde{\boldsymbol{X}} \mid \boldsymbol{Y}}
        \left( \tilde{\boldsymbol{x}} \mid \boldsymbol{y} \right)
        = \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}} f_{\boldsymbol{Y}
            \mid \tilde{\boldsymbol{X}}}
        \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
        f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)%
    \label{eq:prox:vanilla_MAP}
.\end{align}%
%
The likelihood $f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}
\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $ is a known function
determined by the channel model.
The prior \ac{PDF} $f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)$ is also
known, as the equal probability assumption is made on
$\mathcal{C}$.
However, since the considered domain is continuous,
the prior \ac{PDF} cannot be ignored as a constant during the minimization
as is often done, and has a rather unwieldy representation:%
%
\begin{align}
    f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right) =
        \frac{1}{\left| \mathcal{C} \right| }
            \sum_{\boldsymbol{c} \in \mathcal{C} }
                \delta\big( \tilde{\boldsymbol{x}} - \left( -1 \right) ^{\boldsymbol{c}}\big)
    \label{eq:prox:prior_pdf}
.\end{align}%
%
In order to rewrite the prior \ac{PDF}
$f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)$,
the so-called \textit{code-constraint polynomial} is introduced as:%
%
\begin{align*}
    h\left( \tilde{\boldsymbol{x}} \right) =
        \underbrace{\sum_{i=1}^{n} \left( \tilde{x_i}^2-1 \right) ^2}_{\text{Bipolar constraint}}
        + \underbrace{\sum_{j=1}^{m} \left[
            \left( \prod_{i\in N_c \left( j \right) } \tilde{x_i} \right)
        -1 \right] ^2}_{\text{Parity constraint}}%
.\end{align*}%
%
The intention of this function is to provide a way to penalize vectors far
from a codeword and favor those close to one.
In order to achieve this, the polynomial is composed of two parts: one term
representing the bipolar constraint, providing for a discrete solution of the
continuous optimization problem, and one term representing the parity
constraints, accommodating the role of the parity-check matrix $\boldsymbol{H}$.
The prior \ac{PDF} is then approximated using the code-constraint polynomial as:%
%
\begin{align}
    f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)
    \approx \frac{1}{Z}\mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) }%
    \label{eq:prox:prior_pdf_approx}
.\end{align}%
%
The authors justify this approximation by arguing, that for
$\gamma \rightarrow \infty$, the approximation in equation
(\ref{eq:prox:prior_pdf_approx}) approaches the original function in equation
(\ref{eq:prox:prior_pdf}).
This approximation can then be plugged into equation (\ref{eq:prox:vanilla_MAP})
and the likelihood can be rewritten using the negative log-likelihood
$L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) = -\ln\left(
        f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}\left(
        \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) \right) $:%
%
\begin{align*}
    \hat{\boldsymbol{x}} &= \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}}
            \mathrm{e}^{- L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) }
            \mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) } \\
        &= \argmin_{\tilde{\boldsymbol{x}} \in \mathbb{R}^n} \big(
            L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
            + \gamma h\left( \tilde{\boldsymbol{x}} \right)
            \big)%
.\end{align*}%
%
Thus, with proximal decoding, the objective function
$g\left( \tilde{\boldsymbol{x}} \right)$ considered is%
%
\begin{align}
    g\left( \tilde{\boldsymbol{x}} \right) = L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}}
    \right)
        + \gamma h\left( \tilde{\boldsymbol{x}} \right)%
    \label{eq:prox:objective_function}
\end{align}%
%
and the decoding problem is reformulated to%
%
\begin{align*}
    \text{minimize}\hspace{2mm}   &L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
        + \gamma h\left( \tilde{\boldsymbol{x}} \right)\\
    \text{subject to}\hspace{2mm} &\tilde{\boldsymbol{x}} \in \mathbb{R}^n
.\end{align*}
%

For the solution of the approximate \ac{MAP} decoding problem, the two parts
of equation (\ref{eq:prox:objective_function}) are considered separately:
the minimization of the objective function occurs in an alternating
fashion, switching between the negative log-likelihood
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
code-constraint polynomial $\gamma h\left( \boldsymbol{x} \right) $.
Two helper variables, $\boldsymbol{r}$ and $\boldsymbol{s}$, are introduced,
describing the result of each of the two steps.
The first step, minimizing the log-likelihood, is performed using gradient
descent:%
%
\begin{align}
    \boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \nabla
        L\left( \boldsymbol{y} \mid \boldsymbol{s} \right),
    \hspace{5mm}\omega > 0
    \label{eq:prox:step_log_likelihood}
.\end{align}%
%
For the second step, minimizing the scaled code-constraint polynomial, the
proximal gradient method is used and the \textit{proximal operator} of
$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed.
It is then immediately approximated with gradient-descent:%
%
\begin{align*}
    \textbf{prox}_{\gamma h} \left( \tilde{\boldsymbol{x}} \right) &\equiv
        \argmin_{\boldsymbol{t} \in \mathbb{R}^n}
            \left( \gamma h\left( \boldsymbol{t} \right) +
                \frac{1}{2} \lVert \boldsymbol{t} - \tilde{\boldsymbol{x}} \rVert \right)\\
        &\approx \tilde{\boldsymbol{x}} - \gamma \nabla h \left( \tilde{\boldsymbol{x}} \right),
    \hspace{5mm} \gamma > 0, \text{ small}
.\end{align*}%
%
The second step thus becomes%
%
\begin{align*}
    \boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right),
    \hspace{5mm}\gamma > 0,\text{ small}
.\end{align*}
%
While the approximation of the prior \ac{PDF} made in equation (\ref{eq:prox:prior_pdf_approx})
theoretically becomes better
with larger $\gamma$, the constraint that $\gamma$ be small is important,
as it keeps the effect of $h\left( \tilde{\boldsymbol{x}} \right) $ on the landscape
of the objective function small.
Otherwise, unwanted stationary points, including local minima, are introduced.
The authors say that ``in practice, the value of $\gamma$ should be adjusted
according to the decoding performance.'' \cite[Sec. 3.1]{proximal_paper}.

%The components of the gradient of the code-constraint polynomial can be computed as follows:%
%%
%\begin{align*}
%    \frac{\partial}{\partial x_k} h\left( \boldsymbol{x} \right) =
%        4\left( x_k^2 - 1 \right) x_k + \frac{2}{x_k}
%            \sum_{i\in \mathcal{B}\left( k \right) } \left(
%                \left( \prod_{j\in\mathcal{A}\left( i \right)} x_j\right)^2
%                - \prod_{j\in\mathcal{A}\left( i \right) }x_j \right)
%.\end{align*}%
%\todo{Only multiplication?}%
%\todo{$x_k$: $k$ or some other indexing variable?}%
%%
In the case of \ac{AWGN}, the likelihood
$f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}
    \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)$
is%
%
\begin{align*}
    f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}
        \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
        = \frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{e}^{
            -\frac{\lVert \boldsymbol{y}-\tilde{\boldsymbol{x}}
        \rVert^2 }
    {2\sigma^2}}
.\end{align*}
%
Thus, the gradient of the negative log-likelihood becomes%
\footnote{For the minimization, constants can be disregarded. For this reason,
it suffices to consider only proportionality instead of equality.}%
%
\begin{align*}
    \nabla L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
    &\propto -\nabla \lVert \boldsymbol{y} - \tilde{\boldsymbol{x}} \rVert^2\\
    &\propto \tilde{\boldsymbol{x}} - \boldsymbol{y}
,\end{align*}%
%
allowing equation (\ref{eq:prox:step_log_likelihood}) to be rewritten as%
%
\begin{align*}
    \boldsymbol{r} \leftarrow \boldsymbol{s}
        - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right)
.\end{align*}
%

One thing to consider during the actual decoding process, is that the gradient
of the code-constraint polynomial can take on extremely large values.
To avoid numerical instability, an additional step is added, where all
components of the current estimate are clipped to $\left[-\eta, \eta \right]$,
where $\eta$ is a positive constant slightly larger than one:%
%
\begin{align*}
    \boldsymbol{s} \leftarrow \Pi_{\eta} \left( \boldsymbol{r}
        - \gamma \nabla h\left( \boldsymbol{r} \right)  \right)
,\end{align*}
%
$\Pi_{\eta}\left( \cdot \right) $ expressing the projection onto
$\left[ -\eta, \eta \right]^n$.

The iterative decoding process resulting from these considerations is shown in
figure \ref{fig:prox:alg}.

\begin{figure}[H]
    \centering

    \begin{genericAlgorithm}[caption={}, label={}]
$\boldsymbol{s} \leftarrow \boldsymbol{0}$
for $K$ iterations do
    $\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $
    $\boldsymbol{s} \leftarrow \Pi_\eta \left(\boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right)$
    $\boldsymbol{\hat{x}} \leftarrow \text{sign}\left( \boldsymbol{s} \right) $
    if $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ do
        return $\boldsymbol{\hat{c}}$
    end if
end for
return $\boldsymbol{\hat{c}}$
    \end{genericAlgorithm}


    \caption{Proximal decoding algorithm for an \ac{AWGN} channel}
    \label{fig:prox:alg}
\end{figure}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Implementation Details}%
\label{sec:prox:Implementation Details}

The algorithm was first implemented in Python because of the fast development
process and straightforward debugging ability.
It was subsequently reimplemented in C++ using the Eigen%
\footnote{\url{https://eigen.tuxfamily.org}}
linear algebra library to achieve higher performance.
The focus has been set on a fast implementation, sometimes at the expense of
memory usage.
The evaluation of the simulation results has been wholly realized in Python.

The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
is given by%
%
\begin{align*}
    \nabla h\left( \tilde{\boldsymbol{x}} \right) &= \begin{bmatrix}
        \frac{\partial}{\partial \tilde{x}_1}h\left( \tilde{\boldsymbol{x}} \right) &
        \ldots &
        \frac{\partial}{\partial \tilde{x}_n}h\left( \tilde{\boldsymbol{x}} \right) &
    \end{bmatrix}^\text{T}, \\[1em]
    \frac{\partial}{\partial \tilde{x}_k}h\left( \tilde{\boldsymbol{x}} \right)
        &= 4\left( \tilde{x}_k^2 - 1 \right) \tilde{x}_k
        + \frac{2}{\tilde{x}_k} \sum_{j\in N_v\left( k \right) }\left(
            \left( \prod_{i \in N_c\left( j \right)} \tilde{x}_i \right)^2
                - \prod_{i\in N_c\left( j \right) } \tilde{x}_i \right)
.\end{align*}
%
Since the products
$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$
are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be
precomputed.
Defining%
%
\begin{align*}
    \boldsymbol{p} := \begin{bmatrix}
        \prod_{i\in N_c\left( 1 \right) } \tilde{x}_i \\
        \vdots \\
        \prod_{i\in N_c\left( m \right) } \tilde{x}_i \\
    \end{bmatrix}
    \hspace{5mm}
    \text{and}
    \hspace{5mm}
    \boldsymbol{v} := \boldsymbol{p}^{\circ 2} - \boldsymbol{p}
,\end{align*}
%
the gradient can be written as%
%
\begin{align*}
    \nabla h\left( \tilde{\boldsymbol{x}} \right) =
        4\left( \tilde{\boldsymbol{x}}^{\circ 3} - \tilde{\boldsymbol{x}} \right)
        + 2\tilde{\boldsymbol{x}}^{\circ -1} \circ \boldsymbol{H}^\text{T}
            \boldsymbol{v}
,\end{align*}
%
enabling the computation of the gradient primarily with element-wise
operations and matrix-vector multiplication.
This is beneficial, as the libraries used for the implementation are
heavily optimized for such calculations (e.g., through vectorization of the
operations).
\todo{Note about how the equation with which the gradient is calculated is
itself similar to a message-passing rule}

The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
compute, as it amounts to simply clipping each component of the vector onto
$[-\eta, \eta]$ individually.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Analysis and Simulation Results}%
\label{sec:prox:Analysis and Simulation Results}

In this section, the general behaviour of the proximal decoding algorithm is
analyzed.
The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is
examined.
The decoding performance is assessed on the basis of the \ac{BER} and the
\ac{FER} as well as the \textit{decoding failure rate} - the rate at which
the algorithm produces invalid results.
The convergence properties are reviewed and related to the decoding
performance.
Finally, the computational performance is examined on a theoretical basis
as well as on the basis of the implementation completed in the context of this
work.

All simulation results presented hereafter are based on Monte Carlo
simulations.
The \ac{BER} and \ac{FER} curves in particular have been generated by
producing at least 100 frame-errors for each data point, unless otherwise
stated.
\todo{Mention number of datapoints from which each graph was created for
non ber and fer curves}


\subsection{Choice of Parameters}

First, the effect of the parameter $\gamma$ is investigated.
Figure \ref{fig:prox:results} shows a comparison of the decoding performance
of the proximal decoding algorithm as presented by Wadayama et al. in
\cite{proximal_paper} and the implementation realized for this work.
\noindent The \ac{BER} curves for three different choices of the parameter
$\gamma$ are shown, as well as the \ac{BER} curve resulting from decoding
using \ac{BP}, as a reference.
The results from Wadayama et al. are shown with solid lines,
while the newly generated ones are shown with dashed lines.

\begin{figure}[H]
    \centering

    \begin{tikzpicture}
        \begin{axis}[grid=both, grid style={line width=.1pt},
                     xlabel={$E_b / N_0$ (dB)}, ylabel={BER},
                     ymode=log,
                     legend style={at={(0.5,-0.7)},anchor=south},
                     width=0.6\textwidth,
                     height=0.45\textwidth,
                     ymax=1.2, ymin=0.8e-4,
                     xtick={1, 2, ..., 5},
                     xmin=0.9, xmax=5.6,
                     legend columns=2,]

            \addplot [ForestGreen, mark=*, line width=1pt]
                table [x=SNR, y=gamma_0_15, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{$\gamma = 0.15$ (Wadayama et al.)}
            \addplot [ForestGreen, mark=triangle, dashed, line width=1pt]
                table [x=SNR, y=BER, col sep=comma,
                            discard if not={gamma}{0.15},
                            discard if gt={SNR}{5.5},]
                {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$ (Own results)}

            \addplot [NavyBlue, mark=*, line width=1pt]
                table [x=SNR, y=gamma_0_01, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{$\gamma = 0.01$ (Wadayama et al.)}
            \addplot [NavyBlue, mark=triangle, dashed, line width=1pt]
                table [x=SNR, y=BER, col sep=comma,
                            discard if not={gamma}{0.01},
                            discard if gt={SNR}{5.5},]
                {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$ (Own results)}

            \addplot [RedOrange, mark=*, line width=1pt]
                table [x=SNR, y=gamma_0_05, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{$\gamma = 0.05$ (Wadayama et al.)}
            \addplot [RedOrange, mark=triangle, dashed, line width=1pt]
                table [x=SNR, y=BER, col sep=comma,
                            discard if not={gamma}{0.05},
                            discard if gt={SNR}{5.5},]
                {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$ (Own results)}

            \addplot [RoyalPurple, mark=*, line width=1pt]
                table [x=SNR, y=BP, col sep=comma] {res/proximal/ber_paper.csv};
            \addlegendentry{BP (Wadayama et al.)}
        \end{axis}
    \end{tikzpicture}

    \caption{Comparison of datapoints from Wadayama et al. with own simulation results%
        \protect\footnotemark{}}
    \label{fig:prox:results}
\end{figure}
%
\footnotetext{(3,6) regular LDPC code with $n = 204$, $k = 102$
    \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
}%
%
\noindent It is noticeable that for a moderately chosen value of $\gamma$
($\gamma = 0.05$) the decoding performance is better than for low
($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
The question arises if there is some optimal value maximazing the decoding
performance, especially since the decoding performance seems to dramatically
depend on $\gamma$.
To better understand how $\gamma$ and the decoding performance are
related, figure \ref{fig:prox:results} was recreated, but with a considerably
larger selection of values for $\gamma$.
In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
stacking the \ac{BER} curves on top of one another in the same plot, the
visualization is extended to three dimensions.
The previously shown results are highlighted.

Evidently, while the decoding performance does depend on the value of
$\gamma$, there is no single optimal value offering optimal performance, but
rather a certain interval in which it stays largely unchanged.
When examining a number of different codes (figure
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
landscape of the graph depends on the code, the general behaviour is the same
in each case.

\begin{figure}[H]
    \centering

    \begin{tikzpicture}

        \begin{axis}[view={75}{30},
                     zmode=log,
                     xlabel={$E_b / N_0$ (dB)},
                     ylabel={$\gamma$},
                     zlabel={BER},
                     %legend pos=outer north east,
                     legend style={at={(0.5,-0.7)},anchor=south},
                     ytick={0, 0.05, 0.1, 0.15},
                     width=0.6\textwidth,
                     height=0.45\textwidth,]

            \addplot3[surf,
                      mesh/rows=17, mesh/cols=14,
                      colormap/viridis] table [col sep=comma,
                                               x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = \left[ 0\text{:}0.01\text{:}0.16 \right] $}
            \addplot3[NavyBlue, line width=1.5] table [col sep=comma,
                                                   discard if not={gamma}{0.01},
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot3[RedOrange, line width=1.5] table [col sep=comma,
                                                  discard if not={gamma}{0.05},
                                                  x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
            \addplot3[ForestGreen, line width=1.5] table [col sep=comma,
                                                    discard if not={gamma}{0.15},
                                                    x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
        \end{axis}
    \end{tikzpicture}

    \caption{Visualization of relationship between the decoding performance\protect\footnotemark{}
        and the parameter $\gamma$}
    \label{fig:prox:results_3d}
\end{figure}%
%
\footnotetext{(3,6) regular LDPC code with n = 204, k = 102 \cite[\text{204.33.484}]{mackay_enc};
    $\omega = 0.05, K=200, \eta=1.5$
}%
%
\noindent This indicates \todo{This is a result fit for the conclusion}
that while the choice of the parameter $\gamma$ significantly
affects the decoding performance, there is not much benefit attainable in
undertaking an extensive search for an exact optimum.
Rather, a preliminary examination providing a rough window for $\gamma$ may
be sufficient.

TODO: $\omega, K$

Changing the parameter $\eta$ does not appear to have a significant effect on
the decoding performance when keeping the value within a reasonable window
(''slightly larger than one``, as stated in \cite[Sec. 3.2]{proximal_paper}),
which seems plausible considering its only function is ensuring numerical stability.

Summarizing the above considerations, \ldots

\begin{itemize}
    \item Conclusion: Number of iterations independent of \ac{SNR}
\end{itemize}

\begin{figure}[H]
    \centering

    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_963965.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=96, k=48$
            \cite[\text{96.3.965}]{mackay_enc}}
    \end{subfigure}%
    \hfill
    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_bch_31_26.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{BCH code with $n=31, k=26$\\[2\baselineskip]}
    \end{subfigure}

    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b/N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=14,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=204, k=102$
            \cite[\text{204.33.484}]{mackay_enc}}
    \end{subfigure}%
    \hfill
    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_20455187.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 5, 10 \right)$-regular LDPC code with $n=204, k=102$
            \cite[\text{204.55.187}]{mackay_enc}}
    \end{subfigure}%

    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                            {res/proximal/2d_ber_fer_dfr_40833844.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=408, k=204$
            \cite[\text{408.33.844}]{mackay_enc}}
    \end{subfigure}%
    \hfill
    \begin{subfigure}[c]{0.48\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[view={75}{30},
                         zmode=log,
                         xlabel={$E_b / N_0$ (dB)},
                         ylabel={$\gamma$},
                         zlabel={BER},
                         width=\textwidth,
                         height=0.75\textwidth,]
                \addplot3[surf,
                          mesh/rows=17, mesh/cols=10,
                          colormap/viridis] table [col sep=comma,
                                                   x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
                \addplot3[RedOrange, line width=1.5] table[col sep=comma,
                                                     discard if not={gamma}{0.05},
                                                     x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
                \addplot3[NavyBlue, line width=1.5] table[col sep=comma,
                                                      discard if not={gamma}{0.01},
                                                      x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
                \addplot3[ForestGreen, line width=1.5] table[col sep=comma,
                                                       discard if not={gamma}{0.15},
                                                       x=SNR, y=gamma, z=BER]
                                        {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{LDPC code (Progressive Edge Growth Construction) with $n=504, k=252$
            \cite[\text{PEGReg252x504}]{mackay_enc}}
    \end{subfigure}%

    \vspace{1cm}

    \begin{subfigure}[c]{\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[hide axis,
                         xmin=10, xmax=50,
                         ymin=0, ymax=0.4,
                         legend style={draw=white!15!black,legend cell align=left}]
                \addlegendimage{surf, colormap/viridis}
                \addlegendentry{$\gamma = \left[ 0\text{ : }0.01\text{ : }0.16 \right] $};
                \addlegendimage{NavyBlue, line width=1.5pt}
                \addlegendentry{$\gamma = 0.01$};
                \addlegendimage{RedOrange, line width=1.5pt}
                \addlegendentry{$\gamma = 0.05$};
                \addlegendimage{ForestGreen, line width=1.5pt}
                \addlegendentry{$\gamma = 0.15$};
            \end{axis}
        \end{tikzpicture}

    \end{subfigure}

    \caption{BER for $\omega = 0.05, K=100$ (different codes)}
    \label{fig:prox:results_3d_multiple}
\end{figure}

\subsection{Decoding Performance}

\begin{figure}[H]
    \centering

    \begin{tikzpicture}
        \begin{axis}[grid=both,
                     xlabel={$E_b / N_0$ (dB)}, ylabel={BER},
                     ymode=log,
                     width=0.48\textwidth,
                     height=0.36\textwidth,
                     legend style={at={(0.05,0.05)},anchor=south west},
                     ymax=1.5, ymin=3e-7,]

            \addplot [ForestGreen, mark=*]
                table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.15}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
            \addplot [NavyBlue, mark=*]
                table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.01}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot [RedOrange, mark=*]
                table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.05}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
        \end{axis}
    \end{tikzpicture}%
    \hfill%
    \begin{tikzpicture}
        \begin{axis}[grid=both,
                     xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
                     ymode=log,
                     width=0.48\textwidth,
                     height=0.36\textwidth,
                     legend style={at={(0.05,0.05)},anchor=south west},
                     ymax=1.5, ymin=3e-7,]

            \addplot [ForestGreen, mark=*]
                 table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.15}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
            \addplot [NavyBlue, mark=*]
                 table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.01}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot [RedOrange, mark=*]
                 table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.05}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
        \end{axis}
    \end{tikzpicture}\\[1em]
    \begin{tikzpicture}
        \begin{axis}[grid=both,
                     xlabel={$E_b / N_0$ (dB)}, ylabel={Decoding Failure Rate},
                     ymode=log,
                     width=0.48\textwidth,
                     height=0.36\textwidth,
                     legend style={at={(0.05,0.05)},anchor=south west},
                     ymax=1.5, ymin=3e-7,]

            \addplot [ForestGreen, mark=*]
                 table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.15}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.15$}
            \addplot [NavyBlue, mark=*]
                 table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.01}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.01$}
            \addplot [RedOrange, mark=*]
                 table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.05}]
                    {res/proximal/2d_ber_fer_dfr_20433484.csv};
            \addlegendentry{$\gamma = 0.05$}
        \end{axis}
    \end{tikzpicture}

    \caption{Cmoparison\protect\footnotemark{} of \ac{FER}, \ac{BER} and
        decoding failure rate; $\omega = 0.05, K=100$}
    \label{fig:prox:ber_fer_dfr}
\end{figure}%
%
\footnotetext{(3,6) regular LDPC code with n = 204, k = 102 \cite[\text{204.33.484}]{mackay_enc}}%
%

Until now, only the \ac{BER} has been considered to assess the decoding
performance.
The \ac{FER}, however, shows considerably worse behaviour, as can be seen in
figure \ref{fig:prox:ber_fer_dfr}.
Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
\textit{decoding failure rate}.
This is the rate at which the iterative process produces invalid codewords,
i.e., the stopping criterion (line 6 of algorithm \ref{TODO}) is never
satisfied and the maximum number of itertations $K$ is reached without
converging to a valid codeword.
Three lines are plotted in each case, corresponding to different values of
the parameter $\gamma$.
The values chosen are the same as in figure \ref{fig:prox:results}, as they
seem to adequately describe the behaviour across a wide range of values
(see figure \ref{fig:prox:results_3d}).

It is apparent that the \ac{FER} and the decoding failure rate are extremely
similar, especially for higher \acp{SNR}.
This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors
arise mainly due to the non-convergence of the algorithm instead of
convergence to the wrong codeword.
This course of thought will be picked up in section
\ref{sec:prox:Improved Implementation} to try to improve the algorithm.

In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding
performance.
The decoding failure rate closely resembles the \ac{FER}, suggesting that
the frame errors may largely be attributed to decoding failures.

\todo{Maybe reference to the structure of the algorithm (1 part likelihood
1 part constraints)}


\subsection{Convergence Properties}

The previous observation, that the \ac{FER} arises mainly due to the
non-convergence of the algorithm instead of convergence to the wrong codeword,
raises the question why the decoding process does not converge so often.
In figure \ref{fig:prox:convergence}, the iterative process is visualized
for each iteration.
In order to be able to simultaneously consider all components of the vectors
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
Each chart shows one component of the current estimates during a given
iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
as the gradients of the negative log-likelihood and the code-constraint
polynomial, which influence the next estimate.

\begin{figure}[H]
    \begin{minipage}[c]{0.25\textwidth}
        \centering

        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_1]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_1]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_1]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_2$}
                \addlegendentry{$\left(\nabla h \right)_2 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_2]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_2]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_2]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_3$}
                \addlegendentry{$\left(\nabla h \right)_3 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_3]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_3]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_3]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_4$}
                \addlegendentry{$\left(\nabla h \right)_4 $}
            \end{axis}
        \end{tikzpicture}
    \end{minipage}%
    \begin{minipage}[c]{0.5\textwidth}
        \vspace*{-1cm}
        \centering

        \begin{tikzpicture}[scale = 0.85, spy using outlines={circle, magnification=6,
                                                              connect spies}]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_0]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_0]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_0]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_1$}
                \addlegendentry{$\left(\nabla h \right)_1 $}

                \coordinate (spypoint) at (axis cs:100,0.53);
                \coordinate (magnifyglass) at (axis cs:175,2);
            \end{axis}
            \spy [black, size=2cm] on (spypoint)
               in node[fill=white] at (magnifyglass);
        \end{tikzpicture}
    \end{minipage}%
    \begin{minipage}[c]{0.25\textwidth}
        \centering

        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_4]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_4]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_4]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_5$}
                \addlegendentry{$\left(\nabla h \right)_5 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_5]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_5]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_5]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_6$}
                \addlegendentry{$\left(\nabla h \right)_6 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_6]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_6]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_6]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_7$}
                \addlegendentry{$\left(\nabla h \right)_7 $}
            \end{axis}
        \end{tikzpicture}
    \end{minipage}

    \caption{Internal variables of proximal decoder
        as a function of the number of iterations ($n=7$)\protect\footnotemark{}}
    \label{fig:prox:convergence}
\end{figure}%
%
\footnotetext{A single decoding is shown, using the BCH$\left( 7,4 \right) $ code;
    $\gamma = 0.05, \omega = 0.05, E_b / N_0 = \SI{5}{dB}$
}%
%
\noindent It is evident that in all cases, past a certain number of
iterations, the estimate starts to oscillate around a particular value.
After a certain point, the two gradients stop further approaching the value
zero.
In particular, this leads to the code-constraints polynomial not being
minimized.
As such, the constraints are not being satisfied and the estimate is not
converging towards a valid codeword.

While figure \ref{fig:prox:convergence} shows only one instance of a decoding
task, it is indicative of the general behaviour of the algorithm.
This can be justified by looking at the gradients themselves.
In figure \ref{fig:prox:gradients} the gradients of the negative
log-likelihood and the code-constraint polynomial for a repetition code with
$n=2$ are shown.
It is obvious that walking along the gradients in an alternating fashion will
produce a net movement in a certain direction, as long as the two gradients
have a common component.
As soon as this common component is exhausted, they will start pulling the
estimate in opposing directions, leading to an oscillation as illustrated
in figure \ref{fig:prox:convergence}.
Consequently, this oscillation is an intrinsic property of the structure of
the proximal decoding algorithm, where the two parts of the objective function
are minimized in an alternating manner using their gradients.

\begin{figure}[H]
    \centering

    \begin{subfigure}[c]{0.5\textwidth}
        \centering

        \begin{tikzpicture}
            \begin{axis}[xmin = -1.25, xmax=1.25,
                         ymin = -1.25, ymax=1.25,
                         xlabel={$x_1$}, ylabel={$x_2$},
                         width=\textwidth,
                         height=0.75\textwidth,
                         grid=major, grid style={dotted},
                         view={0}{90}]

                \addplot3[point meta=\thisrow{grad_norm},
                          point meta min=1,
                          point meta max=3,
                          quiver={u=\thisrow{grad_0},
                                  v=\thisrow{grad_1},
                                  scale arrows=.05,
                                  every arrow/.append style={%
                                    line width=.3+\pgfplotspointmetatransformed/1000,
                                    -{Latex[length=0pt 5,width=0pt 3]}
                                  },
                              },
                            quiver/colored = {mapped color},
                            colormap/rocket,
                            -stealth,
                          ]
                            table[col sep=comma] {res/proximal/2d_grad_L.csv};
            \end{axis}
        \end{tikzpicture}

        \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $
            for a repetition code with $n=2$}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}[c]{0.5\textwidth}
        \centering

        \begin{tikzpicture}
            \begin{axis}[xmin = -1.25, xmax=1.25,
                         ymin = -1.25, ymax=1.25,
                         xlabel={$x_1$}, ylabel={$x_2$},
                         grid=major, grid style={dotted},
                         width=\textwidth,
                         height=0.75\textwidth,
                         view={0}{90}]
                \addplot3[point meta=\thisrow{grad_norm},
                          point meta min=1,
                          point meta max=4,
                          quiver={u=\thisrow{grad_0},
                                  v=\thisrow{grad_1},
                                  scale arrows=.03,
                                  every arrow/.append style={%
                                    line width=.3+\pgfplotspointmetatransformed/1000,
                                    -{Latex[length=0pt 5,width=0pt 3]}
                                  },
                              },
                            quiver/colored = {mapped color},
                            colormap/rocket,
                            -stealth,
                          ]
                            table[col sep=comma] {res/proximal/2d_grad_h.csv};
            \end{axis}
        \end{tikzpicture}

        \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $
            for a repetition code with $n=2$}
    \end{subfigure}%
\end{figure}


While the initial net movement is generally directed in the right direction
owing to the gradient of the negative log-likelihood, the final oscillation
may well take place in a segment of space not corresponding to a valid
codeword, leading to the aforementioned non-convergence of the algorithm.
This also partly explains the difference in decoding performance when looking
at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while
still yielding an invalid codeword.

When considering codes with larger $n$, the behaviour generally stays the
same, with some minor differences.
In figure \ref{fig:prox:convergence_large_n} the decoding process is
visualized for one component of a code with $n=204$, for a single decoding.
The two gradients still start to fight each other and the estimate still
starts to oscillate, the same as illustrated on the basis of figure
\ref{fig:prox:convergence} for a code with $n=7$.
However, in this case, the gradient of the code-constraint polynomial iself
starts to oscillate, its average value being such that the effect of the
gradient of the negative log-likelihood is counteracted.

In conclusion, as a general rule, the proximal decoding algorithm reaches
an oscillatory state which it cannot escape as a consequence of its structure.
In this state, the constraints may not be satisfied, leading to the algorithm
returning an invalid codeword.

\begin{figure}[H]
    \centering

    \begin{tikzpicture}
        \begin{axis}[
            grid=both,
            xlabel={Iterations},
            width=0.6\textwidth,
            height=0.45\textwidth,
            scale only axis,
            xtick={0, 100, ..., 400},
            xticklabels={0, 50, ..., 200},
        ]
            \addplot [NavyBlue, mark=none, line width=1]
                table [col sep=comma, x=k, y=comb_r_s_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addplot [ForestGreen, mark=none, line width=1]
                table [col sep=comma, x=k, y=grad_L_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addplot [RedOrange, mark=none, line width=1]
                table [col sep=comma, x=k, y=grad_h_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addlegendentry{est}
            \addlegendentry{$\left(\nabla L\right)_1$}
            \addlegendentry{$\left(\nabla h\right)_1$}
        \end{axis}
    \end{tikzpicture}

    \caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
    \label{fig:prox:convergence_large_n}
\end{figure}%


\subsection{Computational Performance}

\begin{itemize}
    \item Theoretical analysis
    \item Simulation results to substantiate theoretical analysis
    \item Conclusion: $\mathcal{O}\left( n \right)$ time complexity, implementation heavily
        optimizable
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Improved Implementation}%
\label{sec:prox:Improved Implementation}

As mentioned earlier, frame errors seem to mainly stem from decoding failures.
This, coupled with the fact that the \ac{BER} indicates so much better
performance than the \ac{FER}, leads to the assumption that only a small
number of components of the estimated vector may be responsible for an invalid
result.
If it was possible to limit the number of possibly wrong components of the
estimate to a small subset, an \ac{ML}-decoding step could be performed on
a limited number of possible results (``ML-in-the-List'' as it will
subsequently be called) to improve the decoding performance.
This concept is pursued in this section.

\begin{itemize}
    \item Decoding performance and comparison with standard proximal decoding
    \item Computational performance and comparison with standard proximal decoding
    \item Conclusion
        \begin{itemize}
            \item Summary
            \item Up to $\SI{1}{dB}$ gain possible
        \end{itemize}
\end{itemize}