\chapter{Proximal Decoding}% \label{chapter:proximal_decoding} In this chapter, the proximal decoding algorithm is examined. First, the algorithm itself is described. Then, some interesting ideas concerning the implementation are presented. Simulation results are shown, on the basis of which the behaviour of the algorithm is investigated for different codes and parameters. Finally, an improvement on proximal decoding is proposed. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Decoding Algorithm}% \label{sec:prox:Decoding Algorithm} Proximal decoding was proposed by Wadayama et al. as a novel formulation of optimization-based decoding \cite{proximal_paper}. With this algorithm, minimization is performed using the proximal gradient method. In contrast to \ac{LP} decoding, the objective function is based on a non-convex optimization formulation of the \ac{MAP} decoding problem. In order to derive the objective function, the authors begin with the \ac{MAP} decoding rule, expressed as a continuous maximization problem% \footnote{The expansion of the domain to be continuous doesn't constitute a material difference in the meaning of the rule. The only change is that what previously were \acp{PMF} now have to be expressed in terms of \acp{PDF}.} over $\boldsymbol{x}$:% % \begin{align} \hat{\boldsymbol{x}} = \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}} f_{\tilde{\boldsymbol{X}} \mid \boldsymbol{Y}} \left( \tilde{\boldsymbol{x}} \mid \boldsymbol{y} \right) = \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}} f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}} \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)% \label{eq:prox:vanilla_MAP} .\end{align}% % The likelihood $f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}} \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $ is a known function determined by the channel model. The prior \ac{PDF} $f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)$ is also known, as the equal probability assumption is made on $\mathcal{C}$. However, since the considered domain is continuous, the prior \ac{PDF} cannot be ignored as a constant during the minimization as is often done, and has a rather unwieldy representation:% % \begin{align} f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right) = \frac{1}{\left| \mathcal{C} \right| } \sum_{\boldsymbol{c} \in \mathcal{C} } \delta\big( \tilde{\boldsymbol{x}} - \left( -1 \right) ^{\boldsymbol{c}}\big) \label{eq:prox:prior_pdf} .\end{align}% % In order to rewrite the prior \ac{PDF} $f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right)$, the so-called \textit{code-constraint polynomial} is introduced as:% % \begin{align*} h\left( \tilde{\boldsymbol{x}} \right) = \underbrace{\sum_{i=1}^{n} \left( \tilde{x_i}^2-1 \right) ^2}_{\text{Bipolar constraint}} + \underbrace{\sum_{j=1}^{m} \left[ \left( \prod_{i\in N_c \left( j \right) } \tilde{x_i} \right) -1 \right] ^2}_{\text{Parity constraint}}% .\end{align*}% % The intention of this function is to provide a way to penalize vectors far from a codeword and favor those close to one. In order to achieve this, the polynomial is composed of two parts: one term representing the bipolar constraint, providing for a discrete solution of the continuous optimization problem, and one term representing the parity constraints, accommodating the role of the parity-check matrix $\boldsymbol{H}$. The prior \ac{PDF} is then approximated using the code-constraint polynomial as:% % \begin{align} f_{\tilde{\boldsymbol{X}}}\left( \tilde{\boldsymbol{x}} \right) \approx \frac{1}{Z}\mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) }% \label{eq:prox:prior_pdf_approx} .\end{align}% % The authors justify this approximation by arguing that for $\gamma \rightarrow \infty$, the approximation in equation (\ref{eq:prox:prior_pdf_approx}) approaches the original function in equation (\ref{eq:prox:prior_pdf}). This approximation can then be plugged into equation (\ref{eq:prox:vanilla_MAP}) and the likelihood can be rewritten using the negative log-likelihood $L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) = -\ln\left( f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}}\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) \right) $:% % \begin{align*} \hat{\boldsymbol{x}} &= \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}} \mathrm{e}^{- L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) } \mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) } \\ &= \argmin_{\tilde{\boldsymbol{x}} \in \mathbb{R}^n} L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) + \gamma h\left( \tilde{\boldsymbol{x}} \right)% .\end{align*}% % Thus, with proximal decoding, the objective function $g\left( \tilde{\boldsymbol{x}} \right)$ considered is% % \begin{align} g\left( \tilde{\boldsymbol{x}} \right) = L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) + \gamma h\left( \tilde{\boldsymbol{x}} \right)% \label{eq:prox:objective_function} \end{align}% % and the decoding problem is reformulated to% % \begin{align*} \text{minimize}\hspace{2mm} &L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) + \gamma h\left( \tilde{\boldsymbol{x}} \right)\\ \text{subject to}\hspace{2mm} &\tilde{\boldsymbol{x}} \in \mathbb{R}^n .\end{align*} % For the solution of the approximate \ac{MAP} decoding problem, the two parts of equation (\ref{eq:prox:objective_function}) are considered separately: the minimization of the objective function occurs in an alternating fashion, switching between the negative log-likelihood $L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled code-constraint polynomial $\gamma h\left( \boldsymbol{x} \right) $. Two helper variables, $\boldsymbol{r}$ and $\boldsymbol{s}$, are introduced, describing the result of each of the two steps. The first step, minimizing the log-likelihood, is performed using gradient descent:% % \begin{align} \boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \nabla L\left( \boldsymbol{y} \mid \boldsymbol{s} \right), \hspace{5mm}\omega > 0 \label{eq:prox:step_log_likelihood} .\end{align}% % For the second step, minimizing the scaled code-constraint polynomial, the proximal gradient method is used and the \textit{proximal operator} of $\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed. It is then immediately approximated with gradient-descent:% % \begin{align*} \textbf{prox}_{\gamma h} \left( \tilde{\boldsymbol{x}} \right) &\equiv \argmin_{\boldsymbol{t} \in \mathbb{R}^n} \gamma h\left( \boldsymbol{t} \right) + \frac{1}{2} \left\Vert \boldsymbol{t} - \tilde{\boldsymbol{x}} \right\Vert \\ &\approx \tilde{\boldsymbol{x}} - \gamma \nabla h \left( \tilde{\boldsymbol{x}} \right), \hspace{5mm} \gamma > 0, \text{ small} .\end{align*}% % The second optimization step thus becomes% % \begin{align*} \boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right), \hspace{5mm}\gamma > 0,\text{ small} .\end{align*} % While the approximation of the prior \ac{PDF} made in equation (\ref{eq:prox:prior_pdf_approx}) theoretically becomes better with larger $\gamma$, the constraint that $\gamma$ be small is important, as it keeps the effect of $h\left( \tilde{\boldsymbol{x}} \right) $ on the landscape of the objective function small. Otherwise, unwanted stationary points, including local minima, are introduced. The authors say that ``in practice, the value of $\gamma$ should be adjusted according to the decoding performance.'' \cite[Sec. 3.1]{proximal_paper}. %The components of the gradient of the code-constraint polynomial can be computed as follows:% %% %\begin{align*} % \frac{\partial}{\partial x_k} h\left( \boldsymbol{x} \right) = % 4\left( x_k^2 - 1 \right) x_k + \frac{2}{x_k} % \sum_{i\in \mathcal{B}\left( k \right) } \left( % \left( \prod_{j\in\mathcal{A}\left( i \right)} x_j\right)^2 % - \prod_{j\in\mathcal{A}\left( i \right) }x_j \right) %.\end{align*}% %\todo{Only multiplication?}% %\todo{$x_k$: $k$ or some other indexing variable?}% %% In the case of \ac{AWGN}, the likelihood $f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}} \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)$ is% % \begin{align*} f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}} \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) = \frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{e}^{ -\frac{\lVert \boldsymbol{y}-\tilde{\boldsymbol{x}} \rVert^2 } {2\sigma^2}} .\end{align*} % Thus, the gradient of the negative log-likelihood becomes% \footnote{For the minimization, constants can be disregarded. For this reason, it suffices to consider only proportionality instead of equality.}% % \begin{align*} \nabla L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) &\propto -\nabla \lVert \boldsymbol{y} - \tilde{\boldsymbol{x}} \rVert^2\\ &\propto \tilde{\boldsymbol{x}} - \boldsymbol{y} ,\end{align*}% % allowing equation (\ref{eq:prox:step_log_likelihood}) to be rewritten as% % \begin{align*} \boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) .\end{align*} % One thing to consider during the actual decoding process, is that the gradient of the code-constraint polynomial can take on extremely large values. To avoid numerical instability, an additional step is added, where all components of the current estimate are clipped to $\left[-\eta, \eta \right]$, where $\eta$ is a positive constant slightly larger than one:% % \begin{align*} \boldsymbol{s} \leftarrow \Pi_{\eta} \left( \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right) ,\end{align*} % $\Pi_{\eta}\left( \cdot \right) $ expressing the projection onto $\left[ -\eta, \eta \right]^n$. The iterative decoding process resulting from these considerations is summarized in algorithm \ref{alg:prox}. \begin{genericAlgorithm}[caption={Proximal decoding algorithm for an \ac{AWGN} channel}, label={alg:prox}] $\boldsymbol{s} \leftarrow \boldsymbol{0}$ for $K$ iterations do $\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $ $\boldsymbol{s} \leftarrow \Pi_\eta \left(\boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right)$ $\boldsymbol{\hat{x}} \leftarrow \text{sign}\left( \boldsymbol{s} \right) $ if $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ do return $\boldsymbol{\hat{c}}$ end if end for return $\boldsymbol{\hat{c}}$ \end{genericAlgorithm} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Implementation Details}% \label{sec:prox:Implementation Details} The algorithm was first implemented in Python because of the fast development process and straightforward debugging ability. It was subsequently reimplemented in C++ using the Eigen% \footnote{\url{https://eigen.tuxfamily.org}} linear algebra library to achieve higher performance. The focus has been set on a fast implementation, sometimes at the expense of memory usage, somewhat limiting the size of the codes the implemenation can be used with \todo{Is this a appropriate for a bachelor's thesis?}. The evaluation of the simulation results has been wholly realized in Python. The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper} is given by% % \begin{align*} \nabla h\left( \tilde{\boldsymbol{x}} \right) &= \begin{bmatrix} \frac{\partial}{\partial \tilde{x}_1}h\left( \tilde{\boldsymbol{x}} \right) & \ldots & \frac{\partial}{\partial \tilde{x}_n}h\left( \tilde{\boldsymbol{x}} \right) & \end{bmatrix}^\text{T}, \\[1em] \frac{\partial}{\partial \tilde{x}_k}h\left( \tilde{\boldsymbol{x}} \right) &= 4\left( \tilde{x}_k^2 - 1 \right) \tilde{x}_k + \frac{2}{\tilde{x}_k} \sum_{j\in N_v\left( k \right) }\left( \left( \prod_{i \in N_c\left( j \right)} \tilde{x}_i \right)^2 - \prod_{i\in N_c\left( j \right) } \tilde{x}_i \right) .\end{align*} % Since the products $\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$ are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be precomputed. Defining% % \begin{align*} \boldsymbol{p} := \begin{bmatrix} \prod_{i\in N_c\left( 1 \right) } \tilde{x}_i \\ \vdots \\ \prod_{i\in N_c\left( m \right) } \tilde{x}_i \\ \end{bmatrix} \hspace{5mm} \text{and} \hspace{5mm} \boldsymbol{v} := \boldsymbol{p}^{\circ 2} - \boldsymbol{p} ,\end{align*} % the gradient can be written as% % \begin{align*} \nabla h\left( \tilde{\boldsymbol{x}} \right) = 4\left( \tilde{\boldsymbol{x}}^{\circ 3} - \tilde{\boldsymbol{x}} \right) + 2\tilde{\boldsymbol{x}}^{\circ -1} \circ \boldsymbol{H}^\text{T} \boldsymbol{v} ,\end{align*} % enabling the computation of the gradient primarily with element-wise operations and matrix-vector multiplication. This is beneficial, as the libraries used for the implementation are heavily optimized for such calculations (e.g., through vectorization of the operations). \todo{Note about how the equation with which the gradient is calculated is itself similar to a message-passing rule} The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to compute, as it amounts to simply clipping each component of the vector onto $[-\eta, \eta]$ individually. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Analysis and Simulation Results}% \label{sec:prox:Analysis and Simulation Results} In this section, the general behaviour of the proximal decoding algorithm is analyzed. The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is examined. The decoding performance is assessed on the basis of the \ac{BER} and the \ac{FER} as well as the \textit{decoding failure rate} - the rate at which the algorithm produces invalid results. The convergence properties are reviewed and related to the decoding performance. Finally, the computational performance is examined on a theoretical basis as well as on the basis of the implementation completed in the context of this work. All simulation results presented hereafter are based on Monte Carlo simulations. The \ac{BER} and \ac{FER} curves in particular have been generated by producing at least 100 frame-errors for each data point, unless otherwise stated. \todo{Mention number of datapoints from which each graph was created for non ber and fer curves} \subsection{Choice of Parameters} First, the effect of the parameter $\gamma$ is investigated. Figure \ref{fig:prox:results} shows a comparison of the decoding performance of the proximal decoding algorithm as presented by Wadayama et al. in \cite{proximal_paper} and the implementation realized for this work. \noindent The \ac{BER} curves for three different choices of the parameter $\gamma$ are shown, as well as the \ac{BER} curve resulting from decoding using \ac{BP}, as a reference. The results from Wadayama et al. are shown with solid lines, while the newly generated ones are shown with dashed lines. \begin{figure}[H] \centering \begin{tikzpicture} \begin{axis}[grid=both, grid style={line width=.1pt}, xlabel={$E_b / N_0$ (dB)}, ylabel={BER}, ymode=log, legend style={at={(0.5,-0.7)},anchor=south}, width=0.6\textwidth, height=0.45\textwidth, ymax=1.2, ymin=0.8e-4, xtick={1, 2, ..., 5}, xmin=0.9, xmax=5.6, legend columns=2,] \addplot [ForestGreen, mark=*, line width=1pt] table [x=SNR, y=gamma_0_15, col sep=comma] {res/proximal/ber_paper.csv}; \addlegendentry{$\gamma = 0.15$ (Wadayama et al.)} \addplot [ForestGreen, mark=triangle, dashed, line width=1pt] table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.15}, discard if gt={SNR}{5.5},] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.15$ (Own results)} \addplot [NavyBlue, mark=*, line width=1pt] table [x=SNR, y=gamma_0_01, col sep=comma] {res/proximal/ber_paper.csv}; \addlegendentry{$\gamma = 0.01$ (Wadayama et al.)} \addplot [NavyBlue, mark=triangle, dashed, line width=1pt] table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.01}, discard if gt={SNR}{5.5},] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.01$ (Own results)} \addplot [RedOrange, mark=*, line width=1pt] table [x=SNR, y=gamma_0_05, col sep=comma] {res/proximal/ber_paper.csv}; \addlegendentry{$\gamma = 0.05$ (Wadayama et al.)} \addplot [RedOrange, mark=triangle, dashed, line width=1pt] table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.05}, discard if gt={SNR}{5.5},] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.05$ (Own results)} \addplot [RoyalPurple, mark=*, line width=1pt] table [x=SNR, y=BP, col sep=comma] {res/proximal/ber_paper.csv}; \addlegendentry{BP (Wadayama et al.)} \end{axis} \end{tikzpicture} \caption{Comparison of datapoints from Wadayama et al. with own simulation results% \protect\footnotemark{}} \label{fig:prox:results} \end{figure} % \footnotetext{(3,6) regular LDPC code with $n = 204$, $k = 102$ \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$ }% % \noindent It is noticeable that for a moderately chosen value of $\gamma$ ($\gamma = 0.05$) the decoding performance is better than for low ($\gamma = 0.01$) or high ($\gamma = 0.15$) values. The question arises if there is some optimal value maximazing the decoding performance, especially since it seems to dramatically depend on $\gamma$. To better understand how $\gamma$ and the decoding performance are related, figure \ref{fig:prox:results} was recreated, but with a considerably larger selection of values for $\gamma$. In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of stacking the \ac{BER} curves on top of one another in the same plot, the visualization is extended to three dimensions. The previously shown results are highlighted. Evidently, while the decoding performance does depend on the value of $\gamma$, there is no single optimal value offering optimal performance, but rather a certain interval in which it stays largely unchanged. When examining a number of different codes (figure \ref{fig:prox:results_3d_multiple}), it is apparent that while the exact landscape of the graph depends on the code, the general behaviour is the same in each case. \begin{figure}[H] \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b / N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, %legend pos=outer north east, legend style={at={(0.5,-0.7)},anchor=south}, ytick={0, 0.05, 0.1, 0.15}, width=0.6\textwidth, height=0.45\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=14, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = \left[ 0\text{:}0.01\text{:}0.16 \right] $} \addplot3[NavyBlue, line width=1.5] table [col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.01$} \addplot3[RedOrange, line width=1.5] table [col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.05$} \addplot3[ForestGreen, line width=1.5] table [col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.15$} \end{axis} \end{tikzpicture} \caption{Visualization of relationship between the decoding performance\protect\footnotemark{} and the parameter $\gamma$} \label{fig:prox:results_3d} \end{figure}% % \footnotetext{(3,6) regular LDPC code with n = 204, k = 102 \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$ }% % \noindent This indicates \todo{This is a result fit for the conclusion} that while the choice of the parameter $\gamma$ significantly affects the decoding performance, there is not much benefit attainable in undertaking an extensive search for an exact optimum. Rather, a preliminary examination providing a rough window for $\gamma$ may be sufficient. TODO: $\omega, K$ Changing the parameter $\eta$ does not appear to have a significant effect on the decoding performance when keeping the value within a reasonable window (''slightly larger than one``, as stated in \cite[Sec. 3.2]{proximal_paper}), which seems plausible considering its only function is ensuring numerical stability. Summarizing the above considerations, \ldots \begin{itemize} \item Conclusion: Number of iterations independent of \ac{SNR} \end{itemize} \begin{figure}[H] \centering \begin{subfigure}[c]{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b / N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, width=\textwidth, height=0.75\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=10, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_963965.csv}; \addplot3[RedOrange, line width=1.5] table[col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_963965.csv}; \addplot3[NavyBlue, line width=1.5] table[col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_963965.csv}; \addplot3[ForestGreen, line width=1.5] table[col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_963965.csv}; \end{axis} \end{tikzpicture} \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=96, k=48$ \cite[\text{96.3.965}]{mackay_enc}} \end{subfigure}% \hfill \begin{subfigure}[c]{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b / N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, width=\textwidth, height=0.75\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=10, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_bch_31_26.csv}; \addplot3[RedOrange, line width=1.5] table[col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_bch_31_26.csv}; \addplot3[NavyBlue, line width=1.5] table[col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_bch_31_26.csv}; \addplot3[ForestGreen, line width=1.5] table[col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_bch_31_26.csv}; \end{axis} \end{tikzpicture} \caption{BCH code with $n=31, k=26$\\[2\baselineskip]} \end{subfigure} \begin{subfigure}[c]{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b/N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, width=\textwidth, height=0.75\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=14, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addplot3[RedOrange, line width=1.5] table[col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addplot3[NavyBlue, line width=1.5] table[col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addplot3[ForestGreen, line width=1.5] table[col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \end{axis} \end{tikzpicture} \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=204, k=102$ \cite[\text{204.33.484}]{mackay_enc}} \end{subfigure}% \hfill \begin{subfigure}[c]{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b / N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, width=\textwidth, height=0.75\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=10, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20455187.csv}; \addplot3[RedOrange, line width=1.5] table[col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20455187.csv}; \addplot3[NavyBlue, line width=1.5] table[col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20455187.csv}; \addplot3[ForestGreen, line width=1.5] table[col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_20455187.csv}; \end{axis} \end{tikzpicture} \caption{$\left( 5, 10 \right)$-regular LDPC code with $n=204, k=102$ \cite[\text{204.55.187}]{mackay_enc}} \end{subfigure}% \begin{subfigure}[c]{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b / N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, width=\textwidth, height=0.75\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=10, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_40833844.csv}; \addplot3[RedOrange, line width=1.5] table[col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_40833844.csv}; \addplot3[NavyBlue, line width=1.5] table[col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_40833844.csv}; \addplot3[ForestGreen, line width=1.5] table[col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_40833844.csv}; \end{axis} \end{tikzpicture} \caption{$\left( 3, 6 \right)$-regular LDPC code with $n=408, k=204$ \cite[\text{408.33.844}]{mackay_enc}} \end{subfigure}% \hfill \begin{subfigure}[c]{0.48\textwidth} \centering \begin{tikzpicture} \begin{axis}[view={75}{30}, zmode=log, xlabel={$E_b / N_0$ (dB)}, ylabel={$\gamma$}, zlabel={BER}, width=\textwidth, height=0.75\textwidth,] \addplot3[surf, mesh/rows=17, mesh/cols=10, colormap/viridis] table [col sep=comma, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv}; \addplot3[RedOrange, line width=1.5] table[col sep=comma, discard if not={gamma}{0.05}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv}; \addplot3[NavyBlue, line width=1.5] table[col sep=comma, discard if not={gamma}{0.01}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv}; \addplot3[ForestGreen, line width=1.5] table[col sep=comma, discard if not={gamma}{0.15}, x=SNR, y=gamma, z=BER] {res/proximal/2d_ber_fer_dfr_pegreg252x504.csv}; \end{axis} \end{tikzpicture} \caption{LDPC code (Progressive Edge Growth Construction) with $n=504, k=252$ \cite[\text{PEGReg252x504}]{mackay_enc}} \end{subfigure}% \vspace{1cm} \begin{subfigure}[c]{\textwidth} \centering \begin{tikzpicture} \begin{axis}[hide axis, xmin=10, xmax=50, ymin=0, ymax=0.4, legend style={draw=white!15!black,legend cell align=left}] \addlegendimage{surf, colormap/viridis} \addlegendentry{$\gamma = \left[ 0\text{ : }0.01\text{ : }0.16 \right] $}; \addlegendimage{NavyBlue, line width=1.5pt} \addlegendentry{$\gamma = 0.01$}; \addlegendimage{RedOrange, line width=1.5pt} \addlegendentry{$\gamma = 0.05$}; \addlegendimage{ForestGreen, line width=1.5pt} \addlegendentry{$\gamma = 0.15$}; \end{axis} \end{tikzpicture} \end{subfigure} \caption{BER for $\omega = 0.05, K=100$ (different codes)} \label{fig:prox:results_3d_multiple} \end{figure} \subsection{Decoding Performance} \begin{figure}[H] \centering \begin{tikzpicture} \begin{axis}[grid=both, xlabel={$E_b / N_0$ (dB)}, ylabel={BER}, ymode=log, width=0.48\textwidth, height=0.36\textwidth, legend style={at={(0.05,0.05)},anchor=south west}, ymax=1.5, ymin=3e-7,] \addplot [ForestGreen, mark=*] table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.15}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.15$} \addplot [NavyBlue, mark=*] table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.01}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.01$} \addplot [RedOrange, mark=*] table [x=SNR, y=BER, col sep=comma, discard if not={gamma}{0.05}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.05$} \end{axis} \end{tikzpicture}% \hfill% \begin{tikzpicture} \begin{axis}[grid=both, xlabel={$E_b / N_0$ (dB)}, ylabel={FER}, ymode=log, width=0.48\textwidth, height=0.36\textwidth, legend style={at={(0.05,0.05)},anchor=south west}, ymax=1.5, ymin=3e-7,] \addplot [ForestGreen, mark=*] table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.15}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.15$} \addplot [NavyBlue, mark=*] table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.01}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.01$} \addplot [RedOrange, mark=*] table [x=SNR, y=FER, col sep=comma, discard if not={gamma}{0.05}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.05$} \end{axis} \end{tikzpicture}\\[1em] \begin{tikzpicture} \begin{axis}[grid=both, xlabel={$E_b / N_0$ (dB)}, ylabel={Decoding Failure Rate}, ymode=log, width=0.48\textwidth, height=0.36\textwidth, legend style={at={(0.05,0.05)},anchor=south west}, ymax=1.5, ymin=3e-7,] \addplot [ForestGreen, mark=*] table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.15}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.15$} \addplot [NavyBlue, mark=*] table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.01}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.01$} \addplot [RedOrange, mark=*] table [x=SNR, y=DFR, col sep=comma, discard if not={gamma}{0.05}] {res/proximal/2d_ber_fer_dfr_20433484.csv}; \addlegendentry{$\gamma = 0.05$} \end{axis} \end{tikzpicture} \caption{Comparison of \ac{FER}, \ac{BER} and decoding failure rate\protect\footnotemark{}} \label{fig:prox:ber_fer_dfr} \end{figure}% % \footnotetext{(3,6) regular LDPC code with n = 204, k = 102 \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=100, \eta=1.5$ }% % Until now, only the \ac{BER} has been considered to gauge the decoding performance. The \ac{FER}, however, shows considerably worse behaviour, as can be seen in figure \ref{fig:prox:ber_fer_dfr}. Besides the \ac{BER} and \ac{FER} curves, the figure also shows the \textit{decoding failure rate}. This is the rate at which the iterative process produces invalid codewords, i.e., the stopping criterion (line 6 of algorithm \ref{alg:prox}) is never satisfied and the maximum number of itertations $K$ is reached without converging to a valid codeword. Three lines are plotted in each case, corresponding to different values of the parameter $\gamma$. The values chosen are the same as in figure \ref{fig:prox:results}, as they seem to adequately describe the behaviour across a wide range of values (see figure \ref{fig:prox:results_3d}). It is apparent that the \ac{FER} and the decoding failure rate are extremely similar, especially for higher \acp{SNR}. This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors arise mainly due to the non-convergence of the algorithm instead of convergence to the wrong codeword. This course of thought will be picked up in section \ref{sec:prox:Improved Implementation} to try to improve the algorithm. In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding performance. The decoding failure rate closely resembles the \ac{FER}, suggesting that the frame errors may largely be attributed to decoding failures. \todo{Maybe reference to the structure of the algorithm (1 part likelihood 1 part constraints)} \subsection{Convergence Properties} The previous observation, that the \ac{FER} arises mainly due to the non-convergence of the algorithm instead of convergence to the wrong codeword, raises the question why the decoding process does not converge so often. In figure \ref{fig:prox:convergence}, the iterative process is visualized. In order to be able to simultaneously consider all components of the vectors being dealt with, a BCH code with $n=7$ and $k=4$ is chosen. Each chart shows one component of the current estimates during a given iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well as the gradients of the negative log-likelihood and the code-constraint polynomial, which influence the next estimate. \begin{figure}[H] \begin{minipage}[c]{0.25\textwidth} \centering \begin{tikzpicture}[scale = 0.35] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_1] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_1] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_1] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_2$} \addlegendentry{$\left(\nabla h \right)_2 $} \end{axis} \end{tikzpicture}\\ \begin{tikzpicture}[scale = 0.35] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_2] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_2] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_2] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_3$} \addlegendentry{$\left(\nabla h \right)_3 $} \end{axis} \end{tikzpicture}\\ \begin{tikzpicture}[scale = 0.35] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_3] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_3] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_3] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_4$} \addlegendentry{$\left(\nabla h \right)_4 $} \end{axis} \end{tikzpicture} \end{minipage}% \begin{minipage}[c]{0.5\textwidth} \vspace*{-1cm} \centering \begin{tikzpicture}[scale = 0.85, spy using outlines={circle, magnification=6, connect spies}] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_0] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_0] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_0] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_1$} \addlegendentry{$\left(\nabla h \right)_1 $} \coordinate (spypoint) at (axis cs:100,0.53); \coordinate (magnifyglass) at (axis cs:175,2); \end{axis} \spy [black, size=2cm] on (spypoint) in node[fill=white] at (magnifyglass); \end{tikzpicture} \end{minipage}% \begin{minipage}[c]{0.25\textwidth} \centering \begin{tikzpicture}[scale = 0.35] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_4] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_4] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_4] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_5$} \addlegendentry{$\left(\nabla h \right)_5 $} \end{axis} \end{tikzpicture}\\ \begin{tikzpicture}[scale = 0.35] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_5] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_5] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_5] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_6$} \addlegendentry{$\left(\nabla h \right)_6 $} \end{axis} \end{tikzpicture}\\ \begin{tikzpicture}[scale = 0.35] \begin{axis}[ grid=both, xlabel={Iterations}, width=8cm, height=3cm, scale only axis, xtick={0, 50, ..., 200}, xticklabels={0, 25, ..., 100}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_6] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_6] {res/proximal/comp_bch_7_4_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_6] {res/proximal/comp_bch_7_4_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L \right)_7$} \addlegendentry{$\left(\nabla h \right)_7 $} \end{axis} \end{tikzpicture} \end{minipage} \caption{Internal variables of proximal decoder as a function of the number of iterations ($n=7$)\protect\footnotemark{}} \label{fig:prox:convergence} \end{figure}% % \footnotetext{A single decoding is shown, using the BCH$\left( 7,4 \right) $ code; $\gamma = 0.05, \omega = 0.05, E_b / N_0 = \SI{5}{dB}$ }% % \noindent It is evident that in all cases, past a certain number of iterations, the estimate starts to oscillate around a particular value. After a certain point, the two gradients stop further approaching the value zero. In particular, this leads to the code-constraints polynomial not being minimized. As such, the constraints are not being satisfied and the estimate is not converging towards a valid codeword. While figure \ref{fig:prox:convergence} shows only one instance of a decoding task, with no statistical significance, it is indicative of the general behaviour of the algorithm. This can be justified by looking at the gradients themselves. In figure \ref{fig:prox:gradients} the gradients of the negative log-likelihood and the code-constraint polynomial for a repetition code with $n=2$ are shown. It is obvious that walking along the gradients in an alternating fashion will produce a net movement in a certain direction, as long as the two gradients have a common component. As soon as this common component is exhausted, they will start pulling the estimate in opposing directions, leading to an oscillation as illustrated in figure \ref{fig:prox:convergence}. Consequently, this oscillation is an intrinsic property of the structure of the proximal decoding algorithm, where the two parts of the objective function are minimized in an alternating manner by use of their gradients.% % \begin{figure}[H] \centering \begin{subfigure}[c]{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis}[xmin = -1.25, xmax=1.25, ymin = -1.25, ymax=1.25, xlabel={$x_1$}, ylabel={$x_2$}, width=\textwidth, height=0.75\textwidth, grid=major, grid style={dotted}, view={0}{90}] \addplot3[point meta=\thisrow{grad_norm}, point meta min=1, point meta max=3, quiver={u=\thisrow{grad_0}, v=\thisrow{grad_1}, scale arrows=.05, every arrow/.append style={% line width=.3+\pgfplotspointmetatransformed/1000, -{Latex[length=0pt 5,width=0pt 3]} }, }, quiver/colored = {mapped color}, colormap/rocket, -stealth, ] table[col sep=comma] {res/proximal/2d_grad_L.csv}; \end{axis} \end{tikzpicture} \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $ for a repetition code with $n=2$} \label{fig:prox:gradients:L} \end{subfigure}% \hfill% \begin{subfigure}[c]{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis}[xmin = -1.25, xmax=1.25, ymin = -1.25, ymax=1.25, xlabel={$x_1$}, ylabel={$x_2$}, grid=major, grid style={dotted}, width=\textwidth, height=0.75\textwidth, view={0}{90}] \addplot3[point meta=\thisrow{grad_norm}, point meta min=1, point meta max=4, quiver={u=\thisrow{grad_0}, v=\thisrow{grad_1}, scale arrows=.03, every arrow/.append style={% line width=.3+\pgfplotspointmetatransformed/1000, -{Latex[length=0pt 5,width=0pt 3]} }, }, quiver/colored = {mapped color}, colormap/rocket, -stealth, ] table[col sep=comma] {res/proximal/2d_grad_h.csv}; \end{axis} \end{tikzpicture} \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $ for a repetition code with $n=2$} \label{fig:prox:gradients:h} \end{subfigure}% \caption{Gradiensts of the negative log-likelihood and the code-constraint polynomial} \label{fig:prox:gradients} \end{figure}% % While the initial net movement is generally directed in the right direction owing to the gradient of the negative log-likelihood, the final oscillation may well take place in a segment of space not corresponding to a valid codeword, leading to the aforementioned non-convergence of the algorithm. This also partly explains the difference in decoding performance when looking at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while still yielding an invalid codeword. The higher the \ac{SNR}, the more likely the gradient of the negative log-likelihood is to point to a valid codeword. The common component of the two gradient then pulls the estimate closer to a valid codeword before the oscillation takes place. This explains why the decoding performance is so much better for higher \acp{SNR}. When considering codes with larger $n$, the behaviour generally stays the same, with some minor differences. In figure \ref{fig:prox:convergence_large_n} the decoding process is visualized for one component of a code with $n=204$, for a single decoding. The two gradients still eventually oppose each other and the estimate still starts to oscillate, the same as illustrated in figure \ref{fig:prox:convergence} on the basis of a code with $n=7$. However, in this case, the gradient of the code-constraint polynomial iself starts to oscillate, its average value being such that the effect of the gradient of the negative log-likelihood is counteracted. Looking at figure \ref{fig:prox:gradients:h} it also becomes apparent why the value of the parameter $\gamma$ has to be kept small, as mentioned in section \ref{sec:prox:Decoding Algorithm}. Local minima are introduced between the codewords, in the ares in which it is not immediately clear which codeword is the most likely one. Raising the value of $\gamma$ results in $h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the objective function, thereby introducing these local minima into the objective function. In conclusion, as a general rule, the proximal decoding algorithm reaches an oscillatory state which it cannot escape as a consequence of its structure. In this state, the constraints may not be satisfied, leading to the algorithm returning an invalid codeword. \begin{figure}[H] \centering \begin{tikzpicture} \begin{axis}[ grid=both, xlabel={Iterations}, width=0.6\textwidth, height=0.45\textwidth, scale only axis, xtick={0, 100, ..., 400}, xticklabels={0, 50, ..., 200}, ] \addplot [NavyBlue, mark=none, line width=1] table [col sep=comma, x=k, y=comb_r_s_0] {res/proximal/extreme_components_20433484_combined.csv}; \addplot [ForestGreen, mark=none, line width=1] table [col sep=comma, x=k, y=grad_L_0] {res/proximal/extreme_components_20433484_combined.csv}; \addplot [RedOrange, mark=none, line width=1] table [col sep=comma, x=k, y=grad_h_0] {res/proximal/extreme_components_20433484_combined.csv}; \addlegendentry{est} \addlegendentry{$\left(\nabla L\right)_1$} \addlegendentry{$\left(\nabla h\right)_1$} \end{axis} \end{tikzpicture} \caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)} \label{fig:prox:convergence_large_n} \end{figure}% \subsection{Computational Performance} \begin{itemize} \item Theoretical analysis \item Simulation results to substantiate theoretical analysis \item Conclusion: $\mathcal{O}\left( n \right)$ time complexity, implementation heavily optimizable \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Improved Implementation}% \label{sec:prox:Improved Implementation} As mentioned earlier, frame errors seem to mainly stem from decoding failures. Coupled with the fact that the \ac{BER} indicates so much better performance than the \ac{FER}, this leads to the assumption that only a small number of components of the estimated vector may be responsible for an invalid result. If it was possible to limit the number of possibly wrong components of the estimate to a small subset, an \ac{ML}-decoding step could be performed on a limited number of possible results (``ML-in-the-List'' as it will subsequently be called) to improve the decoding performance. This concept is pursued in this section. First, a guideline has to be found with which to assess the probability that a given component of an estimate is wrong. One compelling observation is that the closer an estimate is to being at a valid codeword, the smaller the magnitude of the gradient of the code-constraint polynomial, as illustrated in figure \ref{fig:prox:gradients}. This gives rise to the notion that some property or behaviour of $\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its magnitude to the confidence that a given bit is correct. And indeed, the magnitude of the oscillation of $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced in a previous section) and the probability of having a bit error are strongly correlated, a relationship depicted in figure \ref{fig:prox:correlation}. TODO: Figure \noindent The y-axis depicts whether there is a bit error and the x-axis the variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration $k=100$. While this is not exactly the magnitude of the oscillation, it is proportional and easier to compute. The datapoints are taken from a single decoding operation \todo{Generate same figure with multiple decodings}. Using this observation as a rule to determine the $N\in\mathbb{N}$ most probably wrong bits, all variations of the estimate with those bits modified can be generated. An \ac{ML}-in-the-List step can then be performed in order to determine the most likely candidate. This process is outlined in figure \ref{fig:prox:improved_algorithm}. Figure \ref{fig:prox:improved_results} shows the gain that can be achieved. Again, three values of gamma are chosen, for which the \ac{BER}, \ac{FER} and decoding failure rate is plotted. The simulation results for the original proximal decoding algorithm are shown with solid lines and the results for the improved version are shown with dashed lines. The gain seems to depend on the value of $\gamma$, as well as become more pronounced for higher \ac{SNR} values. This is to be expected, since with higher \ac{SNR} values the number of bit errors decreases, making the correction of those errors in the ML-in-the-List step more likely. In figure \ref{fig:prox:improved_results_multiple} the decoding performance between proximal decoding and the improved algorithm is compared for a number of different codes. Similar behaviour can be observed in all cases, with varying improvement over standard proximal decoding. Interestingly, the improved algorithm does not have much different time complexity than proximal decoding. This is the case, because the ML-in-the-List step is only performed when the proximal decoding algorithm produces an invalid result, which in absolute terms happens relatively infrequently. This is illustrated in figure \ref{fig:prox:time_complexity_comp}, where the average time needed to decode a single received frame is visualized for proximal decoding as well as for the improved algorithm. In conclusion, the decoding performance of proximal decoding can be improved by appending an ML-in-the-List step when the algorithm does not produce a valid result. The gain is in some cases as high as $\SI{1}{dB}$ and can be achieved with negligible computational performance penalty. The improvement is mainly noticable for higher \ac{SNR} values and depends on the code as well as the chosen parameters.