Reworked proximal decoding up to and including the choice of parameters

This commit is contained in:
Andreas Tsouchlos 2023-04-11 18:40:15 +02:00
parent 46ebd5aedc
commit 5c135e085e

View File

@ -17,7 +17,8 @@ Proximal decoding was proposed by Wadayama et al. as a novel formulation of
optimization-based decoding \cite{proximal_paper}.
With this algorithm, minimization is performed using the proximal gradient
method.
In contrast to \ac{LP} decoding, the objective function is based on a
In contrast to \ac{LP} decoding, which will be covered in chapter
\ref{chapter:lp_dec_using_admm}, the objective function is based on a
non-convex optimization formulation of the \ac{MAP} decoding problem.
In order to derive the objective function, the authors begin with the
@ -121,8 +122,9 @@ and the decoding problem is reformulated to%
.\end{align*}
%
For the solution of the approximate \ac{MAP} decoding problem, the two parts
of equation (\ref{eq:prox:objective_function}) are considered separately:
For the solution of the approximate \ac{MAP} decoding problem, using the
proximal gradient method, the two parts of equation
(\ref{eq:prox:objective_function}) are considered separately:
the minimization of the objective function occurs in an alternating
fashion, switching between the negative log-likelihood
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
@ -140,10 +142,8 @@ descent:%
.\end{align}%
%
For the second step, minimizing the scaled code-constraint polynomial, the
proximal gradient method is used \todo{The proximal gradient method is not
just used for the second step. It is the name for the alternating iterative process}
and the \textit{proximal operator} of
$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed.
\textit{proximal operator} of $\gamma h\left( \tilde{\boldsymbol{x}} \right) $
has to be computed.
It is then immediately approximated with gradient-descent:%
%
\begin{align*}
@ -258,7 +258,7 @@ It was subsequently reimplemented in C++ using the Eigen%
linear algebra library to achieve higher performance.
The focus has been set on a fast implementation, sometimes at the expense of
memory usage, somewhat limiting the size of the codes the implemenation can be
used with \todo{Is this sentence appropriate for a bachelor's thesis?}.
used with.
The evaluation of the simulation results has been wholly realized in Python.
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
@ -309,8 +309,6 @@ matrix-vector multiplication.
This is beneficial, as the libraries employed for the implementation are
heavily optimized for such calculations (e.g., through vectorization of the
operations).
\todo{Note about how the equation with which the gradient is calculated is
itself similar to a message-passing rule}
The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
compute, as it amounts to simply clipping each component of the vector onto
@ -332,15 +330,13 @@ The convergence properties are reviewed and related to the decoding
performance.
Finally, the computational performance is examined on a theoretical basis
as well as on the basis of the implementation completed in the context of this
work.
thesis.
All simulation results presented hereafter are based on Monte Carlo
simulations.
The \ac{BER} and \ac{FER} curves in particular have been generated by
producing at least 100 frame-errors for each data point, unless otherwise
stated.
\todo{Mention number of datapoints from which each graph was created for
non ber and fer curves}
\subsection{Choice of Parameters}
@ -418,9 +414,9 @@ while the newly generated ones are shown with dashed lines.
\noindent It is noticeable that for a moderately chosen value of $\gamma$
($\gamma = 0.05$) the decoding performance is better than for low
($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
The question arises if there is some optimal value maximazing the decoding
The question arises whether there is some optimal value maximazing the decoding
performance, especially since it seems to dramatically depend on $\gamma$.
To better understand how $\gamma$ and the decoding performance are
To better understand how they are
related, figure \ref{fig:prox:results} was recreated, but with a considerably
larger selection of values for $\gamma$.
In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
@ -431,11 +427,7 @@ The previously shown results are highlighted.
Evidently, while the decoding performance does depend on the value of
$\gamma$, there is no single optimal value offering optimal performance, but
rather a certain interval in which it stays largely unchanged.
When examining a number of different codes (figure
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
landscape of the graph depends on the code, the general behaviour is the same
in each case.
%
\begin{figure}[h]
\centering
@ -485,11 +477,15 @@ in each case.
\cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
}%
%
\noindent This indicates that while the choice of the parameter $\gamma$
This indicates that while the choice of the parameter $\gamma$
significantly affects the decoding performance, there is not much benefit
attainable in undertaking an extensive search for an exact optimum.
Rather, a preliminary examination providing a rough window for $\gamma$ may
be sufficient.
When examining a number of different codes (figure
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
landscape of the graph depends on the code, the general behaviour is the same
in each case.
The parameter $\gamma$ describes the step-size for the optimization step
dealing with the code-constraint polynomial;
@ -497,10 +493,12 @@ the parameter $\omega$ describes the step-size for the step dealing with the
negative-log likelihood.
The relationship between $\omega$ and $\gamma$ is portrayed in figure
\ref{fig:prox:gamma_omega}.
The color of each cell indicates the \ac{BER} when the corresponding values
are chosen for the parameters.
The \ac{SNR} is kept constant at $\SI{4}{dB}$.
Similar behaviour to $\gamma$ is exhibited: the \ac{BER} is minimized when
keeping the value within certain bounds, without displaying a clear
optimum.
The \ac{BER} exhibits similar behaviour in its dependency on $\omega$ and
on $\gamma$: it is minimized when keeping the value within certain
bounds, without displaying a single clear optimum.
It is noteworthy that the decoder seems to achieve the best performance for
similar values of the two step sizes.
Again, this consideration applies to a multitude of different codes, as
@ -552,19 +550,21 @@ depicted in figure \ref{fig:prox:gamma_omega_multiple}.
To better understand how to determine the optimal value for the parameter $K$,
the average error is inspected.
This time $\gamma$ and $\omega$ are held constant and the average error is
observed during each iteration of the decoding process for a number of
different \acp{SNR}.
The plots have been generated by averaging the error over $\SI{500000}{}$ decodings.
This time $\gamma$ and $\omega$ are held constant at $0.05$ and the average
error is observed during each iteration of the decoding process, for a number
of different \acp{SNR}.
The plots have been generated by averaging the error over $\SI{500000}{}$
decodings.
As some decodings go one for more iterations than others, the number of values
which are averaged for each datapoints vary.
This explains the dip visible in all curves around $k=20$, since after
this point more and more correct decodings stop iterating,
this point more and more correct decodings are completed,
leaving more and more faulty ones to be averaged.
A this point the decline in the average error stagnates, rendering an
increase in $K$ counterproductive as it only raises the average timing
requirements of the decoding process.
The higher the \ac{SNR}, the fewer decodings are present at each iteration
Additionally, at this point the decline in the average error stagnates,
rendering an increase in $K$ counterproductive as it only raises the average
timing requirements of the decoding process.
Another aspect to consider is that the higher the \ac{SNR}, the fewer
decodings are present at each iteration
to average, since a solution is found earlier.
This explains the decreasing smootheness of the lines as the \ac{SNR} rises.
Remarkably, the \ac{SNR} seems to not have any impact on the number of
@ -740,9 +740,6 @@ performance.
The decoding failure rate closely resembles the \ac{FER}, suggesting that
the frame errors may largely be attributed to decoding failures.
\todo{Maybe reference to the structure of the algorithm (1 part likelihood
1 part constraints)}
\subsection{Convergence Properties}
\label{subsec:prox:conv_properties}