Reworked proximal decoding up to and including the choice of parameters
This commit is contained in:
parent
46ebd5aedc
commit
5c135e085e
@ -17,7 +17,8 @@ Proximal decoding was proposed by Wadayama et al. as a novel formulation of
|
||||
optimization-based decoding \cite{proximal_paper}.
|
||||
With this algorithm, minimization is performed using the proximal gradient
|
||||
method.
|
||||
In contrast to \ac{LP} decoding, the objective function is based on a
|
||||
In contrast to \ac{LP} decoding, which will be covered in chapter
|
||||
\ref{chapter:lp_dec_using_admm}, the objective function is based on a
|
||||
non-convex optimization formulation of the \ac{MAP} decoding problem.
|
||||
|
||||
In order to derive the objective function, the authors begin with the
|
||||
@ -121,8 +122,9 @@ and the decoding problem is reformulated to%
|
||||
.\end{align*}
|
||||
%
|
||||
|
||||
For the solution of the approximate \ac{MAP} decoding problem, the two parts
|
||||
of equation (\ref{eq:prox:objective_function}) are considered separately:
|
||||
For the solution of the approximate \ac{MAP} decoding problem, using the
|
||||
proximal gradient method, the two parts of equation
|
||||
(\ref{eq:prox:objective_function}) are considered separately:
|
||||
the minimization of the objective function occurs in an alternating
|
||||
fashion, switching between the negative log-likelihood
|
||||
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
|
||||
@ -140,10 +142,8 @@ descent:%
|
||||
.\end{align}%
|
||||
%
|
||||
For the second step, minimizing the scaled code-constraint polynomial, the
|
||||
proximal gradient method is used \todo{The proximal gradient method is not
|
||||
just used for the second step. It is the name for the alternating iterative process}
|
||||
and the \textit{proximal operator} of
|
||||
$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed.
|
||||
\textit{proximal operator} of $\gamma h\left( \tilde{\boldsymbol{x}} \right) $
|
||||
has to be computed.
|
||||
It is then immediately approximated with gradient-descent:%
|
||||
%
|
||||
\begin{align*}
|
||||
@ -258,7 +258,7 @@ It was subsequently reimplemented in C++ using the Eigen%
|
||||
linear algebra library to achieve higher performance.
|
||||
The focus has been set on a fast implementation, sometimes at the expense of
|
||||
memory usage, somewhat limiting the size of the codes the implemenation can be
|
||||
used with \todo{Is this sentence appropriate for a bachelor's thesis?}.
|
||||
used with.
|
||||
The evaluation of the simulation results has been wholly realized in Python.
|
||||
|
||||
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
|
||||
@ -309,8 +309,6 @@ matrix-vector multiplication.
|
||||
This is beneficial, as the libraries employed for the implementation are
|
||||
heavily optimized for such calculations (e.g., through vectorization of the
|
||||
operations).
|
||||
\todo{Note about how the equation with which the gradient is calculated is
|
||||
itself similar to a message-passing rule}
|
||||
|
||||
The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
|
||||
compute, as it amounts to simply clipping each component of the vector onto
|
||||
@ -332,15 +330,13 @@ The convergence properties are reviewed and related to the decoding
|
||||
performance.
|
||||
Finally, the computational performance is examined on a theoretical basis
|
||||
as well as on the basis of the implementation completed in the context of this
|
||||
work.
|
||||
thesis.
|
||||
|
||||
All simulation results presented hereafter are based on Monte Carlo
|
||||
simulations.
|
||||
The \ac{BER} and \ac{FER} curves in particular have been generated by
|
||||
producing at least 100 frame-errors for each data point, unless otherwise
|
||||
stated.
|
||||
\todo{Mention number of datapoints from which each graph was created for
|
||||
non ber and fer curves}
|
||||
|
||||
|
||||
\subsection{Choice of Parameters}
|
||||
@ -418,9 +414,9 @@ while the newly generated ones are shown with dashed lines.
|
||||
\noindent It is noticeable that for a moderately chosen value of $\gamma$
|
||||
($\gamma = 0.05$) the decoding performance is better than for low
|
||||
($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
|
||||
The question arises if there is some optimal value maximazing the decoding
|
||||
The question arises whether there is some optimal value maximazing the decoding
|
||||
performance, especially since it seems to dramatically depend on $\gamma$.
|
||||
To better understand how $\gamma$ and the decoding performance are
|
||||
To better understand how they are
|
||||
related, figure \ref{fig:prox:results} was recreated, but with a considerably
|
||||
larger selection of values for $\gamma$.
|
||||
In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
|
||||
@ -431,11 +427,7 @@ The previously shown results are highlighted.
|
||||
Evidently, while the decoding performance does depend on the value of
|
||||
$\gamma$, there is no single optimal value offering optimal performance, but
|
||||
rather a certain interval in which it stays largely unchanged.
|
||||
When examining a number of different codes (figure
|
||||
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
|
||||
landscape of the graph depends on the code, the general behaviour is the same
|
||||
in each case.
|
||||
|
||||
%
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
|
||||
@ -485,11 +477,15 @@ in each case.
|
||||
\cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
|
||||
}%
|
||||
%
|
||||
\noindent This indicates that while the choice of the parameter $\gamma$
|
||||
This indicates that while the choice of the parameter $\gamma$
|
||||
significantly affects the decoding performance, there is not much benefit
|
||||
attainable in undertaking an extensive search for an exact optimum.
|
||||
Rather, a preliminary examination providing a rough window for $\gamma$ may
|
||||
be sufficient.
|
||||
When examining a number of different codes (figure
|
||||
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
|
||||
landscape of the graph depends on the code, the general behaviour is the same
|
||||
in each case.
|
||||
|
||||
The parameter $\gamma$ describes the step-size for the optimization step
|
||||
dealing with the code-constraint polynomial;
|
||||
@ -497,10 +493,12 @@ the parameter $\omega$ describes the step-size for the step dealing with the
|
||||
negative-log likelihood.
|
||||
The relationship between $\omega$ and $\gamma$ is portrayed in figure
|
||||
\ref{fig:prox:gamma_omega}.
|
||||
The color of each cell indicates the \ac{BER} when the corresponding values
|
||||
are chosen for the parameters.
|
||||
The \ac{SNR} is kept constant at $\SI{4}{dB}$.
|
||||
Similar behaviour to $\gamma$ is exhibited: the \ac{BER} is minimized when
|
||||
keeping the value within certain bounds, without displaying a clear
|
||||
optimum.
|
||||
The \ac{BER} exhibits similar behaviour in its dependency on $\omega$ and
|
||||
on $\gamma$: it is minimized when keeping the value within certain
|
||||
bounds, without displaying a single clear optimum.
|
||||
It is noteworthy that the decoder seems to achieve the best performance for
|
||||
similar values of the two step sizes.
|
||||
Again, this consideration applies to a multitude of different codes, as
|
||||
@ -552,19 +550,21 @@ depicted in figure \ref{fig:prox:gamma_omega_multiple}.
|
||||
|
||||
To better understand how to determine the optimal value for the parameter $K$,
|
||||
the average error is inspected.
|
||||
This time $\gamma$ and $\omega$ are held constant and the average error is
|
||||
observed during each iteration of the decoding process for a number of
|
||||
different \acp{SNR}.
|
||||
The plots have been generated by averaging the error over $\SI{500000}{}$ decodings.
|
||||
This time $\gamma$ and $\omega$ are held constant at $0.05$ and the average
|
||||
error is observed during each iteration of the decoding process, for a number
|
||||
of different \acp{SNR}.
|
||||
The plots have been generated by averaging the error over $\SI{500000}{}$
|
||||
decodings.
|
||||
As some decodings go one for more iterations than others, the number of values
|
||||
which are averaged for each datapoints vary.
|
||||
This explains the dip visible in all curves around $k=20$, since after
|
||||
this point more and more correct decodings stop iterating,
|
||||
this point more and more correct decodings are completed,
|
||||
leaving more and more faulty ones to be averaged.
|
||||
A this point the decline in the average error stagnates, rendering an
|
||||
increase in $K$ counterproductive as it only raises the average timing
|
||||
requirements of the decoding process.
|
||||
The higher the \ac{SNR}, the fewer decodings are present at each iteration
|
||||
Additionally, at this point the decline in the average error stagnates,
|
||||
rendering an increase in $K$ counterproductive as it only raises the average
|
||||
timing requirements of the decoding process.
|
||||
Another aspect to consider is that the higher the \ac{SNR}, the fewer
|
||||
decodings are present at each iteration
|
||||
to average, since a solution is found earlier.
|
||||
This explains the decreasing smootheness of the lines as the \ac{SNR} rises.
|
||||
Remarkably, the \ac{SNR} seems to not have any impact on the number of
|
||||
@ -740,9 +740,6 @@ performance.
|
||||
The decoding failure rate closely resembles the \ac{FER}, suggesting that
|
||||
the frame errors may largely be attributed to decoding failures.
|
||||
|
||||
\todo{Maybe reference to the structure of the algorithm (1 part likelihood
|
||||
1 part constraints)}
|
||||
|
||||
|
||||
\subsection{Convergence Properties}
|
||||
\label{subsec:prox:conv_properties}
|
||||
|
||||
Loading…
Reference in New Issue
Block a user