Reworked proximal decoding up to and including the choice of parameters
This commit is contained in:
parent
46ebd5aedc
commit
5c135e085e
@ -17,7 +17,8 @@ Proximal decoding was proposed by Wadayama et al. as a novel formulation of
|
|||||||
optimization-based decoding \cite{proximal_paper}.
|
optimization-based decoding \cite{proximal_paper}.
|
||||||
With this algorithm, minimization is performed using the proximal gradient
|
With this algorithm, minimization is performed using the proximal gradient
|
||||||
method.
|
method.
|
||||||
In contrast to \ac{LP} decoding, the objective function is based on a
|
In contrast to \ac{LP} decoding, which will be covered in chapter
|
||||||
|
\ref{chapter:lp_dec_using_admm}, the objective function is based on a
|
||||||
non-convex optimization formulation of the \ac{MAP} decoding problem.
|
non-convex optimization formulation of the \ac{MAP} decoding problem.
|
||||||
|
|
||||||
In order to derive the objective function, the authors begin with the
|
In order to derive the objective function, the authors begin with the
|
||||||
@ -121,8 +122,9 @@ and the decoding problem is reformulated to%
|
|||||||
.\end{align*}
|
.\end{align*}
|
||||||
%
|
%
|
||||||
|
|
||||||
For the solution of the approximate \ac{MAP} decoding problem, the two parts
|
For the solution of the approximate \ac{MAP} decoding problem, using the
|
||||||
of equation (\ref{eq:prox:objective_function}) are considered separately:
|
proximal gradient method, the two parts of equation
|
||||||
|
(\ref{eq:prox:objective_function}) are considered separately:
|
||||||
the minimization of the objective function occurs in an alternating
|
the minimization of the objective function occurs in an alternating
|
||||||
fashion, switching between the negative log-likelihood
|
fashion, switching between the negative log-likelihood
|
||||||
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
|
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
|
||||||
@ -140,10 +142,8 @@ descent:%
|
|||||||
.\end{align}%
|
.\end{align}%
|
||||||
%
|
%
|
||||||
For the second step, minimizing the scaled code-constraint polynomial, the
|
For the second step, minimizing the scaled code-constraint polynomial, the
|
||||||
proximal gradient method is used \todo{The proximal gradient method is not
|
\textit{proximal operator} of $\gamma h\left( \tilde{\boldsymbol{x}} \right) $
|
||||||
just used for the second step. It is the name for the alternating iterative process}
|
has to be computed.
|
||||||
and the \textit{proximal operator} of
|
|
||||||
$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed.
|
|
||||||
It is then immediately approximated with gradient-descent:%
|
It is then immediately approximated with gradient-descent:%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
@ -258,7 +258,7 @@ It was subsequently reimplemented in C++ using the Eigen%
|
|||||||
linear algebra library to achieve higher performance.
|
linear algebra library to achieve higher performance.
|
||||||
The focus has been set on a fast implementation, sometimes at the expense of
|
The focus has been set on a fast implementation, sometimes at the expense of
|
||||||
memory usage, somewhat limiting the size of the codes the implemenation can be
|
memory usage, somewhat limiting the size of the codes the implemenation can be
|
||||||
used with \todo{Is this sentence appropriate for a bachelor's thesis?}.
|
used with.
|
||||||
The evaluation of the simulation results has been wholly realized in Python.
|
The evaluation of the simulation results has been wholly realized in Python.
|
||||||
|
|
||||||
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
|
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
|
||||||
@ -309,8 +309,6 @@ matrix-vector multiplication.
|
|||||||
This is beneficial, as the libraries employed for the implementation are
|
This is beneficial, as the libraries employed for the implementation are
|
||||||
heavily optimized for such calculations (e.g., through vectorization of the
|
heavily optimized for such calculations (e.g., through vectorization of the
|
||||||
operations).
|
operations).
|
||||||
\todo{Note about how the equation with which the gradient is calculated is
|
|
||||||
itself similar to a message-passing rule}
|
|
||||||
|
|
||||||
The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
|
The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to
|
||||||
compute, as it amounts to simply clipping each component of the vector onto
|
compute, as it amounts to simply clipping each component of the vector onto
|
||||||
@ -332,15 +330,13 @@ The convergence properties are reviewed and related to the decoding
|
|||||||
performance.
|
performance.
|
||||||
Finally, the computational performance is examined on a theoretical basis
|
Finally, the computational performance is examined on a theoretical basis
|
||||||
as well as on the basis of the implementation completed in the context of this
|
as well as on the basis of the implementation completed in the context of this
|
||||||
work.
|
thesis.
|
||||||
|
|
||||||
All simulation results presented hereafter are based on Monte Carlo
|
All simulation results presented hereafter are based on Monte Carlo
|
||||||
simulations.
|
simulations.
|
||||||
The \ac{BER} and \ac{FER} curves in particular have been generated by
|
The \ac{BER} and \ac{FER} curves in particular have been generated by
|
||||||
producing at least 100 frame-errors for each data point, unless otherwise
|
producing at least 100 frame-errors for each data point, unless otherwise
|
||||||
stated.
|
stated.
|
||||||
\todo{Mention number of datapoints from which each graph was created for
|
|
||||||
non ber and fer curves}
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Choice of Parameters}
|
\subsection{Choice of Parameters}
|
||||||
@ -418,9 +414,9 @@ while the newly generated ones are shown with dashed lines.
|
|||||||
\noindent It is noticeable that for a moderately chosen value of $\gamma$
|
\noindent It is noticeable that for a moderately chosen value of $\gamma$
|
||||||
($\gamma = 0.05$) the decoding performance is better than for low
|
($\gamma = 0.05$) the decoding performance is better than for low
|
||||||
($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
|
($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
|
||||||
The question arises if there is some optimal value maximazing the decoding
|
The question arises whether there is some optimal value maximazing the decoding
|
||||||
performance, especially since it seems to dramatically depend on $\gamma$.
|
performance, especially since it seems to dramatically depend on $\gamma$.
|
||||||
To better understand how $\gamma$ and the decoding performance are
|
To better understand how they are
|
||||||
related, figure \ref{fig:prox:results} was recreated, but with a considerably
|
related, figure \ref{fig:prox:results} was recreated, but with a considerably
|
||||||
larger selection of values for $\gamma$.
|
larger selection of values for $\gamma$.
|
||||||
In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
|
In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of
|
||||||
@ -431,11 +427,7 @@ The previously shown results are highlighted.
|
|||||||
Evidently, while the decoding performance does depend on the value of
|
Evidently, while the decoding performance does depend on the value of
|
||||||
$\gamma$, there is no single optimal value offering optimal performance, but
|
$\gamma$, there is no single optimal value offering optimal performance, but
|
||||||
rather a certain interval in which it stays largely unchanged.
|
rather a certain interval in which it stays largely unchanged.
|
||||||
When examining a number of different codes (figure
|
%
|
||||||
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
|
|
||||||
landscape of the graph depends on the code, the general behaviour is the same
|
|
||||||
in each case.
|
|
||||||
|
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
@ -485,11 +477,15 @@ in each case.
|
|||||||
\cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
|
\cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$
|
||||||
}%
|
}%
|
||||||
%
|
%
|
||||||
\noindent This indicates that while the choice of the parameter $\gamma$
|
This indicates that while the choice of the parameter $\gamma$
|
||||||
significantly affects the decoding performance, there is not much benefit
|
significantly affects the decoding performance, there is not much benefit
|
||||||
attainable in undertaking an extensive search for an exact optimum.
|
attainable in undertaking an extensive search for an exact optimum.
|
||||||
Rather, a preliminary examination providing a rough window for $\gamma$ may
|
Rather, a preliminary examination providing a rough window for $\gamma$ may
|
||||||
be sufficient.
|
be sufficient.
|
||||||
|
When examining a number of different codes (figure
|
||||||
|
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
|
||||||
|
landscape of the graph depends on the code, the general behaviour is the same
|
||||||
|
in each case.
|
||||||
|
|
||||||
The parameter $\gamma$ describes the step-size for the optimization step
|
The parameter $\gamma$ describes the step-size for the optimization step
|
||||||
dealing with the code-constraint polynomial;
|
dealing with the code-constraint polynomial;
|
||||||
@ -497,10 +493,12 @@ the parameter $\omega$ describes the step-size for the step dealing with the
|
|||||||
negative-log likelihood.
|
negative-log likelihood.
|
||||||
The relationship between $\omega$ and $\gamma$ is portrayed in figure
|
The relationship between $\omega$ and $\gamma$ is portrayed in figure
|
||||||
\ref{fig:prox:gamma_omega}.
|
\ref{fig:prox:gamma_omega}.
|
||||||
|
The color of each cell indicates the \ac{BER} when the corresponding values
|
||||||
|
are chosen for the parameters.
|
||||||
The \ac{SNR} is kept constant at $\SI{4}{dB}$.
|
The \ac{SNR} is kept constant at $\SI{4}{dB}$.
|
||||||
Similar behaviour to $\gamma$ is exhibited: the \ac{BER} is minimized when
|
The \ac{BER} exhibits similar behaviour in its dependency on $\omega$ and
|
||||||
keeping the value within certain bounds, without displaying a clear
|
on $\gamma$: it is minimized when keeping the value within certain
|
||||||
optimum.
|
bounds, without displaying a single clear optimum.
|
||||||
It is noteworthy that the decoder seems to achieve the best performance for
|
It is noteworthy that the decoder seems to achieve the best performance for
|
||||||
similar values of the two step sizes.
|
similar values of the two step sizes.
|
||||||
Again, this consideration applies to a multitude of different codes, as
|
Again, this consideration applies to a multitude of different codes, as
|
||||||
@ -552,19 +550,21 @@ depicted in figure \ref{fig:prox:gamma_omega_multiple}.
|
|||||||
|
|
||||||
To better understand how to determine the optimal value for the parameter $K$,
|
To better understand how to determine the optimal value for the parameter $K$,
|
||||||
the average error is inspected.
|
the average error is inspected.
|
||||||
This time $\gamma$ and $\omega$ are held constant and the average error is
|
This time $\gamma$ and $\omega$ are held constant at $0.05$ and the average
|
||||||
observed during each iteration of the decoding process for a number of
|
error is observed during each iteration of the decoding process, for a number
|
||||||
different \acp{SNR}.
|
of different \acp{SNR}.
|
||||||
The plots have been generated by averaging the error over $\SI{500000}{}$ decodings.
|
The plots have been generated by averaging the error over $\SI{500000}{}$
|
||||||
|
decodings.
|
||||||
As some decodings go one for more iterations than others, the number of values
|
As some decodings go one for more iterations than others, the number of values
|
||||||
which are averaged for each datapoints vary.
|
which are averaged for each datapoints vary.
|
||||||
This explains the dip visible in all curves around $k=20$, since after
|
This explains the dip visible in all curves around $k=20$, since after
|
||||||
this point more and more correct decodings stop iterating,
|
this point more and more correct decodings are completed,
|
||||||
leaving more and more faulty ones to be averaged.
|
leaving more and more faulty ones to be averaged.
|
||||||
A this point the decline in the average error stagnates, rendering an
|
Additionally, at this point the decline in the average error stagnates,
|
||||||
increase in $K$ counterproductive as it only raises the average timing
|
rendering an increase in $K$ counterproductive as it only raises the average
|
||||||
requirements of the decoding process.
|
timing requirements of the decoding process.
|
||||||
The higher the \ac{SNR}, the fewer decodings are present at each iteration
|
Another aspect to consider is that the higher the \ac{SNR}, the fewer
|
||||||
|
decodings are present at each iteration
|
||||||
to average, since a solution is found earlier.
|
to average, since a solution is found earlier.
|
||||||
This explains the decreasing smootheness of the lines as the \ac{SNR} rises.
|
This explains the decreasing smootheness of the lines as the \ac{SNR} rises.
|
||||||
Remarkably, the \ac{SNR} seems to not have any impact on the number of
|
Remarkably, the \ac{SNR} seems to not have any impact on the number of
|
||||||
@ -740,9 +740,6 @@ performance.
|
|||||||
The decoding failure rate closely resembles the \ac{FER}, suggesting that
|
The decoding failure rate closely resembles the \ac{FER}, suggesting that
|
||||||
the frame errors may largely be attributed to decoding failures.
|
the frame errors may largely be attributed to decoding failures.
|
||||||
|
|
||||||
\todo{Maybe reference to the structure of the algorithm (1 part likelihood
|
|
||||||
1 part constraints)}
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Convergence Properties}
|
\subsection{Convergence Properties}
|
||||||
\label{subsec:prox:conv_properties}
|
\label{subsec:prox:conv_properties}
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user