diff --git a/latex/thesis/chapters/proximal_decoding.tex b/latex/thesis/chapters/proximal_decoding.tex index 879a285..edc79a9 100644 --- a/latex/thesis/chapters/proximal_decoding.tex +++ b/latex/thesis/chapters/proximal_decoding.tex @@ -17,7 +17,8 @@ Proximal decoding was proposed by Wadayama et al. as a novel formulation of optimization-based decoding \cite{proximal_paper}. With this algorithm, minimization is performed using the proximal gradient method. -In contrast to \ac{LP} decoding, the objective function is based on a +In contrast to \ac{LP} decoding, which will be covered in chapter +\ref{chapter:lp_dec_using_admm}, the objective function is based on a non-convex optimization formulation of the \ac{MAP} decoding problem. In order to derive the objective function, the authors begin with the @@ -121,8 +122,9 @@ and the decoding problem is reformulated to% .\end{align*} % -For the solution of the approximate \ac{MAP} decoding problem, the two parts -of equation (\ref{eq:prox:objective_function}) are considered separately: +For the solution of the approximate \ac{MAP} decoding problem, using the +proximal gradient method, the two parts of equation +(\ref{eq:prox:objective_function}) are considered separately: the minimization of the objective function occurs in an alternating fashion, switching between the negative log-likelihood $L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled @@ -140,10 +142,8 @@ descent:% .\end{align}% % For the second step, minimizing the scaled code-constraint polynomial, the -proximal gradient method is used \todo{The proximal gradient method is not -just used for the second step. It is the name for the alternating iterative process} -and the \textit{proximal operator} of -$\gamma h\left( \tilde{\boldsymbol{x}} \right) $ has to be computed. +\textit{proximal operator} of $\gamma h\left( \tilde{\boldsymbol{x}} \right) $ +has to be computed. It is then immediately approximated with gradient-descent:% % \begin{align*} @@ -258,7 +258,7 @@ It was subsequently reimplemented in C++ using the Eigen% linear algebra library to achieve higher performance. The focus has been set on a fast implementation, sometimes at the expense of memory usage, somewhat limiting the size of the codes the implemenation can be -used with \todo{Is this sentence appropriate for a bachelor's thesis?}. +used with. The evaluation of the simulation results has been wholly realized in Python. The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper} @@ -309,8 +309,6 @@ matrix-vector multiplication. This is beneficial, as the libraries employed for the implementation are heavily optimized for such calculations (e.g., through vectorization of the operations). -\todo{Note about how the equation with which the gradient is calculated is -itself similar to a message-passing rule} The projection $\prod_{\eta}\left( . \right)$ also proves straightforward to compute, as it amounts to simply clipping each component of the vector onto @@ -332,15 +330,13 @@ The convergence properties are reviewed and related to the decoding performance. Finally, the computational performance is examined on a theoretical basis as well as on the basis of the implementation completed in the context of this -work. +thesis. All simulation results presented hereafter are based on Monte Carlo simulations. The \ac{BER} and \ac{FER} curves in particular have been generated by producing at least 100 frame-errors for each data point, unless otherwise stated. -\todo{Mention number of datapoints from which each graph was created for -non ber and fer curves} \subsection{Choice of Parameters} @@ -418,9 +414,9 @@ while the newly generated ones are shown with dashed lines. \noindent It is noticeable that for a moderately chosen value of $\gamma$ ($\gamma = 0.05$) the decoding performance is better than for low ($\gamma = 0.01$) or high ($\gamma = 0.15$) values. -The question arises if there is some optimal value maximazing the decoding +The question arises whether there is some optimal value maximazing the decoding performance, especially since it seems to dramatically depend on $\gamma$. -To better understand how $\gamma$ and the decoding performance are +To better understand how they are related, figure \ref{fig:prox:results} was recreated, but with a considerably larger selection of values for $\gamma$. In this new graph, shown in figure \ref{fig:prox:results_3d}, instead of @@ -431,11 +427,7 @@ The previously shown results are highlighted. Evidently, while the decoding performance does depend on the value of $\gamma$, there is no single optimal value offering optimal performance, but rather a certain interval in which it stays largely unchanged. -When examining a number of different codes (figure -\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact -landscape of the graph depends on the code, the general behaviour is the same -in each case. - +% \begin{figure}[h] \centering @@ -485,11 +477,15 @@ in each case. \cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=200, \eta=1.5$ }% % -\noindent This indicates that while the choice of the parameter $\gamma$ +This indicates that while the choice of the parameter $\gamma$ significantly affects the decoding performance, there is not much benefit attainable in undertaking an extensive search for an exact optimum. Rather, a preliminary examination providing a rough window for $\gamma$ may be sufficient. +When examining a number of different codes (figure +\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact +landscape of the graph depends on the code, the general behaviour is the same +in each case. The parameter $\gamma$ describes the step-size for the optimization step dealing with the code-constraint polynomial; @@ -497,10 +493,12 @@ the parameter $\omega$ describes the step-size for the step dealing with the negative-log likelihood. The relationship between $\omega$ and $\gamma$ is portrayed in figure \ref{fig:prox:gamma_omega}. +The color of each cell indicates the \ac{BER} when the corresponding values +are chosen for the parameters. The \ac{SNR} is kept constant at $\SI{4}{dB}$. -Similar behaviour to $\gamma$ is exhibited: the \ac{BER} is minimized when -keeping the value within certain bounds, without displaying a clear -optimum. +The \ac{BER} exhibits similar behaviour in its dependency on $\omega$ and +on $\gamma$: it is minimized when keeping the value within certain +bounds, without displaying a single clear optimum. It is noteworthy that the decoder seems to achieve the best performance for similar values of the two step sizes. Again, this consideration applies to a multitude of different codes, as @@ -552,19 +550,21 @@ depicted in figure \ref{fig:prox:gamma_omega_multiple}. To better understand how to determine the optimal value for the parameter $K$, the average error is inspected. -This time $\gamma$ and $\omega$ are held constant and the average error is -observed during each iteration of the decoding process for a number of -different \acp{SNR}. -The plots have been generated by averaging the error over $\SI{500000}{}$ decodings. +This time $\gamma$ and $\omega$ are held constant at $0.05$ and the average +error is observed during each iteration of the decoding process, for a number +of different \acp{SNR}. +The plots have been generated by averaging the error over $\SI{500000}{}$ +decodings. As some decodings go one for more iterations than others, the number of values which are averaged for each datapoints vary. This explains the dip visible in all curves around $k=20$, since after -this point more and more correct decodings stop iterating, +this point more and more correct decodings are completed, leaving more and more faulty ones to be averaged. -A this point the decline in the average error stagnates, rendering an -increase in $K$ counterproductive as it only raises the average timing -requirements of the decoding process. -The higher the \ac{SNR}, the fewer decodings are present at each iteration +Additionally, at this point the decline in the average error stagnates, +rendering an increase in $K$ counterproductive as it only raises the average +timing requirements of the decoding process. +Another aspect to consider is that the higher the \ac{SNR}, the fewer +decodings are present at each iteration to average, since a solution is found earlier. This explains the decreasing smootheness of the lines as the \ac{SNR} rises. Remarkably, the \ac{SNR} seems to not have any impact on the number of @@ -740,9 +740,6 @@ performance. The decoding failure rate closely resembles the \ac{FER}, suggesting that the frame errors may largely be attributed to decoding failures. -\todo{Maybe reference to the structure of the algorithm (1 part likelihood -1 part constraints)} - \subsection{Convergence Properties} \label{subsec:prox:conv_properties}