diff --git a/latex/thesis/chapters/proximal_decoding.tex b/latex/thesis/chapters/proximal_decoding.tex index 90c522e..7002f20 100644 --- a/latex/thesis/chapters/proximal_decoding.tex +++ b/latex/thesis/chapters/proximal_decoding.tex @@ -263,7 +263,8 @@ It was subsequently reimplemented in C++ using the Eigen% \footnote{\url{https://eigen.tuxfamily.org}} linear algebra library to achieve higher performance. The focus has been set on a fast implementation, sometimes at the expense of -memory usage. +memory usage, somewhat limiting the size of the codes the implemenation can be +used with \todo{Is this a appropriate for a bachelor's thesis?}. The evaluation of the simulation results has been wholly realized in Python. The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper} @@ -859,8 +860,7 @@ the frame errors may largely be attributed to decoding failures. The previous observation, that the \ac{FER} arises mainly due to the non-convergence of the algorithm instead of convergence to the wrong codeword, raises the question why the decoding process does not converge so often. -In figure \ref{fig:prox:convergence}, the iterative process is visualized -for each iteration. +In figure \ref{fig:prox:convergence}, the iterative process is visualized. In order to be able to simultaneously consider all components of the vectors being dealt with, a BCH code with $n=7$ and $k=4$ is chosen. Each chart shows one component of the current estimates during a given @@ -1076,7 +1076,8 @@ As such, the constraints are not being satisfied and the estimate is not converging towards a valid codeword. While figure \ref{fig:prox:convergence} shows only one instance of a decoding -task, it is indicative of the general behaviour of the algorithm. +task, with no statistical significance, it is indicative of the general +behaviour of the algorithm. This can be justified by looking at the gradients themselves. In figure \ref{fig:prox:gradients} the gradients of the negative log-likelihood and the code-constraint polynomial for a repetition code with @@ -1089,8 +1090,8 @@ estimate in opposing directions, leading to an oscillation as illustrated in figure \ref{fig:prox:convergence}. Consequently, this oscillation is an intrinsic property of the structure of the proximal decoding algorithm, where the two parts of the objective function -are minimized in an alternating manner using their gradients. - +are minimized in an alternating manner by use of their gradients.% +% \begin{figure}[H] \centering @@ -1127,6 +1128,7 @@ are minimized in an alternating manner using their gradients. \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $ for a repetition code with $n=2$} + \label{fig:prox:gradients:L} \end{subfigure}% \hfill% \begin{subfigure}[c]{0.5\textwidth} @@ -1161,10 +1163,14 @@ are minimized in an alternating manner using their gradients. \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $ for a repetition code with $n=2$} + \label{fig:prox:gradients:h} \end{subfigure}% -\end{figure} - + \caption{Gradiensts of the negative log-likelihood and the code-constraint + polynomial} + \label{fig:prox:gradients} +\end{figure}% +% While the initial net movement is generally directed in the right direction owing to the gradient of the negative log-likelihood, the final oscillation may well take place in a segment of space not corresponding to a valid @@ -1173,17 +1179,34 @@ This also partly explains the difference in decoding performance when looking at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while still yielding an invalid codeword. +The higher the \ac{SNR}, the more likely the gradient of the negative +log-likelihood is to point to a valid codeword. +The common component of the two gradient then pulls the estimate closer to +a valid codeword before the oscillation takes place. +This explains why the decoding performance is so much better for higher +\acp{SNR}. + When considering codes with larger $n$, the behaviour generally stays the same, with some minor differences. In figure \ref{fig:prox:convergence_large_n} the decoding process is visualized for one component of a code with $n=204$, for a single decoding. -The two gradients still start to fight each other and the estimate still -starts to oscillate, the same as illustrated on the basis of figure -\ref{fig:prox:convergence} for a code with $n=7$. +The two gradients still eventually oppose each other and the estimate still +starts to oscillate, the same as illustrated in figure +\ref{fig:prox:convergence} on the basis of a code with $n=7$. However, in this case, the gradient of the code-constraint polynomial iself starts to oscillate, its average value being such that the effect of the gradient of the negative log-likelihood is counteracted. +Looking at figure \ref{fig:prox:gradients:h} it also becomes apparent why the +value of the parameter $\gamma$ has to be kept small, as mentioned in section +\ref{sec:prox:Decoding Algorithm}. +Local minima are introduced between the codewords, in the ares in which it is +not immediately clear which codeword is the most likely one. +Raising the value of $\gamma$ results in +$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the +objective function, thereby introducing these local minima into the objective +function. + In conclusion, as a general rule, the proximal decoding algorithm reaches an oscillatory state which it cannot escape as a consequence of its structure. In this state, the constraints may not be satisfied, leading to the algorithm @@ -1237,8 +1260,8 @@ returning an invalid codeword. \label{sec:prox:Improved Implementation} As mentioned earlier, frame errors seem to mainly stem from decoding failures. -This, coupled with the fact that the \ac{BER} indicates so much better -performance than the \ac{FER}, leads to the assumption that only a small +Coupled with the fact that the \ac{BER} indicates so much better +performance than the \ac{FER}, this leads to the assumption that only a small number of components of the estimated vector may be responsible for an invalid result. If it was possible to limit the number of possibly wrong components of the @@ -1247,13 +1270,66 @@ a limited number of possible results (``ML-in-the-List'' as it will subsequently be called) to improve the decoding performance. This concept is pursued in this section. -\begin{itemize} - \item Decoding performance and comparison with standard proximal decoding - \item Computational performance and comparison with standard proximal decoding - \item Conclusion - \begin{itemize} - \item Summary - \item Up to $\SI{1}{dB}$ gain possible - \end{itemize} -\end{itemize} +First, a guideline has to be found with which to assess the probability that +a given component of an estimate is wrong. +One compelling observation is that the closer an estimate is to being at a +valid codeword, the smaller the magnitude of the gradient of the +code-constraint polynomial, as illustrated in figure \ref{fig:prox:gradients}. +This gives rise to the notion that some property or behaviour of +$\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its +magnitude to the confidence that a given bit is correct. +And indeed, the magnitude of the oscillation of +$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced in a previous +section) and the probability of having a bit error are strongly correlated, +a relationship depicted in figure \ref{fig:prox:correlation}. + +TODO: Figure + +\noindent The y-axis depicts whether there is a bit error and the x-axis the +variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration +$k=100$. While this is not exactly the magnitude of the oscillation, it is +proportional and easier to compute. +The datapoints are taken from a single decoding operation +\todo{Generate same figure with multiple decodings}. + +Using this observation as a rule to determine the $N\in\mathbb{N}$ most +probably wrong bits, all variations of the estimate with those bits modified +can be generated. +An \ac{ML}-in-the-List step can then be performed in order to determine the +most likely candidate. +This process is outlined in figure \ref{fig:prox:improved_algorithm}. + +Figure \ref{fig:prox:improved_results} shows the gain that can be achieved. +Again, three values of gamma are chosen, for which the \ac{BER}, \ac{FER} +and decoding failure rate is plotted. +The simulation results for the original proximal decoding algorithm are shown +with solid lines and the results for the improved version are shown with +dashed lines. +The gain seems to depend on the value of $\gamma$, as well as become more +pronounced for higher \ac{SNR} values. +This is to be expected, since with higher \ac{SNR} values the number of bit +errors decreases, making the correction of those errors in the ML-in-the-List +step more likely. +In figure \ref{fig:prox:improved_results_multiple} the decoding performance +between proximal decoding and the improved algorithm is compared for a number +of different codes. +Similar behaviour can be observed in all cases, with varying improvement over +standard proximal decoding. + +Interestingly, the improved algorithm does not have much different time +complexity than proximal decoding. +This is the case, because the ML-in-the-List step is only performed when the +proximal decoding algorithm produces an invalid result, which in absolute +terms happens relatively infrequently. +This is illustrated in figure \ref{fig:prox:time_complexity_comp}, where the +average time needed to decode a single received frame is visualized for +proximal decoding as well as for the improved algorithm. + +In conclusion, the decoding performance of proximal decoding can be improved +by appending an ML-in-the-List step when the algorithm does not produce a +valid result. +The gain is in some cases as high as $\SI{1}{dB}$ and can be achieved with +negligible computational performance penalty. +The improvement is mainly noticable for higher \ac{SNR} values and depends on +the code as well as the chosen parameters.