Minor wording changes; Added paragraphs in proximal improvement section

This commit is contained in:
Andreas Tsouchlos 2023-04-09 15:32:38 +02:00
parent 1632c0744e
commit 3b4e0e885f

View File

@ -263,7 +263,8 @@ It was subsequently reimplemented in C++ using the Eigen%
\footnote{\url{https://eigen.tuxfamily.org}}
linear algebra library to achieve higher performance.
The focus has been set on a fast implementation, sometimes at the expense of
memory usage.
memory usage, somewhat limiting the size of the codes the implemenation can be
used with \todo{Is this a appropriate for a bachelor's thesis?}.
The evaluation of the simulation results has been wholly realized in Python.
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
@ -859,8 +860,7 @@ the frame errors may largely be attributed to decoding failures.
The previous observation, that the \ac{FER} arises mainly due to the
non-convergence of the algorithm instead of convergence to the wrong codeword,
raises the question why the decoding process does not converge so often.
In figure \ref{fig:prox:convergence}, the iterative process is visualized
for each iteration.
In figure \ref{fig:prox:convergence}, the iterative process is visualized.
In order to be able to simultaneously consider all components of the vectors
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
Each chart shows one component of the current estimates during a given
@ -1076,7 +1076,8 @@ As such, the constraints are not being satisfied and the estimate is not
converging towards a valid codeword.
While figure \ref{fig:prox:convergence} shows only one instance of a decoding
task, it is indicative of the general behaviour of the algorithm.
task, with no statistical significance, it is indicative of the general
behaviour of the algorithm.
This can be justified by looking at the gradients themselves.
In figure \ref{fig:prox:gradients} the gradients of the negative
log-likelihood and the code-constraint polynomial for a repetition code with
@ -1089,8 +1090,8 @@ estimate in opposing directions, leading to an oscillation as illustrated
in figure \ref{fig:prox:convergence}.
Consequently, this oscillation is an intrinsic property of the structure of
the proximal decoding algorithm, where the two parts of the objective function
are minimized in an alternating manner using their gradients.
are minimized in an alternating manner by use of their gradients.%
%
\begin{figure}[H]
\centering
@ -1127,6 +1128,7 @@ are minimized in an alternating manner using their gradients.
\caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $
for a repetition code with $n=2$}
\label{fig:prox:gradients:L}
\end{subfigure}%
\hfill%
\begin{subfigure}[c]{0.5\textwidth}
@ -1161,10 +1163,14 @@ are minimized in an alternating manner using their gradients.
\caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $
for a repetition code with $n=2$}
\label{fig:prox:gradients:h}
\end{subfigure}%
\end{figure}
\caption{Gradiensts of the negative log-likelihood and the code-constraint
polynomial}
\label{fig:prox:gradients}
\end{figure}%
%
While the initial net movement is generally directed in the right direction
owing to the gradient of the negative log-likelihood, the final oscillation
may well take place in a segment of space not corresponding to a valid
@ -1173,17 +1179,34 @@ This also partly explains the difference in decoding performance when looking
at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while
still yielding an invalid codeword.
The higher the \ac{SNR}, the more likely the gradient of the negative
log-likelihood is to point to a valid codeword.
The common component of the two gradient then pulls the estimate closer to
a valid codeword before the oscillation takes place.
This explains why the decoding performance is so much better for higher
\acp{SNR}.
When considering codes with larger $n$, the behaviour generally stays the
same, with some minor differences.
In figure \ref{fig:prox:convergence_large_n} the decoding process is
visualized for one component of a code with $n=204$, for a single decoding.
The two gradients still start to fight each other and the estimate still
starts to oscillate, the same as illustrated on the basis of figure
\ref{fig:prox:convergence} for a code with $n=7$.
The two gradients still eventually oppose each other and the estimate still
starts to oscillate, the same as illustrated in figure
\ref{fig:prox:convergence} on the basis of a code with $n=7$.
However, in this case, the gradient of the code-constraint polynomial iself
starts to oscillate, its average value being such that the effect of the
gradient of the negative log-likelihood is counteracted.
Looking at figure \ref{fig:prox:gradients:h} it also becomes apparent why the
value of the parameter $\gamma$ has to be kept small, as mentioned in section
\ref{sec:prox:Decoding Algorithm}.
Local minima are introduced between the codewords, in the ares in which it is
not immediately clear which codeword is the most likely one.
Raising the value of $\gamma$ results in
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
objective function, thereby introducing these local minima into the objective
function.
In conclusion, as a general rule, the proximal decoding algorithm reaches
an oscillatory state which it cannot escape as a consequence of its structure.
In this state, the constraints may not be satisfied, leading to the algorithm
@ -1237,8 +1260,8 @@ returning an invalid codeword.
\label{sec:prox:Improved Implementation}
As mentioned earlier, frame errors seem to mainly stem from decoding failures.
This, coupled with the fact that the \ac{BER} indicates so much better
performance than the \ac{FER}, leads to the assumption that only a small
Coupled with the fact that the \ac{BER} indicates so much better
performance than the \ac{FER}, this leads to the assumption that only a small
number of components of the estimated vector may be responsible for an invalid
result.
If it was possible to limit the number of possibly wrong components of the
@ -1247,13 +1270,66 @@ a limited number of possible results (``ML-in-the-List'' as it will
subsequently be called) to improve the decoding performance.
This concept is pursued in this section.
\begin{itemize}
\item Decoding performance and comparison with standard proximal decoding
\item Computational performance and comparison with standard proximal decoding
\item Conclusion
\begin{itemize}
\item Summary
\item Up to $\SI{1}{dB}$ gain possible
\end{itemize}
\end{itemize}
First, a guideline has to be found with which to assess the probability that
a given component of an estimate is wrong.
One compelling observation is that the closer an estimate is to being at a
valid codeword, the smaller the magnitude of the gradient of the
code-constraint polynomial, as illustrated in figure \ref{fig:prox:gradients}.
This gives rise to the notion that some property or behaviour of
$\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
magnitude to the confidence that a given bit is correct.
And indeed, the magnitude of the oscillation of
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced in a previous
section) and the probability of having a bit error are strongly correlated,
a relationship depicted in figure \ref{fig:prox:correlation}.
TODO: Figure
\noindent The y-axis depicts whether there is a bit error and the x-axis the
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration
$k=100$. While this is not exactly the magnitude of the oscillation, it is
proportional and easier to compute.
The datapoints are taken from a single decoding operation
\todo{Generate same figure with multiple decodings}.
Using this observation as a rule to determine the $N\in\mathbb{N}$ most
probably wrong bits, all variations of the estimate with those bits modified
can be generated.
An \ac{ML}-in-the-List step can then be performed in order to determine the
most likely candidate.
This process is outlined in figure \ref{fig:prox:improved_algorithm}.
Figure \ref{fig:prox:improved_results} shows the gain that can be achieved.
Again, three values of gamma are chosen, for which the \ac{BER}, \ac{FER}
and decoding failure rate is plotted.
The simulation results for the original proximal decoding algorithm are shown
with solid lines and the results for the improved version are shown with
dashed lines.
The gain seems to depend on the value of $\gamma$, as well as become more
pronounced for higher \ac{SNR} values.
This is to be expected, since with higher \ac{SNR} values the number of bit
errors decreases, making the correction of those errors in the ML-in-the-List
step more likely.
In figure \ref{fig:prox:improved_results_multiple} the decoding performance
between proximal decoding and the improved algorithm is compared for a number
of different codes.
Similar behaviour can be observed in all cases, with varying improvement over
standard proximal decoding.
Interestingly, the improved algorithm does not have much different time
complexity than proximal decoding.
This is the case, because the ML-in-the-List step is only performed when the
proximal decoding algorithm produces an invalid result, which in absolute
terms happens relatively infrequently.
This is illustrated in figure \ref{fig:prox:time_complexity_comp}, where the
average time needed to decode a single received frame is visualized for
proximal decoding as well as for the improved algorithm.
In conclusion, the decoding performance of proximal decoding can be improved
by appending an ML-in-the-List step when the algorithm does not produce a
valid result.
The gain is in some cases as high as $\SI{1}{dB}$ and can be achieved with
negligible computational performance penalty.
The improvement is mainly noticable for higher \ac{SNR} values and depends on
the code as well as the chosen parameters.