Reworked rest of proximal decoding and fixed most figures and captions

This commit is contained in:
Andreas Tsouchlos 2023-04-11 19:42:31 +02:00
parent 5c135e085e
commit 0d4b13ccda

View File

@ -557,8 +557,8 @@ The plots have been generated by averaging the error over $\SI{500000}{}$
decodings.
As some decodings go one for more iterations than others, the number of values
which are averaged for each datapoints vary.
This explains the dip visible in all curves around $k=20$, since after
this point more and more correct decodings are completed,
This explains the dip visible in all curves around the 20th iteration, since
after this point more and more correct decodings are completed,
leaving more and more faulty ones to be averaged.
Additionally, at this point the decline in the average error stagnates,
rendering an increase in $K$ counterproductive as it only raises the average
@ -628,7 +628,7 @@ means to bring about numerical stability.
\subsection{Decoding Performance}
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{tikzpicture}
@ -749,13 +749,14 @@ non-convergence of the algorithm instead of convergence to the wrong codeword,
raises the question why the decoding process does not converge so often.
In figure \ref{fig:prox:convergence}, the iterative process is visualized.
In order to be able to simultaneously consider all components of the vectors
being dealt with, a BCH code with $n=7$ and $k=4$ has been chosen.
Each chart shows one component of the current estimate during a given
iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
Each plot shows one component of the current estimate during a given
iteration ($\boldsymbol{r}$ and $\boldsymbol{s}$ are counted as different
estimates and their values are interwoven to obtain the shown result), as well
as the gradients of the negative log-likelihood and the code-constraint
polynomial, which influence the next estimate.
\begin{figure}[H]
\begin{figure}[h]
\begin{minipage}[c]{0.25\textwidth}
\centering
@ -955,10 +956,10 @@ polynomial, which influence the next estimate.
%
\noindent It is evident that in all cases, past a certain number of
iterations, the estimate starts to oscillate around a particular value.
After a certain point, the two gradients stop further approaching the value
Jointly, the two gradients stop further approaching the value
zero.
In particular, this leads to the code-constraints polynomial not being
minimized.
This leads to the two terms of the objective function an in particular the
code-constraint polynomial not being minimized.
As such, the constraints are not being satisfied and the estimate is not
converging towards a valid codeword.
@ -969,17 +970,28 @@ This can be justified by looking at the gradients themselves.
In figure \ref{fig:prox:gradients} the gradients of the negative
log-likelihood and the code-constraint polynomial for a repetition code with
$n=2$ are shown.
The two valid codewords of the $n=2$ repetition code can be recognized in
figure \ref{fig:prox:gradients:h} as
$\boldsymbol{c}_1 = \begin{bmatrix} -1 & -1 \end{bmatrix} $ and
$\boldsymbol{c}_2 = \begin{bmatrix} 1 & 1 \end{bmatrix}$;
these are also the points producing the global minima of the code-constraint
polynomial.
The gradient of the negative log-likelihood points towards the received
codeword as can be seen in figure \ref{fig:prox:gradients:L},
since assuming \ac{AWGN} and no other information that is the
estimate maximizing the likelihood.
It is obvious that walking along the gradients in an alternating fashion will
produce a net movement in a certain direction, as long as the two gradients
produce a net movement in a certain direction, as long as they
have a common component.
As soon as this common component is exhausted, they will start pulling the
estimate in opposing directions, leading to an oscillation as illustrated
in figure \ref{fig:prox:convergence}.
Consequently, this oscillation is an intrinsic property of the structure of
the proximal decoding algorithm, where the two parts of the objective function
are minimized in an alternating manner by use of their gradients.%
are minimized in an alternating manner by use of their gradients.
%
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{subfigure}[c]{0.5\textwidth}
@ -1058,12 +1070,6 @@ are minimized in an alternating manner by use of their gradients.%
\label{fig:prox:gradients}
\end{figure}%
%
\todo{Better explain what is visible on the two gradient plots: the two valid
codewords (-1, -1) and (1, 1); the section between them where a decision in
which direction to move is difficult; maybe say why the gradient of $L$ points
to one specific point}
While the initial net movement is generally directed in the right direction
owing to the gradient of the negative log-likelihood, the final oscillation
may well take place in a segment of space not corresponding to a valid
@ -1087,9 +1093,9 @@ not immediately clear which codeword is the most likely one.
Raising the value of $\gamma$ results in
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
objective function, thereby introducing these local minima into the objective
function. \todo{Show equation again and explain on the basis of the equation}
function.
When considering codes with larger $n$, the behaviour generally stays the
When considering codes with larger $n$ the behaviour generally stays the
same, with some minor differences.
In figure \ref{fig:prox:convergence_large_n} the decoding process is
visualized for one component of a code with $n=204$, for a single decoding.
@ -1103,9 +1109,10 @@ gradient of the negative log-likelihood is counteracted.
In conclusion, as a general rule, the proximal decoding algorithm reaches
an oscillatory state which it cannot escape as a consequence of its structure.
In this state the constraints may not be satisfied, leading to the algorithm
returning an invalid codeword.
exhausting its maximum number of iterations without converging and returning
an invalid codeword.
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{tikzpicture}
@ -1133,11 +1140,15 @@ returning an invalid codeword.
\end{axis}
\end{tikzpicture}
\caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
\caption{Visualization of a single decoding operation\protect\footnotemark{}
for a code with $n=204$}
\label{fig:prox:convergence_large_n}
\end{figure}%
\todo{Fix captions / footnotes referencing the different codes in all figures}
%
\footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
\cite[\text{204.33.484}]{mackay_enc}; $\gamma=0.05, \omega = 0.05, K=200, \eta=1.5$
}%
%
\subsection{Computational Performance}
@ -1161,16 +1172,16 @@ codes in an \ac{AWGN} channel is $\mathcal{O}\left( n \right)$, which is
practical since it is the same as that of $\ac{BP}$.
This theoretical analysis is also corroborated by the practical results shown
in figure \ref{fig:prox:time_comp}. \todo{Note about no very large $n$ codes being
used due to memory requirements?}
in figure \ref{fig:prox:time_comp}.
Some deviations from linear behaviour are unavoidable because not all codes
considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
according to the same scheme.
\todo{Mention on what hardware the results where generated}
Nontheless, a generally linear relationship between the average time needed to
decode a received frame and the length $n$ of the frame can be observed.
These results were generated on an Intel Core i7-7700HQ 4-core CPU, running at
$\SI{2.80}{GHz}$ and utilizing all cores.
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{tikzpicture}
@ -1186,7 +1197,7 @@ decode a received frame and the length $n$ of the frame can be observed.
\end{axis}
\end{tikzpicture}
\caption{Time requirements of proximal decoding algorithm imlementation%
\caption{Time requirements of the proximal decoding algorithm imlementation%
\protect\footnotemark{}}
\label{fig:prox:time_comp}
\end{figure}%
@ -1224,11 +1235,12 @@ $\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
magnitude to the confidence that a given bit is correct.
And indeed, the magnitude of the oscillation of
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
section \ref{subsec:prox:conv_properties}) and the probability of having a bit
section \ref{subsec:prox:conv_properties} and shown in figure
\ref{fig:prox:convergence_large_n}) and the probability of having a bit
error are strongly correlated, a relationship depicted in figure
\ref{fig:prox:correlation}.
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{tikzpicture}
@ -1249,19 +1261,23 @@ error are strongly correlated, a relationship depicted in figure
\end{axis}
\end{tikzpicture}
\caption{Correlation between bit error and amplitude of oscillation}
\caption{Correlation between the occurrence of a bit error and the
amplitude of oscillation of the gradient of the code-constraint polynomial%
\protect\footnotemark{}}
\label{fig:prox:correlation}
\end{figure}
\todo{Mention that the variance of the oscillation is measured
after a given number of iterations}
\end{figure}%
%
\footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
\cite[\text{204.33.484}]{mackay_enc}; $\gamma = 0.05, \omega = 0.05, K=100, \eta=1.5$
}%
%
\noindent The y-axis depicts whether there is a bit error and the x-axis the
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration
$k=100$. While this is not exactly the magnitude of the oscillation, it is
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ after the
100th iteration.
While this is not exactly the magnitude of the oscillation, it is
proportional and easier to compute.
The datapoints are taken from a single decoding operation
\todo{Generate same figure with multiple decodings}.
Using this observation as a rule to determine the $N\in\mathbb{N}$ most
probably wrong bits, all variations of the estimate with those bits modified
@ -1286,14 +1302,14 @@ for $K$ iterations do
end for
$\textcolor{KITblue}{\text{Find }N\text{ most probably wrong bits}}$
$\textcolor{KITblue}{\text{Generate variations } \boldsymbol{\tilde{c}}_l,\hspace{1mm}
l\in [1:n]\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
l\in \mathbb{N}\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
$\textcolor{KITblue}{\text{Compute }d_H\left( \boldsymbol{ \tilde{c}}_l,
\boldsymbol{\hat{c}} \right) \text{ for all valid codewords } \boldsymbol{\tilde{c}}_l}$
$\textcolor{KITblue}{\text{Output }\boldsymbol{\tilde{c}}_l\text{ with lowest }
d_H\left( \boldsymbol{ \tilde{c}}_l, \boldsymbol{\hat{c}} \right)}$
\end{genericAlgorithm}
\todo{Not hamming distance, correlation}
%\todo{Not hamming distance, correlation}
Figure \ref{fig:prox:improved_results} shows the gain that can be achieved
when the number $N$ is chosen to be 12.
@ -1304,13 +1320,12 @@ with solid lines and the results for the improved version are shown with
dashed lines.
For the case of $\gamma = 0.05$, the number of frame errors produced for the
datapoints at $\SI{6}{dB}$, $\SI{6.5}{dB}$ and $\SI{7}{dB}$ are
70, 17 and 2, respectively. \todo{Redo simulation with higher number of iterations}
70, 17 and 2, respectively.
The gain seems to depend on the value of $\gamma$, as well as become more
pronounced for higher \ac{SNR} values.
This is to be expected, since with higher \ac{SNR} values the number of bit
errors decreases, making the correction of those errors in the ML-in-the-List
step more likely.
In figure \ref{fig:prox:improved:comp} the decoding performance
between proximal decoding and the improved algorithm is compared for a number
of different codes.
@ -1320,8 +1335,9 @@ generate the point for the improved algorithm for $\gamma=0.05$ at
$\SI{5.5}{dB}$.
Similar behaviour can be observed in all cases, with varying improvement over
standard proximal decoding.
In some cases, a gain of up to $\SI{1}{dB}$ and higher can be achieved.
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{tikzpicture}
@ -1459,13 +1475,12 @@ average time needed to decode a single received frame is visualized for
proximal decoding as well as for the improved algorithm.
It should be noted that some variability in the data is to be expected,
since the timing of the actual simulations depends on a multitude of other
parameters such as the outside temperature (because of thermal throttling),
the scheduling choices of the operating system as well as variations in the
implementations themselves.
parameters such as the scheduling choices of the operating system as well as
variations in the implementations themselves.
Nevertheless, the empirical data serves, at least in part, to validate the
theoretical considerations.
\begin{figure}[H]
\begin{figure}[h]
\centering
\begin{tikzpicture}
@ -1501,8 +1516,7 @@ theoretical considerations.
In conclusion, the decoding performance of proximal decoding can be improved
by appending an ML-in-the-List step when the algorithm does not produce a
valid result.
The gain can in some cases be as high as $\SI{1}{dB}$ \todo{Explicitly mention this value earlier}
and is achievable with
The gain can in some cases be as high as $\SI{1}{dB}$ and is achievable with
negligible computational performance penalty.
The improvement is mainly noticable for higher \ac{SNR} values and depends on
the code as well as the chosen parameters.