Reworked rest of proximal decoding and fixed most figures and captions

This commit is contained in:
Andreas Tsouchlos 2023-04-11 19:42:31 +02:00
parent 5c135e085e
commit 0d4b13ccda

View File

@ -557,8 +557,8 @@ The plots have been generated by averaging the error over $\SI{500000}{}$
decodings. decodings.
As some decodings go one for more iterations than others, the number of values As some decodings go one for more iterations than others, the number of values
which are averaged for each datapoints vary. which are averaged for each datapoints vary.
This explains the dip visible in all curves around $k=20$, since after This explains the dip visible in all curves around the 20th iteration, since
this point more and more correct decodings are completed, after this point more and more correct decodings are completed,
leaving more and more faulty ones to be averaged. leaving more and more faulty ones to be averaged.
Additionally, at this point the decline in the average error stagnates, Additionally, at this point the decline in the average error stagnates,
rendering an increase in $K$ counterproductive as it only raises the average rendering an increase in $K$ counterproductive as it only raises the average
@ -628,7 +628,7 @@ means to bring about numerical stability.
\subsection{Decoding Performance} \subsection{Decoding Performance}
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
@ -749,13 +749,14 @@ non-convergence of the algorithm instead of convergence to the wrong codeword,
raises the question why the decoding process does not converge so often. raises the question why the decoding process does not converge so often.
In figure \ref{fig:prox:convergence}, the iterative process is visualized. In figure \ref{fig:prox:convergence}, the iterative process is visualized.
In order to be able to simultaneously consider all components of the vectors In order to be able to simultaneously consider all components of the vectors
being dealt with, a BCH code with $n=7$ and $k=4$ has been chosen. being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
Each chart shows one component of the current estimate during a given Each plot shows one component of the current estimate during a given
iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well iteration ($\boldsymbol{r}$ and $\boldsymbol{s}$ are counted as different
estimates and their values are interwoven to obtain the shown result), as well
as the gradients of the negative log-likelihood and the code-constraint as the gradients of the negative log-likelihood and the code-constraint
polynomial, which influence the next estimate. polynomial, which influence the next estimate.
\begin{figure}[H] \begin{figure}[h]
\begin{minipage}[c]{0.25\textwidth} \begin{minipage}[c]{0.25\textwidth}
\centering \centering
@ -955,10 +956,10 @@ polynomial, which influence the next estimate.
% %
\noindent It is evident that in all cases, past a certain number of \noindent It is evident that in all cases, past a certain number of
iterations, the estimate starts to oscillate around a particular value. iterations, the estimate starts to oscillate around a particular value.
After a certain point, the two gradients stop further approaching the value Jointly, the two gradients stop further approaching the value
zero. zero.
In particular, this leads to the code-constraints polynomial not being This leads to the two terms of the objective function an in particular the
minimized. code-constraint polynomial not being minimized.
As such, the constraints are not being satisfied and the estimate is not As such, the constraints are not being satisfied and the estimate is not
converging towards a valid codeword. converging towards a valid codeword.
@ -969,17 +970,28 @@ This can be justified by looking at the gradients themselves.
In figure \ref{fig:prox:gradients} the gradients of the negative In figure \ref{fig:prox:gradients} the gradients of the negative
log-likelihood and the code-constraint polynomial for a repetition code with log-likelihood and the code-constraint polynomial for a repetition code with
$n=2$ are shown. $n=2$ are shown.
The two valid codewords of the $n=2$ repetition code can be recognized in
figure \ref{fig:prox:gradients:h} as
$\boldsymbol{c}_1 = \begin{bmatrix} -1 & -1 \end{bmatrix} $ and
$\boldsymbol{c}_2 = \begin{bmatrix} 1 & 1 \end{bmatrix}$;
these are also the points producing the global minima of the code-constraint
polynomial.
The gradient of the negative log-likelihood points towards the received
codeword as can be seen in figure \ref{fig:prox:gradients:L},
since assuming \ac{AWGN} and no other information that is the
estimate maximizing the likelihood.
It is obvious that walking along the gradients in an alternating fashion will It is obvious that walking along the gradients in an alternating fashion will
produce a net movement in a certain direction, as long as the two gradients produce a net movement in a certain direction, as long as they
have a common component. have a common component.
As soon as this common component is exhausted, they will start pulling the As soon as this common component is exhausted, they will start pulling the
estimate in opposing directions, leading to an oscillation as illustrated estimate in opposing directions, leading to an oscillation as illustrated
in figure \ref{fig:prox:convergence}. in figure \ref{fig:prox:convergence}.
Consequently, this oscillation is an intrinsic property of the structure of Consequently, this oscillation is an intrinsic property of the structure of
the proximal decoding algorithm, where the two parts of the objective function the proximal decoding algorithm, where the two parts of the objective function
are minimized in an alternating manner by use of their gradients.% are minimized in an alternating manner by use of their gradients.
% %
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{subfigure}[c]{0.5\textwidth} \begin{subfigure}[c]{0.5\textwidth}
@ -1058,12 +1070,6 @@ are minimized in an alternating manner by use of their gradients.%
\label{fig:prox:gradients} \label{fig:prox:gradients}
\end{figure}% \end{figure}%
% %
\todo{Better explain what is visible on the two gradient plots: the two valid
codewords (-1, -1) and (1, 1); the section between them where a decision in
which direction to move is difficult; maybe say why the gradient of $L$ points
to one specific point}
While the initial net movement is generally directed in the right direction While the initial net movement is generally directed in the right direction
owing to the gradient of the negative log-likelihood, the final oscillation owing to the gradient of the negative log-likelihood, the final oscillation
may well take place in a segment of space not corresponding to a valid may well take place in a segment of space not corresponding to a valid
@ -1087,9 +1093,9 @@ not immediately clear which codeword is the most likely one.
Raising the value of $\gamma$ results in Raising the value of $\gamma$ results in
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the $h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
objective function, thereby introducing these local minima into the objective objective function, thereby introducing these local minima into the objective
function. \todo{Show equation again and explain on the basis of the equation} function.
When considering codes with larger $n$, the behaviour generally stays the When considering codes with larger $n$ the behaviour generally stays the
same, with some minor differences. same, with some minor differences.
In figure \ref{fig:prox:convergence_large_n} the decoding process is In figure \ref{fig:prox:convergence_large_n} the decoding process is
visualized for one component of a code with $n=204$, for a single decoding. visualized for one component of a code with $n=204$, for a single decoding.
@ -1103,9 +1109,10 @@ gradient of the negative log-likelihood is counteracted.
In conclusion, as a general rule, the proximal decoding algorithm reaches In conclusion, as a general rule, the proximal decoding algorithm reaches
an oscillatory state which it cannot escape as a consequence of its structure. an oscillatory state which it cannot escape as a consequence of its structure.
In this state the constraints may not be satisfied, leading to the algorithm In this state the constraints may not be satisfied, leading to the algorithm
returning an invalid codeword. exhausting its maximum number of iterations without converging and returning
an invalid codeword.
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
@ -1133,11 +1140,15 @@ returning an invalid codeword.
\end{axis} \end{axis}
\end{tikzpicture} \end{tikzpicture}
\caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)} \caption{Visualization of a single decoding operation\protect\footnotemark{}
for a code with $n=204$}
\label{fig:prox:convergence_large_n} \label{fig:prox:convergence_large_n}
\end{figure}% \end{figure}%
%
\todo{Fix captions / footnotes referencing the different codes in all figures} \footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
\cite[\text{204.33.484}]{mackay_enc}; $\gamma=0.05, \omega = 0.05, K=200, \eta=1.5$
}%
%
\subsection{Computational Performance} \subsection{Computational Performance}
@ -1161,16 +1172,16 @@ codes in an \ac{AWGN} channel is $\mathcal{O}\left( n \right)$, which is
practical since it is the same as that of $\ac{BP}$. practical since it is the same as that of $\ac{BP}$.
This theoretical analysis is also corroborated by the practical results shown This theoretical analysis is also corroborated by the practical results shown
in figure \ref{fig:prox:time_comp}. \todo{Note about no very large $n$ codes being in figure \ref{fig:prox:time_comp}.
used due to memory requirements?}
Some deviations from linear behaviour are unavoidable because not all codes Some deviations from linear behaviour are unavoidable because not all codes
considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
according to the same scheme. according to the same scheme.
\todo{Mention on what hardware the results where generated}
Nontheless, a generally linear relationship between the average time needed to Nontheless, a generally linear relationship between the average time needed to
decode a received frame and the length $n$ of the frame can be observed. decode a received frame and the length $n$ of the frame can be observed.
These results were generated on an Intel Core i7-7700HQ 4-core CPU, running at
$\SI{2.80}{GHz}$ and utilizing all cores.
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
@ -1186,7 +1197,7 @@ decode a received frame and the length $n$ of the frame can be observed.
\end{axis} \end{axis}
\end{tikzpicture} \end{tikzpicture}
\caption{Time requirements of proximal decoding algorithm imlementation% \caption{Time requirements of the proximal decoding algorithm imlementation%
\protect\footnotemark{}} \protect\footnotemark{}}
\label{fig:prox:time_comp} \label{fig:prox:time_comp}
\end{figure}% \end{figure}%
@ -1224,11 +1235,12 @@ $\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
magnitude to the confidence that a given bit is correct. magnitude to the confidence that a given bit is correct.
And indeed, the magnitude of the oscillation of And indeed, the magnitude of the oscillation of
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
section \ref{subsec:prox:conv_properties}) and the probability of having a bit section \ref{subsec:prox:conv_properties} and shown in figure
\ref{fig:prox:convergence_large_n}) and the probability of having a bit
error are strongly correlated, a relationship depicted in figure error are strongly correlated, a relationship depicted in figure
\ref{fig:prox:correlation}. \ref{fig:prox:correlation}.
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
@ -1249,19 +1261,23 @@ error are strongly correlated, a relationship depicted in figure
\end{axis} \end{axis}
\end{tikzpicture} \end{tikzpicture}
\caption{Correlation between bit error and amplitude of oscillation} \caption{Correlation between the occurrence of a bit error and the
amplitude of oscillation of the gradient of the code-constraint polynomial%
\protect\footnotemark{}}
\label{fig:prox:correlation} \label{fig:prox:correlation}
\end{figure} \end{figure}%
%
\todo{Mention that the variance of the oscillation is measured \footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
after a given number of iterations} \cite[\text{204.33.484}]{mackay_enc}; $\gamma = 0.05, \omega = 0.05, K=100, \eta=1.5$
}%
%
\noindent The y-axis depicts whether there is a bit error and the x-axis the \noindent The y-axis depicts whether there is a bit error and the x-axis the
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ after the
$k=100$. While this is not exactly the magnitude of the oscillation, it is 100th iteration.
While this is not exactly the magnitude of the oscillation, it is
proportional and easier to compute. proportional and easier to compute.
The datapoints are taken from a single decoding operation The datapoints are taken from a single decoding operation
\todo{Generate same figure with multiple decodings}.
Using this observation as a rule to determine the $N\in\mathbb{N}$ most Using this observation as a rule to determine the $N\in\mathbb{N}$ most
probably wrong bits, all variations of the estimate with those bits modified probably wrong bits, all variations of the estimate with those bits modified
@ -1286,14 +1302,14 @@ for $K$ iterations do
end for end for
$\textcolor{KITblue}{\text{Find }N\text{ most probably wrong bits}}$ $\textcolor{KITblue}{\text{Find }N\text{ most probably wrong bits}}$
$\textcolor{KITblue}{\text{Generate variations } \boldsymbol{\tilde{c}}_l,\hspace{1mm} $\textcolor{KITblue}{\text{Generate variations } \boldsymbol{\tilde{c}}_l,\hspace{1mm}
l\in [1:n]\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$ l\in \mathbb{N}\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
$\textcolor{KITblue}{\text{Compute }d_H\left( \boldsymbol{ \tilde{c}}_l, $\textcolor{KITblue}{\text{Compute }d_H\left( \boldsymbol{ \tilde{c}}_l,
\boldsymbol{\hat{c}} \right) \text{ for all valid codewords } \boldsymbol{\tilde{c}}_l}$ \boldsymbol{\hat{c}} \right) \text{ for all valid codewords } \boldsymbol{\tilde{c}}_l}$
$\textcolor{KITblue}{\text{Output }\boldsymbol{\tilde{c}}_l\text{ with lowest } $\textcolor{KITblue}{\text{Output }\boldsymbol{\tilde{c}}_l\text{ with lowest }
d_H\left( \boldsymbol{ \tilde{c}}_l, \boldsymbol{\hat{c}} \right)}$ d_H\left( \boldsymbol{ \tilde{c}}_l, \boldsymbol{\hat{c}} \right)}$
\end{genericAlgorithm} \end{genericAlgorithm}
\todo{Not hamming distance, correlation} %\todo{Not hamming distance, correlation}
Figure \ref{fig:prox:improved_results} shows the gain that can be achieved Figure \ref{fig:prox:improved_results} shows the gain that can be achieved
when the number $N$ is chosen to be 12. when the number $N$ is chosen to be 12.
@ -1304,13 +1320,12 @@ with solid lines and the results for the improved version are shown with
dashed lines. dashed lines.
For the case of $\gamma = 0.05$, the number of frame errors produced for the For the case of $\gamma = 0.05$, the number of frame errors produced for the
datapoints at $\SI{6}{dB}$, $\SI{6.5}{dB}$ and $\SI{7}{dB}$ are datapoints at $\SI{6}{dB}$, $\SI{6.5}{dB}$ and $\SI{7}{dB}$ are
70, 17 and 2, respectively. \todo{Redo simulation with higher number of iterations} 70, 17 and 2, respectively.
The gain seems to depend on the value of $\gamma$, as well as become more The gain seems to depend on the value of $\gamma$, as well as become more
pronounced for higher \ac{SNR} values. pronounced for higher \ac{SNR} values.
This is to be expected, since with higher \ac{SNR} values the number of bit This is to be expected, since with higher \ac{SNR} values the number of bit
errors decreases, making the correction of those errors in the ML-in-the-List errors decreases, making the correction of those errors in the ML-in-the-List
step more likely. step more likely.
In figure \ref{fig:prox:improved:comp} the decoding performance In figure \ref{fig:prox:improved:comp} the decoding performance
between proximal decoding and the improved algorithm is compared for a number between proximal decoding and the improved algorithm is compared for a number
of different codes. of different codes.
@ -1320,8 +1335,9 @@ generate the point for the improved algorithm for $\gamma=0.05$ at
$\SI{5.5}{dB}$. $\SI{5.5}{dB}$.
Similar behaviour can be observed in all cases, with varying improvement over Similar behaviour can be observed in all cases, with varying improvement over
standard proximal decoding. standard proximal decoding.
In some cases, a gain of up to $\SI{1}{dB}$ and higher can be achieved.
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
@ -1459,13 +1475,12 @@ average time needed to decode a single received frame is visualized for
proximal decoding as well as for the improved algorithm. proximal decoding as well as for the improved algorithm.
It should be noted that some variability in the data is to be expected, It should be noted that some variability in the data is to be expected,
since the timing of the actual simulations depends on a multitude of other since the timing of the actual simulations depends on a multitude of other
parameters such as the outside temperature (because of thermal throttling), parameters such as the scheduling choices of the operating system as well as
the scheduling choices of the operating system as well as variations in the variations in the implementations themselves.
implementations themselves.
Nevertheless, the empirical data serves, at least in part, to validate the Nevertheless, the empirical data serves, at least in part, to validate the
theoretical considerations. theoretical considerations.
\begin{figure}[H] \begin{figure}[h]
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
@ -1501,8 +1516,7 @@ theoretical considerations.
In conclusion, the decoding performance of proximal decoding can be improved In conclusion, the decoding performance of proximal decoding can be improved
by appending an ML-in-the-List step when the algorithm does not produce a by appending an ML-in-the-List step when the algorithm does not produce a
valid result. valid result.
The gain can in some cases be as high as $\SI{1}{dB}$ \todo{Explicitly mention this value earlier} The gain can in some cases be as high as $\SI{1}{dB}$ and is achievable with
and is achievable with
negligible computational performance penalty. negligible computational performance penalty.
The improvement is mainly noticable for higher \ac{SNR} values and depends on The improvement is mainly noticable for higher \ac{SNR} values and depends on
the code as well as the chosen parameters. the code as well as the chosen parameters.