Reworked rest of proximal decoding and fixed most figures and captions
This commit is contained in:
parent
5c135e085e
commit
0d4b13ccda
@ -557,8 +557,8 @@ The plots have been generated by averaging the error over $\SI{500000}{}$
|
|||||||
decodings.
|
decodings.
|
||||||
As some decodings go one for more iterations than others, the number of values
|
As some decodings go one for more iterations than others, the number of values
|
||||||
which are averaged for each datapoints vary.
|
which are averaged for each datapoints vary.
|
||||||
This explains the dip visible in all curves around $k=20$, since after
|
This explains the dip visible in all curves around the 20th iteration, since
|
||||||
this point more and more correct decodings are completed,
|
after this point more and more correct decodings are completed,
|
||||||
leaving more and more faulty ones to be averaged.
|
leaving more and more faulty ones to be averaged.
|
||||||
Additionally, at this point the decline in the average error stagnates,
|
Additionally, at this point the decline in the average error stagnates,
|
||||||
rendering an increase in $K$ counterproductive as it only raises the average
|
rendering an increase in $K$ counterproductive as it only raises the average
|
||||||
@ -628,7 +628,7 @@ means to bring about numerical stability.
|
|||||||
|
|
||||||
\subsection{Decoding Performance}
|
\subsection{Decoding Performance}
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@ -749,13 +749,14 @@ non-convergence of the algorithm instead of convergence to the wrong codeword,
|
|||||||
raises the question why the decoding process does not converge so often.
|
raises the question why the decoding process does not converge so often.
|
||||||
In figure \ref{fig:prox:convergence}, the iterative process is visualized.
|
In figure \ref{fig:prox:convergence}, the iterative process is visualized.
|
||||||
In order to be able to simultaneously consider all components of the vectors
|
In order to be able to simultaneously consider all components of the vectors
|
||||||
being dealt with, a BCH code with $n=7$ and $k=4$ has been chosen.
|
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
|
||||||
Each chart shows one component of the current estimate during a given
|
Each plot shows one component of the current estimate during a given
|
||||||
iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
|
iteration ($\boldsymbol{r}$ and $\boldsymbol{s}$ are counted as different
|
||||||
|
estimates and their values are interwoven to obtain the shown result), as well
|
||||||
as the gradients of the negative log-likelihood and the code-constraint
|
as the gradients of the negative log-likelihood and the code-constraint
|
||||||
polynomial, which influence the next estimate.
|
polynomial, which influence the next estimate.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\begin{minipage}[c]{0.25\textwidth}
|
\begin{minipage}[c]{0.25\textwidth}
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
@ -955,10 +956,10 @@ polynomial, which influence the next estimate.
|
|||||||
%
|
%
|
||||||
\noindent It is evident that in all cases, past a certain number of
|
\noindent It is evident that in all cases, past a certain number of
|
||||||
iterations, the estimate starts to oscillate around a particular value.
|
iterations, the estimate starts to oscillate around a particular value.
|
||||||
After a certain point, the two gradients stop further approaching the value
|
Jointly, the two gradients stop further approaching the value
|
||||||
zero.
|
zero.
|
||||||
In particular, this leads to the code-constraints polynomial not being
|
This leads to the two terms of the objective function an in particular the
|
||||||
minimized.
|
code-constraint polynomial not being minimized.
|
||||||
As such, the constraints are not being satisfied and the estimate is not
|
As such, the constraints are not being satisfied and the estimate is not
|
||||||
converging towards a valid codeword.
|
converging towards a valid codeword.
|
||||||
|
|
||||||
@ -969,17 +970,28 @@ This can be justified by looking at the gradients themselves.
|
|||||||
In figure \ref{fig:prox:gradients} the gradients of the negative
|
In figure \ref{fig:prox:gradients} the gradients of the negative
|
||||||
log-likelihood and the code-constraint polynomial for a repetition code with
|
log-likelihood and the code-constraint polynomial for a repetition code with
|
||||||
$n=2$ are shown.
|
$n=2$ are shown.
|
||||||
|
The two valid codewords of the $n=2$ repetition code can be recognized in
|
||||||
|
figure \ref{fig:prox:gradients:h} as
|
||||||
|
$\boldsymbol{c}_1 = \begin{bmatrix} -1 & -1 \end{bmatrix} $ and
|
||||||
|
$\boldsymbol{c}_2 = \begin{bmatrix} 1 & 1 \end{bmatrix}$;
|
||||||
|
these are also the points producing the global minima of the code-constraint
|
||||||
|
polynomial.
|
||||||
|
The gradient of the negative log-likelihood points towards the received
|
||||||
|
codeword as can be seen in figure \ref{fig:prox:gradients:L},
|
||||||
|
since assuming \ac{AWGN} and no other information that is the
|
||||||
|
estimate maximizing the likelihood.
|
||||||
|
|
||||||
It is obvious that walking along the gradients in an alternating fashion will
|
It is obvious that walking along the gradients in an alternating fashion will
|
||||||
produce a net movement in a certain direction, as long as the two gradients
|
produce a net movement in a certain direction, as long as they
|
||||||
have a common component.
|
have a common component.
|
||||||
As soon as this common component is exhausted, they will start pulling the
|
As soon as this common component is exhausted, they will start pulling the
|
||||||
estimate in opposing directions, leading to an oscillation as illustrated
|
estimate in opposing directions, leading to an oscillation as illustrated
|
||||||
in figure \ref{fig:prox:convergence}.
|
in figure \ref{fig:prox:convergence}.
|
||||||
Consequently, this oscillation is an intrinsic property of the structure of
|
Consequently, this oscillation is an intrinsic property of the structure of
|
||||||
the proximal decoding algorithm, where the two parts of the objective function
|
the proximal decoding algorithm, where the two parts of the objective function
|
||||||
are minimized in an alternating manner by use of their gradients.%
|
are minimized in an alternating manner by use of their gradients.
|
||||||
%
|
%
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{subfigure}[c]{0.5\textwidth}
|
\begin{subfigure}[c]{0.5\textwidth}
|
||||||
@ -1058,12 +1070,6 @@ are minimized in an alternating manner by use of their gradients.%
|
|||||||
\label{fig:prox:gradients}
|
\label{fig:prox:gradients}
|
||||||
\end{figure}%
|
\end{figure}%
|
||||||
%
|
%
|
||||||
|
|
||||||
\todo{Better explain what is visible on the two gradient plots: the two valid
|
|
||||||
codewords (-1, -1) and (1, 1); the section between them where a decision in
|
|
||||||
which direction to move is difficult; maybe say why the gradient of $L$ points
|
|
||||||
to one specific point}
|
|
||||||
|
|
||||||
While the initial net movement is generally directed in the right direction
|
While the initial net movement is generally directed in the right direction
|
||||||
owing to the gradient of the negative log-likelihood, the final oscillation
|
owing to the gradient of the negative log-likelihood, the final oscillation
|
||||||
may well take place in a segment of space not corresponding to a valid
|
may well take place in a segment of space not corresponding to a valid
|
||||||
@ -1087,9 +1093,9 @@ not immediately clear which codeword is the most likely one.
|
|||||||
Raising the value of $\gamma$ results in
|
Raising the value of $\gamma$ results in
|
||||||
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
|
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
|
||||||
objective function, thereby introducing these local minima into the objective
|
objective function, thereby introducing these local minima into the objective
|
||||||
function. \todo{Show equation again and explain on the basis of the equation}
|
function.
|
||||||
|
|
||||||
When considering codes with larger $n$, the behaviour generally stays the
|
When considering codes with larger $n$ the behaviour generally stays the
|
||||||
same, with some minor differences.
|
same, with some minor differences.
|
||||||
In figure \ref{fig:prox:convergence_large_n} the decoding process is
|
In figure \ref{fig:prox:convergence_large_n} the decoding process is
|
||||||
visualized for one component of a code with $n=204$, for a single decoding.
|
visualized for one component of a code with $n=204$, for a single decoding.
|
||||||
@ -1103,9 +1109,10 @@ gradient of the negative log-likelihood is counteracted.
|
|||||||
In conclusion, as a general rule, the proximal decoding algorithm reaches
|
In conclusion, as a general rule, the proximal decoding algorithm reaches
|
||||||
an oscillatory state which it cannot escape as a consequence of its structure.
|
an oscillatory state which it cannot escape as a consequence of its structure.
|
||||||
In this state the constraints may not be satisfied, leading to the algorithm
|
In this state the constraints may not be satisfied, leading to the algorithm
|
||||||
returning an invalid codeword.
|
exhausting its maximum number of iterations without converging and returning
|
||||||
|
an invalid codeword.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@ -1133,11 +1140,15 @@ returning an invalid codeword.
|
|||||||
\end{axis}
|
\end{axis}
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
|
|
||||||
\caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
|
\caption{Visualization of a single decoding operation\protect\footnotemark{}
|
||||||
|
for a code with $n=204$}
|
||||||
\label{fig:prox:convergence_large_n}
|
\label{fig:prox:convergence_large_n}
|
||||||
\end{figure}%
|
\end{figure}%
|
||||||
|
%
|
||||||
\todo{Fix captions / footnotes referencing the different codes in all figures}
|
\footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
|
||||||
|
\cite[\text{204.33.484}]{mackay_enc}; $\gamma=0.05, \omega = 0.05, K=200, \eta=1.5$
|
||||||
|
}%
|
||||||
|
%
|
||||||
|
|
||||||
|
|
||||||
\subsection{Computational Performance}
|
\subsection{Computational Performance}
|
||||||
@ -1161,16 +1172,16 @@ codes in an \ac{AWGN} channel is $\mathcal{O}\left( n \right)$, which is
|
|||||||
practical since it is the same as that of $\ac{BP}$.
|
practical since it is the same as that of $\ac{BP}$.
|
||||||
|
|
||||||
This theoretical analysis is also corroborated by the practical results shown
|
This theoretical analysis is also corroborated by the practical results shown
|
||||||
in figure \ref{fig:prox:time_comp}. \todo{Note about no very large $n$ codes being
|
in figure \ref{fig:prox:time_comp}.
|
||||||
used due to memory requirements?}
|
|
||||||
Some deviations from linear behaviour are unavoidable because not all codes
|
Some deviations from linear behaviour are unavoidable because not all codes
|
||||||
considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
|
considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
|
||||||
according to the same scheme.
|
according to the same scheme.
|
||||||
\todo{Mention on what hardware the results where generated}
|
|
||||||
Nontheless, a generally linear relationship between the average time needed to
|
Nontheless, a generally linear relationship between the average time needed to
|
||||||
decode a received frame and the length $n$ of the frame can be observed.
|
decode a received frame and the length $n$ of the frame can be observed.
|
||||||
|
These results were generated on an Intel Core i7-7700HQ 4-core CPU, running at
|
||||||
|
$\SI{2.80}{GHz}$ and utilizing all cores.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@ -1186,7 +1197,7 @@ decode a received frame and the length $n$ of the frame can be observed.
|
|||||||
\end{axis}
|
\end{axis}
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
|
|
||||||
\caption{Time requirements of proximal decoding algorithm imlementation%
|
\caption{Time requirements of the proximal decoding algorithm imlementation%
|
||||||
\protect\footnotemark{}}
|
\protect\footnotemark{}}
|
||||||
\label{fig:prox:time_comp}
|
\label{fig:prox:time_comp}
|
||||||
\end{figure}%
|
\end{figure}%
|
||||||
@ -1224,11 +1235,12 @@ $\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
|
|||||||
magnitude to the confidence that a given bit is correct.
|
magnitude to the confidence that a given bit is correct.
|
||||||
And indeed, the magnitude of the oscillation of
|
And indeed, the magnitude of the oscillation of
|
||||||
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
|
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
|
||||||
section \ref{subsec:prox:conv_properties}) and the probability of having a bit
|
section \ref{subsec:prox:conv_properties} and shown in figure
|
||||||
|
\ref{fig:prox:convergence_large_n}) and the probability of having a bit
|
||||||
error are strongly correlated, a relationship depicted in figure
|
error are strongly correlated, a relationship depicted in figure
|
||||||
\ref{fig:prox:correlation}.
|
\ref{fig:prox:correlation}.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@ -1249,19 +1261,23 @@ error are strongly correlated, a relationship depicted in figure
|
|||||||
\end{axis}
|
\end{axis}
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
|
|
||||||
\caption{Correlation between bit error and amplitude of oscillation}
|
\caption{Correlation between the occurrence of a bit error and the
|
||||||
|
amplitude of oscillation of the gradient of the code-constraint polynomial%
|
||||||
|
\protect\footnotemark{}}
|
||||||
\label{fig:prox:correlation}
|
\label{fig:prox:correlation}
|
||||||
\end{figure}
|
\end{figure}%
|
||||||
|
%
|
||||||
\todo{Mention that the variance of the oscillation is measured
|
\footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
|
||||||
after a given number of iterations}
|
\cite[\text{204.33.484}]{mackay_enc}; $\gamma = 0.05, \omega = 0.05, K=100, \eta=1.5$
|
||||||
|
}%
|
||||||
|
%
|
||||||
|
|
||||||
\noindent The y-axis depicts whether there is a bit error and the x-axis the
|
\noindent The y-axis depicts whether there is a bit error and the x-axis the
|
||||||
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration
|
variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ after the
|
||||||
$k=100$. While this is not exactly the magnitude of the oscillation, it is
|
100th iteration.
|
||||||
|
While this is not exactly the magnitude of the oscillation, it is
|
||||||
proportional and easier to compute.
|
proportional and easier to compute.
|
||||||
The datapoints are taken from a single decoding operation
|
The datapoints are taken from a single decoding operation
|
||||||
\todo{Generate same figure with multiple decodings}.
|
|
||||||
|
|
||||||
Using this observation as a rule to determine the $N\in\mathbb{N}$ most
|
Using this observation as a rule to determine the $N\in\mathbb{N}$ most
|
||||||
probably wrong bits, all variations of the estimate with those bits modified
|
probably wrong bits, all variations of the estimate with those bits modified
|
||||||
@ -1286,14 +1302,14 @@ for $K$ iterations do
|
|||||||
end for
|
end for
|
||||||
$\textcolor{KITblue}{\text{Find }N\text{ most probably wrong bits}}$
|
$\textcolor{KITblue}{\text{Find }N\text{ most probably wrong bits}}$
|
||||||
$\textcolor{KITblue}{\text{Generate variations } \boldsymbol{\tilde{c}}_l,\hspace{1mm}
|
$\textcolor{KITblue}{\text{Generate variations } \boldsymbol{\tilde{c}}_l,\hspace{1mm}
|
||||||
l\in [1:n]\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
|
l\in \mathbb{N}\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
|
||||||
$\textcolor{KITblue}{\text{Compute }d_H\left( \boldsymbol{ \tilde{c}}_l,
|
$\textcolor{KITblue}{\text{Compute }d_H\left( \boldsymbol{ \tilde{c}}_l,
|
||||||
\boldsymbol{\hat{c}} \right) \text{ for all valid codewords } \boldsymbol{\tilde{c}}_l}$
|
\boldsymbol{\hat{c}} \right) \text{ for all valid codewords } \boldsymbol{\tilde{c}}_l}$
|
||||||
$\textcolor{KITblue}{\text{Output }\boldsymbol{\tilde{c}}_l\text{ with lowest }
|
$\textcolor{KITblue}{\text{Output }\boldsymbol{\tilde{c}}_l\text{ with lowest }
|
||||||
d_H\left( \boldsymbol{ \tilde{c}}_l, \boldsymbol{\hat{c}} \right)}$
|
d_H\left( \boldsymbol{ \tilde{c}}_l, \boldsymbol{\hat{c}} \right)}$
|
||||||
\end{genericAlgorithm}
|
\end{genericAlgorithm}
|
||||||
|
|
||||||
\todo{Not hamming distance, correlation}
|
%\todo{Not hamming distance, correlation}
|
||||||
|
|
||||||
Figure \ref{fig:prox:improved_results} shows the gain that can be achieved
|
Figure \ref{fig:prox:improved_results} shows the gain that can be achieved
|
||||||
when the number $N$ is chosen to be 12.
|
when the number $N$ is chosen to be 12.
|
||||||
@ -1304,13 +1320,12 @@ with solid lines and the results for the improved version are shown with
|
|||||||
dashed lines.
|
dashed lines.
|
||||||
For the case of $\gamma = 0.05$, the number of frame errors produced for the
|
For the case of $\gamma = 0.05$, the number of frame errors produced for the
|
||||||
datapoints at $\SI{6}{dB}$, $\SI{6.5}{dB}$ and $\SI{7}{dB}$ are
|
datapoints at $\SI{6}{dB}$, $\SI{6.5}{dB}$ and $\SI{7}{dB}$ are
|
||||||
70, 17 and 2, respectively. \todo{Redo simulation with higher number of iterations}
|
70, 17 and 2, respectively.
|
||||||
The gain seems to depend on the value of $\gamma$, as well as become more
|
The gain seems to depend on the value of $\gamma$, as well as become more
|
||||||
pronounced for higher \ac{SNR} values.
|
pronounced for higher \ac{SNR} values.
|
||||||
This is to be expected, since with higher \ac{SNR} values the number of bit
|
This is to be expected, since with higher \ac{SNR} values the number of bit
|
||||||
errors decreases, making the correction of those errors in the ML-in-the-List
|
errors decreases, making the correction of those errors in the ML-in-the-List
|
||||||
step more likely.
|
step more likely.
|
||||||
|
|
||||||
In figure \ref{fig:prox:improved:comp} the decoding performance
|
In figure \ref{fig:prox:improved:comp} the decoding performance
|
||||||
between proximal decoding and the improved algorithm is compared for a number
|
between proximal decoding and the improved algorithm is compared for a number
|
||||||
of different codes.
|
of different codes.
|
||||||
@ -1320,8 +1335,9 @@ generate the point for the improved algorithm for $\gamma=0.05$ at
|
|||||||
$\SI{5.5}{dB}$.
|
$\SI{5.5}{dB}$.
|
||||||
Similar behaviour can be observed in all cases, with varying improvement over
|
Similar behaviour can be observed in all cases, with varying improvement over
|
||||||
standard proximal decoding.
|
standard proximal decoding.
|
||||||
|
In some cases, a gain of up to $\SI{1}{dB}$ and higher can be achieved.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@ -1459,13 +1475,12 @@ average time needed to decode a single received frame is visualized for
|
|||||||
proximal decoding as well as for the improved algorithm.
|
proximal decoding as well as for the improved algorithm.
|
||||||
It should be noted that some variability in the data is to be expected,
|
It should be noted that some variability in the data is to be expected,
|
||||||
since the timing of the actual simulations depends on a multitude of other
|
since the timing of the actual simulations depends on a multitude of other
|
||||||
parameters such as the outside temperature (because of thermal throttling),
|
parameters such as the scheduling choices of the operating system as well as
|
||||||
the scheduling choices of the operating system as well as variations in the
|
variations in the implementations themselves.
|
||||||
implementations themselves.
|
|
||||||
Nevertheless, the empirical data serves, at least in part, to validate the
|
Nevertheless, the empirical data serves, at least in part, to validate the
|
||||||
theoretical considerations.
|
theoretical considerations.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
@ -1501,8 +1516,7 @@ theoretical considerations.
|
|||||||
In conclusion, the decoding performance of proximal decoding can be improved
|
In conclusion, the decoding performance of proximal decoding can be improved
|
||||||
by appending an ML-in-the-List step when the algorithm does not produce a
|
by appending an ML-in-the-List step when the algorithm does not produce a
|
||||||
valid result.
|
valid result.
|
||||||
The gain can in some cases be as high as $\SI{1}{dB}$ \todo{Explicitly mention this value earlier}
|
The gain can in some cases be as high as $\SI{1}{dB}$ and is achievable with
|
||||||
and is achievable with
|
|
||||||
negligible computational performance penalty.
|
negligible computational performance penalty.
|
||||||
The improvement is mainly noticable for higher \ac{SNR} values and depends on
|
The improvement is mainly noticable for higher \ac{SNR} values and depends on
|
||||||
the code as well as the chosen parameters.
|
the code as well as the chosen parameters.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user