Reworked rest of proximal decoding and fixed most figures and captions

2023-04-11 19:42:31 +02:00 · 2023-04-11 19:42:31 +02:00 · 0d4b13ccda
commit 0d4b13ccda
parent 5c135e085e
1 changed files with 66 additions and 52 deletions
--- a/latex/thesis/chapters/proximal_decoding.tex
+++ b/latex/thesis/chapters/proximal_decoding.tex
@ -557,8 +557,8 @@ The plots have been generated by averaging the error over $\SI{500000}{}$
 decodings.
 As some decodings go one for more iterations than others, the number of values
 which are averaged for each datapoints vary.
-This explains the dip visible in all curves around $k=20$, since after
+This explains the dip visible in all curves around the 20th iteration, since
-this point more and more correct decodings are completed,
+after this point more and more correct decodings are completed,
 leaving more and more faulty ones to be averaged.
 Additionally, at this point the decline in the average error stagnates,
 rendering an increase in $K$ counterproductive as it only raises the average
@ -628,7 +628,7 @@ means to bring about numerical stability.
 \subsection{Decoding Performance}
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{tikzpicture}
@ -749,13 +749,14 @@ non-convergence of the algorithm instead of convergence to the wrong codeword,
 raises the question why the decoding process does not converge so often.
 In figure \ref{fig:prox:convergence}, the iterative process is visualized.
 In order to be able to simultaneously consider all components of the vectors
-being dealt with, a BCH code with $n=7$ and $k=4$ has been chosen.
+being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
-Each chart shows one component of the current estimate during a given
+Each plot shows one component of the current estimate during a given
-iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
+iteration ($\boldsymbol{r}$ and $\boldsymbol{s}$ are counted as different
 estimates and their values are interwoven to obtain the shown result), as well
 as the gradients of the negative log-likelihood and the code-constraint
 polynomial, which influence the next estimate.
-\begin{figure}[H]
+\begin{figure}[h]
    \begin{minipage}[c]{0.25\textwidth}
        \centering
@ -955,10 +956,10 @@ polynomial, which influence the next estimate.
 % 
 \noindent It is evident that in all cases, past a certain number of
 iterations, the estimate starts to oscillate around a particular value.
-After a certain point, the two gradients stop further approaching the value
+Jointly, the two gradients stop further approaching the value
 zero.
-In particular, this leads to the code-constraints polynomial not being
+This leads to the two terms of the objective function an in particular the
-minimized.
+code-constraint polynomial not being minimized.
 As such, the constraints are not being satisfied and the estimate is not
 converging towards a valid codeword.
@ -969,17 +970,28 @@ This can be justified by looking at the gradients themselves.
 In figure \ref{fig:prox:gradients} the gradients of the negative
 log-likelihood and the code-constraint polynomial for a repetition code with
 $n=2$ are shown.
 The two valid codewords of the $n=2$ repetition code can be recognized in
 figure \ref{fig:prox:gradients:h} as
 $\boldsymbol{c}_1 = \begin{bmatrix} -1 & -1 \end{bmatrix} $ and
 $\boldsymbol{c}_2 = \begin{bmatrix} 1 & 1 \end{bmatrix}$;
 these are also the points producing the global minima of the code-constraint
 polynomial.
 The gradient of the negative log-likelihood points towards the received
 codeword as can be seen in figure \ref{fig:prox:gradients:L},
 since assuming \ac{AWGN} and no other information that is the
 estimate maximizing the likelihood.
 It is obvious that walking along the gradients in an alternating fashion will
-produce a net movement in a certain direction, as long as the two gradients
+produce a net movement in a certain direction, as long as they
 have a common component.
 As soon as this common component is exhausted, they will start pulling the
 estimate in opposing directions, leading to an oscillation as illustrated
 in figure \ref{fig:prox:convergence}.
 Consequently, this oscillation is an intrinsic property of the structure of
 the proximal decoding algorithm, where the two parts of the objective function
-are minimized in an alternating manner by use of their gradients.%
+are minimized in an alternating manner by use of their gradients.
 %
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{subfigure}[c]{0.5\textwidth}
@ -1058,12 +1070,6 @@ are minimized in an alternating manner by use of their gradients.%
    \label{fig:prox:gradients}
 \end{figure}%
 %
 \todo{Better explain what is visible on the two gradient plots: the two valid
 codewords (-1, -1)  and (1, 1); the section between them where a decision in
 which direction to move is difficult; maybe say why the gradient of $L$ points
 to one specific point}
 While the initial net movement is generally directed in the right direction
 owing to the gradient of the negative log-likelihood, the final oscillation
 may well take place in a segment of space not corresponding to a valid
@ -1087,9 +1093,9 @@ not immediately clear which codeword is the most likely one.
 Raising the value of $\gamma$ results in
 $h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
 objective function, thereby introducing these local minima into the objective
-function. \todo{Show equation again and explain on the basis of the equation}
+function.
-When considering codes with larger $n$, the behaviour generally stays the
+When considering codes with larger $n$ the behaviour generally stays the
 same, with some minor differences.
 In figure \ref{fig:prox:convergence_large_n} the decoding process is
 visualized for one component of a code with $n=204$, for a single decoding.
@ -1103,9 +1109,10 @@ gradient of the negative log-likelihood is counteracted.
 In conclusion, as a general rule, the proximal decoding algorithm reaches
 an oscillatory state which it cannot escape as a consequence of its structure.
 In this state the constraints may not be satisfied, leading to the algorithm
-returning an invalid codeword.
+exhausting its maximum number of iterations without converging and returning
 an invalid codeword.
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{tikzpicture}
@ -1133,11 +1140,15 @@ returning an invalid codeword.
        \end{axis}
    \end{tikzpicture}
-    \caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
+    \caption{Visualization of a single decoding operation\protect\footnotemark{}
        for a code with $n=204$}
    \label{fig:prox:convergence_large_n}
 \end{figure}%
-
+%
-\todo{Fix captions / footnotes referencing the different codes in all figures}
+\footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
    \cite[\text{204.33.484}]{mackay_enc}; $\gamma=0.05, \omega = 0.05, K=200, \eta=1.5$
 }%
 %
 \subsection{Computational Performance}
@ -1161,16 +1172,16 @@ codes in an \ac{AWGN} channel is $\mathcal{O}\left( n \right)$, which is
 practical since it is the same as that of $\ac{BP}$.
 This theoretical analysis is also corroborated by the practical results shown
-in figure \ref{fig:prox:time_comp}. \todo{Note about no very large $n$ codes being
+in figure \ref{fig:prox:time_comp}.
 used due to memory requirements?}
 Some deviations from linear behaviour are unavoidable because not all codes
 considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
 according to the same scheme.
 \todo{Mention on what hardware the results where generated}
 Nontheless, a generally linear relationship between the average time needed to
 decode a received frame and the length $n$ of the frame can be observed.
 These results were generated on an Intel Core i7-7700HQ 4-core CPU, running at
 $\SI{2.80}{GHz}$ and utilizing all cores.
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{tikzpicture}
@ -1186,7 +1197,7 @@ decode a received frame and the length $n$ of the frame can be observed.
        \end{axis}
    \end{tikzpicture}
-    \caption{Time requirements of proximal decoding algorithm imlementation%
+    \caption{Time requirements of the proximal decoding algorithm imlementation%
        \protect\footnotemark{}}
    \label{fig:prox:time_comp}
 \end{figure}%
@ -1224,11 +1235,12 @@ $\nabla h\left( \tilde{\boldsymbol{x}} \right) $ may be related in its
 magnitude to the confidence that a given bit is correct.
 And indeed, the magnitude of the oscillation of
 $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
-section \ref{subsec:prox:conv_properties}) and the probability of having a bit
+section \ref{subsec:prox:conv_properties} and shown in figure
 \ref{fig:prox:convergence_large_n}) and the probability of having a bit
 error are strongly correlated, a relationship depicted in figure
 \ref{fig:prox:correlation}.
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{tikzpicture}
@ -1249,19 +1261,23 @@ error are strongly correlated, a relationship depicted in figure
        \end{axis}
    \end{tikzpicture}
-    \caption{Correlation between bit error and amplitude of oscillation}
+    \caption{Correlation between the occurrence of a bit error and the
        amplitude of oscillation of the gradient of the code-constraint polynomial%
        \protect\footnotemark{}}
    \label{fig:prox:correlation}
-\end{figure}
+\end{figure}%
-
+%
-\todo{Mention that the variance of the oscillation is measured
+\footnotetext{(3,6) regular \ac{LDPC} code with n = 204, k = 102
-after a given number of iterations}
+    \cite[\text{204.33.484}]{mackay_enc}; $\gamma = 0.05, \omega = 0.05, K=100, \eta=1.5$
 }%
 %
 \noindent The y-axis depicts whether there is a bit error and the x-axis the
-variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ past the iteration
+variance in $\nabla h\left( \tilde{\boldsymbol{x}} \right)$ after the
-$k=100$. While this is not exactly the magnitude of the oscillation, it is
+100th iteration.
 While this is not exactly the magnitude of the oscillation, it is
 proportional and easier to compute.
 The datapoints are taken from a single decoding operation
 \todo{Generate same figure with multiple decodings}.
 Using this observation as a rule to determine the $N\in\mathbb{N}$ most
 probably wrong bits, all variations of the estimate with those bits modified
@ -1286,14 +1302,14 @@ for $K$ iterations do
 end for
 $\textcolor{KITblue}{\text{Find }N\text{ most probably wrong bits}}$
 $\textcolor{KITblue}{\text{Generate variations } \boldsymbol{\tilde{c}}_l,\hspace{1mm}
-    l\in [1:n]\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
+    l\in \mathbb{N}\text{ of } \boldsymbol{\hat{c}}\text{ with the }N\text{ bits modified}}$
 $\textcolor{KITblue}{\text{Compute }d_H\left( \boldsymbol{ \tilde{c}}_l,
    \boldsymbol{\hat{c}} \right) \text{ for all valid codewords } \boldsymbol{\tilde{c}}_l}$
 $\textcolor{KITblue}{\text{Output }\boldsymbol{\tilde{c}}_l\text{ with lowest }
    d_H\left( \boldsymbol{ \tilde{c}}_l, \boldsymbol{\hat{c}} \right)}$
 \end{genericAlgorithm}
-\todo{Not hamming distance, correlation}
+%\todo{Not hamming distance, correlation}
 Figure \ref{fig:prox:improved_results} shows the gain that can be achieved
 when the number $N$ is chosen to be 12.
@ -1304,13 +1320,12 @@ with solid lines and the results for the improved version are shown with
 dashed lines.
 For the case of $\gamma = 0.05$, the number of frame errors produced for the
 datapoints at $\SI{6}{dB}$, $\SI{6.5}{dB}$ and $\SI{7}{dB}$ are
-70, 17 and 2, respectively. \todo{Redo simulation with higher number of iterations}
+70, 17 and 2, respectively.
 The gain seems to depend on the value of $\gamma$, as well as become more
 pronounced for higher \ac{SNR} values.
 This is to be expected, since with higher \ac{SNR} values the number of bit
 errors decreases, making the correction of those errors in the ML-in-the-List
 step more likely.
 In figure \ref{fig:prox:improved:comp} the decoding performance
 between proximal decoding and the improved algorithm is compared for a number
 of different codes.
@ -1320,8 +1335,9 @@ generate the point for the improved algorithm for $\gamma=0.05$ at
 $\SI{5.5}{dB}$.
 Similar behaviour can be observed in all cases, with varying improvement over
 standard proximal decoding.
 In some cases, a gain of up to $\SI{1}{dB}$ and higher can be achieved.
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{tikzpicture}
@ -1459,13 +1475,12 @@ average time needed to decode a single received frame is visualized for
 proximal decoding as well as for the improved algorithm.
 It should be noted that some variability in the data is to be expected,
 since the timing of the actual simulations depends on a multitude of other
-parameters such as the outside temperature (because of thermal throttling),
+parameters such as the scheduling choices of the operating system as well as
-the scheduling choices of the operating system as well as variations in the
+variations in the implementations themselves.
 implementations themselves.
 Nevertheless, the empirical data serves, at least in part, to validate the
 theoretical considerations.
-\begin{figure}[H]
+\begin{figure}[h]
    \centering
    \begin{tikzpicture}
@ -1501,8 +1516,7 @@ theoretical considerations.
 In conclusion, the decoding performance of proximal decoding can be improved
 by appending an ML-in-the-List step when the algorithm does not produce a
 valid result.
-The gain can in some cases be as high as $\SI{1}{dB}$ \todo{Explicitly mention this value earlier}
+The gain can in some cases be as high as $\SI{1}{dB}$ and is achievable with
 and is achievable with
 negligible computational performance penalty.
 The improvement is mainly noticable for higher \ac{SNR} values and depends on
 the code as well as the chosen parameters.