Fixed tilde over x; Wrote convergence properties subsection; Minor different changes

2023-04-09 02:17:47 +02:00 · 2023-04-09 02:17:47 +02:00 · 2312f40d94
commit 2312f40d94
parent 9c9aa11669
1 changed files with 421 additions and 68 deletions
--- a/latex/thesis/chapters/proximal_decoding.tex
+++ b/latex/thesis/chapters/proximal_decoding.tex
@ -270,28 +270,29 @@ The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
 is given by%
 %
 \begin{align*}
-    \nabla h\left( \boldsymbol{x} \right) &= \begin{bmatrix}
+    \nabla h\left( \tilde{\boldsymbol{x}} \right) &= \begin{bmatrix}
-        \frac{\partial}{\partial x_1}h\left( \boldsymbol{x} \right) &
+        \frac{\partial}{\partial \tilde{x}_1}h\left( \tilde{\boldsymbol{x}} \right) &
        \ldots &
-        \frac{\partial}{\partial x_n}h\left( \boldsymbol{x} \right) &
+        \frac{\partial}{\partial \tilde{x}_n}h\left( \tilde{\boldsymbol{x}} \right) &
    \end{bmatrix}^\text{T}, \\[1em]
-    \frac{\partial}{\partial x_k}h\left( \boldsymbol{x} \right) &= 4\left( x_k^2 - 1 \right) x_k
+    \frac{\partial}{\partial \tilde{x}_k}h\left( \tilde{\boldsymbol{x}} \right)
-        + \frac{2}{x_k} \sum_{j\in N_v\left( k \right) }\left(
+        &= 4\left( \tilde{x}_k^2 - 1 \right) \tilde{x}_k
-            \left( \prod_{i \in N_c\left( j \right)} x_i \right)^2
+        + \frac{2}{\tilde{x}_k} \sum_{j\in N_v\left( k \right) }\left(
-                - \prod_{i\in N_c\left( j \right) } x_i \right) 
+            \left( \prod_{i \in N_c\left( j \right)} \tilde{x}_i \right)^2
                - \prod_{i\in N_c\left( j \right) } \tilde{x}_i \right) 
 .\end{align*}
 %
 Since the products
-$\prod_{i\in N_c\left( j \right) } x_i,\hspace{2mm}j\in \mathcal{J}$
+$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$
-are the same for all components $x_k$ of $\boldsymbol{x}$, they can be
+are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be
 precomputed.
 Defining%
 %
 \begin{align*}
    \boldsymbol{p} := \begin{bmatrix}
-        \prod_{i\in N_c\left( 1 \right) }x_i \\
+        \prod_{i\in N_c\left( 1 \right) } \tilde{x}_i \\
        \vdots \\
-        \prod_{i\in N_c\left( m \right) }x_i \\
+        \prod_{i\in N_c\left( m \right) } \tilde{x}_i \\
    \end{bmatrix}
    \hspace{5mm}
    \text{and}
@ -302,9 +303,9 @@ Defining%
 the gradient can be written as%
 %
 \begin{align*}
-    \nabla h\left( \boldsymbol{x} \right) =
+    \nabla h\left( \tilde{\boldsymbol{x}} \right) =
-        4\left( \boldsymbol{x}^{\circ 3} - \boldsymbol{x} \right)
+        4\left( \tilde{\boldsymbol{x}}^{\circ 3} - \tilde{\boldsymbol{x}} \right)
-        + 2\boldsymbol{x}^{\circ -1} \circ \boldsymbol{H}^\text{T}
+        + 2\tilde{\boldsymbol{x}}^{\circ -1} \circ \boldsymbol{H}^\text{T}
            \boldsymbol{v}
 ,\end{align*}
 %
@ -331,7 +332,7 @@ The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is
 examined.
 The decoding performance is assessed on the basis of the \ac{BER} and the
 \ac{FER} as well as the \textit{decoding failure rate} - the rate at which
-the algorithm produces erroneous results.
+the algorithm produces invalid results.
 The convergence properties are reviewed and related to the decoding
 performance.
 Finally, the computational performance is examined on a theoretical basis
@ -497,6 +498,17 @@ undertaking an extensive search for an exact optimum.
 Rather, a preliminary examination providing a rough window for $\gamma$ may
 be sufficient.
 TODO: $\omega, K$
 Changing the parameter $\eta$ does not appear to have a significant effect on
 the decoding performance when keeping the value within a reasonable window
 (''slightly larger than one``, as stated in \cite[Sec. 3.2]{proximal_paper}),
 which seems plausible considering its only function is ensuring numerical stability.
 \begin{itemize}
    \item Conclusion: Number of iterations independent of \ac{SNR}
 \end{itemize}
 \begin{figure}[H]
    \centering
@ -716,19 +728,13 @@ be sufficient.
                \addlegendentry{$\gamma = 0.15$};
            \end{axis}
        \end{tikzpicture}
    \end{subfigure}
    \caption{BER for $\omega = 0.05, K=100$ (different codes)}
    \label{fig:prox:results_3d_multiple}
 \end{figure}
 A similar analysis was performed to determine the optimal values for the other
 parameters, $\omega$, $K$ and $\eta$.
 Changing the parameter $\eta$ does not appear to have a significant effect on
 the decoding performance, which seems sensible considering its only purpose
 is ensuring numerical stability.
 \subsection{Decoding Performance}
 \begin{figure}[H]
@ -815,67 +821,414 @@ is ensuring numerical stability.
 Until now, only the \ac{BER} has been considered to assess the decoding
 performance.
-The \ac{FER}, however, shows considerably worse performance, as can be seen in
+The \ac{FER}, however, shows considerably worse behaviour, as can be seen in
 figure \ref{fig:prox:ber_fer_dfr}.
 Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
 \textit{decoding failure rate}.
-This is the rate at which the iterative process produces erroneous codewords,
+This is the rate at which the iterative process produces invalid codewords,
 i.e., the stopping criterion (line 6 of algorithm \ref{TODO}) is never
 satisfied and the maximum number of itertations $K$ is reached without
 converging to a valid codeword.
 Three lines are plotted in each case, corresponding to different values of
 the parameter $\gamma$.
 The values chosen are the same as in figure \ref{fig:prox:results}, as they
 seem to adequately describe the behaviour across a wide range of values
 (see figure \ref{fig:prox:results_3d}).
-One possible explanation might be found in the structure of the proxmal
+It is apparent that the \ac{FER} and the decoding failure rate are extremely
-decoding algorithm \ref{TODO} itself.
+similar, especially for higher \acp{SNR}.
-As it comprises two separate steps, one responsible for addressing the
+This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors
-likelihood and one for addressing the constraints imposed by the parity-check
+arise mainly due to the non-convergence of the algorithm instead of
-matrix, the algorithm could tend to gravitate toward the correct codeword
+convergence to the wrong codeword.
 but then get stuck in a local minimum introduced by the code-constraint
 polynomial.
 This would yield fewer bit-errors, while still producing a frame error.
 This course of thought will be picked up in section
 \ref{sec:prox:Improved Implementation} to try to improve the algorithm.
 In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding
 performance.
 The decoding failure rate closely resembles the \ac{FER}, suggesting that
 the frame errors may largely be attributed to decoding failures.
 \todo{Maybe reference to the structure of the algorithm (1 part likelihood
 1 part constraints)}
 \begin{itemize}
    \item Introduction
        \begin{itemize}
            \item asdf
            \item ghjk
        \end{itemize}
    \item Reconstruction of results from paper
        \begin{itemize}
            \item asdf
            \item ghjk
        \end{itemize}
    \item Choice of parameters, in particular gamma
        \begin{itemize}
            \item Introduction (``Looking at these results, the question arises \ldots'')
            \item Different gammas simulated for same code as in paper
            \item 
        \end{itemize}
    \item The FER problem
        \begin{itemize}
            \item Intro (``\acs{FER} not as good as the \acs{BER} would have one assume'')
            \item Possible explanation
        \end{itemize}
    \item Computational performance
        \begin{itemize}
            \item Theoretical analysis
            \item Simulation results to substantiate theoretical analysis
        \end{itemize}
    \item Conclusion
        \begin{itemize}
            \item Choice of $\gamma$ code-dependant but decoding performance largely unaffected
                by small variations
            \item Number of iterations independent of \ac{SNR}
            \item $\mathcal{O}\left( n \right)$ time complexity, implementation heavily
                optimizable
        \end{itemize}
 \end{itemize}
 \subsection{Convergence Properties}
 The previous observation, that the \ac{FER} arises mainly due to the
 non-convergence of the algorithm instead of convergence to the wrong codeword,
 raises the question why the decoding process does not converge so often.
 In figure \ref{fig:prox:convergence}, the iterative process is visualized
 for each iteration.
 In order to be able to simultaneously consider all components of the vectors
 being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
 Each chart shows one component of the current estimates during a given
 iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
 as the gradients of the negative log-likelihood and the code-constraint
 polynomial, which influence the next estimate.
 \begin{figure}[H]
    \begin{minipage}[c]{0.25\textwidth}
        \centering
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_1]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_1]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_1]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_2$}
                \addlegendentry{$\left(\nabla h \right)_2 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_2]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_2]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_2]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_3$}
                \addlegendentry{$\left(\nabla h \right)_3 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_3]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_3]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_3]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_4$}
                \addlegendentry{$\left(\nabla h \right)_4 $}
            \end{axis}
        \end{tikzpicture}
    \end{minipage}%
    \begin{minipage}[c]{0.5\textwidth}
        \vspace*{-1cm}
        \centering
        \begin{tikzpicture}[scale = 0.85, spy using outlines={circle, magnification=6,
                                                              connect spies}]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_0]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_0]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_0]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_1$}
                \addlegendentry{$\left(\nabla h \right)_1 $}
                \coordinate (spypoint) at (axis cs:100,0.53);
                \coordinate (magnifyglass) at (axis cs:175,2);
            \end{axis}
            \spy [black, size=2cm] on (spypoint)
               in node[fill=white] at (magnifyglass);
        \end{tikzpicture}
    \end{minipage}%
    \begin{minipage}[c]{0.25\textwidth}
        \centering
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_4]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_4]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_4]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_5$}
                \addlegendentry{$\left(\nabla h \right)_5 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_5]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_5]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_5]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_6$}
                \addlegendentry{$\left(\nabla h \right)_6 $}
            \end{axis}
        \end{tikzpicture}\\
        \begin{tikzpicture}[scale = 0.35]
            \begin{axis}[
                grid=both,
                xlabel={Iterations},
                width=8cm,
                height=3cm,
                scale only axis,
                xtick={0, 50, ..., 200},
                xticklabels={0, 25, ..., 100},
            ]
                \addplot [NavyBlue, mark=none, line width=1]
                    table [col sep=comma, x=k, y=comb_r_s_6]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [ForestGreen, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_L_6]
                        {res/proximal/comp_bch_7_4_combined.csv};
                \addplot [RedOrange, mark=none, line width=1]
                    table [col sep=comma, x=k, y=grad_h_6]
                    {res/proximal/comp_bch_7_4_combined.csv};
                \addlegendentry{est}
                \addlegendentry{$\left(\nabla L \right)_7$}
                \addlegendentry{$\left(\nabla h \right)_7 $}
            \end{axis}
        \end{tikzpicture}
    \end{minipage}
    \caption{Internal variables of proximal decoder
        as a function of the number of iterations ($n=7$)\protect\footnotemark{}}
    \label{fig:prox:convergence}
 \end{figure}%
 %
 \footnotetext{A single decoding is shown, using the BCH$\left( 7,4 \right) $ code;
    $\gamma = 0.05, \omega = 0.05, E_b / N_0 = \SI{5}{dB}$
 }%
 % 
 \noindent It is evident that in all cases, past a certain number of
 iterations, the estimate starts to oscillate around a particular value.
 After a certain point, the two gradients stop further approaching the value
 zero.
 In particular, this leads to the code-constraints polynomial not being
 minimized.
 As such, the constraints are not being satisfied and the estimate is not
 converging towards a valid codeword.
 While figure \ref{fig:prox:convergence} shows only one instance of a decoding
 task, it is indicative of the general behaviour of the algorithm.
 This can be justified by looking at the gradients themselves.
 In figure \ref{fig:prox:gradients} the gradients of the negative
 log-likelihood and the code-constraint polynomial for a repetition code with
 $n=2$ are shown.
 It is obvious that walking along the gradients in an alternating fashion will
 produce a net movement in a certain direction, as long as the two gradients
 have a common component.
 As soon as this common component is exhausted, they will start pulling the
 estimate in opposing directions, leading to an oscillation as illustrated
 in figure \ref{fig:prox:convergence}.
 Consequently, this oscillation is an intrinsic property of the structure of
 the proximal decoding algorithm, where the two parts of the objective function
 are minimized in an alternating manner using their gradients.
 \begin{figure}[H]
    \centering
    \begin{subfigure}[c]{0.5\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[xmin = -1.25, xmax=1.25,
                         ymin = -1.25, ymax=1.25,
                         xlabel={$x_1$}, ylabel={$x_2$},
                         width=\textwidth,
                         height=0.75\textwidth,
                         grid=major, grid style={dotted},
                         view={0}{90}]
                \addplot3[point meta=\thisrow{grad_norm},
                          point meta min=1,
                          point meta max=3,
                          quiver={u=\thisrow{grad_0},
                                  v=\thisrow{grad_1},
                                  scale arrows=.05,
                                  every arrow/.append style={%
                                    line width=.3+\pgfplotspointmetatransformed/1000,
                                    -{Latex[length=0pt 5,width=0pt 3]}
                                  },
                              },
                            quiver/colored = {mapped color},
                            colormap/rocket,
                            -stealth,
                          ]
                            table[col sep=comma] {res/proximal/2d_grad_L.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $
            for a repetition code with $n=2$}
    \end{subfigure}%
    \hfill%
    \begin{subfigure}[c]{0.5\textwidth}
        \centering
        \begin{tikzpicture}
            \begin{axis}[xmin = -1.25, xmax=1.25,
                         ymin = -1.25, ymax=1.25,
                         xlabel={$x_1$}, ylabel={$x_2$},
                         grid=major, grid style={dotted},
                         width=\textwidth,
                         height=0.75\textwidth,
                         view={0}{90}]
                \addplot3[point meta=\thisrow{grad_norm},
                          point meta min=1,
                          point meta max=4,
                          quiver={u=\thisrow{grad_0},
                                  v=\thisrow{grad_1},
                                  scale arrows=.03,
                                  every arrow/.append style={%
                                    line width=.3+\pgfplotspointmetatransformed/1000,
                                    -{Latex[length=0pt 5,width=0pt 3]}
                                  },
                              },
                            quiver/colored = {mapped color},
                            colormap/rocket,
                            -stealth,
                          ]
                            table[col sep=comma] {res/proximal/2d_grad_h.csv};
            \end{axis}
        \end{tikzpicture}
        \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $
            for a repetition code with $n=2$}
    \end{subfigure}%
 \end{figure}
 While the initial net movement is generally directed in the right direction
 owing to the gradient of the negative log-likelihood, the final oscillation
 may well take place in a segment of space not corresponding to a valid
 codeword, leading to the aforementioned non-convergence of the algorithm.
 This also partly explains the difference in decoding performance when looking
 at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while
 still yielding an invalid codeword.
 When considering codes with larger $n$, the behaviour generally stays the
 same, with some minor differences.
 In figure \ref{fig:prox:convergence_large_n} the decoding process is
 visualized for one component of a code with $n=204$, for a single decoding.
 The two gradients still start to fight each other and the estimate still
 starts to oscillate, the same as illustrated on the basis of figure
 \ref{fig:prox:convergence} for a code with $n=7$.
 However, in this case, the gradient of the code-constraint polynomial iself
 starts to oscillate, its average value being such that the effect of the
 gradient of the negative log-likelihood is counteracted.
 In conclusion, as a general rule, the proximal decoding algorithm reaches
 an oscillatory state which it cannot escape as a consequence of its structure.
 In this state, the constraints may not be satisfied, leading to the algorithm
 returning an invalid codeword.
 \begin{figure}[H]
    \centering
    \begin{tikzpicture}
        \begin{axis}[
            grid=both,
            xlabel={Iterations},
            width=0.6\textwidth,
            height=0.45\textwidth,
            scale only axis,
            xtick={0, 100, ..., 400},
            xticklabels={0, 50, ..., 200},
        ]
            \addplot [NavyBlue, mark=none, line width=1]
                table [col sep=comma, x=k, y=comb_r_s_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addplot [ForestGreen, mark=none, line width=1]
                table [col sep=comma, x=k, y=grad_L_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addplot [RedOrange, mark=none, line width=1]
                table [col sep=comma, x=k, y=grad_h_0]
                    {res/proximal/extreme_components_20433484_combined.csv};
            \addlegendentry{est}
            \addlegendentry{$\left(\nabla L\right)_1$}
            \addlegendentry{$\left(\nabla h\right)_1$}
        \end{axis}
    \end{tikzpicture}
    \caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
    \label{fig:prox:convergence_large_n}
 \end{figure}%
 \subsection{Computational Performance}
 \begin{itemize}
    \item Theoretical analysis
    \item Simulation results to substantiate theoretical analysis
    \item Conclusion: $\mathcal{O}\left( n \right)$ time complexity, implementation heavily
        optimizable
 \end{itemize}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Improved Implementation}%