From 2312f40d947bf8e07bed1b4d6732ae127bb18c5a Mon Sep 17 00:00:00 2001 From: Andreas Tsouchlos Date: Sun, 9 Apr 2023 02:17:47 +0200 Subject: [PATCH] Fixed tilde over x; Wrote convergence properties subsection; Minor different changes --- latex/thesis/chapters/proximal_decoding.tex | 489 +++++++++++++++++--- 1 file changed, 421 insertions(+), 68 deletions(-) diff --git a/latex/thesis/chapters/proximal_decoding.tex b/latex/thesis/chapters/proximal_decoding.tex index ae09966..bde076c 100644 --- a/latex/thesis/chapters/proximal_decoding.tex +++ b/latex/thesis/chapters/proximal_decoding.tex @@ -270,28 +270,29 @@ The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper} is given by% % \begin{align*} - \nabla h\left( \boldsymbol{x} \right) &= \begin{bmatrix} - \frac{\partial}{\partial x_1}h\left( \boldsymbol{x} \right) & + \nabla h\left( \tilde{\boldsymbol{x}} \right) &= \begin{bmatrix} + \frac{\partial}{\partial \tilde{x}_1}h\left( \tilde{\boldsymbol{x}} \right) & \ldots & - \frac{\partial}{\partial x_n}h\left( \boldsymbol{x} \right) & + \frac{\partial}{\partial \tilde{x}_n}h\left( \tilde{\boldsymbol{x}} \right) & \end{bmatrix}^\text{T}, \\[1em] - \frac{\partial}{\partial x_k}h\left( \boldsymbol{x} \right) &= 4\left( x_k^2 - 1 \right) x_k - + \frac{2}{x_k} \sum_{j\in N_v\left( k \right) }\left( - \left( \prod_{i \in N_c\left( j \right)} x_i \right)^2 - - \prod_{i\in N_c\left( j \right) } x_i \right) + \frac{\partial}{\partial \tilde{x}_k}h\left( \tilde{\boldsymbol{x}} \right) + &= 4\left( \tilde{x}_k^2 - 1 \right) \tilde{x}_k + + \frac{2}{\tilde{x}_k} \sum_{j\in N_v\left( k \right) }\left( + \left( \prod_{i \in N_c\left( j \right)} \tilde{x}_i \right)^2 + - \prod_{i\in N_c\left( j \right) } \tilde{x}_i \right) .\end{align*} % Since the products -$\prod_{i\in N_c\left( j \right) } x_i,\hspace{2mm}j\in \mathcal{J}$ -are the same for all components $x_k$ of $\boldsymbol{x}$, they can be +$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$ +are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be precomputed. Defining% % \begin{align*} \boldsymbol{p} := \begin{bmatrix} - \prod_{i\in N_c\left( 1 \right) }x_i \\ + \prod_{i\in N_c\left( 1 \right) } \tilde{x}_i \\ \vdots \\ - \prod_{i\in N_c\left( m \right) }x_i \\ + \prod_{i\in N_c\left( m \right) } \tilde{x}_i \\ \end{bmatrix} \hspace{5mm} \text{and} @@ -302,9 +303,9 @@ Defining% the gradient can be written as% % \begin{align*} - \nabla h\left( \boldsymbol{x} \right) = - 4\left( \boldsymbol{x}^{\circ 3} - \boldsymbol{x} \right) - + 2\boldsymbol{x}^{\circ -1} \circ \boldsymbol{H}^\text{T} + \nabla h\left( \tilde{\boldsymbol{x}} \right) = + 4\left( \tilde{\boldsymbol{x}}^{\circ 3} - \tilde{\boldsymbol{x}} \right) + + 2\tilde{\boldsymbol{x}}^{\circ -1} \circ \boldsymbol{H}^\text{T} \boldsymbol{v} ,\end{align*} % @@ -331,7 +332,7 @@ The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is examined. The decoding performance is assessed on the basis of the \ac{BER} and the \ac{FER} as well as the \textit{decoding failure rate} - the rate at which -the algorithm produces erroneous results. +the algorithm produces invalid results. The convergence properties are reviewed and related to the decoding performance. Finally, the computational performance is examined on a theoretical basis @@ -497,6 +498,17 @@ undertaking an extensive search for an exact optimum. Rather, a preliminary examination providing a rough window for $\gamma$ may be sufficient. +TODO: $\omega, K$ + +Changing the parameter $\eta$ does not appear to have a significant effect on +the decoding performance when keeping the value within a reasonable window +(''slightly larger than one``, as stated in \cite[Sec. 3.2]{proximal_paper}), +which seems plausible considering its only function is ensuring numerical stability. + +\begin{itemize} + \item Conclusion: Number of iterations independent of \ac{SNR} +\end{itemize} + \begin{figure}[H] \centering @@ -716,19 +728,13 @@ be sufficient. \addlegendentry{$\gamma = 0.15$}; \end{axis} \end{tikzpicture} + \end{subfigure} \caption{BER for $\omega = 0.05, K=100$ (different codes)} \label{fig:prox:results_3d_multiple} \end{figure} -A similar analysis was performed to determine the optimal values for the other -parameters, $\omega$, $K$ and $\eta$. - -Changing the parameter $\eta$ does not appear to have a significant effect on -the decoding performance, which seems sensible considering its only purpose -is ensuring numerical stability. - \subsection{Decoding Performance} \begin{figure}[H] @@ -815,67 +821,414 @@ is ensuring numerical stability. Until now, only the \ac{BER} has been considered to assess the decoding performance. -The \ac{FER}, however, shows considerably worse performance, as can be seen in +The \ac{FER}, however, shows considerably worse behaviour, as can be seen in figure \ref{fig:prox:ber_fer_dfr}. Besides the \ac{BER} and \ac{FER} curves, the figure also shows the \textit{decoding failure rate}. -This is the rate at which the iterative process produces erroneous codewords, +This is the rate at which the iterative process produces invalid codewords, i.e., the stopping criterion (line 6 of algorithm \ref{TODO}) is never satisfied and the maximum number of itertations $K$ is reached without converging to a valid codeword. +Three lines are plotted in each case, corresponding to different values of +the parameter $\gamma$. +The values chosen are the same as in figure \ref{fig:prox:results}, as they +seem to adequately describe the behaviour across a wide range of values +(see figure \ref{fig:prox:results_3d}). -One possible explanation might be found in the structure of the proxmal -decoding algorithm \ref{TODO} itself. -As it comprises two separate steps, one responsible for addressing the -likelihood and one for addressing the constraints imposed by the parity-check -matrix, the algorithm could tend to gravitate toward the correct codeword -but then get stuck in a local minimum introduced by the code-constraint -polynomial. -This would yield fewer bit-errors, while still producing a frame error. +It is apparent that the \ac{FER} and the decoding failure rate are extremely +similar, especially for higher \acp{SNR}. +This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors +arise mainly due to the non-convergence of the algorithm instead of +convergence to the wrong codeword. This course of thought will be picked up in section \ref{sec:prox:Improved Implementation} to try to improve the algorithm. +In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding +performance. +The decoding failure rate closely resembles the \ac{FER}, suggesting that +the frame errors may largely be attributed to decoding failures. + +\todo{Maybe reference to the structure of the algorithm (1 part likelihood +1 part constraints)} -\begin{itemize} - \item Introduction - \begin{itemize} - \item asdf - \item ghjk - \end{itemize} - \item Reconstruction of results from paper - \begin{itemize} - \item asdf - \item ghjk - \end{itemize} - \item Choice of parameters, in particular gamma - \begin{itemize} - \item Introduction (``Looking at these results, the question arises \ldots'') - \item Different gammas simulated for same code as in paper - \item - \end{itemize} - \item The FER problem - \begin{itemize} - \item Intro (``\acs{FER} not as good as the \acs{BER} would have one assume'') - \item Possible explanation - \end{itemize} - \item Computational performance - \begin{itemize} - \item Theoretical analysis - \item Simulation results to substantiate theoretical analysis - \end{itemize} - \item Conclusion - \begin{itemize} - \item Choice of $\gamma$ code-dependant but decoding performance largely unaffected - by small variations - \item Number of iterations independent of \ac{SNR} - \item $\mathcal{O}\left( n \right)$ time complexity, implementation heavily - optimizable - \end{itemize} -\end{itemize} \subsection{Convergence Properties} + +The previous observation, that the \ac{FER} arises mainly due to the +non-convergence of the algorithm instead of convergence to the wrong codeword, +raises the question why the decoding process does not converge so often. +In figure \ref{fig:prox:convergence}, the iterative process is visualized +for each iteration. +In order to be able to simultaneously consider all components of the vectors +being dealt with, a BCH code with $n=7$ and $k=4$ is chosen. +Each chart shows one component of the current estimates during a given +iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well +as the gradients of the negative log-likelihood and the code-constraint +polynomial, which influence the next estimate. + +\begin{figure}[H] + \begin{minipage}[c]{0.25\textwidth} + \centering + + \begin{tikzpicture}[scale = 0.35] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_1] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_1] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_1] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_2$} + \addlegendentry{$\left(\nabla h \right)_2 $} + \end{axis} + \end{tikzpicture}\\ + \begin{tikzpicture}[scale = 0.35] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_2] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_2] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_2] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_3$} + \addlegendentry{$\left(\nabla h \right)_3 $} + \end{axis} + \end{tikzpicture}\\ + \begin{tikzpicture}[scale = 0.35] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_3] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_3] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_3] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_4$} + \addlegendentry{$\left(\nabla h \right)_4 $} + \end{axis} + \end{tikzpicture} + \end{minipage}% + \begin{minipage}[c]{0.5\textwidth} + \vspace*{-1cm} + \centering + + \begin{tikzpicture}[scale = 0.85, spy using outlines={circle, magnification=6, + connect spies}] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_0] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_0] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_0] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_1$} + \addlegendentry{$\left(\nabla h \right)_1 $} + + \coordinate (spypoint) at (axis cs:100,0.53); + \coordinate (magnifyglass) at (axis cs:175,2); + \end{axis} + \spy [black, size=2cm] on (spypoint) + in node[fill=white] at (magnifyglass); + \end{tikzpicture} + \end{minipage}% + \begin{minipage}[c]{0.25\textwidth} + \centering + + \begin{tikzpicture}[scale = 0.35] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_4] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_4] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_4] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_5$} + \addlegendentry{$\left(\nabla h \right)_5 $} + \end{axis} + \end{tikzpicture}\\ + \begin{tikzpicture}[scale = 0.35] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_5] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_5] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_5] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_6$} + \addlegendentry{$\left(\nabla h \right)_6 $} + \end{axis} + \end{tikzpicture}\\ + \begin{tikzpicture}[scale = 0.35] + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=8cm, + height=3cm, + scale only axis, + xtick={0, 50, ..., 200}, + xticklabels={0, 25, ..., 100}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_6] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_6] + {res/proximal/comp_bch_7_4_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_6] + {res/proximal/comp_bch_7_4_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L \right)_7$} + \addlegendentry{$\left(\nabla h \right)_7 $} + \end{axis} + \end{tikzpicture} + \end{minipage} + + \caption{Internal variables of proximal decoder + as a function of the number of iterations ($n=7$)\protect\footnotemark{}} + \label{fig:prox:convergence} +\end{figure}% +% +\footnotetext{A single decoding is shown, using the BCH$\left( 7,4 \right) $ code; + $\gamma = 0.05, \omega = 0.05, E_b / N_0 = \SI{5}{dB}$ +}% +% +\noindent It is evident that in all cases, past a certain number of +iterations, the estimate starts to oscillate around a particular value. +After a certain point, the two gradients stop further approaching the value +zero. +In particular, this leads to the code-constraints polynomial not being +minimized. +As such, the constraints are not being satisfied and the estimate is not +converging towards a valid codeword. + +While figure \ref{fig:prox:convergence} shows only one instance of a decoding +task, it is indicative of the general behaviour of the algorithm. +This can be justified by looking at the gradients themselves. +In figure \ref{fig:prox:gradients} the gradients of the negative +log-likelihood and the code-constraint polynomial for a repetition code with +$n=2$ are shown. +It is obvious that walking along the gradients in an alternating fashion will +produce a net movement in a certain direction, as long as the two gradients +have a common component. +As soon as this common component is exhausted, they will start pulling the +estimate in opposing directions, leading to an oscillation as illustrated +in figure \ref{fig:prox:convergence}. +Consequently, this oscillation is an intrinsic property of the structure of +the proximal decoding algorithm, where the two parts of the objective function +are minimized in an alternating manner using their gradients. + +\begin{figure}[H] + \centering + + \begin{subfigure}[c]{0.5\textwidth} + \centering + + \begin{tikzpicture} + \begin{axis}[xmin = -1.25, xmax=1.25, + ymin = -1.25, ymax=1.25, + xlabel={$x_1$}, ylabel={$x_2$}, + width=\textwidth, + height=0.75\textwidth, + grid=major, grid style={dotted}, + view={0}{90}] + + \addplot3[point meta=\thisrow{grad_norm}, + point meta min=1, + point meta max=3, + quiver={u=\thisrow{grad_0}, + v=\thisrow{grad_1}, + scale arrows=.05, + every arrow/.append style={% + line width=.3+\pgfplotspointmetatransformed/1000, + -{Latex[length=0pt 5,width=0pt 3]} + }, + }, + quiver/colored = {mapped color}, + colormap/rocket, + -stealth, + ] + table[col sep=comma] {res/proximal/2d_grad_L.csv}; + \end{axis} + \end{tikzpicture} + + \caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $ + for a repetition code with $n=2$} + \end{subfigure}% + \hfill% + \begin{subfigure}[c]{0.5\textwidth} + \centering + + \begin{tikzpicture} + \begin{axis}[xmin = -1.25, xmax=1.25, + ymin = -1.25, ymax=1.25, + xlabel={$x_1$}, ylabel={$x_2$}, + grid=major, grid style={dotted}, + width=\textwidth, + height=0.75\textwidth, + view={0}{90}] + \addplot3[point meta=\thisrow{grad_norm}, + point meta min=1, + point meta max=4, + quiver={u=\thisrow{grad_0}, + v=\thisrow{grad_1}, + scale arrows=.03, + every arrow/.append style={% + line width=.3+\pgfplotspointmetatransformed/1000, + -{Latex[length=0pt 5,width=0pt 3]} + }, + }, + quiver/colored = {mapped color}, + colormap/rocket, + -stealth, + ] + table[col sep=comma] {res/proximal/2d_grad_h.csv}; + \end{axis} + \end{tikzpicture} + + \caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $ + for a repetition code with $n=2$} + \end{subfigure}% +\end{figure} + + +While the initial net movement is generally directed in the right direction +owing to the gradient of the negative log-likelihood, the final oscillation +may well take place in a segment of space not corresponding to a valid +codeword, leading to the aforementioned non-convergence of the algorithm. +This also partly explains the difference in decoding performance when looking +at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while +still yielding an invalid codeword. + +When considering codes with larger $n$, the behaviour generally stays the +same, with some minor differences. +In figure \ref{fig:prox:convergence_large_n} the decoding process is +visualized for one component of a code with $n=204$, for a single decoding. +The two gradients still start to fight each other and the estimate still +starts to oscillate, the same as illustrated on the basis of figure +\ref{fig:prox:convergence} for a code with $n=7$. +However, in this case, the gradient of the code-constraint polynomial iself +starts to oscillate, its average value being such that the effect of the +gradient of the negative log-likelihood is counteracted. + +In conclusion, as a general rule, the proximal decoding algorithm reaches +an oscillatory state which it cannot escape as a consequence of its structure. +In this state, the constraints may not be satisfied, leading to the algorithm +returning an invalid codeword. + +\begin{figure}[H] + \centering + + \begin{tikzpicture} + \begin{axis}[ + grid=both, + xlabel={Iterations}, + width=0.6\textwidth, + height=0.45\textwidth, + scale only axis, + xtick={0, 100, ..., 400}, + xticklabels={0, 50, ..., 200}, + ] + \addplot [NavyBlue, mark=none, line width=1] + table [col sep=comma, x=k, y=comb_r_s_0] + {res/proximal/extreme_components_20433484_combined.csv}; + \addplot [ForestGreen, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_L_0] + {res/proximal/extreme_components_20433484_combined.csv}; + \addplot [RedOrange, mark=none, line width=1] + table [col sep=comma, x=k, y=grad_h_0] + {res/proximal/extreme_components_20433484_combined.csv}; + \addlegendentry{est} + \addlegendentry{$\left(\nabla L\right)_1$} + \addlegendentry{$\left(\nabla h\right)_1$} + \end{axis} + \end{tikzpicture} + + \caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)} + \label{fig:prox:convergence_large_n} +\end{figure}% + + \subsection{Computational Performance} +\begin{itemize} + \item Theoretical analysis + \item Simulation results to substantiate theoretical analysis + \item Conclusion: $\mathcal{O}\left( n \right)$ time complexity, implementation heavily + optimizable +\end{itemize} + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Improved Implementation}%