Fixed tilde over x; Wrote convergence properties subsection; Minor different changes

This commit is contained in:
Andreas Tsouchlos 2023-04-09 02:17:47 +02:00
parent 9c9aa11669
commit 2312f40d94

View File

@ -270,28 +270,29 @@ The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
is given by%
%
\begin{align*}
\nabla h\left( \boldsymbol{x} \right) &= \begin{bmatrix}
\frac{\partial}{\partial x_1}h\left( \boldsymbol{x} \right) &
\nabla h\left( \tilde{\boldsymbol{x}} \right) &= \begin{bmatrix}
\frac{\partial}{\partial \tilde{x}_1}h\left( \tilde{\boldsymbol{x}} \right) &
\ldots &
\frac{\partial}{\partial x_n}h\left( \boldsymbol{x} \right) &
\frac{\partial}{\partial \tilde{x}_n}h\left( \tilde{\boldsymbol{x}} \right) &
\end{bmatrix}^\text{T}, \\[1em]
\frac{\partial}{\partial x_k}h\left( \boldsymbol{x} \right) &= 4\left( x_k^2 - 1 \right) x_k
+ \frac{2}{x_k} \sum_{j\in N_v\left( k \right) }\left(
\left( \prod_{i \in N_c\left( j \right)} x_i \right)^2
- \prod_{i\in N_c\left( j \right) } x_i \right)
\frac{\partial}{\partial \tilde{x}_k}h\left( \tilde{\boldsymbol{x}} \right)
&= 4\left( \tilde{x}_k^2 - 1 \right) \tilde{x}_k
+ \frac{2}{\tilde{x}_k} \sum_{j\in N_v\left( k \right) }\left(
\left( \prod_{i \in N_c\left( j \right)} \tilde{x}_i \right)^2
- \prod_{i\in N_c\left( j \right) } \tilde{x}_i \right)
.\end{align*}
%
Since the products
$\prod_{i\in N_c\left( j \right) } x_i,\hspace{2mm}j\in \mathcal{J}$
are the same for all components $x_k$ of $\boldsymbol{x}$, they can be
$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$
are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be
precomputed.
Defining%
%
\begin{align*}
\boldsymbol{p} := \begin{bmatrix}
\prod_{i\in N_c\left( 1 \right) }x_i \\
\prod_{i\in N_c\left( 1 \right) } \tilde{x}_i \\
\vdots \\
\prod_{i\in N_c\left( m \right) }x_i \\
\prod_{i\in N_c\left( m \right) } \tilde{x}_i \\
\end{bmatrix}
\hspace{5mm}
\text{and}
@ -302,9 +303,9 @@ Defining%
the gradient can be written as%
%
\begin{align*}
\nabla h\left( \boldsymbol{x} \right) =
4\left( \boldsymbol{x}^{\circ 3} - \boldsymbol{x} \right)
+ 2\boldsymbol{x}^{\circ -1} \circ \boldsymbol{H}^\text{T}
\nabla h\left( \tilde{\boldsymbol{x}} \right) =
4\left( \tilde{\boldsymbol{x}}^{\circ 3} - \tilde{\boldsymbol{x}} \right)
+ 2\tilde{\boldsymbol{x}}^{\circ -1} \circ \boldsymbol{H}^\text{T}
\boldsymbol{v}
,\end{align*}
%
@ -331,7 +332,7 @@ The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is
examined.
The decoding performance is assessed on the basis of the \ac{BER} and the
\ac{FER} as well as the \textit{decoding failure rate} - the rate at which
the algorithm produces erroneous results.
the algorithm produces invalid results.
The convergence properties are reviewed and related to the decoding
performance.
Finally, the computational performance is examined on a theoretical basis
@ -497,6 +498,17 @@ undertaking an extensive search for an exact optimum.
Rather, a preliminary examination providing a rough window for $\gamma$ may
be sufficient.
TODO: $\omega, K$
Changing the parameter $\eta$ does not appear to have a significant effect on
the decoding performance when keeping the value within a reasonable window
(''slightly larger than one``, as stated in \cite[Sec. 3.2]{proximal_paper}),
which seems plausible considering its only function is ensuring numerical stability.
\begin{itemize}
\item Conclusion: Number of iterations independent of \ac{SNR}
\end{itemize}
\begin{figure}[H]
\centering
@ -716,19 +728,13 @@ be sufficient.
\addlegendentry{$\gamma = 0.15$};
\end{axis}
\end{tikzpicture}
\end{subfigure}
\caption{BER for $\omega = 0.05, K=100$ (different codes)}
\label{fig:prox:results_3d_multiple}
\end{figure}
A similar analysis was performed to determine the optimal values for the other
parameters, $\omega$, $K$ and $\eta$.
Changing the parameter $\eta$ does not appear to have a significant effect on
the decoding performance, which seems sensible considering its only purpose
is ensuring numerical stability.
\subsection{Decoding Performance}
\begin{figure}[H]
@ -815,67 +821,414 @@ is ensuring numerical stability.
Until now, only the \ac{BER} has been considered to assess the decoding
performance.
The \ac{FER}, however, shows considerably worse performance, as can be seen in
The \ac{FER}, however, shows considerably worse behaviour, as can be seen in
figure \ref{fig:prox:ber_fer_dfr}.
Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
\textit{decoding failure rate}.
This is the rate at which the iterative process produces erroneous codewords,
This is the rate at which the iterative process produces invalid codewords,
i.e., the stopping criterion (line 6 of algorithm \ref{TODO}) is never
satisfied and the maximum number of itertations $K$ is reached without
converging to a valid codeword.
Three lines are plotted in each case, corresponding to different values of
the parameter $\gamma$.
The values chosen are the same as in figure \ref{fig:prox:results}, as they
seem to adequately describe the behaviour across a wide range of values
(see figure \ref{fig:prox:results_3d}).
One possible explanation might be found in the structure of the proxmal
decoding algorithm \ref{TODO} itself.
As it comprises two separate steps, one responsible for addressing the
likelihood and one for addressing the constraints imposed by the parity-check
matrix, the algorithm could tend to gravitate toward the correct codeword
but then get stuck in a local minimum introduced by the code-constraint
polynomial.
This would yield fewer bit-errors, while still producing a frame error.
It is apparent that the \ac{FER} and the decoding failure rate are extremely
similar, especially for higher \acp{SNR}.
This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors
arise mainly due to the non-convergence of the algorithm instead of
convergence to the wrong codeword.
This course of thought will be picked up in section
\ref{sec:prox:Improved Implementation} to try to improve the algorithm.
In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding
performance.
The decoding failure rate closely resembles the \ac{FER}, suggesting that
the frame errors may largely be attributed to decoding failures.
\todo{Maybe reference to the structure of the algorithm (1 part likelihood
1 part constraints)}
\begin{itemize}
\item Introduction
\begin{itemize}
\item asdf
\item ghjk
\end{itemize}
\item Reconstruction of results from paper
\begin{itemize}
\item asdf
\item ghjk
\end{itemize}
\item Choice of parameters, in particular gamma
\begin{itemize}
\item Introduction (``Looking at these results, the question arises \ldots'')
\item Different gammas simulated for same code as in paper
\item
\end{itemize}
\item The FER problem
\begin{itemize}
\item Intro (``\acs{FER} not as good as the \acs{BER} would have one assume'')
\item Possible explanation
\end{itemize}
\item Computational performance
\begin{itemize}
\item Theoretical analysis
\item Simulation results to substantiate theoretical analysis
\end{itemize}
\item Conclusion
\begin{itemize}
\item Choice of $\gamma$ code-dependant but decoding performance largely unaffected
by small variations
\item Number of iterations independent of \ac{SNR}
\item $\mathcal{O}\left( n \right)$ time complexity, implementation heavily
optimizable
\end{itemize}
\end{itemize}
\subsection{Convergence Properties}
The previous observation, that the \ac{FER} arises mainly due to the
non-convergence of the algorithm instead of convergence to the wrong codeword,
raises the question why the decoding process does not converge so often.
In figure \ref{fig:prox:convergence}, the iterative process is visualized
for each iteration.
In order to be able to simultaneously consider all components of the vectors
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
Each chart shows one component of the current estimates during a given
iteration (alternating between $\boldsymbol{r}$ and $\boldsymbol{s}$), as well
as the gradients of the negative log-likelihood and the code-constraint
polynomial, which influence the next estimate.
\begin{figure}[H]
\begin{minipage}[c]{0.25\textwidth}
\centering
\begin{tikzpicture}[scale = 0.35]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_1]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_1]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_1]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_2$}
\addlegendentry{$\left(\nabla h \right)_2 $}
\end{axis}
\end{tikzpicture}\\
\begin{tikzpicture}[scale = 0.35]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_2]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_2]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_2]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_3$}
\addlegendentry{$\left(\nabla h \right)_3 $}
\end{axis}
\end{tikzpicture}\\
\begin{tikzpicture}[scale = 0.35]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_3]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_3]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_3]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_4$}
\addlegendentry{$\left(\nabla h \right)_4 $}
\end{axis}
\end{tikzpicture}
\end{minipage}%
\begin{minipage}[c]{0.5\textwidth}
\vspace*{-1cm}
\centering
\begin{tikzpicture}[scale = 0.85, spy using outlines={circle, magnification=6,
connect spies}]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_0]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_0]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_0]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_1$}
\addlegendentry{$\left(\nabla h \right)_1 $}
\coordinate (spypoint) at (axis cs:100,0.53);
\coordinate (magnifyglass) at (axis cs:175,2);
\end{axis}
\spy [black, size=2cm] on (spypoint)
in node[fill=white] at (magnifyglass);
\end{tikzpicture}
\end{minipage}%
\begin{minipage}[c]{0.25\textwidth}
\centering
\begin{tikzpicture}[scale = 0.35]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_4]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_4]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_4]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_5$}
\addlegendentry{$\left(\nabla h \right)_5 $}
\end{axis}
\end{tikzpicture}\\
\begin{tikzpicture}[scale = 0.35]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_5]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_5]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_5]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_6$}
\addlegendentry{$\left(\nabla h \right)_6 $}
\end{axis}
\end{tikzpicture}\\
\begin{tikzpicture}[scale = 0.35]
\begin{axis}[
grid=both,
xlabel={Iterations},
width=8cm,
height=3cm,
scale only axis,
xtick={0, 50, ..., 200},
xticklabels={0, 25, ..., 100},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_6]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_6]
{res/proximal/comp_bch_7_4_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_6]
{res/proximal/comp_bch_7_4_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L \right)_7$}
\addlegendentry{$\left(\nabla h \right)_7 $}
\end{axis}
\end{tikzpicture}
\end{minipage}
\caption{Internal variables of proximal decoder
as a function of the number of iterations ($n=7$)\protect\footnotemark{}}
\label{fig:prox:convergence}
\end{figure}%
%
\footnotetext{A single decoding is shown, using the BCH$\left( 7,4 \right) $ code;
$\gamma = 0.05, \omega = 0.05, E_b / N_0 = \SI{5}{dB}$
}%
%
\noindent It is evident that in all cases, past a certain number of
iterations, the estimate starts to oscillate around a particular value.
After a certain point, the two gradients stop further approaching the value
zero.
In particular, this leads to the code-constraints polynomial not being
minimized.
As such, the constraints are not being satisfied and the estimate is not
converging towards a valid codeword.
While figure \ref{fig:prox:convergence} shows only one instance of a decoding
task, it is indicative of the general behaviour of the algorithm.
This can be justified by looking at the gradients themselves.
In figure \ref{fig:prox:gradients} the gradients of the negative
log-likelihood and the code-constraint polynomial for a repetition code with
$n=2$ are shown.
It is obvious that walking along the gradients in an alternating fashion will
produce a net movement in a certain direction, as long as the two gradients
have a common component.
As soon as this common component is exhausted, they will start pulling the
estimate in opposing directions, leading to an oscillation as illustrated
in figure \ref{fig:prox:convergence}.
Consequently, this oscillation is an intrinsic property of the structure of
the proximal decoding algorithm, where the two parts of the objective function
are minimized in an alternating manner using their gradients.
\begin{figure}[H]
\centering
\begin{subfigure}[c]{0.5\textwidth}
\centering
\begin{tikzpicture}
\begin{axis}[xmin = -1.25, xmax=1.25,
ymin = -1.25, ymax=1.25,
xlabel={$x_1$}, ylabel={$x_2$},
width=\textwidth,
height=0.75\textwidth,
grid=major, grid style={dotted},
view={0}{90}]
\addplot3[point meta=\thisrow{grad_norm},
point meta min=1,
point meta max=3,
quiver={u=\thisrow{grad_0},
v=\thisrow{grad_1},
scale arrows=.05,
every arrow/.append style={%
line width=.3+\pgfplotspointmetatransformed/1000,
-{Latex[length=0pt 5,width=0pt 3]}
},
},
quiver/colored = {mapped color},
colormap/rocket,
-stealth,
]
table[col sep=comma] {res/proximal/2d_grad_L.csv};
\end{axis}
\end{tikzpicture}
\caption{$\nabla L \left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) $
for a repetition code with $n=2$}
\end{subfigure}%
\hfill%
\begin{subfigure}[c]{0.5\textwidth}
\centering
\begin{tikzpicture}
\begin{axis}[xmin = -1.25, xmax=1.25,
ymin = -1.25, ymax=1.25,
xlabel={$x_1$}, ylabel={$x_2$},
grid=major, grid style={dotted},
width=\textwidth,
height=0.75\textwidth,
view={0}{90}]
\addplot3[point meta=\thisrow{grad_norm},
point meta min=1,
point meta max=4,
quiver={u=\thisrow{grad_0},
v=\thisrow{grad_1},
scale arrows=.03,
every arrow/.append style={%
line width=.3+\pgfplotspointmetatransformed/1000,
-{Latex[length=0pt 5,width=0pt 3]}
},
},
quiver/colored = {mapped color},
colormap/rocket,
-stealth,
]
table[col sep=comma] {res/proximal/2d_grad_h.csv};
\end{axis}
\end{tikzpicture}
\caption{$\nabla h \left( \tilde{\boldsymbol{x}} \right) $
for a repetition code with $n=2$}
\end{subfigure}%
\end{figure}
While the initial net movement is generally directed in the right direction
owing to the gradient of the negative log-likelihood, the final oscillation
may well take place in a segment of space not corresponding to a valid
codeword, leading to the aforementioned non-convergence of the algorithm.
This also partly explains the difference in decoding performance when looking
at the \ac{BER} and \ac{FER}, as it would lower the amount of bit errors while
still yielding an invalid codeword.
When considering codes with larger $n$, the behaviour generally stays the
same, with some minor differences.
In figure \ref{fig:prox:convergence_large_n} the decoding process is
visualized for one component of a code with $n=204$, for a single decoding.
The two gradients still start to fight each other and the estimate still
starts to oscillate, the same as illustrated on the basis of figure
\ref{fig:prox:convergence} for a code with $n=7$.
However, in this case, the gradient of the code-constraint polynomial iself
starts to oscillate, its average value being such that the effect of the
gradient of the negative log-likelihood is counteracted.
In conclusion, as a general rule, the proximal decoding algorithm reaches
an oscillatory state which it cannot escape as a consequence of its structure.
In this state, the constraints may not be satisfied, leading to the algorithm
returning an invalid codeword.
\begin{figure}[H]
\centering
\begin{tikzpicture}
\begin{axis}[
grid=both,
xlabel={Iterations},
width=0.6\textwidth,
height=0.45\textwidth,
scale only axis,
xtick={0, 100, ..., 400},
xticklabels={0, 50, ..., 200},
]
\addplot [NavyBlue, mark=none, line width=1]
table [col sep=comma, x=k, y=comb_r_s_0]
{res/proximal/extreme_components_20433484_combined.csv};
\addplot [ForestGreen, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_L_0]
{res/proximal/extreme_components_20433484_combined.csv};
\addplot [RedOrange, mark=none, line width=1]
table [col sep=comma, x=k, y=grad_h_0]
{res/proximal/extreme_components_20433484_combined.csv};
\addlegendentry{est}
\addlegendentry{$\left(\nabla L\right)_1$}
\addlegendentry{$\left(\nabla h\right)_1$}
\end{axis}
\end{tikzpicture}
\caption{Internal variables of proximal decoder as a function of the iteration ($n=204$)}
\label{fig:prox:convergence_large_n}
\end{figure}%
\subsection{Computational Performance}
\begin{itemize}
\item Theoretical analysis
\item Simulation results to substantiate theoretical analysis
\item Conclusion: $\mathcal{O}\left( n \right)$ time complexity, implementation heavily
optimizable
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Improved Implementation}%