862 lines
33 KiB
TeX
862 lines
33 KiB
TeX
\documentclass[journal]{IEEEtran}
|
||
|
||
|
||
\usepackage{amsmath,amsfonts}
|
||
\usepackage{float}
|
||
\usepackage{algorithmic}
|
||
\usepackage{algorithm}
|
||
\usepackage{siunitx}
|
||
\usepackage{dsfont}
|
||
\usepackage{mleftright}
|
||
\usepackage{bbm}
|
||
\usepackage[
|
||
backend=biber,
|
||
style=ieee,
|
||
sorting=nty,
|
||
]{biblatex}
|
||
|
||
|
||
\usepackage{tikz}
|
||
\usetikzlibrary{spy, arrows.meta,arrows}
|
||
|
||
\usepackage{pgfplots}
|
||
\pgfplotsset{compat=newest}
|
||
|
||
|
||
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xplore}
|
||
|
||
|
||
%
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
% Inputs & Global Options
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
%
|
||
|
||
|
||
\newif\ifoverleaf
|
||
%\overleaftrue % When enabled, this option allows the document to be compiled
|
||
% on overleaf:
|
||
% - common.tex is sourced from a different directory
|
||
% - TikZ Externalization is disabled
|
||
% - Figures are included from pre-build PDFs
|
||
|
||
|
||
%
|
||
% Figures
|
||
%
|
||
|
||
\ifoverleaf
|
||
\input{common.tex}
|
||
\else
|
||
\usetikzlibrary{external}
|
||
\tikzexternalize[prefix=build/]
|
||
\input{lib/latex-common/common.tex}
|
||
\fi
|
||
|
||
\pgfplotsset{colorscheme/cel}
|
||
|
||
\newcommand{\figwidth}{\columnwidth}
|
||
\newcommand{\figheight}{0.7\columnwidth}
|
||
|
||
\pgfplotsset{
|
||
FERPlot/.style={
|
||
line width=1pt,
|
||
densely dashed,
|
||
},
|
||
BERPlot/.style={
|
||
line width=1pt,
|
||
},
|
||
DFRPlot/.style={
|
||
only marks,
|
||
},
|
||
}
|
||
|
||
%
|
||
% Bibliography
|
||
%
|
||
|
||
\addbibresource{letter.bib}
|
||
\AtBeginBibliography{\footnotesize}
|
||
|
||
|
||
%
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
% Title, Header, Footer, etc.
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
%
|
||
|
||
|
||
\begin{document}
|
||
|
||
|
||
\title{List-based Optimization of Proximal Decoding for Linear Block Codes}
|
||
|
||
\author{Andreas Tsouchlos, Holger Jäkel, and Laurent Schmalen
|
||
\thanks{The authors are with the Communications Engineering Lab (CEL), Karlsruhe Institute of Technology (KIT), corresponding author: \texttt{holger.jaekel@kit.edu}}}
|
||
|
||
\markboth{IEEE Communications Letters}{List-based Optimization of Proximal Decoding for Linear Block Codes}
|
||
|
||
\maketitle
|
||
|
||
|
||
%
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
% Abstract & Index Terms
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
%
|
||
|
||
|
||
\begin{abstract}
|
||
In this paper, the proximal decoding algorithm described in, e.g., \cite{proximal_paper}, is considered within the
|
||
context of \textit{additive white Gaussian noise} (AWGN) channels.
|
||
An analysis of the convergence behavior of the algorithm shows that
|
||
proximal decoding inherently enters an oscillating behavior of the estimate
|
||
after a certain number of iterations.
|
||
Due to this oscillation, frame errors arising during decoding can often
|
||
be attributed to only a few remaining wrongly decoded bit positions.
|
||
In this letter, an improvement of the proximal decoding algorithm is proposed
|
||
by establishing an additional step, in which these erroneous positions are
|
||
attempted to be corrected.
|
||
We suggest an empirical rule with which the components most likely needing
|
||
correction can be determined.
|
||
Using this insight and performing a subsequent ``ML-in-the-list'' decoding,
|
||
a gain of up to 1 dB is achieved compared to conventional
|
||
proximal decoding, depending on the decoder parameters and the code.
|
||
\end{abstract}
|
||
|
||
\begin{IEEEkeywords}
|
||
Optimization-based decoding, Proximal decoding, ML-in-the-list.
|
||
\end{IEEEkeywords}
|
||
|
||
|
||
%
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
% Content
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
%
|
||
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\section{Introduction}
|
||
|
||
\IEEEPARstart{C}{hannel} coding using binary linear codes is a way of enhancing
|
||
the reliability of data by detecting and correcting any errors that may occur
|
||
during its transmission or storage.
|
||
One class of binary linear codes, \textit{low-density parity-check} (LDPC)
|
||
codes, has become especially popular due to its ability to reach arbitrarily
|
||
small error probabilities at code rates up to the capacity of the channel
|
||
\cite{mackay99}, while retaining a structure that allows for very efficient
|
||
decoding.
|
||
While the established decoders for LDPC codes, such as belief propagation (BP)
|
||
and the min-sum algorithm, offer good decoding performance, they are generally
|
||
not optimal and exhibit an error floor for high
|
||
\textit{signal-to-noise ratios} (SNRs) \cite{channel_codes_book}, making them
|
||
inadequate for applications with extreme reliability requirements.
|
||
|
||
Optimization based decoding algorithms are an entirely different way of
|
||
approaching the decoding problem;
|
||
they map the decoding problem onto an optimization problem in order to
|
||
leverage the vast knowledge from the field of optimization theory.
|
||
A number of different such algorithms have been introduced.
|
||
The field of \textit{linear programming} (LP) decoding \cite{feldman_paper},
|
||
for example, represents one class of such algorithms, based on a relaxation
|
||
of the \textit{maximum likelihood} (ML) decoding problem as a linear program.
|
||
Many different optimization algorithms can be used to solve the resulting
|
||
problem \cite{ADMM, adaptive_lp_decoding, interior_point_decoding}.
|
||
Recently, proximal decoding for LDPC codes was presented by
|
||
Wadayama \textit{et al.} \cite{proximal_paper}.
|
||
Proximal decoding relies on a non-convex optimization formulation
|
||
of the \textit{maximum a posteriori} (MAP) decoding problem.
|
||
|
||
The aim of this work is to improve upon the performance of proximal decoding by
|
||
first presenting an analysis of the algorithm's behavior and then suggesting
|
||
an approach to mitigate some of its flaws.
|
||
This analysis is performed for
|
||
\textit{additive white Gaussian noise} (AWGN) channels.
|
||
We first observe that the algorithm initially moves the estimate in
|
||
the right direction; however, in the final steps of the decoding process,
|
||
convergence to the correct codeword is often not achieved.
|
||
Subsequently, we attributed this behavior to the nature
|
||
of the decoding algorithm itself, comprising two separate gradient descent
|
||
steps working adversarially.
|
||
|
||
We, thus, propose a method to mitigate this effect by appending an
|
||
additional step to the iterative decoding process.
|
||
In this additional step, the components of the estimate with the highest
|
||
probability of being erroneous are identified.
|
||
New codewords are then generated, over which an ``ML-in-the-list''
|
||
\cite{ml_in_the_list} decoding is performed.
|
||
A process to conduct this identification is proposed in this paper.
|
||
Using the improved algorithm, a gain of up to
|
||
$\SI{1}{dB}$ can be achieved compared to conventional proximal decoding,
|
||
depending on the decoder parameters and the code.
|
||
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\section{Preliminaries}
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%
|
||
\subsection{Notation}
|
||
|
||
When considering binary linear codes, data words are mapped onto
|
||
codewords, the lengths of which are denoted by $k \in \mathbb{N}$
|
||
and $n \in \mathbb{N}$, respectively, with $k \le n$.
|
||
The set of codewords $\mathcal{C} \subset \mathbb{F}_2^n$ of a binary linear
|
||
code can be characterized using the parity-check matrix
|
||
$\boldsymbol{H} \in \mathbb{F}_2^{m \times n} $, where $m$ represents the
|
||
number of parity-checks:
|
||
%
|
||
\begin{align*}
|
||
\mathcal{C} := \left\{ \boldsymbol{c} \in \mathbb{F}_2^n :
|
||
\boldsymbol{H}\boldsymbol{c}^\text{T} = \boldsymbol{0} \right\}
|
||
\end{align*}
|
||
%
|
||
|
||
The check nodes $j \in \mathcal{J}:=\left\{1, \ldots, m\right\}$ each
|
||
correspond to a parity check, i.e., a row of $\boldsymbol{H}$.
|
||
The variable nodes $i \in \mathcal{I}:=\left\{1, \ldots, n\right\}$ correspond
|
||
to the components of a codeword being subjected to a parity check, i.e.,
|
||
to the columns of $\boldsymbol{H}$.
|
||
The neighborhood of a parity check $j$, i.e., the set of indices of components
|
||
relevant for the according parity check, is denoted by
|
||
$\mathcal{N}_c(j) := \left\{i \in \mathcal{I}: \boldsymbol{H}\negthinspace_{j,i} = 1 \right\},
|
||
\hspace{2mm} j \in \mathcal{J}$.
|
||
|
||
In order to transmit a codeword $\boldsymbol{c} \in \mathbb{F}_2^n$, it is
|
||
mapped onto a \textit{binary phase shift keying} (BPSK) symbol via
|
||
$\boldsymbol{x} = 1 - 2\boldsymbol{c}$, with
|
||
$ \boldsymbol{x} \in \left\{\pm 1\right\}^n$, which is then transmitted over an
|
||
AWGN channel.
|
||
The received vector $\boldsymbol{y} \in \mathbb{R}^n$ is decoded to obtain an
|
||
estimate of the transmitted codeword, denoted as
|
||
$\hat{\boldsymbol{c}} \in \mathbb{F}_2^n$.
|
||
A distinction is made between $\boldsymbol{x} \in \left\{\pm 1\right\}^n$
|
||
and $\tilde{\boldsymbol{x}} \in \mathbb{R}^n$,
|
||
the former denoting the BPSK symbols transmitted over the channel and
|
||
the latter being used as a variable during the optimization process.
|
||
The posterior probability of having transmitted $\boldsymbol{x}$ when receiving
|
||
$\boldsymbol{y}$ is expressed as a \textit{probability mass function} (PMF)
|
||
$P_{\boldsymbol{X}\mid\boldsymbol{Y}}(\boldsymbol{x} \mid \boldsymbol{y})$.
|
||
Likewise, the likelihood of receiving $\boldsymbol{y}$ upon transmitting
|
||
$\boldsymbol{x}$ is expressed as a \textit{probability density function} (PDF)
|
||
$f_{\boldsymbol{Y}\mid\boldsymbol{X}}(\boldsymbol{y} \mid \boldsymbol{x})$.
|
||
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%
|
||
\subsection{Proximal Decoding}
|
||
|
||
Proximal decoding was proposed by Wadayama et al. as a novel formulation
|
||
of optimization-based decoding \cite{proximal_paper}.
|
||
With proximal decoding, the proximal gradient method \cite{proximal_algorithms}
|
||
is used to solve a non-convex optimization formulation of the MAP decoding
|
||
problem.
|
||
|
||
With the equal prior probability assumption for all codewords, MAP and ML
|
||
decoding are equivalent and, specifically for AWGN channels, correspond to a
|
||
nearest-neighbor decision.
|
||
For this reason, decoding can be carried out using a figure of merit that
|
||
describes the distance from a given vector to a codeword.
|
||
One such expression, formulated under the assumption of BPSK, is the
|
||
\textit{code-constraint polynomial} \cite{proximal_paper}
|
||
%
|
||
\begin{align*}
|
||
h( \tilde{\boldsymbol{x}} ) =
|
||
\underbrace{\sum_{i=1}^{n}
|
||
\left( \tilde{x}_i^2-1 \right) ^2}_{\text{Bipolar constraint}}
|
||
+ \underbrace{\sum_{j=1}^{m} \left[
|
||
\left( \prod_{i\in \mathcal{N}_c \left( j \right) } \tilde{x}_i \right)
|
||
-1 \right] ^2}_{\text{Parity constraint}}
|
||
.\end{align*}%
|
||
%
|
||
Its intent is to penalize vectors far from a codeword.
|
||
It comprises two terms: one representing the bipolar constraint due to transmitting BPSK
|
||
and one representing the parity constraint, incorporating all
|
||
information regarding the code.
|
||
|
||
The channel model can be considered using the negative log-likelihood
|
||
%
|
||
\begin{align*}
|
||
L \mleft( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \mright) = -\ln\mleft(
|
||
f_{\boldsymbol{Y} \mid \tilde{\boldsymbol{X}}} \mleft(
|
||
\boldsymbol{y} \mid \tilde{\boldsymbol{x}} \mright) \mright)
|
||
.\end{align*}
|
||
%
|
||
Then, the information about the channel and the code are consolidated in the objective
|
||
function \cite{proximal_paper}
|
||
%
|
||
\begin{align*}
|
||
g \mleft( \tilde{\boldsymbol{x}} \mright)
|
||
= L \mleft( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \mright)
|
||
+ \gamma h\mleft( \tilde{\boldsymbol{x}} \mright),
|
||
\hspace{5mm} \gamma > 0%
|
||
.\end{align*}
|
||
%
|
||
The objective function is minimized using the proximal gradient method, which
|
||
amounts to iteratively performing two gradient-descent steps \cite{proximal_paper}
|
||
with the given objective function and considering AWGN channels.
|
||
To this end, two helper variables, $\boldsymbol{r}$ and $\boldsymbol{s}$, are
|
||
introduced, describing the result of each of the two steps:
|
||
%
|
||
\begin{alignat}{3}
|
||
\boldsymbol{r} &\leftarrow \boldsymbol{s}
|
||
- \omega \mleft( \boldsymbol{s} - \boldsymbol{y} \mright)
|
||
\hspace{5mm }&&\omega > 0 \label{eq:r_update}\\
|
||
\boldsymbol{s} &\leftarrow \boldsymbol{r}
|
||
- \gamma \nabla h\mleft( \boldsymbol{r} \mright),
|
||
\hspace{5mm} &&\gamma > 0 \label{eq:s_update}
|
||
.\end{alignat}
|
||
%
|
||
An equation for determining $\nabla h(\boldsymbol{r})$ is given in
|
||
\cite{proximal_paper}, where it is also proposed to initialized $\boldsymbol{s}=\boldsymbol{0}$.
|
||
It should be noted that the variables $\boldsymbol{r}$ and $\boldsymbol{s}$
|
||
represent $\tilde{\boldsymbol{x}}$ during different
|
||
stages of the decoding process.
|
||
|
||
As the gradient of the code-constraint polynomial can attain very large values
|
||
in some cases, an additional step is introduced in \cite{proximal_paper} to ensure numerical stability:
|
||
every estimate $\boldsymbol{s}$ is projected onto
|
||
$\left[-\eta, \eta\right]^n$ by a projection
|
||
$\Pi_\eta : \mathbb{R}^n \rightarrow \left[-\eta, \eta\right]^n$, where $\eta$
|
||
is a positive constant larger than one, e.g., $\eta = 1.5$.
|
||
The resulting decoding process as described in \cite{proximal_paper} is
|
||
presented in Algorithm \ref{alg:proximal_decoding}.
|
||
|
||
\begin{algorithm}
|
||
\caption{Proximal decoding algorithm for an AWGN channel \cite{proximal_paper}.}
|
||
\label{alg:proximal_decoding}
|
||
|
||
\begin{algorithmic}
|
||
\STATE $\boldsymbol{s} \leftarrow \boldsymbol{0}$
|
||
\STATE \textbf{for} $K$ iterations \textbf{do}
|
||
\STATE \hspace{5mm} $\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $
|
||
\STATE \hspace{5mm} $\boldsymbol{s} \leftarrow \Pi_\eta \left(\boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right)$
|
||
\STATE \hspace{5mm} $\boldsymbol{\hat{c}} \leftarrow \mathbbm{1}_{\left\{ \boldsymbol{s} \le 0 \right\}}$
|
||
\STATE \hspace{5mm} \textbf{if} $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ \textbf{do}
|
||
\STATE \hspace{10mm} \textbf{return} $\boldsymbol{\hat{c}}$
|
||
\STATE \hspace{5mm} \textbf{end if}
|
||
\STATE \textbf{end for}
|
||
\STATE \textbf{return} $\boldsymbol{\hat{c}}$
|
||
\end{algorithmic}
|
||
\end{algorithm}
|
||
|
||
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\section{Improved algorithm}
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%
|
||
\subsection{Analysis of the Convergence Behavior}
|
||
|
||
In Fig. \ref{fig:fer vs ber}, the \textit{frame error rate} (FER),
|
||
\textit{bit error rate} (BER), and \textit{decoding failure rate} (DFR) of
|
||
proximal decoding are shown for an LDPC code with $n=204$ and $k=102$
|
||
\cite[204.33.484]{mackay}.
|
||
Hereby, a \emph{decoding failure} is defined as returning a \emph{non valid codeword}, i.e., as non-convergence of the algorithm.
|
||
The parameters chosen for this simulation are $\gamma=0.05, \omega=0.05,
|
||
\eta=1.5$ and $K=200$ ($K$ describing the maximum number of iterations).
|
||
They were determined to offer the best performance in a preliminary examination,
|
||
where the effect of changing multiple parameters was simulated over a wide
|
||
range of values.
|
||
It is apparent that the DFR completely dominates the FER after a certain SNR.
|
||
This means that most frame errors are not due to the algorithm converging
|
||
to the wrong codeword, but due to the algorithm not converging at all.
|
||
|
||
As proximal decoding is an optimization-based decoding method, one possible
|
||
explanation for this effect might be that during the decoding process, convergence
|
||
to the final codeword is often not achieved, although the estimate is moving into
|
||
the right direction.
|
||
This would suggest that most frame errors occur due to only a few incorrectly
|
||
decoded bits.%
|
||
%
|
||
\begin{figure}[t]
|
||
\centering
|
||
|
||
|
||
\ifoverleaf
|
||
\includegraphics{figs/letter-figure0.pdf}
|
||
\else
|
||
\begin{tikzpicture}
|
||
\begin{axis}[
|
||
grid=both,
|
||
xlabel={$E_\text{b} / N_0$ (dB)}, ylabel={},
|
||
ymode=log,
|
||
xmin=1, xmax=8,
|
||
ymax=1, ymin=1e-6,
|
||
% ytick={1e-0, 1e-2, 1e-4, 1e-6},
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
legend pos = south west,
|
||
]
|
||
\addplot+[FERPlot, mark=o, mark options={solid}, scol1]
|
||
table [x=SNR, y=FER, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{9}]
|
||
{res/proximal_ber_fer_dfr_20433484.csv};
|
||
\addlegendentry{FER}
|
||
\addplot+[BERPlot, mark=*, scol1]
|
||
table [x=SNR, y=BER, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{7.5}]
|
||
{res/proximal_ber_fer_dfr_20433484.csv};
|
||
\addlegendentry{BER}
|
||
\addplot+[DFRPlot, mark=square*, scol0]
|
||
table [x=SNR, y=DFR, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{9}]
|
||
{res/proximal_ber_fer_dfr_20433484.csv};
|
||
\addlegendentry{DFR}
|
||
\end{axis}
|
||
\end{tikzpicture}
|
||
\fi
|
||
|
||
\caption{FER, DFR, and BER for $\left( 3, 6 \right)$-regular LDPC code with
|
||
$n=204, k=102$ \cite[\text{204.33.484}]{mackay}.
|
||
Parameters used for simulation: $\gamma =0.05,\omega = 0.05,
|
||
\eta = 1.5, K=200$.
|
||
}
|
||
\label{fig:fer vs ber}
|
||
\end{figure}%
|
||
%
|
||
|
||
An approach for lowering the FER might then be to add an ``ML-in-the-list''
|
||
\cite{ml_in_the_list} step to the decoding process shown in Algorithm
|
||
\ref{alg:proximal_decoding}.
|
||
This step consists in determining the $N \in \mathbb{N}$ most probably
|
||
erroneous bit positions $\mathcal{I}'$, generating a list of $2^N$ codeword candidates out of the current estimate $\hat{\boldsymbol{c}}$ with bits in $\mathcal{I}'$ adopting all possible values, i.e., $\mathcal{L}'=\left\{ \hat{\boldsymbol{c}}'\in\mathbb{F}_2^n: \hat{c}'_i=\hat{c}_i, i\notin \mathcal{I}'\text{ and } \hat{c}'_i\in\mathbb{F}_2, i\in \mathcal{I}' \right\}$, and performing ML decoding on this list.
|
||
|
||
This approach crucially relies on identifying the most probably erroneous bits.
|
||
Therefore, the convergence properties of proximal decoding are investigated.
|
||
Considering (\ref{eq:s_update}) and (\ref{eq:r_update}), Fig.
|
||
\ref{fig:grad} shows the two gradients along which the minimization is
|
||
performed for a repetition code with $n=2$.
|
||
It is apparent that a net movement will result as long as the two gradients
|
||
have a common component.
|
||
As soon as this common component is exhausted, they will work in opposing
|
||
directions resulting in an oscillation of the estimate.
|
||
This behavior supports the conjecture that the reason for the high DFR is a
|
||
failure to converge to the correct codeword in the final steps of the
|
||
optimization process.%
|
||
%
|
||
\begin{figure}[t]
|
||
\centering
|
||
|
||
\ifoverleaf
|
||
\includegraphics{figs/letter-figure1.pdf}
|
||
\else
|
||
\begin{tikzpicture}
|
||
\begin{axis}[xmin = -1.25, xmax=1.25,
|
||
ymin = -1.25, ymax=1.25,
|
||
xlabel={$\tilde{x}_1$},
|
||
ylabel={$\tilde{x}_2$},
|
||
y label style={at={(axis description cs:-0.06,0.5)},anchor=south},
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
grid=major, grid style={dotted},
|
||
view={0}{90}]
|
||
\addplot3[point meta=\thisrow{grad_norm},
|
||
point meta min=1,
|
||
point meta max=2.5,
|
||
quiver={u=\thisrow{grad_0},
|
||
v=\thisrow{grad_1},
|
||
scale arrows=.05,
|
||
every arrow/.append style={%
|
||
line width=.3
|
||
+\pgfplotspointmetatransformed/1000,
|
||
-{Latex[length=0pt 5,width=0pt 3]}
|
||
},
|
||
},
|
||
quiver/colored = {mapped color},
|
||
-stealth,
|
||
]
|
||
table[col sep=comma] {res/2d_grad_L.csv};
|
||
\end{axis}
|
||
\begin{axis}[hide axis,
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
xmin=10, xmax=50,
|
||
ymin=0, ymax=0.4,
|
||
legend style={draw=white!15!black,
|
||
legend cell align=left,
|
||
empty legend,
|
||
at={(0.9775,0.97)},anchor=north east}]
|
||
\addlegendimage{mark=none}
|
||
\addlegendentry{
|
||
$\nabla L\left(\boldsymbol{y}
|
||
\mid \tilde{\boldsymbol{x}}\right)$
|
||
};
|
||
\end{axis}
|
||
\end{tikzpicture}
|
||
\fi
|
||
|
||
\vspace{3mm}
|
||
|
||
\ifoverleaf
|
||
\includegraphics{figs/letter-figure2.pdf}
|
||
\else
|
||
\begin{tikzpicture}
|
||
\begin{axis}[xmin = -1.25, xmax=1.25,
|
||
ymin = -1.25, ymax=1.25,
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
xlabel={$\tilde{x}_1$},
|
||
ylabel={$\tilde{x}_2$},
|
||
y label style={at={(axis description cs:-0.06,0.5)},anchor=south},
|
||
grid=major, grid style={dotted},
|
||
view={0}{90}]
|
||
\addplot3[point meta=\thisrow{grad_norm},
|
||
point meta min=1,
|
||
point meta max=7,
|
||
quiver={u=\thisrow{grad_0},
|
||
v=\thisrow{grad_1},
|
||
scale arrows=.03,
|
||
every arrow/.append style={%
|
||
line width=.5
|
||
+\pgfplotspointmetatransformed/1000,
|
||
-{Latex[length=0pt 5,width=0pt 3]}
|
||
},
|
||
},
|
||
quiver/colored = {mapped color},
|
||
-stealth,
|
||
]
|
||
table[col sep=comma] {res/2d_grad_h.csv};
|
||
\end{axis}
|
||
\begin{axis}[hide axis,
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
xmin=10, xmax=50,
|
||
ymin=0, ymax=0.4,
|
||
legend style={draw=white!15!black,
|
||
legend cell align=left,
|
||
empty legend,
|
||
at={(0.9775,0.97)},anchor=north east}]
|
||
\addlegendimage{mark=none}
|
||
\addlegendentry{$\nabla h\left(\tilde{\boldsymbol{x}}\right)$};
|
||
\end{axis}
|
||
\end{tikzpicture}
|
||
\fi
|
||
\caption{Gradients
|
||
$\nabla L\left(\boldsymbol{y} \mid \tilde{\boldsymbol{x}}\right)$
|
||
and $\nabla h \left( \tilde{\boldsymbol{x}} \right)$ for a repetition
|
||
code with $n=2$.
|
||
Shown for $\boldsymbol{y} = \begin{pmatrix} -0.5 & 0.8 \end{pmatrix}$.
|
||
}
|
||
\label{fig:grad}
|
||
\end{figure}%
|
||
%
|
||
|
||
In Fig. \ref{fig:prox:convergence_large_n}, we consider only component
|
||
$\left(\tilde{\boldsymbol{x}}\right)_1$ of the estimate during a
|
||
decoding operation for the LDPC code used also for Fig. 1.
|
||
Two qualities may be observed.
|
||
First, we observe that the average absolute values of the two gradients are equal,
|
||
however, they have opposing signs,
|
||
leading to the aforementioned oscillation.
|
||
Second, the gradient of the code constraint polynomial itself starts to
|
||
oscillate after a certain number of iterations.%
|
||
%
|
||
\begin{figure}
|
||
\centering
|
||
|
||
\ifoverleaf
|
||
\includegraphics{figs/letter-figure3.pdf}
|
||
\else
|
||
\begin{tikzpicture}
|
||
\begin{axis}[
|
||
grid=both,
|
||
xlabel={Iterations},
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
xtick={0, 100, ..., 400},
|
||
xticklabels={0, 50, ..., 200},
|
||
xmin=0, xmax=300,
|
||
ymin=-4, ymax=2,
|
||
ytick={-4,-3,...,2},
|
||
legend pos = south east,
|
||
]
|
||
\addplot+ [mark=none, line width=1]
|
||
table [col sep=comma, x=k, y=comb_r_s_0,
|
||
discard if gt={k}{300}]
|
||
{res/extreme_components_20433484_combined.csv};
|
||
\addplot+ [mark=none, line width=1,
|
||
discard if gt={k}{300}]
|
||
table [col sep=comma, x=k, y=grad_L_0]
|
||
{res/extreme_components_20433484_combined.csv};
|
||
\addplot+ [mark=none, line width=1]
|
||
table [col sep=comma, x=k, y=grad_h_0,
|
||
discard if gt={k}{300}]
|
||
{res/extreme_components_20433484_combined.csv};
|
||
\addlegendentry{$\left(\tilde{\boldsymbol{x}}\right)_1$}
|
||
\addlegendentry{$\left(\nabla L\right)_1$}
|
||
\addlegendentry{$\left(\nabla h\right)_1$}
|
||
\end{axis}
|
||
\end{tikzpicture}
|
||
\fi
|
||
|
||
\caption{Visualization of component $\left(\tilde{\boldsymbol{x}}\right)_1$
|
||
for a decoding operation for a (3,6) regular LDPC code with
|
||
$n=204, k=102$ \cite[\text{204.33.484}]{mackay}.
|
||
Parameters used for simulation: $\gamma = 0.05, \omega = 0.05,
|
||
\eta = 1.5, E_b/N_0 = \SI{4}{dB}$.
|
||
}
|
||
\label{fig:prox:convergence_large_n}
|
||
\end{figure}%
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%
|
||
\subsection{Improvement Using ``ML-in-the-List'' Step}
|
||
|
||
Considering the magnitude of the oscillation of the gradient of the code constraint
|
||
polynomial, some interesting behavior may be observed. Let $\boldsymbol{i}'=(i'_1, \ldots, i_n')$ be a permutation of $\{1,\ldots, n\}$ such that $\left(\nabla h\right)_{i'}$ is arranged according to increasing variance of oscillation of its magnitude, i.e., $\text{Var}_\text{iter}(|\left(\nabla h\right)_{i'_1}|)\leq \cdots \leq \text{Var}_\text{iter}(|\left(\nabla h\right)_{i'_n}|)$ with $\text{Var}_\text{iter}(\cdot)$ denoting the empirical variance along the iterations.
|
||
|
||
Hereafter, Fig. \ref{fig:p_error} shows Monte Carlo simulations of the probability that decoded bit $\hat{c}_i'$ at position $i'$ of the estimated codeword
|
||
is wrong. %, when the components of
|
||
%$\boldsymbol{c}$ are ordered from smallest to largest oscillation of
|
||
%$\left(\nabla h\right)_i$.
|
||
It can be observed that lower magnitudes of oscillation correlate with higher probability that the corresponding bit was not decoded correctly.
|
||
Thus, this magnitude might be used as a feasible indicator
|
||
%for determining the probability that a given component was decoded incorrectly and, thus,
|
||
for identifying erroneously decoded bit positions as $\mathcal{I}'=\{i_1', \ldots, i_N'\}$.%
|
||
%
|
||
\begin{figure}[H]
|
||
\centering
|
||
|
||
\ifoverleaf
|
||
\includegraphics{figs/letter-figure4.pdf}
|
||
\else
|
||
\begin{tikzpicture}
|
||
\begin{axis}[
|
||
grid=both,
|
||
ylabel=$P(\hat{c}_{i'} \ne c_{i'})$,
|
||
xlabel=$i'$,
|
||
ymode=log,
|
||
ymin=8e-9,ymax=1e-5,
|
||
xmin=0,xmax=200,
|
||
width=0.95\figwidth,
|
||
height=\figheight,
|
||
]
|
||
\addplot+ [scol0, mark=none, line width=1]
|
||
table [col sep=comma, y=p_error]{res/p_error.csv};
|
||
\end{axis}
|
||
\end{tikzpicture}
|
||
\fi
|
||
|
||
\caption{Probability that a component of the estimated codeword
|
||
$\boldsymbol{\hat{c}}\in \mathbb{F}_2^n$ is erroneous for a (3,6) regular
|
||
LDPC code with $n=204, k=102$ \cite[\text{204.33.484}]{mackay}.
|
||
Indices $i'$ are ordered such that $|\left(\nabla h\right)_{i'_1}|\leq \cdots \leq |\left(\nabla h\right)_{i'_n}|$.
|
||
Parameters used for the simulation: $\gamma = 0.05, \omega = 0.05,
|
||
\eta = 1.5, E_b/N_0 = \SI{4}{dB}$.
|
||
Simulated with $\SI{100000000}{}$ iterations using the all-zeros codeword.}
|
||
\label{fig:p_error}
|
||
\end{figure}
|
||
|
||
The complete improved algorithm is given in Algorithm \ref{alg:improved}.
|
||
First, the proximal decoding algorithm is applied.
|
||
If a valid codeword has been reached, i.e., if the algorithm has converged,
|
||
we return this solution.
|
||
Otherwise, $N \in \mathbb{N}$ components are selected based on the criterion
|
||
presented above.
|
||
Originating from $\boldsymbol{\hat{c}} \in \mathbb{F}_2^n$ resulting from proximal decoding,
|
||
the list $\mathcal{L}'$ of codeword candidates with bits in $\mathcal{I}'$ modified is
|
||
generated and an ``ML-in-the-list'' step is performed.
|
||
|
||
\begin{algorithm}
|
||
\caption{Improved proximal decoding algorithm.
|
||
}
|
||
\label{alg:improved}
|
||
|
||
\begin{algorithmic}
|
||
\STATE $\boldsymbol{s} \leftarrow \boldsymbol{0}$
|
||
\STATE \textbf{for} $K$ iterations \textbf{do}
|
||
\STATE \hspace{5mm} $\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $
|
||
\STATE \hspace{5mm} $\boldsymbol{s} \leftarrow \Pi_\eta \left(\boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right)$
|
||
\STATE \hspace{5mm} $\boldsymbol{\hat{c}} \leftarrow \mathds{1}_{ \left\{ \boldsymbol{s} \leq 0 \right\}}$
|
||
\STATE \hspace{5mm} \textbf{if} $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ \textbf{do}
|
||
\STATE \hspace{10mm} \textbf{return} $\boldsymbol{\hat{c}}$
|
||
\STATE \hspace{5mm} \textbf{end if}
|
||
\STATE \textbf{end for}
|
||
\STATE $\textcolor{KITblue}{\text{$\mathcal{I}'\leftarrow \{i_1',\ldots, i_N'\}$ (indices of $N$ probably wrong bits)
|
||
%$\mathcal{I} = \{i_1,\ldots,i_N\}$
|
||
}
|
||
}$
|
||
|
||
\STATE $\textcolor{KITblue}{\text{%Generate candidates
|
||
$\mathcal{L}'\leftarrow\left\{ \boldsymbol{\hat{c}}'\in\mathbb{F}_2^n: \hat{c}'_i=\hat{c}_i, i\notin \mathcal{I}' \text{ and } \hat{c}'_i\in\mathbb{F}_2, i\in \mathcal{I}' \right\}
|
||
%\left(\boldsymbol{\hat{c}}_{l}\right)_{l=1}^{2^N}
|
||
$
|
||
%by varying bits in $\mathcal{I}$
|
||
}}
|
||
$\vspace{1mm}
|
||
%\STATE \hspace{20mm} \textcolor{KITblue}{(list of codeword candidates)}
|
||
\STATE $\textcolor{KITblue}{\textbf{return ML\textunderscore in\textunderscore the\textunderscore list}\left(
|
||
%\left(\boldsymbol{\hat{c}}_l\right)_{1=1}^{2^N}
|
||
\mathcal{L}'
|
||
\right)}$
|
||
\end{algorithmic}
|
||
\end{algorithm}
|
||
|
||
\begin{algorithm}
|
||
\caption{ML-in-the-List algorithm.}
|
||
\label{alg:ml-in-the-list}
|
||
|
||
\begin{algorithmic}
|
||
\STATE $\mathcal{L}'_\text{valid} \leftarrow \{ \boldsymbol{\hat{c}}'\in\mathcal{L}': \boldsymbol{H}\boldsymbol{\hat{c}}'=\boldsymbol{0}\}$ (select valid codewords)
|
||
% Find valid codewords within $\mathcal{L}'$
|
||
%under $\left(\boldsymbol{\hat{c}}_{l}\right)_{1=1}^{2^N}$
|
||
\STATE \textbf{if} $\mathcal{L}'_\text{valid}\neq\emptyset$ \textbf{do}
|
||
%no valid codewords exist
|
||
\STATE \hspace{5mm}
|
||
\textbf{return} $\arg\max \{ \langle 1-2\boldsymbol{\hat{c}}'_l, \boldsymbol{y} \rangle : \boldsymbol{\hat{c}}'_l\in\mathcal{L}'_\text{valid}\}$
|
||
%Compute $\langle \boldsymbol{\hat{c}}'_l, \boldsymbol{\hat{c}} \rangle$ for all variations $\boldsymbol{\hat{c}}'_l\in\mathcal{L}$
|
||
\STATE \textbf{else}
|
||
\STATE \hspace{5mm}
|
||
\textbf{return} $\arg\max \{ \langle 1-2 \boldsymbol{\hat{c}}'_l, \boldsymbol{y} \rangle : \boldsymbol{\hat{c}}'_l\in\mathcal{L}'\}$
|
||
%Compute $\langle \boldsymbol{\hat{c}}'_l, \boldsymbol{\hat{c}} \rangle$ for valid codewords $\boldsymbol{\hat{c}}'_l\in\mathcal{L}$
|
||
\STATE \textbf{end if}
|
||
%\STATE \textbf{return} $\boldsymbol{\hat{c}}_l$ with highest $\langle \boldsymbol{\hat{c}}_l, \boldsymbol{\hat{c}} \rangle$
|
||
\end{algorithmic}
|
||
\end{algorithm}%
|
||
%
|
||
|
||
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\section{Simulation Results \& Discussion}
|
||
|
||
Fig. \ref{fig:results} shows the FER and BER resulting from applying
|
||
proximal decoding as presented in \cite{proximal_paper} and the improved
|
||
algorithm presented in this work, when both are applied to a $\left( 3,6 \right)$-regular LDPC
|
||
code with $n=204$ and $k=102$ \cite[204.33.484]{mackay}.
|
||
The parameters chosen for the simulation are
|
||
$\gamma = 0.05, \omega=0.05, \eta=1.5, K=200$.
|
||
Again, these parameters were chosen,
|
||
as a preliminary examination
|
||
showed that they provide the best results for proximal decoding as well as
|
||
the improved algorithm.
|
||
All points were generated by simulating at least 100 frame errors.
|
||
The number of possibly wrong components selected was selected as $N=8$,
|
||
since this provides reasonable gain without requiring an unreasonable amount
|
||
of memory and computational resources.
|
||
%
|
||
\begin{figure}
|
||
\centering
|
||
|
||
\ifoverleaf
|
||
\includegraphics{figs/letter-figure5.pdf}
|
||
\else
|
||
\newcommand{\lineintext}[1]{%
|
||
\begin{tikzpicture}
|
||
\draw[#1] (0,0) -- (1.5em,0);
|
||
|
||
% Dummy node taking up the space of a letter to fix spacing
|
||
\node[outer sep=0, inner sep=0] () at (0.75em,0) {\phantom{a}};
|
||
\end{tikzpicture}%
|
||
}
|
||
|
||
\begin{tikzpicture}
|
||
\begin{axis}[
|
||
grid=both,
|
||
xlabel={$E_\text{b} / N_0$ (dB)},
|
||
ymode=log,
|
||
xmin=1, xmax=8,
|
||
ymax=1, ymin=1e-6,
|
||
width=\figwidth,
|
||
height=\figheight,
|
||
legend pos=north east,
|
||
ylabel={BER (\lineintext{}), FER (\lineintext{dashed})},
|
||
]
|
||
\addplot+[FERPlot, mark=o, mark options={solid}, scol0, forget plot]
|
||
table [x=SNR, y=FER, col sep=comma,
|
||
discard if gt={SNR}{9}]
|
||
{res/bp_20433484.csv};
|
||
|
||
\addplot+[BERPlot, mark=*, scol0]
|
||
table [x=SNR, y=BER, col sep=comma,
|
||
discard if gt={SNR}{7.5}]
|
||
{res/bp_20433484.csv};
|
||
\addlegendentry{BP};
|
||
|
||
\addplot+[FERPlot, mark=o, mark options={solid}, scol1, forget plot]
|
||
table [x=SNR, y=FER, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{9}]
|
||
{res/proximal_ber_fer_dfr_20433484.csv};
|
||
|
||
\addplot+[BERPlot, mark=*, scol1]
|
||
table [x=SNR, y=BER, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{7.5}]
|
||
{res/proximal_ber_fer_dfr_20433484.csv};
|
||
\addlegendentry{Prox. dec.};
|
||
|
||
\addplot+[FERPlot, mark=triangle, mark options={solid}, scol2, forget plot]
|
||
table [x=SNR, y=FER, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{7.5}]
|
||
{res/improved_ber_fer_dfr_20433484.csv};
|
||
|
||
\addplot+[BERPlot, mark=triangle*, scol2]
|
||
table [x=SNR, y=BER, col sep=comma,
|
||
discard if not={gamma}{0.05},
|
||
discard if gt={SNR}{6.5}]
|
||
{res/improved_ber_fer_dfr_20433484.csv};
|
||
\addlegendentry{Improved};
|
||
\end{axis}
|
||
\end{tikzpicture}
|
||
\fi
|
||
|
||
\caption{FER and BER of proximal decoding \cite{proximal_paper} and the
|
||
improved algorithm for a $\left( 3, 6 \right)$-regular LDPC code with
|
||
$n=204, k=102$ \cite[\text{204.33.484}]{mackay}.
|
||
Parameters used for simulation: $\gamma=0.05, \omega=0.05, \eta=1.5,
|
||
K=200, N=8$.
|
||
}
|
||
|
||
\label{fig:results}
|
||
\end{figure}%
|
||
%
|
||
|
||
A noticeable improvement can be observed both in the FER as well as the BER.
|
||
The gain varies significantly
|
||
with the SNR, which is to be expected since higher SNR values result in a decreased number
|
||
of bit errors, making the correction of those errors in the
|
||
``ML-in-the-list'' step more likely.
|
||
For an FER of $10^{-6}$, the gain is approximately $\SI{1}{dB}$.
|
||
Similar behavior was observed with a number of different codes, e.g., \cite[\text{PEGReg252x504, 204.55.187, 96.3.965}]{mackay}.
|
||
Furthermore, no immediate relationship between the code length and the gain was observed
|
||
during our examinations.
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\section{Conclusion}
|
||
|
||
In this paper, an improvement on proximal decoding as presented by
|
||
Wadayama et al. \cite{proximal_paper} is introduced for AWGN channels.
|
||
It relies on the fact that most errors observed in proximal decoding stem
|
||
from only a few components of the estimate being wrong.
|
||
These few erroneous components can mostly be corrected by appending an
|
||
additional step to the original algorithm that is only executed if the
|
||
algorithm has not converged.
|
||
A gain of up to $\SI{1}{dB}$ can be observed, depending on the code,
|
||
the parameters considered, and the SNR.
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\section{Acknowledgements}
|
||
|
||
This work has received funding in part from the European Research Council
|
||
(ERC) under the European Union’s Horizon 2020 research and innovation
|
||
programme (grant agreement No. 101001899) and in part from the German Federal
|
||
Ministry of Education and Research (BMBF) within the project Open6GHub
|
||
(grant agreement 16KISK010).
|
||
|
||
|
||
%
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
% Bibliography
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
%
|
||
|
||
|
||
\printbibliography
|
||
|
||
\end{document}
|
||
|