Reworked proximal decoding

This commit is contained in:
Andreas Tsouchlos 2023-02-19 14:01:49 +01:00
parent 51bec2b5a7
commit 252736ff31

View File

@ -34,8 +34,8 @@ The goal is to arrive at a formulation, where a certain objective function
$f$ has to be minimized under certain constraints:% $f$ has to be minimized under certain constraints:%
% %
\begin{align*} \begin{align*}
\text{minimize } f\left( \boldsymbol{c} \right)\\ \text{minimize}\hspace{2mm} &f\left( \boldsymbol{c} \right)\\
\text{subject to $\boldsymbol{c} \in D$} \text{subject to}\hspace{2mm} &\boldsymbol{c} \in D
,\end{align*}% ,\end{align*}%
% %
where $D$ is the domain of values attainable for $c$ and represents the where $D$ is the domain of values attainable for $c$ and represents the
@ -256,7 +256,7 @@ the transfer matrix would be $\boldsymbol{T}_j =
0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
\end{bmatrix} $ (example taken from \cite[Sec. II, A]{efficient_lp_dec_admm})}% \end{bmatrix} $ (example taken from \cite[Sec. II, A]{efficient_lp_dec_admm})}
(i.e. the relevant components of $\boldsymbol{c}$ for parity-check $j$) (i.e. the relevant components of $\boldsymbol{c}$ for parity-check $j$)
and $\mathcal{P}_{d}$ is the \textit{check polytope}, the convex hull of all and $\mathcal{P}_{d}$ is the \textit{check polytope}, the convex hull of all
binary vectors of length $d$ with even parity% binary vectors of length $d$ with even parity%
@ -274,6 +274,22 @@ Figures \ref{fig:dec:poly:local1} and \ref{fig:dec:poly:local2} show the local
codeword polytopes of each check node. codeword polytopes of each check node.
Their intersection, the relaxed codeword polytope $\overline{Q}$, is shown in Their intersection, the relaxed codeword polytope $\overline{Q}$, is shown in
figure \ref{fig:dec:poly:relaxed}. figure \ref{fig:dec:poly:relaxed}.
It can be seen, that the relaxed codeword polytope $\overline{Q}$ introduces
vertices with fractional values;
these represent erroneous non-codeword solutions to the linear program and
correspond to the so-called \textit{pseudocodewords} introduced in
\cite{feldman_paper}.
However, since for \ac{LDPC} codes $\overline{Q}$ scales linearly with $n$ instead of
exponentially, it is a lot more tractable for practical applications.
The resulting formulation of the relaxed optimization problem is the following:%
%
\begin{align*}
\text{minimize }\hspace{2mm} &\sum_{i=1}^{n} \gamma_i c_i \\
\text{subject to }\hspace{2mm} &\boldsymbol{T}_j \boldsymbol{c} \in \mathcal{P}_{d_j},
\hspace{5mm}j\in\mathcal{J}
.\end{align*}%
%
% %
% %
% Codeword polytope visualization figure % Codeword polytope visualization figure
@ -566,22 +582,6 @@ figure \ref{fig:dec:poly:relaxed}.
\label{fig:dec:poly} \label{fig:dec:poly}
\end{figure}% \end{figure}%
% %
It can be seen, that the relaxed codeword polytope $\overline{Q}$ introduces
vertices with fractional values;
these represent erroneous non-codeword solutions to the linear program and
correspond to the so-called \textit{pseudocodewords} introduced in
\cite{feldman_paper}.
However, since for \ac{LDPC} codes $\overline{Q}$ scales linearly with $n$ instead of
exponentially, it is a lot more tractable for practical applications.
The resulting formulation of the relaxed optimization problem is the following:%
%
\begin{align*}
\text{minimize }\hspace{2mm} &\sum_{i=1}^{n} \gamma_i c_i \\
\text{subject to }\hspace{2mm} &\boldsymbol{T}_j \boldsymbol{c} \in \mathcal{P}_{d_j}
\hspace{5mm}j\in\mathcal{J}
.\end{align*}%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -599,14 +599,16 @@ The resulting formulation of the relaxed optimization problem is the following:%
\section{Proximal Decoding}% \section{Proximal Decoding}%
\label{sec:dec:Proximal Decoding} \label{sec:dec:Proximal Decoding}
Proximal decoding was proposed by Wadayama et. al \cite{proximal_paper}. Proximal decoding was proposed by Wadayama et. al as a novel formulation of
With this decoding algorithm, the objective function is minimized using optimization based decoding \cite{proximal_paper}.
the proximal gradient method. With this algorithm, minimization is performed using the proximal gradient
method.
In contrast to \ac{LP} decoding, the objective function is based on a In contrast to \ac{LP} decoding, the objective function is based on a
non-convex optimization formulation of the \ac{MAP} decoding problem. non-convex optimization formulation of the \ac{MAP} decoding problem.
In order to derive the objective function, the authors reformulate the In order to derive the objective function, the authors begin with the
\ac{MAP} decoding problem:% \ac{MAP} decoding rule, expressed as a continuous minimization problem over
$\boldsymbol{x}$:%
% %
\begin{align} \begin{align}
\hat{\boldsymbol{x}} = \argmax_{\boldsymbol{x} \in \mathbb{R}^{n}} \hat{\boldsymbol{x}} = \argmax_{\boldsymbol{x} \in \mathbb{R}^{n}}
@ -616,19 +618,37 @@ In order to derive the objective function, the authors reformulate the
\left( \boldsymbol{y} \mid \boldsymbol{x} \right) \left( \boldsymbol{y} \mid \boldsymbol{x} \right)
f_{\boldsymbol{X}}\left( \boldsymbol{x} \right)% f_{\boldsymbol{X}}\left( \boldsymbol{x} \right)%
\label{eq:prox:vanilla_MAP} \label{eq:prox:vanilla_MAP}
.\end{align}%
%
The likelihood $f_{\boldsymbol{Y} \mid \boldsymbol{X}}
\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ is a known function
determined by the channel model.
The prior \ac{PDF} $f_{\boldsymbol{X}}\left( \boldsymbol{x} \right)$ is also
known, as the equal probability assumption is made on
$\mathcal{C}\left( \boldsymbol{H} \right)$.
However, because in this case the considered domain is continuous,
the prior \ac{PDF} cannot be ignored as a constant during the minimization
as is often done, and has a rather unwieldy representation:%
%
\begin{align}
f_{\boldsymbol{X}}\left( \boldsymbol{x} \right) =
\frac{1}{\left| \mathcal{C}\left( \boldsymbol{H} \right) \right| }
\sum_{c \in \mathcal{C}\left( \boldsymbol{H} \right) }
\delta\left( \boldsymbol{x} - \left( -1 \right) ^{\boldsymbol{c}}\right)
\label{eq:prox:prior_pdf}
\end{align}% \end{align}%
% %
The likelihood is usually a known function determined by the channel model.
In order to rewrite the prior \ac{PDF} In order to rewrite the prior \ac{PDF}
$f_{\boldsymbol{X}}\left( \boldsymbol{x} \right)$, $f_{\boldsymbol{X}}\left( \boldsymbol{x} \right)$,
the so-called \textit{code-constraint polynomial} is introduced:% the so-called \textit{code-constraint polynomial} is introduced:%
% %
\begin{align} \begin{align*}
h\left( \boldsymbol{x} \right) = \sum_{j=1}^{n} \left( x_j^2-1 \right) ^2 h\left( \boldsymbol{x} \right) =
+ \sum_{i=1}^{m} \left[ \underbrace{\sum_{j=1}^{n} \left( x_j^2-1 \right) ^2}_{\text{Bipolar constraint}}
\left( \prod_{j\in \mathcal{A}\left( i \right) } x_j \right) -1 \right] ^2% + \underbrace{\sum_{i=1}^{m} \left[
\label{eq:prox:ccp} \left( \prod_{j\in \mathcal{A}
\end{align}% \left( i \right) } x_j \right) -1 \right] ^2}_{\text{Parity Constraint}}%
.\end{align*}%
% %
The intention of this function is to provide a way to penalize vectors far The intention of this function is to provide a way to penalize vectors far
from a codeword and favor those close to a codeword. from a codeword and favor those close to a codeword.
@ -636,69 +656,74 @@ In order to achieve this, the polynomial is composed of two parts: one term
representing the bibolar constraint, providing for a discrete solution of the representing the bibolar constraint, providing for a discrete solution of the
continuous optimization problem, and one term representing the parity continuous optimization problem, and one term representing the parity
constraint, accomodating the role of the parity-check matrix $\boldsymbol{H}$. constraint, accomodating the role of the parity-check matrix $\boldsymbol{H}$.
%
The equal probability assumption is made on $\mathcal{C}\left( \boldsymbol{H} \right) $.
The prior \ac{PDF} is then approximated using the code-constraint polynomial:% The prior \ac{PDF} is then approximated using the code-constraint polynomial:%
% %
\begin{align} \begin{align}
f_{\boldsymbol{X}}\left( \boldsymbol{x} \right) = f_{\boldsymbol{X}}\left( \boldsymbol{x} \right)
\frac{1}{\left| \mathcal{C}\left( \boldsymbol{H} \right) \right| }
\sum_{c \in \mathcal{C}\left( \boldsymbol{H} \right) }
\delta\left( \boldsymbol{x} - \left( -1 \right) ^{\boldsymbol{c}}\right)
\approx \frac{1}{Z}e^{-\gamma h\left( \boldsymbol{x} \right) }% \approx \frac{1}{Z}e^{-\gamma h\left( \boldsymbol{x} \right) }%
\label{eq:prox:prior_pdf_approx} \label{eq:prox:prior_pdf_approx}
\end{align}% .\end{align}%
% %
The authors justify this approximation by arguing that for The authors justify this approximation by arguing that for
$\gamma \rightarrow \infty$, the right-hand side aproaches the left-hand $\gamma \rightarrow \infty$, the approximation in equation
side. In equation \ref{eq:prox:vanilla_MAP}, the prior \ac{PDF} \ref{eq:prox:prior_pdf_approx} aproaches the original fuction in equation
$f_{\boldsymbol{X}}\left( \boldsymbol{x} \right) $ can then be subsituted \ref{eq:prox:prior_pdf}.
for equation \ref{eq:prox:prior_pdf_approx} and the likelihood can be rewritten using This approximation can then be plugged into equation \ref{eq:prox:vanilla_MAP}
the negative log-likelihood and the likelihood can be rewritten using the negative log-likelihood
$L \left( \boldsymbol{y} \mid \boldsymbol{x} \right) = -\ln\left( $L \left( \boldsymbol{y} \mid \boldsymbol{x} \right) = -\ln\left(
f_{\boldsymbol{X} \mid \boldsymbol{Y}}\left( f_{\boldsymbol{Y} \mid \boldsymbol{X}}\left(
\boldsymbol{x} \mid \boldsymbol{y} \right) \right) $:% \boldsymbol{y} \mid \boldsymbol{x} \right) \right) $:%
% %
\begin{align} \begin{align*}
\hat{\boldsymbol{x}} &= \argmax_{\boldsymbol{x} \in \mathbb{R}^{n}} \hat{\boldsymbol{x}} &= \argmax_{\boldsymbol{x} \in \mathbb{R}^{n}}
e^{- L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) } e^{- L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) }
e^{-\gamma h\left( \boldsymbol{x} \right) } \nonumber \\ e^{-\gamma h\left( \boldsymbol{x} \right) } \\
&= \argmin_{\boldsymbol{x} \in \mathbb{R}^n} \left( &= \argmin_{\boldsymbol{x} \in \mathbb{R}^n} \left(
L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) L\left( \boldsymbol{y} \mid \boldsymbol{x} \right)
+ \gamma h\left( \boldsymbol{x} \right) + \gamma h\left( \boldsymbol{x} \right)
\right)% \right)%
\label{eq:prox:approx_map_problem} .\end{align*}%
.\end{align}%
% %
Thus, with proximal decoding, the objective function Thus, with proximal decoding, the objective function
$f\left( \boldsymbol{x} \right)$ to be minimized is% $f\left( \boldsymbol{x} \right)$ considered is%
% %
\begin{align} \begin{align}
f\left( \boldsymbol{x} \right) = L\left( \boldsymbol{x} \mid \boldsymbol{y} \right) f\left( \boldsymbol{x} \right) = L\left( \boldsymbol{x} \mid \boldsymbol{y} \right)
+ \gamma h\left( \boldsymbol{x} \right)% + \gamma h\left( \boldsymbol{x} \right)%
\label{eq:prox:objective_function} \label{eq:prox:objective_function}
.\end{align} \end{align}%
%
and the decoding problem is reformulated to%
%
\begin{align*}
\text{minimize}\hspace{2mm} &L\left( \boldsymbol{x} \mid \boldsymbol{y} \right)
+ \gamma h\left( \boldsymbol{x} \right)\\
\text{subject to}\hspace{2mm} &\boldsymbol{x} \in \mathbb{R}^n
.\end{align*}
%
For the solution of the approximalte \ac{MAP} decoding problem, the two parts For the solution of the approximate \ac{MAP} decoding problem, the two parts
of \ref{eq:prox:objective_function} are considered separately: of \ref{eq:prox:objective_function} are considered separately:
the minimization of the objective function occurs in an alternating the minimization of the objective function occurs in an alternating
manner, switching between the minimization of the negative log-likelihood fashion, switching between the negative log-likelihood
$L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled $L\left( \boldsymbol{y} \mid \boldsymbol{x} \right) $ and the scaled
code-constaint polynomial $\gamma h\left( \boldsymbol{x} \right) $. code-constaint polynomial $\gamma h\left( \boldsymbol{x} \right) $.
Two helper variables, $\boldsymbol{r}$ and $\boldsymbol{s}$ are introduced, Two helper variables, $\boldsymbol{r}$ and $\boldsymbol{s}$ are introduced,
describing the result of each of the two steps. describing the result of each of the two steps.
The first step, minimizing the log-likelihood using gradient descent, yields% The first step, minimizing the log-likelihood, is performed using gradient
descent:%
% %
\begin{align*} \begin{align}
\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \nabla \boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \nabla
L\left( \boldsymbol{y} \mid \boldsymbol{s} \right), L\left( \boldsymbol{y} \mid \boldsymbol{s} \right),
\hspace{5mm}\omega > 0 \hspace{5mm}\omega > 0
.\end{align*}% \label{eq:prox:step_log_likelihood}
.\end{align}%
% %
For the second step, minimizig the scaled code-constraint polynomial using For the second step, minimizig the scaled code-constraint polynomial, the
the proximal gradient method, the proximal operator of proximal gradient method is used and the \textit{proximal operator} of
$\gamma h\left( \boldsymbol{x} \right) $ has to be computed and is $\gamma h\left( \boldsymbol{x} \right) $ has to be computed.
immediately approximalted by a gradient-descent step:% It is then immediately approximalted with gradient-descent:%
% %
\begin{align*} \begin{align*}
\text{prox}_{\gamma h} \left( \boldsymbol{x} \right) &\equiv \text{prox}_{\gamma h} \left( \boldsymbol{x} \right) &\equiv
@ -709,8 +734,7 @@ immediately approximalted by a gradient-descent step:%
\hspace{5mm} \gamma \text{ small} \hspace{5mm} \gamma \text{ small}
.\end{align*}% .\end{align*}%
% %
The second step thus becomes \todo{Write the formulation optimization problem properly The second step thus becomes%
(as shown in the introductory section)}%
% %
\begin{align*} \begin{align*}
\boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right), \boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right),
@ -725,42 +749,19 @@ of the objective function small.
Otherwise, unwanted stationary points, including local minima, are introduced. Otherwise, unwanted stationary points, including local minima, are introduced.
The authors say that in practice, the value of $\gamma$ should be adjusted The authors say that in practice, the value of $\gamma$ should be adjusted
according to the decoding performance. according to the decoding performance.
The iterative decoding process \todo{projection with $\eta$} resulting from this considreation is shown in
figure \ref{fig:prox:alg}.
\begin{figure}[H] %The components of the gradient of the code-constraint polynomial can be computed as follows:%
\centering %%
%\begin{align*}
\begin{genericAlgorithm}[caption={}, label={}] % \frac{\partial}{\partial x_k} h\left( \boldsymbol{x} \right) =
$\boldsymbol{s} \leftarrow \boldsymbol{0}$ % 4\left( x_k^2 - 1 \right) x_k + \frac{2}{x_k}
for $K$ iterations do % \sum_{i\in \mathcal{B}\left( k \right) } \left(
$\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \nabla L \left( \boldsymbol{y} \mid \boldsymbol{s} \right) $ % \left( \prod_{j\in\mathcal{A}\left( i \right)} x_j\right)^2
$\boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) $ % - \prod_{j\in\mathcal{A}\left( i \right) }x_j \right)
$\boldsymbol{\hat{x}} \leftarrow \text{sign}\left( \boldsymbol{s} \right) $ %.\end{align*}%
if $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ do %\todo{Only multiplication?}%
return $\boldsymbol{\hat{c}}$ %\todo{$x_k$: $k$ or some other indexing variable?}%
end if %%
end for
return $\boldsymbol{\hat{c}}$
\end{genericAlgorithm}
\caption{Proximal decoding algorithm}
\label{fig:prox:alg}
\end{figure}
The components of the gradient of the code-constraint polynomial can be computed as follows:%
%
\begin{align*}
\frac{\partial}{\partial x_k} h\left( \boldsymbol{x} \right) =
4\left( x_k^2 - 1 \right) x_k + \frac{2}{x_k}
\sum_{i\in \mathcal{B}\left( k \right) } \left(
\left( \prod_{j\in\mathcal{A}\left( i \right)} x_j\right)^2
- \prod_{j\in\mathcal{A}\left( i \right) }x_j \right)
.\end{align*}%
\todo{Only multiplication?}%
\todo{$x_k$: $k$ or some other indexing variable?}%
%
In the case of \ac{AWGN}, the likelihood In the case of \ac{AWGN}, the likelihood
$f_{\boldsymbol{Y} \mid \boldsymbol{X}}\left( \boldsymbol{y} \mid \boldsymbol{x} \right)$ $f_{\boldsymbol{Y} \mid \boldsymbol{X}}\left( \boldsymbol{y} \mid \boldsymbol{x} \right)$
is% is%
@ -778,12 +779,50 @@ it suffices to consider only the proportionality instead of the equality.}%
\nabla L \left( \boldsymbol{y} \mid \boldsymbol{x} \right) \nabla L \left( \boldsymbol{y} \mid \boldsymbol{x} \right)
&\propto -\nabla \lVert \boldsymbol{y} - \boldsymbol{x} \rVert^2\\ &\propto -\nabla \lVert \boldsymbol{y} - \boldsymbol{x} \rVert^2\\
&\propto \boldsymbol{x} - \boldsymbol{y} &\propto \boldsymbol{x} - \boldsymbol{y}
.\end{align*}% ,\end{align*}%
% %
The resulting iterative decoding process under the assumption of \ac{AWGN} is Allowing equation \ref{eq:prox:step_log_likelihood} to be rewritten as%
described by%
% %
\begin{align*} \begin{align*}
\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega\left( \boldsymbol{s}-\boldsymbol{y} \right)\\ \boldsymbol{r} \leftarrow \boldsymbol{s}
\boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right)
.\end{align*} .\end{align*}
%
One thing to consider during the actual decoding process, is that the gradient
of the code-constraint polynomial can take on extremely large values.
In order to avoid numeric instability, an additional step is added, where all
components of the current estimate are clipped to $\left[-\eta, \eta \right]$,
where $\eta$ is a positive constant slightly larger than one:%
%
\begin{align*}
\boldsymbol{s} \leftarrow \Pi_{\eta} \left( \boldsymbol{r}
- \gamma \nabla h\left( \boldsymbol{r} \right) \right)
,\end{align*}
%
$\Pi_{\eta}\left( \cdot \right) $ expressing the projection onto
$\left[ -\eta, \eta \right]^n$.
The iterative decoding process resulting from these considreations is shown in
figure \ref{fig:prox:alg}.
\begin{figure}[H]
\centering
\begin{genericAlgorithm}[caption={}, label={}]
$\boldsymbol{s} \leftarrow \boldsymbol{0}$
for $K$ iterations do
$\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $
$\boldsymbol{s} \leftarrow \Pi_\eta \left(\boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right) \right)$
$\boldsymbol{\hat{x}} \leftarrow \text{sign}\left( \boldsymbol{s} \right) $
if $\boldsymbol{H}\boldsymbol{\hat{c}} = \boldsymbol{0}$ do
return $\boldsymbol{\hat{c}}$
end if
end for
return $\boldsymbol{\hat{c}}$
\end{genericAlgorithm}
\caption{Proximal decoding algorithm}
\label{fig:prox:alg}
\end{figure}