Continued writing theoretical comparison of admm and proximal decoding

This commit is contained in:
Andreas Tsouchlos 2023-04-04 18:16:44 +02:00
parent 2ade886191
commit 70eac9515f

View File

@ -43,10 +43,10 @@ proximal operators.
They are both composed of an iterative approach consisting of two They are both composed of an iterative approach consisting of two
alternating steps. alternating steps.
In both cases each step minimizes one distinct part of the objective function. In both cases each step minimizes one distinct part of the objective function.
The approaches they are based on, however, are fundamentally different. They do, however, have some fundametal differences.
In figure \ref{fig:ana:theo_comp_alg} the two algorithms are juxtaposed, In figure \ref{fig:ana:theo_comp_alg} the two algorithms are juxtaposed in their
in conjuction with the optimization problems they are meant to solve, in their proximal operator form, in conjuction with the optimization problems they
proximal operator form.% are meant to solve.%
% %
\begin{figure}[H] \begin{figure}[H]
\centering \centering
@ -86,8 +86,7 @@ return $\boldsymbol{s}$
\text{minimize}\hspace{5mm} & \text{minimize}\hspace{5mm} &
\underbrace{\boldsymbol{\gamma}^\text{T}\tilde{\boldsymbol{c}}} \underbrace{\boldsymbol{\gamma}^\text{T}\tilde{\boldsymbol{c}}}
_{\text{Likelihood}} _{\text{Likelihood}}
+ \underbrace{\sum_{j\in\mathcal{J}} g_j\left( + \underbrace{g\left( \boldsymbol{T}\tilde{\boldsymbol{c}} \right) }
\boldsymbol{T}_j\tilde{\boldsymbol{c}} \right) }
_{\text{Constraints}} \\ _{\text{Constraints}} \\
\text{subject to}\hspace{5mm} & \text{subject to}\hspace{5mm} &
\tilde{\boldsymbol{c}} \in \mathbb{R}^n \tilde{\boldsymbol{c}} \in \mathbb{R}^n
@ -102,10 +101,12 @@ Initialize $\tilde{\boldsymbol{c}}, \boldsymbol{z}, \boldsymbol{u}, \boldsymbol{
while stopping criterion not satisfied do while stopping criterion not satisfied do
$\tilde{\boldsymbol{c}} \leftarrow \textbf{prox}_{ $\tilde{\boldsymbol{c}} \leftarrow \textbf{prox}_{
\scaleto{\nu \cdot \boldsymbol{\gamma}^{\text{T}}\tilde{\boldsymbol{c}}}{8.5pt}} \scaleto{\nu \cdot \boldsymbol{\gamma}^{\text{T}}\tilde{\boldsymbol{c}}}{8.5pt}}
\left( \boldsymbol{z} - \boldsymbol{u} \right) $ \left( \tilde{\boldsymbol{c}}
$\boldsymbol{z}_j \leftarrow \textbf{prox}_{\scaleto{g_j}{7pt}} - \frac{\mu}{\lambda}\boldsymbol{T}^\text{T}\left( \boldsymbol{T}\tilde{\boldsymbol{c}}
\left( \boldsymbol{T}_j\tilde{\boldsymbol{c}} - \boldsymbol{z} + \boldsymbol{u} \right) \right)$
+ \boldsymbol{T}_j\boldsymbol{u} \right) \hspace{5mm}\forall j\in\mathcal{J}$ $\boldsymbol{z} \leftarrow \textbf{prox}_{\scaleto{g}{7pt}}
\left( \boldsymbol{T}\tilde{\boldsymbol{c}}
+ \boldsymbol{u} \right)$
$\boldsymbol{u} \leftarrow \boldsymbol{u} $\boldsymbol{u} \leftarrow \boldsymbol{u}
+ \tilde{\boldsymbol{c}} - \boldsymbol{z}$ + \tilde{\boldsymbol{c}} - \boldsymbol{z}$
end while end while
@ -121,35 +122,37 @@ return $\tilde{\boldsymbol{c}}$
\label{fig:ana:theo_comp_alg} \label{fig:ana:theo_comp_alg}
\end{figure}% \end{figure}%
% %
\todo{Show how $\tilde{\boldsymbol{c}} \leftarrow \textbf{prox}
_{1 / \mu \cdot \boldsymbol{\gamma}^{\text{T}}\tilde{\boldsymbol{c}}}
\left( \boldsymbol{z} - \boldsymbol{u} \right) $
is the same as
$\boldsymbol{\gamma}^\text{T}\tilde{\boldsymbol{c}}
+ \sum_{j\in\mathcal{J}} \boldsymbol{\lambda}^\text{T}_j
\left( \boldsymbol{T}_j\tilde{\boldsymbol{c}} - \boldsymbol{z}_j \right)
+ \frac{\mu}{2}\sum_{j\in\mathcal{J}}
\lVert \boldsymbol{T}_j\tilde{\boldsymbol{c}} - \boldsymbol{z}_j \rVert^2_2$}%
%
\noindent The objective functions of both problems are similar in that they \noindent The objective functions of both problems are similar in that they
both comprise two parts: one associated to the likelihood that a given both comprise two parts: one associated to the likelihood that a given
codeword was sent and one associated to the constraints the codeword is codeword was sent and one associated to the constraints the codeword is
subjected to. subjected to.
Their major difference is that the two parts of the objective minimized with
proximal decoding are both functions of the same variable Their major differece is that while with proximal decoding the constraints
$\tilde{\boldsymbol{x}}$, whereas with \ac{ADMM} the two parts are functions are regarded in a global context, considering all parity checks at the same
of different variables: $\tilde{\boldsymbol{c}}$ and $\boldsymbol{z}_{[1:m]}$. time in the second step, with \ac{ADMM} each parity check is
considered separately, in a more local context (line 4 in both algorithms).
This difference means that while with proximal decoding the alternating This difference means that while with proximal decoding the alternating
minimization of the two parts of the objective function inevitably leads to minimization of the two parts of the objective function inevitably leads to
oscillatory behaviour (as explained in section (TODO)), this is not the oscillatory behaviour (as explained in section (TODO)), this is not the
case with \ac{ADMM}. case with \ac{ADMM}, which partly explains the disparate decoding performance
of the two methods.
Furthermore, while with proximal decoding the step considering the constraints
is realized using gradient descent - amounting to an approximation -
with \ac{ADMM} it reduces to a number of projections onto the parity polytopes
$\mathcal{P}_{d_j}$ (see
\ref{chapter:LD Decoding using ADMM as a Proximal Algorithm}),
which always provide exact results.
Another aspect partly explaining the disparate decoding performance is the The contrasting treatment of the constraints (global and approximate with
difference in the minimization step handling the constraints. proximal decoding, local and exact with \ac{ADMM}) also leads to different
While with proximal decoding it is performed using gradient prospects when the decoding process gets stuck in a local minimum.
descent - amounting to an approximation - with \ac{ADMM} it reduces to a With proximal decoding this occurrs due to the approximate nature of the
number of projections onto the parity polytopes $\mathcal{P}_{d_j}$ - which calculation, whereas with \ac{ADMM} it occurs due to the approximate
always provide exact results. formulation of the constraints - not depending on the optimization method
itself.
The advantage which arises because of this when using \ac{ADMM} is that
it can be easily detected, when the algorithm gets stuck - the algorithm
returns a pseudocodeword, the components of which are fractional.
\begin{itemize} \begin{itemize}
\item The comparison of actual implementations is always debatable / \item The comparison of actual implementations is always debatable /