ba-thesis/latex/thesis/chapters/analysis_of_results.tex

\chapter{Analysis of Results}%
\label{chapter:Analysis of Results}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{LP Decoding using ADMM}%
\label{sec:ana:LP Decoding using ADMM}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Proximal Decoding}%
\label{sec:ana:Proximal Decoding}

\begin{itemize}
    \item Parameter choice
    \item FER
    \item Improved implementation
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Comparison of BP, Proximal Decoding and LP Decoding using ADMM}%
\label{sec:ana:Comparison of BP, Proximal Decoding and LP Decoding using ADMM}

\begin{itemize}
    \item Decoding performance
    \item Complexity \& runtime(mention difficulty in reaching conclusive
        results when comparing implementations)
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Theoretical Comparison of Proximal Decoding and LP Decoding using ADMM}%
\label{sec:Theoretical Comparison of Proximal Decoding and LP Decoding using ADMM}

In this section, some similarities between the proximal decoding algorithm
and \ac{LP} decoding using \ac{ADMM} are be pointed out.
The two algorithms are compared and their different computational and decoding
performance is explained on the basis of their theoretical structure.

\ac{ADMM} and the proximal gradient method can both be expressed in terms of
proximal operators.
They are both composed of an iterative approach consisting of two
alternating steps.
In both cases each step minimizes one distinct part of the objective function.
The approaches they are based on, however, are fundamentally different.
In figure \ref{fig:ana:theo_comp_alg} the two algorithms are juxtaposed,
in conjuction with the optimization problems they are meant to solve, in their
proximal operator form.%
%
\begin{figure}[H]
    \centering

    \begin{subfigure}{0.48\textwidth}
        \centering

        \begin{align*}
            \text{minimize}\hspace{2mm}   & \underbrace{L\left( \boldsymbol{y} \mid
                    \tilde{\boldsymbol{x}} \right)}_{\text{Likelihood}}
                    + \underbrace{\gamma h\left( \tilde{\boldsymbol{x}} \right)}
                        _{\text{Constraints}} \\
            \text{subject to}\hspace{2mm} &\tilde{\boldsymbol{x}} \in \mathbb{R}^n
        \end{align*}

        \begin{genericAlgorithm}[caption={}, label={},
            basicstyle=\fontsize{11}{17}\selectfont
            ]
Initialize variables
while stopping critierion not satisfied do
    $\boldsymbol{r} \leftarrow \boldsymbol{r}
        + \omega \nabla L\left( \boldsymbol{y} \mid \boldsymbol{s} \right) $
    $\boldsymbol{s} \leftarrow
        \textbf{prox}_{\scaleto{\gamma h}{7.5pt}}\left( \boldsymbol{r} \right) $|\Suppressnumber|
|\Reactivatenumber|
end while
return $\boldsymbol{s}$
        \end{genericAlgorithm}

        \caption{Proximal gradient method}
        \label{fig:ana:theo_comp_alg:prox}
    \end{subfigure}\hfill%
    \begin{subfigure}{0.48\textwidth}
        \centering

        \begin{align*}
            \text{minimize}\hspace{5mm} &
                \underbrace{\boldsymbol{\gamma}^\text{T}\tilde{\boldsymbol{c}}}
                    _{\text{Likelihood}}
                + \underbrace{\sum_{j\in\mathcal{J}} g_j\left(
                    \boldsymbol{T}_j\tilde{\boldsymbol{c}} \right) }
                    _{\text{Constraints}} \\
            \text{subject to}\hspace{5mm} &
                \tilde{\boldsymbol{c}} \in \mathbb{R}^n
%                \boldsymbol{T}_j\tilde{\boldsymbol{c}} = \boldsymbol{z}_j\hspace{3mm}
%                    \forall j\in\mathcal{J}
        \end{align*}

        \begin{genericAlgorithm}[caption={}, label={},
            basicstyle=\fontsize{11}{17}\selectfont
            ]
Initialize variables
while stopping criterion not satisfied do
    $\tilde{\boldsymbol{c}} \leftarrow \textbf{prox}_{
        \scaleto{\nu \cdot \boldsymbol{\gamma}^{\text{T}}\tilde{\boldsymbol{c}}}{8.5pt}}
        \left( \boldsymbol{z} - \boldsymbol{u} \right) $
    $\boldsymbol{z}_j \leftarrow \textbf{prox}_{\scaleto{g_j}{7pt}}
        \left( \boldsymbol{T}_j\tilde{\boldsymbol{c}}
            + \boldsymbol{T}_j\boldsymbol{u} \right) \hspace{5mm}\forall j\in\mathcal{J}$
    $\boldsymbol{u} \leftarrow \boldsymbol{u}
        + \tilde{\boldsymbol{c}} - \boldsymbol{z}$
end while
return $\tilde{\boldsymbol{c}}$
        \end{genericAlgorithm}

        \caption{\ac{ADMM}}
        \label{fig:ana:theo_comp_alg:admm}
    \end{subfigure}%


    \caption{Comparison of the proximal gradient method and \ac{ADMM}}
    \label{fig:ana:theo_comp_alg}
\end{figure}%
%
\todo{Show how $\tilde{\boldsymbol{c}} \leftarrow \textbf{prox}
    _{1 / \mu \cdot \boldsymbol{\gamma}^{\text{T}}\tilde{\boldsymbol{c}}}
        \left( \boldsymbol{z} - \boldsymbol{u} \right) $
is the same as
$\boldsymbol{\gamma}^\text{T}\tilde{\boldsymbol{c}}
    + \sum_{j\in\mathcal{J}} \boldsymbol{\lambda}^\text{T}_j
    \left( \boldsymbol{T}_j\tilde{\boldsymbol{c}} - \boldsymbol{z}_j \right)
    + \frac{\mu}{2}\sum_{j\in\mathcal{J}}
    \lVert \boldsymbol{T}_j\tilde{\boldsymbol{c}} - \boldsymbol{z}_j \rVert^2_2$}%
%
\noindent The objective functions of both problems are similar in that they
both comprise two parts: one associated to the likelihood that a given
codeword was sent and one associated to the constraints the codeword is
subjected to.
Their major difference is that the two parts of the objective minimized with
proximal decoding are both functions of the same variable
$\tilde{\boldsymbol{x}}$, whereas with \ac{ADMM} the two parts are functions
of different variables: $\tilde{\boldsymbol{c}}$ and $\boldsymbol{z}_{[1:m]}$.
This difference means that while with proximal decoding the alternating
minimization of the two parts of the objective function inevitably leads to
oscillatory behaviour (as explained in section (TODO)), this is not the
case with \ac{ADMM}.

Another aspect partly explaining the disparate decoding performance is the
difference in the minimization step handling the constraints.
While with proximal decoding it is performed using gradient
descent - amounting to an approximation - with \ac{ADMM} it reduces to a
number of projections onto the parity polytopes $\mathcal{P}_{d_j}$ - which
always provide exact results.

\begin{itemize}
    \item The comparison of actual implementations is always debatable /
        contentious, since it is difficult to separate differences in
        algorithm performance from differences in implementation
    \item No large difference in computational performance $\rightarrow$
        Parallelism cannot come to fruition as decoding is performed on the
        same number of cores for both algorithms (Multiple decodings in parallel)
    \item Nonetheless, in realtime applications / applications where the focus
        is not the mass decoding of raw data, \ac{ADMM} has advantages, since
        the decoding of a single codeword is performed faster
    \item \ac{ADMM} faster than proximal decoding $\rightarrow$
        Parallelism
    \item Proximal decoding faster than \ac{ADMM} $\rightarrow$ dafuq
        (larger number of iterations before convergence?)
\end{itemize}