720 lines
31 KiB
TeX
720 lines
31 KiB
TeX
\chapter{Theoretical Background}%
|
|
\label{chapter:theoretical_background}
|
|
|
|
In this chapter, the theoretical background necessary to understand the
|
|
decoding algorithms examined in this work is given.
|
|
First, the notation used is clarified.
|
|
The physical layer is detailed - the used modulation scheme and channel model.
|
|
A short introduction to channel coding with binary linear codes and especially
|
|
\ac{LDPC} codes is given.
|
|
The established methods of decoding \ac{LDPC} codes are briefly explained.
|
|
Lastly, the general process of decoding using optimization techniques is described
|
|
and an overview of the utilized optimization methods is given.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{General Remarks on Notation}
|
|
\label{sec:theo:Notation}
|
|
|
|
Wherever the domain of a variable is expanded, this will be indicated with a tilde.
|
|
For example:%
|
|
%
|
|
\begin{align*}
|
|
x \in \left\{ -1, 1 \right\} &\to \tilde{x} \in \mathbb{R}\\
|
|
c \in \mathbb{F}_2 &\to \tilde{c} \in \left[ 0, 1 \right] \subseteq \mathbb{R}
|
|
.\end{align*}
|
|
%
|
|
Additionally, a shorthand notation will be used, denoting a set of indices as%
|
|
%
|
|
\begin{align*}
|
|
\left[ m:n \right] &:= \left\{ m, m+1, \ldots, n-1, n \right\},
|
|
\hspace{5mm} m < n, \hspace{2mm} m,n\in\mathbb{Z}
|
|
.\end{align*}
|
|
%
|
|
In order to designate element-wise operations, in particular the \textit{Hadamard product}
|
|
and the \textit{Hadamard power}, the operator $\circ$ will be used:%
|
|
%
|
|
\begin{alignat*}{3}
|
|
\boldsymbol{a} \circ \boldsymbol{b}
|
|
&:= \begin{bmatrix} a_1 b_1 & \ldots & a_n b_n \end{bmatrix} ^\text{T},
|
|
\hspace{5mm} &&\boldsymbol{a}, \boldsymbol{b} \in \mathbb{R}^n, \hspace{2mm} n\in \mathbb{N} \\
|
|
\boldsymbol{a}^{\circ k} &:= \begin{bmatrix} a_1^k \ldots a_n^k \end{bmatrix}^\text{T},
|
|
\hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}n\in \mathbb{N}, k\in \mathbb{Z}
|
|
.\end{alignat*}
|
|
%
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Channel Model and Modulation}
|
|
\label{sec:theo:Preliminaries: Channel Model and Modulation}
|
|
|
|
In order to transmit a bit-word $\boldsymbol{c} \in \mathbb{F}_2^n$ of length
|
|
$n$ over a channel, it has to be mapped onto a symbol
|
|
$\boldsymbol{x} \in \mathbb{R}^n$ that can be physically transmitted.
|
|
This is known as modulation. The modulation scheme chosen here is \ac{BPSK}:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{x} = 1 - 2\boldsymbol{c}
|
|
.\end{align*}
|
|
%
|
|
The transmitted symbol is distorted by the channel and denoted as
|
|
$\boldsymbol{y} \in \mathbb{R}^n$.
|
|
This distortion is described by the channel model, which in the context of
|
|
this thesis is chosen to be \ac{AWGN}:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{y} = \boldsymbol{x} + \boldsymbol{n},
|
|
\hspace{5mm} n_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right),
|
|
\hspace{2mm} i \in \left[ 1:n \right]
|
|
.\end{align*}
|
|
%
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Channel Coding with LDPC Codes}
|
|
\label{sec:theo:Channel Coding with LDPC Codes}
|
|
|
|
Channel coding describes the process of adding redundancy to information
|
|
transmitted over a channel in order to detect and correct any errors
|
|
that may occur during the transmission.
|
|
Encoding the information using \textit{binary linear codes} is one way of
|
|
conducting this process, whereby \textit{data words} are mapped onto longer
|
|
\textit{codewords}, which carry redundant information.
|
|
\Ac{LDPC} codes have become especially popular, since they are able to
|
|
reach arbitrarily small probabilities of error at code rates up to the capacity
|
|
of the channel \cite[Sec. II.B.]{mackay_rediscovery}, while having a structure
|
|
that allows for very efficient decoding.
|
|
|
|
The lengths of the data words and codewords are denoted by $k\in\mathbb{N}$
|
|
and $n\in\mathbb{N}$, respectively, with $k \le n$.
|
|
The set of codewords $\mathcal{C} \subset \mathbb{F}_2^n$ of a binary
|
|
linear code can be represented using the \textit{parity-check matrix}
|
|
$\boldsymbol{H} \in \mathbb{F}_2^{m\times n}$, where $m$ represents
|
|
the number of parity-checks:%
|
|
%
|
|
\begin{align*}
|
|
\mathcal{C} := \left\{ \boldsymbol{c} \in \mathbb{F}_2^n :
|
|
\boldsymbol{H}\boldsymbol{c}^\text{T} = \boldsymbol{0} \right\}
|
|
.\end{align*}
|
|
%
|
|
A data word $\boldsymbol{u} \in \mathbb{F}_2^k$ can be mapped onto a codeword
|
|
$\boldsymbol{c} \in \mathbb{F}_2^n$ using the \textit{generator matrix}
|
|
$\boldsymbol{G} \in \mathbb{F}_2^{k\times n}$:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{c} = \boldsymbol{u}\boldsymbol{G}
|
|
.\end{align*}
|
|
%
|
|
|
|
After obtaining a codeword from a data word, it is transmitted over a channel
|
|
as described in section \ref{sec:theo:Preliminaries: Channel Model and Modulation}.
|
|
The received signal $\boldsymbol{y}$ is then decoded to obtain
|
|
an estimate of the transmitted codeword, denoted as $\hat{\boldsymbol{c}}$.
|
|
Finally, the encoding procedure is reversed and an estimate of the originally
|
|
sent data word, $\hat{\boldsymbol{u}}$, is produced.
|
|
The methods examined in this work are all based on \textit{soft-decision} decoding,
|
|
i.e., $\boldsymbol{y}$ is considered to be in $\mathbb{R}^n$ and no preliminary decision
|
|
is made by a demodulator.
|
|
The process of transmitting and decoding a codeword is visualized in
|
|
figure \ref{fig:theo:channel_overview}.%
|
|
%
|
|
\begin{figure}[H]
|
|
\centering
|
|
|
|
\tikzstyle{box} = [rectangle, minimum width=1.5cm, minimum height=0.7cm,
|
|
rounded corners=0.1cm, text centered, draw=black, fill=KITgreen!80]
|
|
|
|
\begin{tikzpicture}[scale=1, transform shape]
|
|
\node (c) {$\boldsymbol{c}$};
|
|
\node[box, right=0.5cm of c] (bpskmap) {Mapper};
|
|
\node[right=1.5cm of bpskmap,
|
|
draw, circle, inner sep=0pt, minimum size=0.5cm] (add) {$+$};
|
|
\node[box, right=1.5cm of add] (decoder) {Decoder};
|
|
\node[box, right=1.5cm of decoder] (demapper) {Demapper};
|
|
\node[right=0.5cm of demapper] (out) {$\boldsymbol{\hat{c}}$};
|
|
|
|
\node (x) at ($(bpskmap.east)!0.5!(add.west) + (0,0.3cm)$) {$\boldsymbol{x}$};
|
|
\node (y) at ($(add.east)!0.5!(decoder.west) + (0,0.3cm)$) {$\boldsymbol{y}$};
|
|
\node (x_hat) at ($(decoder.east)!0.5!(demapper.west) + (0,0.3cm)$)
|
|
{$\boldsymbol{\hat{x}}$};
|
|
\node[below=0.5cm of add] (n) {$\boldsymbol{n}$};
|
|
|
|
\draw[->] (c) -- (bpskmap);
|
|
\draw[->] (bpskmap) -- (add);
|
|
\draw[->] (add) -- (decoder);
|
|
\draw[->] (n) -- (add);
|
|
\draw[->] (decoder) -- (demapper);
|
|
\draw[->] (demapper) -- (out);
|
|
|
|
\coordinate (top_left) at ($(x.north west) + (-0.1cm, 0.1cm)$);
|
|
\coordinate (top_right) at ($(y.north east) + (+0.1cm, 0.1cm)$);
|
|
\coordinate (bottom_center) at ($(n.south) + (0cm, -0.1cm)$);
|
|
\draw[dashed] (top_left) -- (top_right) |- (bottom_center) -| cycle;
|
|
\node[below=0.25cm of n] (text) {Channel};
|
|
\end{tikzpicture}
|
|
|
|
\caption{Overview of channel model and modulation}
|
|
\label{fig:theo:channel_overview}
|
|
\end{figure}
|
|
|
|
The decoding process itself is generally based either on the \ac{MAP} or the \ac{ML}
|
|
criterion:%
|
|
%
|
|
\begin{align*}
|
|
\hat{\boldsymbol{c}}_{\text{\ac{MAP}}} &= \argmax_{\boldsymbol{c} \in \mathcal{C}}
|
|
p_{\boldsymbol{C} \mid \boldsymbol{Y}} \left(\boldsymbol{c} \mid \boldsymbol{y}
|
|
\right) \\
|
|
\hat{\boldsymbol{c}}_{\text{\ac{ML}}} &= \argmax_{\boldsymbol{c} \in \mathcal{C}}
|
|
f_{\boldsymbol{Y} \mid \boldsymbol{C}} \left( \boldsymbol{y} \mid \boldsymbol{c}
|
|
\right)
|
|
.\end{align*}%
|
|
%
|
|
The two criteria are closely connected through Bayes' theorem and are equivalent
|
|
when the prior probability of transmitting a codeword is the same for all
|
|
codewords:
|
|
%
|
|
\begin{align*}
|
|
\argmax_{c\in\mathcal{C}} p_{\boldsymbol{C} \mid \boldsymbol{Y}}
|
|
\left( \boldsymbol{c} \mid \boldsymbol{y} \right)
|
|
&= \argmax_{c\in\mathcal{C}} \frac{f_{\boldsymbol{Y} \mid \boldsymbol{C}}
|
|
\left( \boldsymbol{y} \mid \boldsymbol{c} \right) p_{\boldsymbol{C}}
|
|
\left( \boldsymbol{c} \right)}{f_{\boldsymbol{Y}}\left( \boldsymbol{y} \right) } \\
|
|
% &= \argmax_{c\in\mathcal{C}} f_{\boldsymbol{Y} \mid \boldsymbol{C}}
|
|
% \left( \boldsymbol{y} \mid \boldsymbol{c} \right) p_{\boldsymbol{C}}
|
|
% \left( \boldsymbol{c} \right) \\
|
|
&= \argmax_{c\in\mathcal{C}}f_{\boldsymbol{Y} \mid \boldsymbol{C}}
|
|
\left( \boldsymbol{y} \mid \boldsymbol{c} \right)
|
|
.\end{align*}
|
|
%
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Tanner Graphs and Belief Propagation}
|
|
\label{sec:theo:Tanner Graphs and Belief Propagation}
|
|
|
|
It is often helpful to visualize codes graphically.
|
|
This is especially true for \ac{LDPC} codes, as the established decoding
|
|
algorithms are \textit{message passing algorithms}, which are inherently
|
|
graph-based.
|
|
|
|
A binary linear code with a parity-check matrix $\boldsymbol{H}$ can be
|
|
visualized using a \textit{Tanner} or \textit{factor graph}:
|
|
Each row of $\boldsymbol{H}$, which represents one parity-check, is viewed as a
|
|
\ac{CN}.
|
|
Each component of the codeword $\boldsymbol{c}$ is interpreted as a \ac{VN}.
|
|
The relationship between \acp{CN} and \acp{VN} can then be plotted by noting
|
|
which components of $\boldsymbol{c}$ are considered for which parity-check.
|
|
Figure \ref{fig:theo:tanner_graph} shows the Tanner graph for the
|
|
(7,4) Hamming code, which has the following parity-check matrix
|
|
\cite[Example 5.7.]{ryan_lin_2009}:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{H} = \begin{bmatrix}
|
|
1 & 0 & 1 & 0 & 1 & 0 & 1 \\
|
|
0 & 1 & 1 & 0 & 0 & 1 & 1 \\
|
|
0 & 0 & 0 & 1 & 1 & 1 & 1
|
|
\end{bmatrix}
|
|
.\end{align*}
|
|
%
|
|
%
|
|
\begin{figure}[H]
|
|
\centering
|
|
|
|
\tikzstyle{checknode} = [color=KITblue, fill=KITblue,
|
|
draw, regular polygon,regular polygon sides=4,
|
|
inner sep=0pt, minimum size=12pt]
|
|
\tikzstyle{variablenode} = [color=KITgreen, fill=KITgreen,
|
|
draw, circle, inner sep=0pt, minimum size=10pt]
|
|
|
|
\begin{tikzpicture}[scale=1, transform shape]
|
|
\node[checknode,
|
|
label={[below, label distance=-0.4cm, align=center]
|
|
\acs{CN} 1\\$\left( c_1 + c_3 + c_5 + c_7 = 0 \right) $}]
|
|
(cn1) at (-4, -1) {};
|
|
\node[checknode,
|
|
label={[below, label distance=-0.4cm, align=center]
|
|
\acs{CN} 2\\$\left( c_2 + c_3 + c_6 + c_7 = 0 \right) $}]
|
|
(cn2) at (0, -1) {};
|
|
\node[checknode,
|
|
label={[below, label distance=-0.4cm, align=center]
|
|
\acs{CN} 3\\$\left( c_4 + c_5 + c_6 + c_7 = 0 \right) $}]
|
|
(cn3) at (4, -1) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 1\\$c_1$}] (c1) at (-4.5, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 2\\$c_2$}] (c2) at (-3, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 3\\$c_3$}] (c3) at (-1.5, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 4\\$c_4$}] (c4) at (0, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 5\\$c_5$}] (c5) at (1.5, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 6\\$c_6$}] (c6) at (3, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN} 7\\$c_7$}] (c7) at (4.5, 2) {};
|
|
|
|
\draw (cn1) -- (c1);
|
|
\draw (cn1) -- (c3);
|
|
\draw (cn1) -- (c5);
|
|
\draw (cn1) -- (c7);
|
|
|
|
\draw (cn2) -- (c2);
|
|
\draw (cn2) -- (c3);
|
|
\draw (cn2) -- (c6);
|
|
\draw (cn2) -- (c7);
|
|
|
|
\draw (cn3) -- (c4);
|
|
\draw (cn3) -- (c5);
|
|
\draw (cn3) -- (c6);
|
|
\draw (cn3) -- (c7);
|
|
\end{tikzpicture}
|
|
|
|
\caption{Tanner graph for the (7,4) Hamming code}
|
|
\label{fig:theo:tanner_graph}
|
|
\end{figure}%
|
|
%
|
|
\noindent \acp{CN} and \acp{VN}, and by extension the rows and columns of
|
|
$\boldsymbol{H}$, are indexed with the variables $j$ and $i$.
|
|
The sets of all \acp{CN} and all \acp{VN} are denoted by
|
|
$\mathcal{J} := \left[ 1:m \right]$ and $\mathcal{I} := \left[ 1:n \right]$, respectively.
|
|
The \textit{neighborhood} of the $j$th \ac{CN}, i.e., the set of all adjacent \acp{VN},
|
|
is denoted by $N_c\left( j \right)$.
|
|
The neighborhood of the $i$th \ac{VN} is denoted by $N_v\left( i \right)$.
|
|
For the code depicted in figure \ref{fig:theo:tanner_graph}, for example,
|
|
$N_c\left( 1 \right) = \left\{ 1, 3, 5, 7 \right\}$ and
|
|
$N_v\left( 3 \right) = \left\{ 1, 2 \right\}$.
|
|
The degree $d_j$ of a \ac{CN} is defined as the number of adjacent \acp{VN}:
|
|
$d_j := \left| N_c\left( j \right) \right| $; the degree of a \ac{VN} is
|
|
similarly defined as $d_i := \left| N_v\left( i \right) \right|$.
|
|
|
|
Message passing algorithms are based on the notion of passing messages between
|
|
\acp{CN} and \acp{VN}.
|
|
\Ac{BP} is one such algorithm that is commonly used to decode \ac{LDPC} codes.
|
|
It aims to compute the posterior probabilities
|
|
$p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$,
|
|
see \cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate
|
|
$\hat{\boldsymbol{c}}$.
|
|
For cycle-free graphs this goal is reached after a finite
|
|
number of steps and \ac{BP} is equivalent to \ac{MAP} decoding.
|
|
When the graph contains cycles, however, \ac{BP} only approximates the \ac{MAP} probabilities
|
|
and is sub-optimal.
|
|
This leads to generally worse performance than \ac{MAP} decoding for practical codes.
|
|
Additionally, an \textit{error floor} appears for very high \acp{SNR}, making
|
|
the use of \ac{BP} impractical for applications where a very low error rate is
|
|
desired \cite[Sec. 15.3]{ryan_lin_2009}.
|
|
Another popular decoding method for \ac{LDPC} codes is the
|
|
\textit{min-sum algorithm}.
|
|
This is a simplification of \ac{BP} using an approximation of the
|
|
non-linear $\tanh$ function to improve the computational performance.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Decoding using Optimization Methods}%
|
|
\label{sec:theo:Decoding using Optimization Methods}
|
|
|
|
%
|
|
% General methodology
|
|
%
|
|
|
|
The general idea behind using optimization methods for channel decoding
|
|
is to reformulate the decoding problem as an optimization problem.
|
|
This new formulation can then be solved with one of the many
|
|
available optimization algorithms.
|
|
|
|
Generally, the original decoding problem considered is either the \ac{MAP} or
|
|
the \ac{ML} decoding problem:%
|
|
%
|
|
\begin{align}
|
|
\hat{\boldsymbol{c}}_{\text{\ac{MAP}}} &= \argmax_{\boldsymbol{c} \in \mathcal{C}}
|
|
p_{\boldsymbol{C} \mid \boldsymbol{Y}} \left(\boldsymbol{c} \mid \boldsymbol{y}
|
|
\right) \label{eq:dec:map}\\
|
|
\hat{\boldsymbol{c}}_{\text{\ac{ML}}} &= \argmax_{\boldsymbol{c} \in \mathcal{C}}
|
|
f_{\boldsymbol{Y} \mid \boldsymbol{C}} \left( \boldsymbol{y} \mid \boldsymbol{c}
|
|
\right) \label{eq:dec:ml}
|
|
.\end{align}%
|
|
%
|
|
The goal is to arrive at a formulation, where a certain objective function
|
|
$g : \mathbb{R}^n \rightarrow \mathbb{R} $ must be minimized under certain constraints:%
|
|
%
|
|
\begin{align*}
|
|
\text{minimize}\hspace{2mm} &g\left( \tilde{\boldsymbol{c}} \right)\\
|
|
\text{subject to}\hspace{2mm} &\tilde{\boldsymbol{c}} \in D
|
|
,\end{align*}%
|
|
%
|
|
where $D \subseteq \mathbb{R}^n$ is the domain of values attainable for $\tilde{\boldsymbol{c}}$
|
|
and represents the constraints under which the minimization is to take place.
|
|
|
|
In contrast to the established message-passing decoding algorithms,
|
|
the perspective then changes from observing the decoding process in its
|
|
Tanner graph representation with \acp{VN} and \acp{CN} (as shown in figure \ref{fig:dec:tanner})
|
|
to a spatial representation (figure \ref{fig:dec:spatial}),
|
|
where the codewords are some of the vertices of a hypercube.
|
|
The goal is to find the point $\tilde{\boldsymbol{c}}$,
|
|
which minimizes the objective function $g$.
|
|
|
|
%
|
|
% Figure showing decoding space
|
|
%
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
|
|
\begin{subfigure}[c]{0.47\textwidth}
|
|
\centering
|
|
|
|
\tikzstyle{checknode} = [color=KITblue, fill=KITblue,
|
|
draw, regular polygon,regular polygon sides=4,
|
|
inner sep=0pt, minimum size=12pt]
|
|
\tikzstyle{variablenode} = [color=KITgreen, fill=KITgreen,
|
|
draw, circle, inner sep=0pt, minimum size=10pt]
|
|
|
|
\begin{tikzpicture}[scale=1, transform shape]
|
|
\node[checknode,
|
|
label={[below, label distance=-0.4cm, align=center]
|
|
\acs{CN}\\$\left( c_1 + c_2 + c_3 = 0 \right) $}]
|
|
(cn) at (0, 0) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN}\\$\left( c_1 \right)$}]
|
|
(c1) at (-2, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN}\\$\left( c_2 \right)$}]
|
|
(c2) at (0, 2) {};
|
|
\node[variablenode, label={[above, align=center] \acs{VN}\\$\left( c_3 \right)$}]
|
|
(c3) at (2, 2) {};
|
|
|
|
\draw (cn) -- (c1);
|
|
\draw (cn) -- (c2);
|
|
\draw (cn) -- (c3);
|
|
\end{tikzpicture}
|
|
|
|
\caption{Tanner graph representation of a single parity-check code}
|
|
\label{fig:dec:tanner}
|
|
\end{subfigure}%
|
|
\hfill%
|
|
\begin{subfigure}[c]{0.47\textwidth}
|
|
\centering
|
|
|
|
\tikzstyle{codeword} = [color=KITblue, fill=KITblue,
|
|
draw, circle, inner sep=0pt, minimum size=4pt]
|
|
|
|
\tdplotsetmaincoords{60}{25}
|
|
\begin{tikzpicture}[scale=1, transform shape, tdplot_main_coords]
|
|
% Cube
|
|
|
|
\coordinate (p000) at (0, 0, 0);
|
|
\coordinate (p001) at (0, 0, 2);
|
|
\coordinate (p010) at (0, 2, 0);
|
|
\coordinate (p011) at (0, 2, 2);
|
|
\coordinate (p100) at (2, 0, 0);
|
|
\coordinate (p101) at (2, 0, 2);
|
|
\coordinate (p110) at (2, 2, 0);
|
|
\coordinate (p111) at (2, 2, 2);
|
|
|
|
\draw[] (p000) -- (p100);
|
|
\draw[] (p100) -- (p101);
|
|
\draw[] (p101) -- (p001);
|
|
\draw[] (p001) -- (p000);
|
|
|
|
\draw[dashed] (p010) -- (p110);
|
|
\draw[] (p110) -- (p111);
|
|
\draw[] (p111) -- (p011);
|
|
\draw[dashed] (p011) -- (p010);
|
|
|
|
\draw[dashed] (p000) -- (p010);
|
|
\draw[] (p100) -- (p110);
|
|
\draw[] (p101) -- (p111);
|
|
\draw[] (p001) -- (p011);
|
|
|
|
% Polytope Vertices
|
|
|
|
\node[codeword] (c000) at (p000) {};
|
|
\node[codeword] (c101) at (p101) {};
|
|
\node[codeword] (c110) at (p110) {};
|
|
\node[codeword] (c011) at (p011) {};
|
|
|
|
% Polytope Edges
|
|
|
|
% \draw[line width=1pt, color=KITblue] (c000) -- (c101);
|
|
% \draw[line width=1pt, color=KITblue] (c000) -- (c110);
|
|
% \draw[line width=1pt, color=KITblue] (c000) -- (c011);
|
|
%
|
|
% \draw[line width=1pt, color=KITblue] (c101) -- (c110);
|
|
% \draw[line width=1pt, color=KITblue] (c101) -- (c011);
|
|
%
|
|
% \draw[line width=1pt, color=KITblue] (c011) -- (c110);
|
|
|
|
% Polytope Annotations
|
|
|
|
\node[color=KITblue, below=0cm of c000] {$\left( 0, 0, 0 \right) $};
|
|
\node[color=KITblue, right=0.17cm of c101] {$\left( 1, 0, 1 \right) $};
|
|
\node[color=KITblue, right=0cm of c110] {$\left( 1, 1, 0 \right) $};
|
|
\node[color=KITblue, above=0cm of c011] {$\left( 0, 1, 1 \right) $};
|
|
|
|
% c
|
|
|
|
\node[color=KITgreen, fill=KITgreen,
|
|
draw, circle, inner sep=0pt, minimum size=4pt] (c) at (0.9, 0.7, 1) {};
|
|
\node[color=KITgreen, right=0cm of c] {$\tilde{\boldsymbol{c}}$};
|
|
\end{tikzpicture}
|
|
|
|
\caption{Spatial representation of a single parity-check code}
|
|
\label{fig:dec:spatial}
|
|
\end{subfigure}%
|
|
|
|
\caption{Different representations of the decoding problem}
|
|
\end{figure}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{A Short Introduction to the Proximal Gradient Method and ADMM}
|
|
\label{sec:theo:Optimization Methods}
|
|
|
|
In this section, the general ideas behind the optimization methods used in
|
|
this work are outlined.
|
|
The application of these optimization methods to channel decoding decoding
|
|
will be discussed in later chapters.
|
|
Two methods are introduced, the \textit{proximal gradient method} and
|
|
\ac{ADMM}.
|
|
|
|
\textit{Proximal algorithms} are algorithms for solving convex optimization
|
|
problems that rely on the use of \textit{proximal operators}.
|
|
The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$
|
|
of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
|
|
\cite[Sec. 1.1]{proximal_algorithms}%
|
|
%
|
|
\begin{align*}
|
|
\textbf{prox}_{\lambda f}\left( \boldsymbol{v} \right)
|
|
= \argmin_{\boldsymbol{x} \in \mathbb{R}^n} \left(
|
|
f\left( \boldsymbol{x} \right) + \frac{1}{2\lambda}\lVert \boldsymbol{x}
|
|
- \boldsymbol{v} \rVert_2^2 \right)
|
|
.\end{align*}
|
|
%
|
|
This operator computes a point that is a compromise between minimizing $f$
|
|
and staying in the proximity of $\boldsymbol{v}$.
|
|
The parameter $\lambda$ determines how each term is weighed.
|
|
The proximal gradient method is an iterative optimization method
|
|
utilizing proximal operators, used to solve problems of the form%
|
|
%
|
|
\begin{align*}
|
|
\underset{\boldsymbol{x} \in \mathbb{R}^n}{\text{minimize}}\hspace{5mm}
|
|
f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right)
|
|
\end{align*}
|
|
%
|
|
that consists of two steps: minimizing $f$ with gradient descent
|
|
and minimizing $g$ using the proximal operator
|
|
\cite[Sec. 4.2]{proximal_algorithms}:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{x} &\leftarrow \boldsymbol{x} - \lambda \nabla f\left( \boldsymbol{x} \right) \\
|
|
\boldsymbol{x} &\leftarrow \textbf{prox}_{\lambda g} \left( \boldsymbol{x} \right)
|
|
,\end{align*}
|
|
%
|
|
Since $g$ is minimized with the proximal operator and is thus not required
|
|
to be differentiable, it can be used to encode the constraints of the optimization problem
|
|
(e.g., in the form of an \textit{indicator function}, as mentioned in
|
|
\cite[Sec. 1.2]{proximal_algorithms}).
|
|
|
|
\ac{ADMM} is another optimization method.
|
|
In this thesis it will be used to solve a \textit{linear program}, which
|
|
is a special type of convex optimization problem in which the objective function
|
|
is linear and the constraints consist of linear equalities and inequalities.
|
|
Generally, any linear program can be expressed in \textit{standard form}%
|
|
\footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be
|
|
interpreted componentwise.}
|
|
\cite[Sec. 1.1]{intro_to_lin_opt_book}:%
|
|
%
|
|
\begin{alignat}{3}
|
|
\begin{alignedat}{3}
|
|
\underset{\boldsymbol{x}\in\mathbb{R}^n}{\text{minimize }}\hspace{2mm}
|
|
&& \boldsymbol{\gamma}^\text{T} \boldsymbol{x} \\
|
|
\text{subject to }\hspace{2mm} && \boldsymbol{A}\boldsymbol{x} & = \boldsymbol{b} \\
|
|
&& \boldsymbol{x} & \ge \boldsymbol{0},
|
|
\end{alignedat}
|
|
\label{eq:theo:admm_standard}
|
|
\end{alignat}%
|
|
%
|
|
where $\boldsymbol{x}, \boldsymbol{\gamma} \in \mathbb{R}^n$, $\boldsymbol{b} \in \mathbb{R}^m$
|
|
and $\boldsymbol{A}\in\mathbb{R}^{m \times n}$.
|
|
A technique called \textit{Lagrangian relaxation} can then be applied
|
|
\cite[Sec. 11.4]{intro_to_lin_opt_book}.
|
|
First, some of the constraints are moved into the objective function itself
|
|
and weights $\boldsymbol{\lambda}$ are introduced. A new, relaxed problem
|
|
is formulated as
|
|
%
|
|
\begin{align}
|
|
\begin{aligned}
|
|
\underset{\boldsymbol{x}\in\mathbb{R}^n}{\text{minimize }}\hspace{2mm}
|
|
& \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
|
+ \boldsymbol{\lambda}^\text{T}\left(
|
|
\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\right) \\
|
|
\text{subject to }\hspace{2mm} & \boldsymbol{x} \ge \boldsymbol{0},
|
|
\end{aligned}
|
|
\label{eq:theo:admm_relaxed}
|
|
\end{align}%
|
|
%
|
|
the new objective function being the \textit{Lagrangian}%
|
|
\footnote{
|
|
Depending on what literature is consulted, the definition of the Lagrangian differs
|
|
in the order of $\boldsymbol{A}\boldsymbol{x}$ and $\boldsymbol{b}$.
|
|
As will subsequently be seen, however, the only property of the Lagrangian having
|
|
any bearing on the optimization process is that minimizing it gives a lower bound
|
|
on the optimal objective of the original problem.
|
|
This property is satisfied no matter the order of the terms and the order
|
|
chosen here is the one used in the \ac{LP} decoding literature making use of
|
|
\ac{ADMM}.
|
|
}%
|
|
%
|
|
\begin{align*}
|
|
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
|
= \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
|
+ \boldsymbol{\lambda}^\text{T}\left(
|
|
\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\right)
|
|
.\end{align*}%
|
|
%
|
|
|
|
This problem is not directly equivalent to the original one, as the
|
|
solution now depends on the choice of the \textit{Lagrange multipliers}
|
|
$\boldsymbol{\lambda}$.
|
|
Interestingly, however, for this particular class of problems,
|
|
the minimum of the objective function (hereafter called \textit{optimal objective})
|
|
of the relaxed problem (\ref{eq:theo:admm_relaxed}) is a lower bound for
|
|
the optimal objective of the original problem (\ref{eq:theo:admm_standard})
|
|
\cite[Sec. 4.1]{intro_to_lin_opt_book}:%
|
|
%
|
|
\begin{align*}
|
|
\min_{\substack{\boldsymbol{x} \ge \boldsymbol{0} \\ \phantom{a}}}
|
|
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda}
|
|
\right)
|
|
\le
|
|
\min_{\substack{\boldsymbol{x} \ge \boldsymbol{0} \\ \boldsymbol{A}\boldsymbol{x}
|
|
= \boldsymbol{b}}}
|
|
\boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
|
.\end{align*}
|
|
%
|
|
Furthermore, for uniquely solvable linear programs \textit{strong duality}
|
|
always holds \cite[Theorem 4.4]{intro_to_lin_opt_book}.
|
|
This means that not only is it a lower bound, the tightest lower
|
|
bound actually reaches the value itself:
|
|
in other words, with the optimal choice of $\boldsymbol{\lambda}$,
|
|
the optimal objectives of the problems (\ref{eq:theo:admm_relaxed})
|
|
and (\ref{eq:theo:admm_standard}) have the same value, i.e.,
|
|
%
|
|
\begin{align*}
|
|
\max_{\boldsymbol{\lambda}\in\mathbb{R}^m} \, \min_{\boldsymbol{x} \ge \boldsymbol{0}}
|
|
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
|
= \min_{\substack{\boldsymbol{x} \ge \boldsymbol{0} \\ \boldsymbol{A}\boldsymbol{x}
|
|
= \boldsymbol{b}}}
|
|
\boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
|
.\end{align*}
|
|
%
|
|
Thus, we can define the \textit{dual problem} as the search for the tightest lower bound:%
|
|
%
|
|
\begin{align}
|
|
\underset{\boldsymbol{\lambda}\in\mathbb{R}^m}{\text{maximize }}\hspace{2mm}
|
|
& \min_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}
|
|
\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
|
\label{eq:theo:dual}
|
|
,\end{align}
|
|
%
|
|
and recover the solution $\boldsymbol{x}_{\text{opt}}$ to problem (\ref{eq:theo:admm_standard})
|
|
from the solution $\boldsymbol{\lambda}_\text{opt}$ to problem (\ref{eq:theo:dual})
|
|
by computing \cite[Sec. 2.1]{distr_opt_book}%
|
|
%
|
|
\begin{align}
|
|
\boldsymbol{x}_{\text{opt}} = \argmin_{\boldsymbol{x} \ge \boldsymbol{0}}
|
|
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda}_{\text{opt}} \right)
|
|
\label{eq:theo:admm_obtain_primal}
|
|
.\end{align}
|
|
%
|
|
|
|
The dual problem can then be solved iteratively using \textit{dual ascent}: starting with an
|
|
initial estimate for $\boldsymbol{\lambda}$, calculate an estimate for $\boldsymbol{x}$
|
|
using equation (\ref{eq:theo:admm_obtain_primal}); then, update $\boldsymbol{\lambda}$
|
|
using gradient descent \cite[Sec. 2.1]{distr_opt_book}:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{x} &\leftarrow \argmin_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}\left(
|
|
\boldsymbol{x}, \boldsymbol{\lambda} \right) \\
|
|
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
|
+ \alpha\left( \boldsymbol{A}\boldsymbol{x} - \boldsymbol{b} \right),
|
|
\hspace{5mm} \alpha > 0
|
|
.\end{align*}
|
|
%
|
|
The algorithm can be improved by observing that when the objective function
|
|
$g: \mathbb{R}^n \rightarrow \mathbb{R}$ is separable into a sum of
|
|
$N \in \mathbb{N}$ sub-functions
|
|
$g_i: \mathbb{R}^{n_i} \rightarrow \mathbb{R}$,
|
|
i.e., $g\left( \boldsymbol{x} \right) = \sum_{i=1}^{N} g_i
|
|
\left( \boldsymbol{x}_i \right)$,
|
|
where $\boldsymbol{x}_i\in\mathbb{R}^{n_i},\hspace{1mm} i\in [1:N]$ are subvectors of
|
|
$\boldsymbol{x}$, the Lagrangian is as well:
|
|
%
|
|
\begin{align*}
|
|
\text{minimize }\hspace{5mm} & \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right) \\
|
|
\text{subject to}\hspace{5mm} & \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
|
= \boldsymbol{b}
|
|
\end{align*}
|
|
\begin{align*}
|
|
\mathcal{L}\left( \left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda} \right)
|
|
= \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right)
|
|
+ \boldsymbol{\lambda}^\text{T} \left(
|
|
\sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} - \boldsymbol{b}\right)
|
|
.\end{align*}%
|
|
%
|
|
The matrices $\boldsymbol{A}_i \in \mathbb{R}^{m \times n_i}, \hspace{1mm} i \in [1:N]$
|
|
form a partition of $\boldsymbol{A}$, corresponding to
|
|
$\boldsymbol{A} = \begin{bmatrix}
|
|
\boldsymbol{A}_1 &
|
|
\ldots &
|
|
\boldsymbol{A}_N
|
|
\end{bmatrix}$.
|
|
The minimization of each term can happen in parallel, in a distributed
|
|
fashion \cite[Sec. 2.2]{distr_opt_book}.
|
|
In each minimization step, only one subvector $\boldsymbol{x}_i$ of
|
|
$\boldsymbol{x}$ is considered, regarding all other subvectors as being
|
|
constant.
|
|
This modified version of dual ascent is called \textit{dual decomposition}:
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i \ge \boldsymbol{0}}\mathcal{L}\left(
|
|
\left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda}\right)
|
|
\hspace{5mm} \forall i \in [1:N]\\
|
|
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
|
+ \alpha\left( \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
|
- \boldsymbol{b} \right),
|
|
\hspace{5mm} \alpha > 0
|
|
.\end{align*}
|
|
%
|
|
|
|
\ac{ADMM} works the same way as dual decomposition.
|
|
It only differs in the use of an \textit{augmented Lagrangian}
|
|
$\mathcal{L}_\mu\left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)$
|
|
in order to strengthen the convergence properties.
|
|
The augmented Lagrangian extends the classical one with an additional penalty term
|
|
with the penalty parameter $\mu$:
|
|
%
|
|
\begin{align*}
|
|
\mathcal{L}_\mu \left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)
|
|
= \underbrace{\sum_{i=1}^{N} g_i\left( \boldsymbol{x_i} \right)
|
|
+ \boldsymbol{\lambda}^\text{T}\left(\sum_{i=1}^{N}
|
|
\boldsymbol{A}_i\boldsymbol{x}_i - \boldsymbol{b}\right)}
|
|
_{\text{Classical Lagrangian}}
|
|
+ \underbrace{\frac{\mu}{2}\left\Vert \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
|
- \boldsymbol{b} \right\Vert_2^2}_{\text{Penalty term}},
|
|
\hspace{5mm} \mu > 0
|
|
.\end{align*}
|
|
%
|
|
The steps to solve the problem are the same as with dual decomposition, with the added
|
|
condition that the step size be $\mu$:%
|
|
%
|
|
\begin{align*}
|
|
\boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i \ge \boldsymbol{0}}\mathcal{L}_\mu\left(
|
|
\left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda}\right)
|
|
\hspace{5mm} \forall i \in [1:N]\\
|
|
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
|
+ \mu\left( \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
|
- \boldsymbol{b} \right),
|
|
\hspace{5mm} \mu > 0
|
|
.\end{align*}
|
|
%
|
|
|
|
In subsequent chapters, the decoding problem will be reformulated as an
|
|
optimization problem using two different methodologies.
|
|
In chapter \ref{chapter:proximal_decoding}, a non-convex optimization approach
|
|
is chosen and addressed using the proximal gradient method.
|
|
In chapter \ref{chapter:lp_dec_using_admm}, an \ac{LP} based optimization problem is
|
|
formulated and solved using \ac{ADMM}.
|
|
|