Reworked theoretical background

This commit is contained in:
Andreas Tsouchlos 2023-04-11 18:10:53 +02:00
parent c074d3e034
commit 327ad3934e

View File

@ -28,7 +28,8 @@ Additionally, a shorthand notation will be used to denote series of indices and
of indexed variables:%
%
\begin{align*}
\left[ m:n \right] &:= \left\{ m, m+1, \ldots, n-1, n \right\} \\
\left[ m:n \right] &:= \left\{ m, m+1, \ldots, n-1, n \right\},
\hspace{5mm} m,n\in\mathbb{Z}\\
x_{\left[ m:n \right] } &:= \left\{ x_m, x_{m+1}, \ldots, x_{n-1}, x_n \right\}
.\end{align*}
%
@ -40,7 +41,7 @@ and the \textit{Hadamard power}, the operator $\circ$ will be used:%
&:= \begin{bmatrix} a_1 b_1 & \ldots & a_n b_n \end{bmatrix} ^\text{T},
\hspace{5mm} &&\boldsymbol{a}, \boldsymbol{b} \in \mathbb{R}^n, \hspace{2mm} n\in \mathbb{N} \\
\boldsymbol{a}^{\circ k} &:= \begin{bmatrix} a_1^k \ldots a_n^k \end{bmatrix}^\text{T},
\hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}k\in \mathbb{Z}
\hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}n\in \mathbb{N}, k\in \mathbb{Z}
.\end{alignat*}
%
@ -59,11 +60,12 @@ This is known as modulation. The modulation scheme chosen here is \ac{BPSK}:%
.\end{align*}
%
The symbol that reaches the receiver, $\boldsymbol{y}$, is distorted by the channel.
This distortion is described by the channel model, which here is chosen to be \ac{AWGN}:%
This distortion is described by the channel model, which in the context of
this thesis is chosen to be \ac{AWGN}:%
%
\begin{align*}
\boldsymbol{y} = \boldsymbol{x} + \boldsymbol{z},
\hspace{5mm} z_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right),
\boldsymbol{y} = \boldsymbol{x} + \boldsymbol{n},
\hspace{5mm} n_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right),
\hspace{2mm} i \in \left[ 1:n \right]
.\end{align*}
%
@ -81,11 +83,11 @@ conducting this process, whereby \textit{data words} are mapped onto longer
\textit{codewords}, which carry redundant information.
\Ac{LDPC} codes have become especially popular, since they are able to
reach arbitrarily small probabilities of error at coderates up to the capacity
of the channel \cite[Sec. II.B.]{mackay_rediscovery} and their structure allows
for very efficient decoding.
of the channel \cite[Sec. II.B.]{mackay_rediscovery} while having a structure
that allows for very efficient decoding.
The lengths of the data words and codewords are denoted by $k$ and $n$,
respectively.
The lengths of the data words and codewords are denoted by $k\in\mathbb{N}$
and $n\in\mathbb{N}$, respectively.
The set of codewords $\mathcal{C} \subset \mathbb{F}_2^n$ of a binary
linear code can be represented using the \textit{parity-check matrix}
$\boldsymbol{H} \in \mathbb{F}_2^{m\times n}$, where $m$ represents
@ -101,7 +103,7 @@ $\boldsymbol{c} \in \mathbb{F}_2^n$ using the \textit{generator matrix}
$\boldsymbol{G} \in \mathbb{F}_2^{k\times n}$:%
%
\begin{align*}
\boldsymbol{c} = \boldsymbol{u}\boldsymbol{G}
\boldsymbol{c} = \boldsymbol{u}^\text{T}\boldsymbol{G}
.\end{align*}
%
@ -110,7 +112,7 @@ as described in section \ref{sec:theo:Preliminaries: Channel Model and Modulatio
The received signal $\boldsymbol{y}$ is then decoded to obtain
an estimate of the transmitted codeword, $\hat{\boldsymbol{c}}$.
Finally, the encoding procedure is reversed and an estimate of the originally
sent data word, $\hat{\boldsymbol{u}}$, is obtained.
sent data word, $\hat{\boldsymbol{u}}$, is produced.
The methods examined in this work are all based on \textit{soft-decision} decoding,
i.e., $\boldsymbol{y}$ is considered to be in $\mathbb{R}^n$ and no preliminary decision
is made by a demodulator.
@ -156,9 +158,6 @@ figure \ref{fig:theo:channel_overview}.%
\label{fig:theo:channel_overview}
\end{figure}
\todo{Explicitly mention $\boldsymbol{n}$}
\todo{Mapper $\to$ Modulator?}
The decoding process itself is generally based either on the \ac{MAP} or the \ac{ML}
criterion:%
%
@ -183,7 +182,7 @@ This is especially true for \ac{LDPC} codes, as the established decoding
algorithms are \textit{message passing algorithms}, which are inherently
graph-based.
Binary linear codes with a parity-check matrix $\boldsymbol{H}$ can be
A binary linear code with a parity-check matrix $\boldsymbol{H}$ can be
visualized using a \textit{Tanner} or \textit{factor graph}:
Each row of $\boldsymbol{H}$, which represents one parity-check, is viewed as a
\ac{CN}.
@ -263,8 +262,9 @@ The neighbourhood of the $i$th \ac{VN} is denoted by $N_v\left( i \right)$.
For the code depicted in figure \ref{fig:theo:tanner_graph}, for example,
$N_c\left( 1 \right) = \left\{ 1, 3, 5, 7 \right\}$ and
$N_v\left( 3 \right) = \left\{ 1, 2 \right\}$.
\todo{Define $d_i$ and $d_j$}
The degree $d_j$ of a \ac{CN} is defined as the number of adjacent \acp{VN}:
$d_j := \left| N_c\left( j \right) \right| $; the degree of a \ac{VN} is
similarly defined as $d_i := \left| N_v\left( i \right) \right|$.
Message passing algorithms are based on the notion of passing messages between
\acp{CN} and \acp{VN}.
@ -273,13 +273,17 @@ It aims to compute the posterior probabilities
$p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$
\cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate $\hat{\boldsymbol{c}}$.
For cycle-free graphs this goal is reached after a finite
number of steps and \ac{BP} is thus equivalent to \ac{MAP} decoding.
number of steps and \ac{BP} is equivalent to \ac{MAP} decoding.
When the graph contains cycles, however, \ac{BP} only approximates the probabilities
and is sub-optimal.
This leads to generally worse performance than \ac{MAP} decoding for practical codes.
Additionally, an \textit{error floor} appears for very high \acp{SNR}, making
the use of \ac{BP} impractical for applications where a very low \ac{BER} is
desired \cite[Sec. 15.3]{ryan_lin_2009}.
Another popular decoding method for \ac{LDPC} codes is the
\textit{min-sum algorithm}.
This is a simplification of \ac{BP} using an approximation of the the
non-linear $\tanh$ function to improve the computational performance.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -438,12 +442,12 @@ which minimizes the objective function $g$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Optimization Methods}
\section{An introduction to the proximal gradient method and ADMM}
\label{sec:theo:Optimization Methods}
\textit{Proximal algorithms} are algorithms for solving convex optimization
problems, that rely on the use of \textit{proximal operators}.
The proximal operator $\textbf{prox}_f : \mathbb{R}^n \rightarrow \mathbb{R}^n$
The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$
of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
\cite[Sec. 1.1]{proximal_algorithms}%
%
@ -456,8 +460,8 @@ of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
This operator computes a point that is a compromise between minimizing $f$
and staying in the proximity of $\boldsymbol{v}$.
The parameter $\lambda$ determines how heavily each term is weighed.
The \textit{proximal gradient method} is an iterative optimization method used to
solve problems of the form%
The \textit{proximal gradient method} is an iterative optimization method
utilizing proximal operators, used to solve problems of the form%
%
\begin{align*}
\text{minimize}\hspace{5mm}f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right)
@ -473,11 +477,14 @@ and minimizing $g$ using the proximal operator
,\end{align*}
%
Since $g$ is minimized with the proximal operator and is thus not required
to be differentiable, it can be used to encode the constraints of the problem.
to be differentiable, it can be used to encode the constraints of the problem
(e.g., in the form of an \textit{indicator funcion}, as mentioned in
\cite[Sec. 1.2]{proximal_algorithms}).
A special case of convex optimization problems are \textit{linear programs}.
These are problems where the objective function is linear and the constraints
consist of linear equalities and inequalities.
The \ac{ADMM} is another optimization method.
In this thesis it will be used to solve a \textit{linear program}, which
is a special type of convex optimization problem, where the objective function
is linear and the constraints consist of linear equalities and inequalities.
Generally, any linear program can be expressed in \textit{standard form}%
\footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be
interpreted componentwise.}