diff --git a/latex/thesis/chapters/theoretical_background.tex b/latex/thesis/chapters/theoretical_background.tex index 95c71d5..b8d10e3 100644 --- a/latex/thesis/chapters/theoretical_background.tex +++ b/latex/thesis/chapters/theoretical_background.tex @@ -28,7 +28,8 @@ Additionally, a shorthand notation will be used to denote series of indices and of indexed variables:% % \begin{align*} - \left[ m:n \right] &:= \left\{ m, m+1, \ldots, n-1, n \right\} \\ + \left[ m:n \right] &:= \left\{ m, m+1, \ldots, n-1, n \right\}, + \hspace{5mm} m,n\in\mathbb{Z}\\ x_{\left[ m:n \right] } &:= \left\{ x_m, x_{m+1}, \ldots, x_{n-1}, x_n \right\} .\end{align*} % @@ -40,7 +41,7 @@ and the \textit{Hadamard power}, the operator $\circ$ will be used:% &:= \begin{bmatrix} a_1 b_1 & \ldots & a_n b_n \end{bmatrix} ^\text{T}, \hspace{5mm} &&\boldsymbol{a}, \boldsymbol{b} \in \mathbb{R}^n, \hspace{2mm} n\in \mathbb{N} \\ \boldsymbol{a}^{\circ k} &:= \begin{bmatrix} a_1^k \ldots a_n^k \end{bmatrix}^\text{T}, - \hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}k\in \mathbb{Z} + \hspace{5mm} &&\boldsymbol{a} \in \mathbb{R}^n, \hspace{2mm}n\in \mathbb{N}, k\in \mathbb{Z} .\end{alignat*} % @@ -59,11 +60,12 @@ This is known as modulation. The modulation scheme chosen here is \ac{BPSK}:% .\end{align*} % The symbol that reaches the receiver, $\boldsymbol{y}$, is distorted by the channel. -This distortion is described by the channel model, which here is chosen to be \ac{AWGN}:% +This distortion is described by the channel model, which in the context of +this thesis is chosen to be \ac{AWGN}:% % \begin{align*} - \boldsymbol{y} = \boldsymbol{x} + \boldsymbol{z}, - \hspace{5mm} z_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right), + \boldsymbol{y} = \boldsymbol{x} + \boldsymbol{n}, + \hspace{5mm} n_i \in \mathcal{N}\left( 0, \frac{\sigma^2}{2} \right), \hspace{2mm} i \in \left[ 1:n \right] .\end{align*} % @@ -81,11 +83,11 @@ conducting this process, whereby \textit{data words} are mapped onto longer \textit{codewords}, which carry redundant information. \Ac{LDPC} codes have become especially popular, since they are able to reach arbitrarily small probabilities of error at coderates up to the capacity -of the channel \cite[Sec. II.B.]{mackay_rediscovery} and their structure allows -for very efficient decoding. +of the channel \cite[Sec. II.B.]{mackay_rediscovery} while having a structure +that allows for very efficient decoding. -The lengths of the data words and codewords are denoted by $k$ and $n$, -respectively. +The lengths of the data words and codewords are denoted by $k\in\mathbb{N}$ +and $n\in\mathbb{N}$, respectively. The set of codewords $\mathcal{C} \subset \mathbb{F}_2^n$ of a binary linear code can be represented using the \textit{parity-check matrix} $\boldsymbol{H} \in \mathbb{F}_2^{m\times n}$, where $m$ represents @@ -101,7 +103,7 @@ $\boldsymbol{c} \in \mathbb{F}_2^n$ using the \textit{generator matrix} $\boldsymbol{G} \in \mathbb{F}_2^{k\times n}$:% % \begin{align*} - \boldsymbol{c} = \boldsymbol{u}\boldsymbol{G} + \boldsymbol{c} = \boldsymbol{u}^\text{T}\boldsymbol{G} .\end{align*} % @@ -110,7 +112,7 @@ as described in section \ref{sec:theo:Preliminaries: Channel Model and Modulatio The received signal $\boldsymbol{y}$ is then decoded to obtain an estimate of the transmitted codeword, $\hat{\boldsymbol{c}}$. Finally, the encoding procedure is reversed and an estimate of the originally -sent data word, $\hat{\boldsymbol{u}}$, is obtained. +sent data word, $\hat{\boldsymbol{u}}$, is produced. The methods examined in this work are all based on \textit{soft-decision} decoding, i.e., $\boldsymbol{y}$ is considered to be in $\mathbb{R}^n$ and no preliminary decision is made by a demodulator. @@ -156,9 +158,6 @@ figure \ref{fig:theo:channel_overview}.% \label{fig:theo:channel_overview} \end{figure} -\todo{Explicitly mention $\boldsymbol{n}$} -\todo{Mapper $\to$ Modulator?} - The decoding process itself is generally based either on the \ac{MAP} or the \ac{ML} criterion:% % @@ -183,7 +182,7 @@ This is especially true for \ac{LDPC} codes, as the established decoding algorithms are \textit{message passing algorithms}, which are inherently graph-based. -Binary linear codes with a parity-check matrix $\boldsymbol{H}$ can be +A binary linear code with a parity-check matrix $\boldsymbol{H}$ can be visualized using a \textit{Tanner} or \textit{factor graph}: Each row of $\boldsymbol{H}$, which represents one parity-check, is viewed as a \ac{CN}. @@ -263,8 +262,9 @@ The neighbourhood of the $i$th \ac{VN} is denoted by $N_v\left( i \right)$. For the code depicted in figure \ref{fig:theo:tanner_graph}, for example, $N_c\left( 1 \right) = \left\{ 1, 3, 5, 7 \right\}$ and $N_v\left( 3 \right) = \left\{ 1, 2 \right\}$. - -\todo{Define $d_i$ and $d_j$} +The degree $d_j$ of a \ac{CN} is defined as the number of adjacent \acp{VN}: +$d_j := \left| N_c\left( j \right) \right| $; the degree of a \ac{VN} is +similarly defined as $d_i := \left| N_v\left( i \right) \right|$. Message passing algorithms are based on the notion of passing messages between \acp{CN} and \acp{VN}. @@ -273,13 +273,17 @@ It aims to compute the posterior probabilities $p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$ \cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate $\hat{\boldsymbol{c}}$. For cycle-free graphs this goal is reached after a finite -number of steps and \ac{BP} is thus equivalent to \ac{MAP} decoding. +number of steps and \ac{BP} is equivalent to \ac{MAP} decoding. When the graph contains cycles, however, \ac{BP} only approximates the probabilities and is sub-optimal. This leads to generally worse performance than \ac{MAP} decoding for practical codes. Additionally, an \textit{error floor} appears for very high \acp{SNR}, making the use of \ac{BP} impractical for applications where a very low \ac{BER} is desired \cite[Sec. 15.3]{ryan_lin_2009}. +Another popular decoding method for \ac{LDPC} codes is the +\textit{min-sum algorithm}. +This is a simplification of \ac{BP} using an approximation of the the +non-linear $\tanh$ function to improve the computational performance. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -438,12 +442,12 @@ which minimizes the objective function $g$. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\section{Optimization Methods} +\section{An introduction to the proximal gradient method and ADMM} \label{sec:theo:Optimization Methods} \textit{Proximal algorithms} are algorithms for solving convex optimization problems, that rely on the use of \textit{proximal operators}. -The proximal operator $\textbf{prox}_f : \mathbb{R}^n \rightarrow \mathbb{R}^n$ +The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$ of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by \cite[Sec. 1.1]{proximal_algorithms}% % @@ -456,8 +460,8 @@ of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by This operator computes a point that is a compromise between minimizing $f$ and staying in the proximity of $\boldsymbol{v}$. The parameter $\lambda$ determines how heavily each term is weighed. -The \textit{proximal gradient method} is an iterative optimization method used to -solve problems of the form% +The \textit{proximal gradient method} is an iterative optimization method +utilizing proximal operators, used to solve problems of the form% % \begin{align*} \text{minimize}\hspace{5mm}f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right) @@ -473,11 +477,14 @@ and minimizing $g$ using the proximal operator ,\end{align*} % Since $g$ is minimized with the proximal operator and is thus not required -to be differentiable, it can be used to encode the constraints of the problem. +to be differentiable, it can be used to encode the constraints of the problem +(e.g., in the form of an \textit{indicator funcion}, as mentioned in +\cite[Sec. 1.2]{proximal_algorithms}). -A special case of convex optimization problems are \textit{linear programs}. -These are problems where the objective function is linear and the constraints -consist of linear equalities and inequalities. +The \ac{ADMM} is another optimization method. +In this thesis it will be used to solve a \textit{linear program}, which +is a special type of convex optimization problem, where the objective function +is linear and the constraints consist of linear equalities and inequalities. Generally, any linear program can be expressed in \textit{standard form}% \footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be interpreted componentwise.}