ba-thesis/latex/thesis/chapters/systemmodel.tex

\acresetall
% This is an example chapter from 'Polar Codes for Software Radio'. Do not use it but delete it! It serves as an example!
% Titles should be in title case (https://en.wikipedia.org/wiki/Title_case)
\chapter{An Example Chapter}\label{chapter:systemmodel}
Polar codes are defined for a specific system model.
The objective of this chapter is to introduce the key concepts.
Notations are introduced and important terms are revisited in order to refer to them.

\section{Key Channel Coding Concepts}
The system model used throughout this thesis follows the remarks in~\cite{Richardson:2008:MCT} and~\cite{polar:arikan09}.
It is intended to define the domain for which polar codes are developed.

The objective of channel coding is to transmit information from a source to a sink over a point-to-point connection with as few errors as possible.
A source wants to transmit binary data  $u \in \mathcal{U} = \{0, 1\}$ to a sink where $u$ represents one draw of a binary uniformly distributed random variable.
The source symbols are encoded, transmitted over a channel and decoded afterwards in order to pass an estimate $\hat{u}$ to a sink.

This thesis uses a common notation for vectors which is introduced here shortly.
A variable $x$ may assume any value in an alphabet $\mathcal{X}$, i.e. $x \in \mathcal{X}$.
Multiple variables are combined into a (row) vector $\bm{x}^n = (x_0, \ldots  x_{n-1})$ of size $n$ with  $\bm{x}^n \in \mathcal{X}^n$.
A subvector of $\bm{x}^n$ is denoted $\bm{x}_i^j = (x_i, \ldots  x_{j-1})$ where $0 \leq i \leq j \leq n$.
A vector where $i=j$ is an empty vector.
A vector $\bm{x}^n$ ($n$ even) may be split into even and odd subvectors which are denoted $\bm{x}_{0,\mathrm{e}}^{n} = (x_0, x_2, \ldots x_{n-2})$, $\bm{x}_{0,\mathrm{o}}^{n} = (x_1, x_3, \ldots x_{n-1})$.
This numbering convention is in accordance with~\cite{dijkstra:zerocounting}, where the author makes a strong point for this exact notation and some papers on polar codes follow it too, e.g.~\cite{polar:talvardy:howtoCC}.


\subsection{Encoder}
The encoder takes a frame $\bm{u}^k$ and maps it to a binary codeword $\bm{x}^n$, where $k$ and $n$ denote the vector sizes of a frame and a codeword respectively with $\bm{k} \leq n$.
An ensemble of all valid codewords for an encoder is a code $\mathcal{C}$.
It should be noted that $|\mathcal{C}| = |\mathcal{X}^n|$ must hold in order for the code to be able to represent every possible frame.

Not all possible symbols from $\mathcal{X}^n$ are used for transmission.
The difference between all $2^n$ possible codewords  and the $2^k$ used codewords  is called redundancy.
With those two values, the code rate is defined as $r = \frac{k}{n}$.
It is a measure of efficient channel usage.

The encoder is assumed to be linear and to perform a one-to-one mapping of frames to codewords.
A code is linear if $\alpha \bm{x} + \alpha^\prime \bm{x}^\prime \in \mathcal{C}$ for $\forall \bm{x}, \bm{x}^\prime \in \mathcal{C}$ and $\forall \alpha, \alpha^\prime \in \mathbb{F}$ hold.
It should be noted that all operations are done over the Galois field GF(2) or $\mathbb{F} = \{0, 1\}$ unless stated otherwise.
Then the expression can be simplified to
\begin{equation}
 \bm{x} + \bm{x}^\prime \in \mathcal{C} \quad \textrm{for} \quad \forall \bm{x}, \bm{x}^\prime \in \mathcal{C}.
\end{equation}
A linear combination of two codewords must yield a codeword again.

For linear codes it is possible to find a generator matrix $\bm{G} \in \mathbb{F}^{k \times n}$ and obtain a codeword from a frame with $\bm{x}^n = \bm{u}^k \bm{G}^{k \times n}$.
All linear codes can be transformed into systematic form with $\bm{G} = (\bm{I}_k\ \bm{P})$. Therein, $\bm{I}_k$ is the identity matrix of size $k \times k$.
If $\bm{G}$ is systematic, all elements of a frame $\bm{u}^k$ are also elements of the codeword $\bm{x}^n$.
Also, a parity check matrix $\bm{H} = (-\bm{P}^\top\ \bm{I}_{n-k})$ of size $\dim\bm{H} = (n-k) \times n$ can be obtained from the systematic $\bm{G}$.
The parity check matrix can be used to define the code, as $\bm{H} \bm{x}^\top = \bm{0}^\top$, $\forall \bm{x} \in \mathcal{C}$.
Thus, a parity check matrix can be used to verify correct codeword reception and furthermore error correction may be performed.

A code can be characterized by the minimum distance between any two codewords.
In order to obtain this value we use the Hamming distance.
This distance $d(\bm{v}^n,\bm{x}^n)$ equals the number of positions in $\bm{v}^n$ that differ from $\bm{x}^n$.
The minimum distance of a code is than defined by $d(\mathcal{C}) = \min\{d(\bm{x},\bm{v}): \bm{x},\bm{v} \in \mathcal{C}, \bm{x} \neq \bm{v}\}$.
For linear codes, the minimum distance computation can be simplified by comparing all codewords to the zero codeword $d(\mathcal{C}) = \min\{d(\bm{x},0): \bm{x} \in \mathcal{C}, \bm{x} \neq \bm{0}\}$.

\subsection{Channel Model}\label{sec:channel_model}
Channel coding relies on a generic channel model.
Its input is $x \in \mathcal{X}$ and its distorted output is $y \in \mathcal{Y}$.
A channel is denoted by $W: \mathcal{X} \rightarrow \mathcal{Y}$ along with its transition probability $W(y|x), x \in \mathcal{X}, y \in \mathcal{Y}$.
A \ac{DMC} does not have memory, thus every symbol transmission is independent from any other.
Combined with a binary input alphabet it is called a \ac{BDMC}.
For a symmetric channel, $P(y|1) = P(-y|-1)$ must hold for an output alphabet $y \in \mathcal{Y}, \mathcal{Y} \subset \mathbb{R}$~\cite{Richardson:2008:MCT}.
Assuming symmetry for a \ac{BDMC} leads to a symmetric \ac{BDMC}.
In Sec.~\ref{theory:channels}, several examples of such channels are discussed.

This channel concept may be extended to vector channels.
A vector channel $W^n$ corresponds to $n$ independent uses of a channel $W$ which is denoted as $W^n : \mathcal{X}^n \rightarrow \mathcal{Y}^n$.
Also, vector transition probabilities are denoted $W^n(\bm{y}^n|\bm{x}^n) = \prod_{i=0}^{n-1} W(y_i|x_i)$.

\subsection{Decoder}
A decoder receives a possibly erroneous codeword $\bm{y}$ and checks its validity by asserting $\bm{H} \bm{y}^\top = \bm{0}^\top$, thus performing error detection.
A more sophisticated decoder tries to correct errors by using redundant information transmitted in a codeword.
An optimal decoder strategy is to maximize the \emph{a posteriori} probability.
Given the probability of each codeword $P(\bm{x})$ and the channel transition probability $P(\bm{y}|\bm{x})$, the task at hand is to find the most likely transmitted codeword $\bm{x}$ under the observation $\bm{y}$, $P(\bm{x}|\bm{y})$.
This is denoted
\begin{equation}
 \hat{\bm{x}}^{\text{MAP}} = \argmax_{\bm{x} \in \mathcal{C}} P(\bm{x}|\bm{y})  \stackrel{(i)}{=} \argmax_{\bm{x} \in \mathcal{C}} P(\bm{y}|\bm{x}) \frac{P(\bm{x})}{P(\bm{y})} \stackrel{(ii)}{=} \argmax_{\bm{x} \in \mathcal{C}} P(\bm{y}|\bm{x}) P(\bm{x})
\end{equation}
where we have used Bayes' rule in $(i)$ and the simplification in $(ii)$ is due to the fact that $P(\bm{y})$ is constant and does not change when varying $\bm{x}$.

Assume that every codeword is transmitted with identical probability $P(\bm{x}) = P(\bm{v})$, $\forall \bm{x}, \bm{v} \in \mathcal{C}$.
This simplifies the equation and yields the \ac{ML} decoder
\begin{equation}
 \hat{\bm{x}}^{\text{ML}} = \argmax_{\bm{x} \in \mathcal{C}} P(\bm{y}|\bm{x})
\end{equation}
which estimates the most likely codeword to be transmitted given a received possibly erroneous codeword~\cite{Richardson:2008:MCT}.
In conclusion, the task at hand is to find a code which inserts redundancy intelligently, so a decoder can use this information to detect and correct transmission errors.

\subsection{Asymptotically Good Codes}\label{theory:repetition_code}
A repetition code is a very simple code which helps clarify certain key concepts in the channel coding domain.
Assume that the encoder and decoder use a repetition code.
For example, a repetition code with $k=1$ and $n = 3$ has two codewords $\mathcal{C} = \{(0,0,0), (1,1,1)\}$.
Thus in this example $r=\frac{1}{3}$.
We can also obtain its generator and parity check matrices as
\begin{equation}
 \bm{G} = \begin{pmatrix} 1 & 1 & 1 \end{pmatrix},\qquad \bm{H} = \begin{pmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{pmatrix}\,.
\end{equation}
The parity check matrix $\bm{H}$ can be used to detect if a transmission error occurred by verifying if $\bm{H} \bm{x}^\top = \bm{0}^\top$.
In the case where an error occurred, an \ac{ML} decoder for a \ac{BSC} carries out a majority decision to estimate the most likely codeword.

Repetition codes shed light on a problem common to a lot of codes.
If the reliability of a code needs to be improved, it comes at the expense of a lower code rate.
Increasing $n$ comes at the expense of decreasing $r = \frac{1}{n}$ because $k=1$ for all repetition codes.
Thus for a very reliable repetition code has a vanishing rate as $\lim_{n \to \infty} r = 0$.

The above results leads to the definition of asymptotically good codes $\mathcal{C}(n_s, k_s, d_s)$ \cite{Friedrichs:2010:error-control-coding}.
Two properties must hold for this class of codes:
\begin{equation}
 R = \lim_{s \to \infty} \frac{k_s}{n_s} > 0 \quad \textrm{and} \quad  \lim_{s \to \infty} \frac{d_s}{n_s} > 0.
\end{equation}
The code rate must be positive ($>0$) for all codes. Repetition codes do not satisfy this property, for example. Furthermore, the distance between codewords must grow proportionally to the code block size $n$.

\section{Channels}\label{theory:channels}
Several common channel models exist to describe the characteristics of a physical transmission.
Common properties were discussed in Sec.~\ref{sec:channel_model} whereas in this section, the differences are targeted.
The three most important channel models for polar codes are presented, namely the \ac{BSC}, the \ac{BEC} and the \ac{AWGN} channel.

\subsection{AWGN Channel}
An \ac{AWGN} channel as used in this thesis has a binary input alphabet and a continuous output alphabet $\mathcal{Y} = \mathbb{R}$.
Each input symbol is affected by Gaussian noise to yield an output symbol.

\subsection{Capacity and Reliability}
Channels are often characterized by two important measures: their capacity and their reliability.
These measures are introduced in this section. The channel capacity for symmetric \acp{BDMC} with input alphabet $\mathcal{X} = \{0,1\}$ can be calculated by
\begin{equation}
 I(W) = \frac{1}{2} \sum_{y \in \mathcal{Y}} \sum_{x \in \mathcal{X}} W(y|x) \log_2 \frac{W(y|x)}{\frac{1}{2} (W(y|0) + W(y|1))},
\end{equation}
where we assume equiprobable channel input symbols $P(X=0) = P(X=1) = \frac{1}{2}$, which is the capacity-achieving input distribution for symmetric \acp{BDMC}.
The capacity defines the highest rate at which a reliable transmission (i.e., with a vanishing error probability after decoding) over a channel $W$ can be realized.
It is also called the Shannon capacity~\cite{sha49} for symmetric channels.

The Bhattacharyya parameter
\begin{equation}
 Z(W) = \sum_{y \in \mathcal{Y}} \sqrt{W(y|0) W(y|1)}
\end{equation}
is used to quantify a channel's reliability where a lower value for $Z(W)$ indicates higher reliability.
Also, an upper \ac{ML} decision error bound is given by $Z(W)$~\cite{polar:arikan09}.