\acresetall % This is an example chapter from 'Polar Codes for Software Radio'. Do not use it but delete it! It serves as an example! % Titles should be in title case (https://en.wikipedia.org/wiki/Title_case) \chapter{An Example Chapter}\label{chapter:systemmodel} Polar codes are defined for a specific system model. The objective of this chapter is to introduce the key concepts. Notations are introduced and important terms are revisited in order to refer to them. \section{Key Channel Coding Concepts} The system model used throughout this thesis follows the remarks in~\cite{Richardson:2008:MCT} and~\cite{polar:arikan09}. It is intended to define the domain for which polar codes are developed. The objective of channel coding is to transmit information from a source to a sink over a point-to-point connection with as few errors as possible. A source wants to transmit binary data $u \in \mathcal{U} = \{0, 1\}$ to a sink where $u$ represents one draw of a binary uniformly distributed random variable. The source symbols are encoded, transmitted over a channel and decoded afterwards in order to pass an estimate $\hat{u}$ to a sink. This thesis uses a common notation for vectors which is introduced here shortly. A variable $x$ may assume any value in an alphabet $\mathcal{X}$, i.e. $x \in \mathcal{X}$. Multiple variables are combined into a (row) vector $\bm{x}^n = (x_0, \ldots x_{n-1})$ of size $n$ with $\bm{x}^n \in \mathcal{X}^n$. A subvector of $\bm{x}^n$ is denoted $\bm{x}_i^j = (x_i, \ldots x_{j-1})$ where $0 \leq i \leq j \leq n$. A vector where $i=j$ is an empty vector. A vector $\bm{x}^n$ ($n$ even) may be split into even and odd subvectors which are denoted $\bm{x}_{0,\mathrm{e}}^{n} = (x_0, x_2, \ldots x_{n-2})$, $\bm{x}_{0,\mathrm{o}}^{n} = (x_1, x_3, \ldots x_{n-1})$. This numbering convention is in accordance with~\cite{dijkstra:zerocounting}, where the author makes a strong point for this exact notation and some papers on polar codes follow it too, e.g.~\cite{polar:talvardy:howtoCC}. \subsection{Encoder} The encoder takes a frame $\bm{u}^k$ and maps it to a binary codeword $\bm{x}^n$, where $k$ and $n$ denote the vector sizes of a frame and a codeword respectively with $\bm{k} \leq n$. An ensemble of all valid codewords for an encoder is a code $\mathcal{C}$. It should be noted that $|\mathcal{C}| = |\mathcal{X}^n|$ must hold in order for the code to be able to represent every possible frame. Not all possible symbols from $\mathcal{X}^n$ are used for transmission. The difference between all $2^n$ possible codewords and the $2^k$ used codewords is called redundancy. With those two values, the code rate is defined as $r = \frac{k}{n}$. It is a measure of efficient channel usage. The encoder is assumed to be linear and to perform a one-to-one mapping of frames to codewords. A code is linear if $\alpha \bm{x} + \alpha^\prime \bm{x}^\prime \in \mathcal{C}$ for $\forall \bm{x}, \bm{x}^\prime \in \mathcal{C}$ and $\forall \alpha, \alpha^\prime \in \mathbb{F}$ hold. It should be noted that all operations are done over the Galois field GF(2) or $\mathbb{F} = \{0, 1\}$ unless stated otherwise. Then the expression can be simplified to \begin{equation} \bm{x} + \bm{x}^\prime \in \mathcal{C} \quad \textrm{for} \quad \forall \bm{x}, \bm{x}^\prime \in \mathcal{C}. \end{equation} A linear combination of two codewords must yield a codeword again. For linear codes it is possible to find a generator matrix $\bm{G} \in \mathbb{F}^{k \times n}$ and obtain a codeword from a frame with $\bm{x}^n = \bm{u}^k \bm{G}^{k \times n}$. All linear codes can be transformed into systematic form with $\bm{G} = (\bm{I}_k\ \bm{P})$. Therein, $\bm{I}_k$ is the identity matrix of size $k \times k$. If $\bm{G}$ is systematic, all elements of a frame $\bm{u}^k$ are also elements of the codeword $\bm{x}^n$. Also, a parity check matrix $\bm{H} = (-\bm{P}^\top\ \bm{I}_{n-k})$ of size $\dim\bm{H} = (n-k) \times n$ can be obtained from the systematic $\bm{G}$. The parity check matrix can be used to define the code, as $\bm{H} \bm{x}^\top = \bm{0}^\top$, $\forall \bm{x} \in \mathcal{C}$. Thus, a parity check matrix can be used to verify correct codeword reception and furthermore error correction may be performed. A code can be characterized by the minimum distance between any two codewords. In order to obtain this value we use the Hamming distance. This distance $d(\bm{v}^n,\bm{x}^n)$ equals the number of positions in $\bm{v}^n$ that differ from $\bm{x}^n$. The minimum distance of a code is than defined by $d(\mathcal{C}) = \min\{d(\bm{x},\bm{v}): \bm{x},\bm{v} \in \mathcal{C}, \bm{x} \neq \bm{v}\}$. For linear codes, the minimum distance computation can be simplified by comparing all codewords to the zero codeword $d(\mathcal{C}) = \min\{d(\bm{x},0): \bm{x} \in \mathcal{C}, \bm{x} \neq \bm{0}\}$. \subsection{Channel Model}\label{sec:channel_model} Channel coding relies on a generic channel model. Its input is $x \in \mathcal{X}$ and its distorted output is $y \in \mathcal{Y}$. A channel is denoted by $W: \mathcal{X} \rightarrow \mathcal{Y}$ along with its transition probability $W(y|x), x \in \mathcal{X}, y \in \mathcal{Y}$. A \ac{DMC} does not have memory, thus every symbol transmission is independent from any other. Combined with a binary input alphabet it is called a \ac{BDMC}. For a symmetric channel, $P(y|1) = P(-y|-1)$ must hold for an output alphabet $y \in \mathcal{Y}, \mathcal{Y} \subset \mathbb{R}$~\cite{Richardson:2008:MCT}. Assuming symmetry for a \ac{BDMC} leads to a symmetric \ac{BDMC}. In Sec.~\ref{theory:channels}, several examples of such channels are discussed. This channel concept may be extended to vector channels. A vector channel $W^n$ corresponds to $n$ independent uses of a channel $W$ which is denoted as $W^n : \mathcal{X}^n \rightarrow \mathcal{Y}^n$. Also, vector transition probabilities are denoted $W^n(\bm{y}^n|\bm{x}^n) = \prod_{i=0}^{n-1} W(y_i|x_i)$. \subsection{Decoder} A decoder receives a possibly erroneous codeword $\bm{y}$ and checks its validity by asserting $\bm{H} \bm{y}^\top = \bm{0}^\top$, thus performing error detection. A more sophisticated decoder tries to correct errors by using redundant information transmitted in a codeword. An optimal decoder strategy is to maximize the \emph{a posteriori} probability. Given the probability of each codeword $P(\bm{x})$ and the channel transition probability $P(\bm{y}|\bm{x})$, the task at hand is to find the most likely transmitted codeword $\bm{x}$ under the observation $\bm{y}$, $P(\bm{x}|\bm{y})$. This is denoted \begin{equation} \hat{\bm{x}}^{\text{MAP}} = \argmax_{\bm{x} \in \mathcal{C}} P(\bm{x}|\bm{y}) \stackrel{(i)}{=} \argmax_{\bm{x} \in \mathcal{C}} P(\bm{y}|\bm{x}) \frac{P(\bm{x})}{P(\bm{y})} \stackrel{(ii)}{=} \argmax_{\bm{x} \in \mathcal{C}} P(\bm{y}|\bm{x}) P(\bm{x}) \end{equation} where we have used Bayes' rule in $(i)$ and the simplification in $(ii)$ is due to the fact that $P(\bm{y})$ is constant and does not change when varying $\bm{x}$. Assume that every codeword is transmitted with identical probability $P(\bm{x}) = P(\bm{v})$, $\forall \bm{x}, \bm{v} \in \mathcal{C}$. This simplifies the equation and yields the \ac{ML} decoder \begin{equation} \hat{\bm{x}}^{\text{ML}} = \argmax_{\bm{x} \in \mathcal{C}} P(\bm{y}|\bm{x}) \end{equation} which estimates the most likely codeword to be transmitted given a received possibly erroneous codeword~\cite{Richardson:2008:MCT}. In conclusion, the task at hand is to find a code which inserts redundancy intelligently, so a decoder can use this information to detect and correct transmission errors. \subsection{Asymptotically Good Codes}\label{theory:repetition_code} A repetition code is a very simple code which helps clarify certain key concepts in the channel coding domain. Assume that the encoder and decoder use a repetition code. For example, a repetition code with $k=1$ and $n = 3$ has two codewords $\mathcal{C} = \{(0,0,0), (1,1,1)\}$. Thus in this example $r=\frac{1}{3}$. We can also obtain its generator and parity check matrices as \begin{equation} \bm{G} = \begin{pmatrix} 1 & 1 & 1 \end{pmatrix},\qquad \bm{H} = \begin{pmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{pmatrix}\,. \end{equation} The parity check matrix $\bm{H}$ can be used to detect if a transmission error occurred by verifying if $\bm{H} \bm{x}^\top = \bm{0}^\top$. In the case where an error occurred, an \ac{ML} decoder for a \ac{BSC} carries out a majority decision to estimate the most likely codeword. Repetition codes shed light on a problem common to a lot of codes. If the reliability of a code needs to be improved, it comes at the expense of a lower code rate. Increasing $n$ comes at the expense of decreasing $r = \frac{1}{n}$ because $k=1$ for all repetition codes. Thus for a very reliable repetition code has a vanishing rate as $\lim_{n \to \infty} r = 0$. The above results leads to the definition of asymptotically good codes $\mathcal{C}(n_s, k_s, d_s)$ \cite{Friedrichs:2010:error-control-coding}. Two properties must hold for this class of codes: \begin{equation} R = \lim_{s \to \infty} \frac{k_s}{n_s} > 0 \quad \textrm{and} \quad \lim_{s \to \infty} \frac{d_s}{n_s} > 0. \end{equation} The code rate must be positive ($>0$) for all codes. Repetition codes do not satisfy this property, for example. Furthermore, the distance between codewords must grow proportionally to the code block size $n$. \section{Channels}\label{theory:channels} Several common channel models exist to describe the characteristics of a physical transmission. Common properties were discussed in Sec.~\ref{sec:channel_model} whereas in this section, the differences are targeted. The three most important channel models for polar codes are presented, namely the \ac{BSC}, the \ac{BEC} and the \ac{AWGN} channel. \subsection{AWGN Channel} An \ac{AWGN} channel as used in this thesis has a binary input alphabet and a continuous output alphabet $\mathcal{Y} = \mathbb{R}$. Each input symbol is affected by Gaussian noise to yield an output symbol. \subsection{Capacity and Reliability} Channels are often characterized by two important measures: their capacity and their reliability. These measures are introduced in this section. The channel capacity for symmetric \acp{BDMC} with input alphabet $\mathcal{X} = \{0,1\}$ can be calculated by \begin{equation} I(W) = \frac{1}{2} \sum_{y \in \mathcal{Y}} \sum_{x \in \mathcal{X}} W(y|x) \log_2 \frac{W(y|x)}{\frac{1}{2} (W(y|0) + W(y|1))}, \end{equation} where we assume equiprobable channel input symbols $P(X=0) = P(X=1) = \frac{1}{2}$, which is the capacity-achieving input distribution for symmetric \acp{BDMC}. The capacity defines the highest rate at which a reliable transmission (i.e., with a vanishing error probability after decoding) over a channel $W$ can be realized. It is also called the Shannon capacity~\cite{sha49} for symmetric channels. The Bhattacharyya parameter \begin{equation} Z(W) = \sum_{y \in \mathcal{Y}} \sqrt{W(y|0) W(y|1)} \end{equation} is used to quantify a channel's reliability where a lower value for $Z(W)$ indicates higher reliability. Also, an upper \ac{ML} decision error bound is given by $Z(W)$~\cite{polar:arikan09}.