First round of corrections
This commit is contained in:
parent
c088a92b3b
commit
0b12fcb419
@ -508,7 +508,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
legend columns=1,
|
legend columns=1,
|
||||||
legend pos=outer north east,
|
legend pos=outer north east,
|
||||||
@ -549,7 +549,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
legend columns=1,
|
legend columns=1,
|
||||||
legend pos=outer north east,
|
legend pos=outer north east,
|
||||||
@ -593,7 +593,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
legend columns=1,
|
legend columns=1,
|
||||||
legend pos=outer north east,
|
legend pos=outer north east,
|
||||||
@ -647,7 +647,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
legend columns=1,
|
legend columns=1,
|
||||||
legend pos=outer north east,
|
legend pos=outer north east,
|
||||||
@ -692,7 +692,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
legend columns=1,
|
legend columns=1,
|
||||||
legend pos=outer north east,
|
legend pos=outer north east,
|
||||||
@ -735,7 +735,7 @@ $\gamma \in \left\{ 0.01, 0.05, 0.15 \right\}$.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
legend columns=1,
|
legend columns=1,
|
||||||
legend pos=outer north east,
|
legend pos=outer north east,
|
||||||
|
|||||||
@ -340,7 +340,7 @@ algorithms.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
ymax=1.5, ymin=8e-5,
|
ymax=1.5, ymin=8e-5,
|
||||||
width=\textwidth,
|
width=\textwidth,
|
||||||
@ -376,7 +376,7 @@ algorithms.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
ymax=1.5, ymin=8e-5,
|
ymax=1.5, ymin=8e-5,
|
||||||
width=\textwidth,
|
width=\textwidth,
|
||||||
@ -414,7 +414,7 @@ algorithms.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
ymax=1.5, ymin=8e-5,
|
ymax=1.5, ymin=8e-5,
|
||||||
width=\textwidth,
|
width=\textwidth,
|
||||||
@ -455,7 +455,7 @@ algorithms.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
ymax=1.5, ymin=8e-5,
|
ymax=1.5, ymin=8e-5,
|
||||||
width=\textwidth,
|
width=\textwidth,
|
||||||
@ -490,7 +490,7 @@ algorithms.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
ymax=1.5, ymin=8e-5,
|
ymax=1.5, ymin=8e-5,
|
||||||
width=\textwidth,
|
width=\textwidth,
|
||||||
@ -523,7 +523,7 @@ algorithms.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
ymax=1.5, ymin=8e-5,
|
ymax=1.5, ymin=8e-5,
|
||||||
width=\textwidth,
|
width=\textwidth,
|
||||||
|
|||||||
@ -2,15 +2,15 @@
|
|||||||
\label{chapter:introduction}
|
\label{chapter:introduction}
|
||||||
|
|
||||||
Channel coding using binary linear codes is a way of enhancing the reliability
|
Channel coding using binary linear codes is a way of enhancing the reliability
|
||||||
of data by detecting and correcting any errors that may have occurred during
|
of data by detecting and correcting any errors that may occur during
|
||||||
transmission or storage.
|
its transmission or storage.
|
||||||
One class of binary linear codes, \ac{LDPC} codes, has become especially
|
One class of binary linear codes, \ac{LDPC} codes, has become especially
|
||||||
popular due to being able to reach arbitrarily small probabilities of error
|
popular due to being able to reach arbitrarily small probabilities of error
|
||||||
at code rates up to the capacity of the channel, while retaining a structure
|
at code rates up to the capacity of the channel, while retaining a structure
|
||||||
that allows for very efficient decoding.
|
that allows for very efficient decoding.
|
||||||
While the established decoders for \ac{LDPC} codes, such as \ac{BP} and the
|
While the established decoders for \ac{LDPC} codes, such as \ac{BP} and the
|
||||||
\textit{min-sum algorithm}, offer reasonable performance, they are suboptimal
|
\textit{min-sum algorithm}, offer reasonable decoding performance, they are suboptimal
|
||||||
in most cases and exhibit a so called \textit{error floor} for high \acp{SNR},
|
in most cases and exhibit an \textit{error floor} for high \acp{SNR},
|
||||||
making them unsuitable for applications with extreme reliability requiremnts.
|
making them unsuitable for applications with extreme reliability requiremnts.
|
||||||
Optimization based decoding algorithms are an entirely different way of approaching
|
Optimization based decoding algorithms are an entirely different way of approaching
|
||||||
the decoding problem, in some cases coming with stronger theoretical guarantees
|
the decoding problem, in some cases coming with stronger theoretical guarantees
|
||||||
@ -22,10 +22,10 @@ the existing literature by considering a variety of different codes.
|
|||||||
Specifically, the \textit{proximal decoding} \cite{proximal_paper}
|
Specifically, the \textit{proximal decoding} \cite{proximal_paper}
|
||||||
algorithm and \ac{LP} decoding using the \ac{ADMM} \cite{original_admm} are explored.
|
algorithm and \ac{LP} decoding using the \ac{ADMM} \cite{original_admm} are explored.
|
||||||
The two algorithms are analyzed based on their theoretical structure
|
The two algorithms are analyzed based on their theoretical structure
|
||||||
and on results of simulations conducted in the scope of this work.
|
and based on the results of the simulations conducted in the scope of this work.
|
||||||
Approaches to determine the optimal value of each parameter are derived
|
Approaches to determine the optimal value of each parameter are derived
|
||||||
and the computational and decoding performance of the algorithms is examined.
|
and the computational and decoding performance of the algorithms is examined.
|
||||||
An improvement on proximal decoding is suggested, offering up to $\SI{1}{dB}$
|
An improvement on proximal decoding is suggested, achieving up to $\SI{1}{dB}$
|
||||||
of gain in decoding performance, depending on the parameters chosen and the
|
of gain, depending on the parameters chosen and the
|
||||||
code considered.
|
code considered.
|
||||||
|
|
||||||
|
|||||||
@ -256,12 +256,15 @@ process and straightforward debugging ability.
|
|||||||
It was subsequently reimplemented in C++ using the Eigen%
|
It was subsequently reimplemented in C++ using the Eigen%
|
||||||
\footnote{\url{https://eigen.tuxfamily.org}}
|
\footnote{\url{https://eigen.tuxfamily.org}}
|
||||||
linear algebra library to achieve higher performance.
|
linear algebra library to achieve higher performance.
|
||||||
The focus has been set on a fast implementation, sometimes at the expense of
|
The focus has been on a fast implementation, sometimes at the expense of
|
||||||
memory usage, somewhat limiting the size of the codes the implementation can be
|
memory usage, somewhat limiting the size of the codes the implementation can be
|
||||||
used with.
|
used with.
|
||||||
The evaluation of the simulation results has been wholly realized in Python.
|
The evaluation of decoding operations and subsequent calculation of \acp{BER},
|
||||||
|
\acp{FER}, etc., has been wholly realized in Python.
|
||||||
|
|
||||||
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}
|
Concerning the proximal decoding algorithm itself, there are certain aspects
|
||||||
|
presenting optimization opportunities during the implementation.
|
||||||
|
The gradient of the code-constraint polynomial \cite[Sec. 2.3]{proximal_paper}, for example,
|
||||||
is given by%
|
is given by%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
@ -279,7 +282,7 @@ is given by%
|
|||||||
%
|
%
|
||||||
Since the products
|
Since the products
|
||||||
$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$
|
$\prod_{i\in N_c\left( j \right) } \tilde{x}_i,\hspace{2mm}j\in \mathcal{J}$
|
||||||
are the same for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be
|
are identical for all components $\tilde{x}_k$ of $\tilde{\boldsymbol{x}}$, they can be
|
||||||
precomputed.
|
precomputed.
|
||||||
Defining%
|
Defining%
|
||||||
%
|
%
|
||||||
@ -325,7 +328,7 @@ The impact of the parameters $\gamma$, as well as $\omega$, $K$ and $\eta$ is
|
|||||||
examined.
|
examined.
|
||||||
The decoding performance is assessed based on the \ac{BER} and the
|
The decoding performance is assessed based on the \ac{BER} and the
|
||||||
\ac{FER} as well as the \textit{decoding failure rate} - the rate at which
|
\ac{FER} as well as the \textit{decoding failure rate} - the rate at which
|
||||||
the algorithm produces invalid results.
|
the algorithm produces results that are not valid codewords.
|
||||||
The convergence properties are reviewed and related to the decoding
|
The convergence properties are reviewed and related to the decoding
|
||||||
performance.
|
performance.
|
||||||
Finally, the computational performance is examined on a theoretical basis
|
Finally, the computational performance is examined on a theoretical basis
|
||||||
@ -335,8 +338,9 @@ thesis.
|
|||||||
All simulation results presented hereafter are based on Monte Carlo
|
All simulation results presented hereafter are based on Monte Carlo
|
||||||
simulations.
|
simulations.
|
||||||
The \ac{BER} and \ac{FER} curves in particular have been generated by
|
The \ac{BER} and \ac{FER} curves in particular have been generated by
|
||||||
producing at least 100 frame-errors for each data point, unless otherwise
|
producing at least 100 frame errors for each data point, unless otherwise
|
||||||
stated.
|
stated.
|
||||||
|
\todo{Same text about monte carlo simulations and frame errors for admm}
|
||||||
|
|
||||||
|
|
||||||
\subsection{Choice of Parameters}
|
\subsection{Choice of Parameters}
|
||||||
@ -478,10 +482,10 @@ significantly affects the decoding performance, there is not much benefit
|
|||||||
attainable in undertaking an extensive search for an exact optimum.
|
attainable in undertaking an extensive search for an exact optimum.
|
||||||
Rather, a preliminary examination providing a rough window for $\gamma$ may
|
Rather, a preliminary examination providing a rough window for $\gamma$ may
|
||||||
be sufficient.
|
be sufficient.
|
||||||
When examining a number of different codes (figure
|
When examining a number of different codes, see figure
|
||||||
\ref{fig:prox:results_3d_multiple}), it is apparent that while the exact
|
\ref{fig:prox:results_3d_multiple}, it is apparent that while the exact
|
||||||
landscape of the graph depends on the code, the general behavior is the same
|
landscape of the graph depends on the code, the general behavior is the same
|
||||||
in each case.
|
for all codes analyzed in this thesis.
|
||||||
|
|
||||||
The parameter $\gamma$ describes the step-size for the optimization step
|
The parameter $\gamma$ describes the step-size for the optimization step
|
||||||
dealing with the code-constraint polynomial;
|
dealing with the code-constraint polynomial;
|
||||||
@ -490,7 +494,7 @@ negative-log likelihood.
|
|||||||
The relationship between $\omega$ and $\gamma$ is portrayed in figure
|
The relationship between $\omega$ and $\gamma$ is portrayed in figure
|
||||||
\ref{fig:prox:gamma_omega}.
|
\ref{fig:prox:gamma_omega}.
|
||||||
The color of each cell indicates the \ac{BER} when the corresponding values
|
The color of each cell indicates the \ac{BER} when the corresponding values
|
||||||
are chosen for the parameters.
|
are chosen for the decoding.
|
||||||
The \ac{SNR} is kept constant at $\SI{4}{dB}$.
|
The \ac{SNR} is kept constant at $\SI{4}{dB}$.
|
||||||
The \ac{BER} exhibits similar behavior in its dependency on $\omega$ and
|
The \ac{BER} exhibits similar behavior in its dependency on $\omega$ and
|
||||||
on $\gamma$: it is minimized when keeping the value within certain
|
on $\gamma$: it is minimized when keeping the value within certain
|
||||||
@ -547,7 +551,7 @@ error is observed during each iteration of the decoding process, for several
|
|||||||
different \acp{SNR}.
|
different \acp{SNR}.
|
||||||
The plots have been generated by averaging the error over $\SI{500000}{}$
|
The plots have been generated by averaging the error over $\SI{500000}{}$
|
||||||
decodings.
|
decodings.
|
||||||
As some decodings go one for more iterations than others, the number of values
|
As some decodings go on for more iterations than others, the number of values
|
||||||
which are averaged for each datapoints vary.
|
which are averaged for each datapoints vary.
|
||||||
This explains the dip visible in all curves around the 20th iteration, since
|
This explains the dip visible in all curves around the 20th iteration, since
|
||||||
after this point more and more correct decodings are completed,
|
after this point more and more correct decodings are completed,
|
||||||
@ -558,7 +562,7 @@ timing requirements of the decoding process.
|
|||||||
Another aspect to consider is that the higher the \ac{SNR}, the fewer
|
Another aspect to consider is that the higher the \ac{SNR}, the fewer
|
||||||
decodings are present at each iteration
|
decodings are present at each iteration
|
||||||
to average, since a solution is found earlier.
|
to average, since a solution is found earlier.
|
||||||
This explains the decreasing smoothness of the lines as the \ac{SNR} rises.
|
This explains the decreasing smoothness of the lines as the \ac{SNR} increases.
|
||||||
Remarkably, the \ac{SNR} seems to not have any impact on the number of
|
Remarkably, the \ac{SNR} seems to not have any impact on the number of
|
||||||
iterations necessary to reach the point at which the average error
|
iterations necessary to reach the point at which the average error
|
||||||
stabilizes.
|
stabilizes.
|
||||||
@ -609,7 +613,7 @@ optimum values for the parameters $\gamma$ and $\omega$ appears to bring
|
|||||||
limited benefit;
|
limited benefit;
|
||||||
an initial rudimentary examination to find the general bounds in which the two
|
an initial rudimentary examination to find the general bounds in which the two
|
||||||
values should lie is sufficient.
|
values should lie is sufficient.
|
||||||
The parameter $K$ is independent of the $SNR$ and raising its value above a
|
The parameter $K$ is independent of the \ac{SNR} and raising its value above a
|
||||||
certain threshold does not improve the decoding performance.
|
certain threshold does not improve the decoding performance.
|
||||||
The choice of $\eta$ is insignificant and the parameter is only relevant as a
|
The choice of $\eta$ is insignificant and the parameter is only relevant as a
|
||||||
means to bring about numerical stability.
|
means to bring about numerical stability.
|
||||||
@ -699,7 +703,7 @@ means to bring about numerical stability.
|
|||||||
|
|
||||||
Until now, only the \ac{BER} has been considered to gauge the decoding
|
Until now, only the \ac{BER} has been considered to gauge the decoding
|
||||||
performance.
|
performance.
|
||||||
The \ac{FER}, however, shows considerably worse behavior, as can be seen in
|
The \ac{FER}, however, shows considerably different behavior, as can be seen in
|
||||||
figure \ref{fig:prox:ber_fer_dfr}.
|
figure \ref{fig:prox:ber_fer_dfr}.
|
||||||
Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
|
Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
|
||||||
\textit{decoding failure rate}.
|
\textit{decoding failure rate}.
|
||||||
@ -719,7 +723,8 @@ This leads to the hypothesis that, at least for higher \acp{SNR}, frame errors
|
|||||||
arise mainly due to the non-convergence of the algorithm instead of
|
arise mainly due to the non-convergence of the algorithm instead of
|
||||||
convergence to the wrong codeword.
|
convergence to the wrong codeword.
|
||||||
This course of thought will be picked up in section
|
This course of thought will be picked up in section
|
||||||
\ref{sec:prox:Improved Implementation} to try to improve the algorithm.
|
\ref{sec:prox:Improved Implementation} when proposing a method toimprove the
|
||||||
|
algorithm.
|
||||||
|
|
||||||
In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding
|
In summary, the \ac{BER} and \ac{FER} indicate dissimilar decoding
|
||||||
performance.
|
performance.
|
||||||
@ -730,10 +735,11 @@ the frame errors may largely be attributed to decoding failures.
|
|||||||
\subsection{Convergence Properties}
|
\subsection{Convergence Properties}
|
||||||
\label{subsec:prox:conv_properties}
|
\label{subsec:prox:conv_properties}
|
||||||
|
|
||||||
The previous observation, that the \ac{FER} may arise mainly due to the
|
The previous observation that the \ac{FER} may arise mainly due to the
|
||||||
non-convergence of the algorithm instead of convergence to the wrong codeword,
|
non-convergence of the algorithm instead of convergence to the wrong codeword,
|
||||||
raises the question why the decoding process does not converge so often.
|
raises the question why the decoding process does not converge so often.
|
||||||
In figure \ref{fig:prox:convergence}, the iterative process is visualized.
|
To better understand this issue, the iterative process is visualized in
|
||||||
|
figure \ref{fig:prox:convergence}.
|
||||||
In order to be able to simultaneously consider all components of the vectors
|
In order to be able to simultaneously consider all components of the vectors
|
||||||
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
|
being dealt with, a BCH code with $n=7$ and $k=4$ is chosen.
|
||||||
Each plot shows one component of the current estimate during a given
|
Each plot shows one component of the current estimate during a given
|
||||||
@ -961,7 +967,7 @@ As such, the constraints are not being satisfied and the estimate is not
|
|||||||
converging towards a valid codeword.
|
converging towards a valid codeword.
|
||||||
|
|
||||||
While figure \ref{fig:prox:convergence} shows only one instance of a decoding
|
While figure \ref{fig:prox:convergence} shows only one instance of a decoding
|
||||||
task, with no statistical significance, it is indicative of the general
|
task with no statistical significance, it is indicative of the general
|
||||||
behavior of the algorithm.
|
behavior of the algorithm.
|
||||||
This can be justified by looking at the gradients themselves.
|
This can be justified by looking at the gradients themselves.
|
||||||
In figure \ref{fig:prox:gradients} the gradients of the negative
|
In figure \ref{fig:prox:gradients} the gradients of the negative
|
||||||
@ -1087,7 +1093,7 @@ value of the parameter $\gamma$ has to be kept small, as mentioned in section
|
|||||||
\ref{sec:prox:Decoding Algorithm}.
|
\ref{sec:prox:Decoding Algorithm}.
|
||||||
Local minima are introduced between the codewords, in the areas in which it is
|
Local minima are introduced between the codewords, in the areas in which it is
|
||||||
not immediately clear which codeword is the most likely one.
|
not immediately clear which codeword is the most likely one.
|
||||||
Raising the value of $\gamma$ results in
|
Increasing the value of $\gamma$ results in
|
||||||
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
|
$h \left( \tilde{\boldsymbol{x}} \right)$ dominating the landscape of the
|
||||||
objective function, thereby introducing these local minima into the objective
|
objective function, thereby introducing these local minima into the objective
|
||||||
function.
|
function.
|
||||||
@ -1099,7 +1105,7 @@ visualized for one component of a code with $n=204$, for a single decoding.
|
|||||||
The two gradients still eventually oppose each other and the estimate still
|
The two gradients still eventually oppose each other and the estimate still
|
||||||
starts to oscillate, the same as illustrated in figure
|
starts to oscillate, the same as illustrated in figure
|
||||||
\ref{fig:prox:convergence} based on a code with $n=7$.
|
\ref{fig:prox:convergence} based on a code with $n=7$.
|
||||||
However, in this case, the gradient of the code-constraint polynomial iself
|
However, in this case, the gradient of the code-constraint polynomial itself
|
||||||
starts to oscillate, its average value being such that the effect of the
|
starts to oscillate, its average value being such that the effect of the
|
||||||
gradient of the negative log-likelihood is counteracted.
|
gradient of the negative log-likelihood is counteracted.
|
||||||
|
|
||||||
@ -1171,7 +1177,7 @@ The codes considered are the BCH(31, 11) and BCH(31, 26) codes, a number of (3,
|
|||||||
regular \ac{LDPC} codes (\cite[\text{96.3.965, 204.33.484, 408.33.844}]{mackay_enc}),
|
regular \ac{LDPC} codes (\cite[\text{96.3.965, 204.33.484, 408.33.844}]{mackay_enc}),
|
||||||
a (5,10) regular \ac{LDPC} code (\cite[\text{204.55.187}]{mackay_enc}) and a
|
a (5,10) regular \ac{LDPC} code (\cite[\text{204.55.187}]{mackay_enc}) and a
|
||||||
progressive edge growth construction code (\cite[\text{PEGReg252x504}]{mackay_enc}).
|
progressive edge growth construction code (\cite[\text{PEGReg252x504}]{mackay_enc}).
|
||||||
Some deviations from linear behavior are unavoidable because not all codes
|
Some deviations from linear behavior are unavoidable, since not all codes
|
||||||
considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
|
considered are actually \ac{LDPC} codes, or \ac{LDPC} codes constructed
|
||||||
according to the same scheme.
|
according to the same scheme.
|
||||||
Nonetheless, a generally linear relationship between the average time needed to
|
Nonetheless, a generally linear relationship between the average time needed to
|
||||||
@ -1228,7 +1234,7 @@ And indeed, the magnitude of the oscillation of
|
|||||||
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
|
$\nabla h\left( \tilde{\boldsymbol{x}} \right)$ (introduced previously in
|
||||||
section \ref{subsec:prox:conv_properties} and shown in figure
|
section \ref{subsec:prox:conv_properties} and shown in figure
|
||||||
\ref{fig:prox:convergence_large_n}) and the probability of having a bit
|
\ref{fig:prox:convergence_large_n}) and the probability of having a bit
|
||||||
error are strongly correlated, a relationship depicted in figure
|
error are strongly correlated, a relationship being depicted in figure
|
||||||
\ref{fig:prox:correlation}.
|
\ref{fig:prox:correlation}.
|
||||||
%
|
%
|
||||||
\begin{figure}[h]
|
\begin{figure}[h]
|
||||||
@ -1329,7 +1335,7 @@ In some cases, a gain of up to $\SI{1}{dB}$ or higher can be achieved.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={BER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={BER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
width=0.48\textwidth,
|
width=0.48\textwidth,
|
||||||
height=0.36\textwidth,
|
height=0.36\textwidth,
|
||||||
@ -1360,7 +1366,7 @@ In some cases, a gain of up to $\SI{1}{dB}$ or higher can be achieved.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={FER},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={FER},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
width=0.48\textwidth,
|
width=0.48\textwidth,
|
||||||
height=0.36\textwidth,
|
height=0.36\textwidth,
|
||||||
@ -1392,7 +1398,7 @@ In some cases, a gain of up to $\SI{1}{dB}$ or higher can be achieved.
|
|||||||
\begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
\begin{axis}[
|
\begin{axis}[
|
||||||
grid=both,
|
grid=both,
|
||||||
xlabel={$E_b / N_0$}, ylabel={Decoding Failure Rate},
|
xlabel={$E_b / N_0$ (dB)}, ylabel={Decoding Failure Rate},
|
||||||
ymode=log,
|
ymode=log,
|
||||||
width=0.48\textwidth,
|
width=0.48\textwidth,
|
||||||
height=0.36\textwidth,
|
height=0.36\textwidth,
|
||||||
|
|||||||
@ -4,7 +4,7 @@
|
|||||||
In this chapter, the theoretical background necessary to understand this
|
In this chapter, the theoretical background necessary to understand this
|
||||||
work is given.
|
work is given.
|
||||||
First, the notation used is clarified.
|
First, the notation used is clarified.
|
||||||
The physical aspects are detailed - the used modulation scheme and channel model.
|
The physical layer is detailed - the used modulation scheme and channel model.
|
||||||
A short introduction to channel coding with binary linear codes and especially
|
A short introduction to channel coding with binary linear codes and especially
|
||||||
\ac{LDPC} codes is given.
|
\ac{LDPC} codes is given.
|
||||||
The established methods of decoding LPDC codes are briefly explained.
|
The established methods of decoding LPDC codes are briefly explained.
|
||||||
@ -204,7 +204,7 @@ Each row of $\boldsymbol{H}$, which represents one parity-check, is viewed as a
|
|||||||
Each component of the codeword $\boldsymbol{c}$ is interpreted as a \ac{VN}.
|
Each component of the codeword $\boldsymbol{c}$ is interpreted as a \ac{VN}.
|
||||||
The relationship between \acp{CN} and \acp{VN} can then be plotted by noting
|
The relationship between \acp{CN} and \acp{VN} can then be plotted by noting
|
||||||
which components of $\boldsymbol{c}$ are considered for which parity-check.
|
which components of $\boldsymbol{c}$ are considered for which parity-check.
|
||||||
Figure \ref{fig:theo:tanner_graph} shows the tanner graph for the
|
Figure \ref{fig:theo:tanner_graph} shows the Tanner graph for the
|
||||||
(7,4) Hamming code, which has the following parity-check matrix
|
(7,4) Hamming code, which has the following parity-check matrix
|
||||||
\cite[Example 5.7.]{ryan_lin_2009}:%
|
\cite[Example 5.7.]{ryan_lin_2009}:%
|
||||||
%
|
%
|
||||||
@ -285,15 +285,16 @@ Message passing algorithms are based on the notion of passing messages between
|
|||||||
\acp{CN} and \acp{VN}.
|
\acp{CN} and \acp{VN}.
|
||||||
\Ac{BP} is one such algorithm that is commonly used to decode \ac{LDPC} codes.
|
\Ac{BP} is one such algorithm that is commonly used to decode \ac{LDPC} codes.
|
||||||
It aims to compute the posterior probabilities
|
It aims to compute the posterior probabilities
|
||||||
$p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$
|
$p_{C_i \mid \boldsymbol{Y}}\left(c_i = 1 | \boldsymbol{y} \right),\hspace{2mm} i\in\mathcal{I}$,
|
||||||
\cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate $\hat{\boldsymbol{c}}$.
|
see \cite[Sec. III.]{mackay_rediscovery} and use them to calculate the estimate
|
||||||
|
$\hat{\boldsymbol{c}}$.
|
||||||
For cycle-free graphs this goal is reached after a finite
|
For cycle-free graphs this goal is reached after a finite
|
||||||
number of steps and \ac{BP} is equivalent to \ac{MAP} decoding.
|
number of steps and \ac{BP} is equivalent to \ac{MAP} decoding.
|
||||||
When the graph contains cycles, however, \ac{BP} only approximates the probabilities
|
When the graph contains cycles, however, \ac{BP} only approximates the \ac{MAP} probabilities
|
||||||
and is sub-optimal.
|
and is sub-optimal.
|
||||||
This leads to generally worse performance than \ac{MAP} decoding for practical codes.
|
This leads to generally worse performance than \ac{MAP} decoding for practical codes.
|
||||||
Additionally, an \textit{error floor} appears for very high \acp{SNR}, making
|
Additionally, an \textit{error floor} appears for very high \acp{SNR}, making
|
||||||
the use of \ac{BP} impractical for applications where a very low \ac{BER} is
|
the use of \ac{BP} impractical for applications where a very low error rate is
|
||||||
desired \cite[Sec. 15.3]{ryan_lin_2009}.
|
desired \cite[Sec. 15.3]{ryan_lin_2009}.
|
||||||
Another popular decoding method for \ac{LDPC} codes is the
|
Another popular decoding method for \ac{LDPC} codes is the
|
||||||
\textit{min-sum algorithm}.
|
\textit{min-sum algorithm}.
|
||||||
@ -457,29 +458,38 @@ which minimizes the objective function $g$.
|
|||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{An introduction to the proximal gradient method and ADMM}
|
\section{A Short Introduction to the Proximal Gradient Method and ADMM}
|
||||||
\label{sec:theo:Optimization Methods}
|
\label{sec:theo:Optimization Methods}
|
||||||
|
|
||||||
|
In this section, the general ideas behind the optimization methods used in
|
||||||
|
this work are outlined.
|
||||||
|
The application of these optimization methods to channel decoding decoding
|
||||||
|
will be discussed in later chapters.
|
||||||
|
Two methods are introduced, the \textit{proximal gradient method} and
|
||||||
|
\ac{ADMM}.
|
||||||
|
|
||||||
\textit{Proximal algorithms} are algorithms for solving convex optimization
|
\textit{Proximal algorithms} are algorithms for solving convex optimization
|
||||||
problems, that rely on the use of \textit{proximal operators}.
|
problems that rely on the use of \textit{proximal operators}.
|
||||||
The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$
|
The proximal operator $\textbf{prox}_{\lambda f} : \mathbb{R}^n \rightarrow \mathbb{R}^n$
|
||||||
of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
|
of a function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is defined by
|
||||||
\cite[Sec. 1.1]{proximal_algorithms}%
|
\cite[Sec. 1.1]{proximal_algorithms}%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\textbf{prox}_{\lambda f}\left( \boldsymbol{v} \right) = \argmin_{\boldsymbol{x}} \left(
|
\textbf{prox}_{\lambda f}\left( \boldsymbol{v} \right)
|
||||||
f\left( \boldsymbol{x} \right) + \frac{1}{2\lambda}\lVert \boldsymbol{x}
|
= \argmin_{\boldsymbol{x} \in \mathbb{R}^n} \left(
|
||||||
- \boldsymbol{v} \rVert_2^2 \right)
|
f\left( \boldsymbol{x} \right) + \frac{1}{2\lambda}\lVert \boldsymbol{x}
|
||||||
|
- \boldsymbol{v} \rVert_2^2 \right)
|
||||||
.\end{align*}
|
.\end{align*}
|
||||||
%
|
%
|
||||||
This operator computes a point that is a compromise between minimizing $f$
|
This operator computes a point that is a compromise between minimizing $f$
|
||||||
and staying in the proximity of $\boldsymbol{v}$.
|
and staying in the proximity of $\boldsymbol{v}$.
|
||||||
The parameter $\lambda$ determines how heavily each term is weighed.
|
The parameter $\lambda$ determines how each term is weighed.
|
||||||
The \textit{proximal gradient method} is an iterative optimization method
|
The proximal gradient method is an iterative optimization method
|
||||||
utilizing proximal operators, used to solve problems of the form%
|
utilizing proximal operators, used to solve problems of the form%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\text{minimize}\hspace{5mm}f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right)
|
\underset{\boldsymbol{x} \in \mathbb{R}^n}{\text{minimize}}\hspace{5mm}
|
||||||
|
f\left( \boldsymbol{x} \right) + g\left( \boldsymbol{x} \right)
|
||||||
\end{align*}
|
\end{align*}
|
||||||
%
|
%
|
||||||
that consists of two steps: minimizing $f$ with gradient descent
|
that consists of two steps: minimizing $f$ with gradient descent
|
||||||
@ -492,14 +502,14 @@ and minimizing $g$ using the proximal operator
|
|||||||
,\end{align*}
|
,\end{align*}
|
||||||
%
|
%
|
||||||
Since $g$ is minimized with the proximal operator and is thus not required
|
Since $g$ is minimized with the proximal operator and is thus not required
|
||||||
to be differentiable, it can be used to encode the constraints of the problem
|
to be differentiable, it can be used to encode the constraints of the optimization problem
|
||||||
(e.g., in the form of an \textit{indicator function}, as mentioned in
|
(e.g., in the form of an \textit{indicator function}, as mentioned in
|
||||||
\cite[Sec. 1.2]{proximal_algorithms}).
|
\cite[Sec. 1.2]{proximal_algorithms}).
|
||||||
|
|
||||||
The \ac{ADMM} is another optimization method.
|
\ac{ADMM} is another optimization method.
|
||||||
In this thesis it will be used to solve a \textit{linear program}, which
|
In this thesis it will be used to solve a \textit{linear program}, which
|
||||||
is a special type of convex optimization problem, where the objective function
|
is a special type of convex optimization problem in which the objective function
|
||||||
is linear, and the constraints consist of linear equalities and inequalities.
|
is linear and the constraints consist of linear equalities and inequalities.
|
||||||
Generally, any linear program can be expressed in \textit{standard form}%
|
Generally, any linear program can be expressed in \textit{standard form}%
|
||||||
\footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be
|
\footnote{The inequality $\boldsymbol{x} \ge \boldsymbol{0}$ is to be
|
||||||
interpreted componentwise.}
|
interpreted componentwise.}
|
||||||
@ -507,38 +517,53 @@ interpreted componentwise.}
|
|||||||
%
|
%
|
||||||
\begin{alignat}{3}
|
\begin{alignat}{3}
|
||||||
\begin{alignedat}{3}
|
\begin{alignedat}{3}
|
||||||
\text{minimize }\hspace{2mm} && \boldsymbol{\gamma}^\text{T} \boldsymbol{x} \\
|
\underset{\boldsymbol{x}\in\mathbb{R}^n}{\text{minimize }}\hspace{2mm}
|
||||||
|
&& \boldsymbol{\gamma}^\text{T} \boldsymbol{x} \\
|
||||||
\text{subject to }\hspace{2mm} && \boldsymbol{A}\boldsymbol{x} & = \boldsymbol{b} \\
|
\text{subject to }\hspace{2mm} && \boldsymbol{A}\boldsymbol{x} & = \boldsymbol{b} \\
|
||||||
&& \boldsymbol{x} & \ge \boldsymbol{0}.
|
&& \boldsymbol{x} & \ge \boldsymbol{0},
|
||||||
\end{alignedat}
|
\end{alignedat}
|
||||||
\label{eq:theo:admm_standard}
|
\label{eq:theo:admm_standard}
|
||||||
\end{alignat}%
|
\end{alignat}%
|
||||||
%
|
%
|
||||||
|
where $\boldsymbol{x}, \boldsymbol{\gamma} \in \mathbb{R}^n$, $\boldsymbol{b} \in \mathbb{R}^m$
|
||||||
|
and $\boldsymbol{A}\in\mathbb{R}^{m \times n}$.
|
||||||
A technique called \textit{Lagrangian relaxation} \cite[Sec. 11.4]{intro_to_lin_opt_book}
|
A technique called \textit{Lagrangian relaxation} \cite[Sec. 11.4]{intro_to_lin_opt_book}
|
||||||
can then be applied.
|
can then be applied.
|
||||||
First, some of the constraints are moved into the objective function itself
|
First, some of the constraints are moved into the objective function itself
|
||||||
and weights $\boldsymbol{\lambda}$ are introduced. A new, relaxed problem
|
and weights $\boldsymbol{\lambda}$ are introduced. A new, relaxed problem
|
||||||
is then formulated as
|
is formulated as
|
||||||
%
|
%
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
\text{minimize }\hspace{2mm} & \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
\underset{\boldsymbol{x}\in\mathbb{R}^n}{\text{minimize }}\hspace{2mm}
|
||||||
+ \boldsymbol{\lambda}^\text{T}\left(\boldsymbol{b}
|
& \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
||||||
- \boldsymbol{A}\boldsymbol{x} \right) \\
|
+ \boldsymbol{\lambda}^\text{T}\left(
|
||||||
|
\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\right) \\
|
||||||
\text{subject to }\hspace{2mm} & \boldsymbol{x} \ge \boldsymbol{0},
|
\text{subject to }\hspace{2mm} & \boldsymbol{x} \ge \boldsymbol{0},
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\label{eq:theo:admm_relaxed}
|
\label{eq:theo:admm_relaxed}
|
||||||
\end{align}%
|
\end{align}%
|
||||||
%
|
%
|
||||||
the new objective function being the \textit{Lagrangian}%
|
the new objective function being the \textit{Lagrangian}%
|
||||||
|
\footnote{
|
||||||
|
Depending on what literature is consulted, the definition of the Lagrangian differs
|
||||||
|
in the order of $\boldsymbol{A}\boldsymbol{x}$ and $\boldsymbol{b}$.
|
||||||
|
As will subsequently be seen, however, the only property of the Lagrangian having
|
||||||
|
any bearing on the optimization process is that minimizing it gives a lower bound
|
||||||
|
on the optimal objective of the original problem.
|
||||||
|
This property is satisfied no matter the order of the terms and the order
|
||||||
|
chosen here is the one used in the \ac{LP} decoding literature making use of
|
||||||
|
\ac{ADMM}.
|
||||||
|
}%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
||||||
= \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
= \boldsymbol{\gamma}^\text{T}\boldsymbol{x}
|
||||||
+ \boldsymbol{\lambda}^\text{T}\left(\boldsymbol{b}
|
+ \boldsymbol{\lambda}^\text{T}\left(
|
||||||
- \boldsymbol{A}\boldsymbol{x} \right)
|
\boldsymbol{A}\boldsymbol{x} - \boldsymbol{b}\right)
|
||||||
.\end{align*}%
|
.\end{align*}%
|
||||||
%
|
%
|
||||||
|
|
||||||
This problem is not directly equivalent to the original one, as the
|
This problem is not directly equivalent to the original one, as the
|
||||||
solution now depends on the choice of the \textit{Lagrange multipliers}
|
solution now depends on the choice of the \textit{Lagrange multipliers}
|
||||||
$\boldsymbol{\lambda}$.
|
$\boldsymbol{\lambda}$.
|
||||||
@ -562,12 +587,12 @@ Furthermore, for uniquely solvable linear programs \textit{strong duality}
|
|||||||
always holds \cite[Theorem 4.4]{intro_to_lin_opt_book}.
|
always holds \cite[Theorem 4.4]{intro_to_lin_opt_book}.
|
||||||
This means that not only is it a lower bound, the tightest lower
|
This means that not only is it a lower bound, the tightest lower
|
||||||
bound actually reaches the value itself:
|
bound actually reaches the value itself:
|
||||||
In other words, with the optimal choice of $\boldsymbol{\lambda}$,
|
in other words, with the optimal choice of $\boldsymbol{\lambda}$,
|
||||||
the optimal objectives of the problems (\ref{eq:theo:admm_relaxed})
|
the optimal objectives of the problems (\ref{eq:theo:admm_relaxed})
|
||||||
and (\ref{eq:theo:admm_standard}) have the same value.
|
and (\ref{eq:theo:admm_standard}) have the same value, i.e.,
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\max_{\boldsymbol{\lambda}} \, \min_{\boldsymbol{x} \ge \boldsymbol{0}}
|
\max_{\boldsymbol{\lambda}\in\mathbb{R}^m} \, \min_{\boldsymbol{x} \ge \boldsymbol{0}}
|
||||||
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
||||||
= \min_{\substack{\boldsymbol{x} \ge \boldsymbol{0} \\ \boldsymbol{A}\boldsymbol{x}
|
= \min_{\substack{\boldsymbol{x} \ge \boldsymbol{0} \\ \boldsymbol{A}\boldsymbol{x}
|
||||||
= \boldsymbol{b}}}
|
= \boldsymbol{b}}}
|
||||||
@ -577,7 +602,7 @@ and (\ref{eq:theo:admm_standard}) have the same value.
|
|||||||
Thus, we can define the \textit{dual problem} as the search for the tightest lower bound:%
|
Thus, we can define the \textit{dual problem} as the search for the tightest lower bound:%
|
||||||
%
|
%
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\underset{\boldsymbol{\lambda}}{\text{maximize }}\hspace{2mm}
|
\underset{\boldsymbol{\lambda}\in\mathbb{R}^m}{\text{maximize }}\hspace{2mm}
|
||||||
& \min_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}
|
& \min_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}
|
||||||
\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
\left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
|
||||||
\label{eq:theo:dual}
|
\label{eq:theo:dual}
|
||||||
@ -600,7 +625,7 @@ using equation (\ref{eq:theo:admm_obtain_primal}); then, update $\boldsymbol{\la
|
|||||||
using gradient descent \cite[Sec. 2.1]{distr_opt_book}:%
|
using gradient descent \cite[Sec. 2.1]{distr_opt_book}:%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\boldsymbol{x} &\leftarrow \argmin_{\boldsymbol{x}} \mathcal{L}\left(
|
\boldsymbol{x} &\leftarrow \argmin_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}\left(
|
||||||
\boldsymbol{x}, \boldsymbol{\lambda} \right) \\
|
\boldsymbol{x}, \boldsymbol{\lambda} \right) \\
|
||||||
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
||||||
+ \alpha\left( \boldsymbol{A}\boldsymbol{x} - \boldsymbol{b} \right),
|
+ \alpha\left( \boldsymbol{A}\boldsymbol{x} - \boldsymbol{b} \right),
|
||||||
@ -608,12 +633,12 @@ using gradient descent \cite[Sec. 2.1]{distr_opt_book}:%
|
|||||||
.\end{align*}
|
.\end{align*}
|
||||||
%
|
%
|
||||||
The algorithm can be improved by observing that when the objective function
|
The algorithm can be improved by observing that when the objective function
|
||||||
$g: \mathbb{R}^n \rightarrow \mathbb{R}$ is separable into a number
|
$g: \mathbb{R}^n \rightarrow \mathbb{R}$ is separable into a sum of
|
||||||
$N \in \mathbb{N}$ of sub-functions
|
$N \in \mathbb{N}$ sub-functions
|
||||||
$g_i: \mathbb{R}^{n_i} \rightarrow \mathbb{R}$,
|
$g_i: \mathbb{R}^{n_i} \rightarrow \mathbb{R}$,
|
||||||
i.e., $g\left( \boldsymbol{x} \right) = \sum_{i=1}^{N} g_i
|
i.e., $g\left( \boldsymbol{x} \right) = \sum_{i=1}^{N} g_i
|
||||||
\left( \boldsymbol{x}_i \right)$,
|
\left( \boldsymbol{x}_i \right)$,
|
||||||
where $\boldsymbol{x}_i,\hspace{1mm} i\in [1:N]$ are subvectors of
|
where $\boldsymbol{x}_i\in\mathbb{R}^{n_i},\hspace{1mm} i\in [1:N]$ are subvectors of
|
||||||
$\boldsymbol{x}$, the Lagrangian is as well:
|
$\boldsymbol{x}$, the Lagrangian is as well:
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
@ -624,12 +649,12 @@ $\boldsymbol{x}$, the Lagrangian is as well:
|
|||||||
\begin{align*}
|
\begin{align*}
|
||||||
\mathcal{L}\left( \left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda} \right)
|
\mathcal{L}\left( \left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda} \right)
|
||||||
= \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right)
|
= \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right)
|
||||||
+ \boldsymbol{\lambda}^\text{T} \left( \boldsymbol{b}
|
+ \boldsymbol{\lambda}^\text{T} \left(
|
||||||
- \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} \right)
|
\sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} - \boldsymbol{b}\right)
|
||||||
.\end{align*}%
|
.\end{align*}%
|
||||||
%
|
%
|
||||||
The matrices $\boldsymbol{A}_i, \hspace{1mm} i \in [1:N]$ are partitions of
|
The matrices $\boldsymbol{A}_i \in \mathbb{R}^{m \times n_i}, \hspace{1mm} i \in [1:N]$
|
||||||
the matrix $\boldsymbol{A}$, corresponding to
|
form a partition of $\boldsymbol{A}$, corresponding to
|
||||||
$\boldsymbol{A} = \begin{bmatrix}
|
$\boldsymbol{A} = \begin{bmatrix}
|
||||||
\boldsymbol{A}_1 &
|
\boldsymbol{A}_1 &
|
||||||
\ldots &
|
\ldots &
|
||||||
@ -643,7 +668,7 @@ constant.
|
|||||||
This modified version of dual ascent is called \textit{dual decomposition}:
|
This modified version of dual ascent is called \textit{dual decomposition}:
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i}\mathcal{L}\left(
|
\boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i \ge \boldsymbol{0}}\mathcal{L}\left(
|
||||||
\left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda}\right)
|
\left( \boldsymbol{x}_i \right)_{i=1}^N, \boldsymbol{\lambda}\right)
|
||||||
\hspace{5mm} \forall i \in [1:N]\\
|
\hspace{5mm} \forall i \in [1:N]\\
|
||||||
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
||||||
@ -657,14 +682,15 @@ This modified version of dual ascent is called \textit{dual decomposition}:
|
|||||||
It only differs in the use of an \textit{augmented Lagrangian}
|
It only differs in the use of an \textit{augmented Lagrangian}
|
||||||
$\mathcal{L}_\mu\left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)$
|
$\mathcal{L}_\mu\left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)$
|
||||||
in order to strengthen the convergence properties.
|
in order to strengthen the convergence properties.
|
||||||
The augmented Lagrangian extends the ordinary one with an additional penalty term
|
The augmented Lagrangian extends the classical one with an additional penalty term
|
||||||
with the penaly parameter $\mu$:
|
with the penalty parameter $\mu$:
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\mathcal{L}_\mu \left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)
|
\mathcal{L}_\mu \left( \left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda} \right)
|
||||||
= \underbrace{\sum_{i=1}^{N} g_i\left( \boldsymbol{x_i} \right)
|
= \underbrace{\sum_{i=1}^{N} g_i\left( \boldsymbol{x_i} \right)
|
||||||
+ \boldsymbol{\lambda}^\text{T}\left( \boldsymbol{b}
|
+ \boldsymbol{\lambda}^\text{T}\left(\sum_{i=1}^{N}
|
||||||
- \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i \right)}_{\text{Ordinary Lagrangian}}
|
\boldsymbol{A}_i\boldsymbol{x}_i - \boldsymbol{b}\right)}
|
||||||
|
_{\text{Classical Lagrangian}}
|
||||||
+ \underbrace{\frac{\mu}{2}\left\Vert \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
+ \underbrace{\frac{\mu}{2}\left\Vert \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
||||||
- \boldsymbol{b} \right\Vert_2^2}_{\text{Penalty term}},
|
- \boldsymbol{b} \right\Vert_2^2}_{\text{Penalty term}},
|
||||||
\hspace{5mm} \mu > 0
|
\hspace{5mm} \mu > 0
|
||||||
@ -674,21 +700,20 @@ The steps to solve the problem are the same as with dual decomposition, with the
|
|||||||
condition that the step size be $\mu$:%
|
condition that the step size be $\mu$:%
|
||||||
%
|
%
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i}\mathcal{L}_\mu\left(
|
\boldsymbol{x}_i &\leftarrow \argmin_{\boldsymbol{x}_i \ge \boldsymbol{0}}\mathcal{L}_\mu\left(
|
||||||
\left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda}\right)
|
\left( \boldsymbol{x} \right)_{i=1}^N, \boldsymbol{\lambda}\right)
|
||||||
\hspace{5mm} \forall i \in [1:N]\\
|
\hspace{5mm} \forall i \in [1:N]\\
|
||||||
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
\boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
||||||
+ \mu\left( \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
+ \mu\left( \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
|
||||||
- \boldsymbol{b} \right),
|
- \boldsymbol{b} \right),
|
||||||
\hspace{5mm} \mu > 0
|
\hspace{5mm} \mu > 0
|
||||||
% \boldsymbol{x}_1 &\leftarrow \argmin_{\boldsymbol{x}_1}\mathcal{L}_\mu\left(
|
|
||||||
% \boldsymbol{x}_1, \boldsymbol{x_2}, \boldsymbol{\lambda}\right) \\
|
|
||||||
% \boldsymbol{x}_2 &\leftarrow \argmin_{\boldsymbol{x}_2}\mathcal{L}_\mu\left(
|
|
||||||
% \boldsymbol{x}_1, \boldsymbol{x_2}, \boldsymbol{\lambda}\right) \\
|
|
||||||
% \boldsymbol{\lambda} &\leftarrow \boldsymbol{\lambda}
|
|
||||||
% + \mu\left( \boldsymbol{A}_1\boldsymbol{x}_1 + \boldsymbol{A}_2\boldsymbol{x}_2
|
|
||||||
% - \boldsymbol{b} \right),
|
|
||||||
% \hspace{5mm} \mu > 0
|
|
||||||
.\end{align*}
|
.\end{align*}
|
||||||
%
|
%
|
||||||
|
|
||||||
|
In subsequent chapters, the decoding problem will be reformulated as an
|
||||||
|
optimization problem using two different methodologies.
|
||||||
|
In chapter \ref{chapter:proximal_decoding}, a non-convex optimization approach
|
||||||
|
is chosen and addressed using the proximal gradient method.
|
||||||
|
In chapter \ref{chapter:lp_dec_using_admm}, an \ac{LP} based optimization problem is
|
||||||
|
formulated and solved using \ac{ADMM}.
|
||||||
|
|
||||||
|
|||||||
@ -35,6 +35,7 @@
|
|||||||
\usetikzlibrary{spy}
|
\usetikzlibrary{spy}
|
||||||
\usetikzlibrary{shapes.geometric}
|
\usetikzlibrary{shapes.geometric}
|
||||||
\usetikzlibrary{arrows.meta,arrows}
|
\usetikzlibrary{arrows.meta,arrows}
|
||||||
|
\tikzset{>=latex}
|
||||||
|
|
||||||
\pgfplotsset{compat=newest}
|
\pgfplotsset{compat=newest}
|
||||||
\usepgfplotslibrary{colorbrewer}
|
\usepgfplotslibrary{colorbrewer}
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user