Implemented corrections; Changed lp dec figure text scaling

This commit is contained in:
Andreas Tsouchlos 2023-04-09 18:07:18 +02:00
parent e2267929c2
commit aefb6cbae2
3 changed files with 72 additions and 70 deletions

View File

@ -18,7 +18,7 @@ To solve the resulting linear program, various optimization methods can be
used (see for example \cite{alp}, \cite{interior_point}, used (see for example \cite{alp}, \cite{interior_point},
\cite{efficient_lp_dec_admm}, \cite{pdd}). \cite{efficient_lp_dec_admm}, \cite{pdd}).
They begin by looking at the \ac{ML} decoding problem% Feldman et al. begin by looking at the \ac{ML} decoding problem%
\footnote{They assume that all codewords are equally likely to be transmitted, \footnote{They assume that all codewords are equally likely to be transmitted,
making the \ac{ML} and \ac{MAP} decoding problems equivalent.}% making the \ac{ML} and \ac{MAP} decoding problems equivalent.}%
% %
@ -40,7 +40,7 @@ of the \acp{LLR} $\gamma_i$ \cite[Sec. 2.5]{feldman_thesis}:%
{f_{Y_i | C_i} \left( y_i \mid c_i = 1 \right) } \right) {f_{Y_i | C_i} \left( y_i \mid c_i = 1 \right) } \right)
.\end{align*} .\end{align*}
% %
The authors propose the following cost function% The authors propose using the following cost function%
\footnote{In this context, \textit{cost function} and \textit{objective function} \footnote{In this context, \textit{cost function} and \textit{objective function}
have the same meaning.} have the same meaning.}
for the \ac{LP} decoding problem:% for the \ac{LP} decoding problem:%
@ -51,7 +51,7 @@ for the \ac{LP} decoding problem:%
.\end{align*} .\end{align*}
% %
With this cost function, the exact integer linear program formulation of \ac{ML} With this cost function, the exact integer linear program formulation of \ac{ML}
decoding becomes the following:% decoding becomes%
% %
\begin{align*} \begin{align*}
\text{minimize }\hspace{2mm} & \boldsymbol{\gamma}^\text{T}\boldsymbol{c} \\ \text{minimize }\hspace{2mm} & \boldsymbol{\gamma}^\text{T}\boldsymbol{c} \\
@ -65,7 +65,7 @@ As solving integer linear programs is generally NP-hard, this decoding problem
has to be approximated by a problem with looser constraints. has to be approximated by a problem with looser constraints.
A technique called \textit{relaxation} is applied: A technique called \textit{relaxation} is applied:
relaxing the constraints, thereby broadening the considered domain relaxing the constraints, thereby broadening the considered domain
(e.g. by lifting the integer requirement). (e.g., by lifting the integer requirement).
First, the authors present an equivalent \ac{LP} formulation of exact \ac{ML} First, the authors present an equivalent \ac{LP} formulation of exact \ac{ML}
decoding, redefining the constraints in terms of the \text{codeword polytope} decoding, redefining the constraints in terms of the \text{codeword polytope}
% %
@ -82,10 +82,10 @@ This corresponds to simply lifting the integer requirement.
However, since the number of constraints needed to characterize the codeword However, since the number of constraints needed to characterize the codeword
polytope is exponential in the code length, this formulation is relaxed further. polytope is exponential in the code length, this formulation is relaxed further.
By observing that each check node defines its own local single parity-check By observing that each check node defines its own local single parity-check
code, and thus its own \textit{local codeword polytope}, code, and, thus, its own \textit{local codeword polytope},
the \textit{relaxed codeword polytope} $\overline{Q}$ is defined as the intersection of all the \textit{relaxed codeword polytope} $\overline{Q}$ is defined as the intersection of all
local codeword polytopes. local codeword polytopes.
This consideration leads to constraints, that can be described as follows This consideration leads to constraints that can be described as follows
\cite[Sec. II, A]{efficient_lp_dec_admm}:% \cite[Sec. II, A]{efficient_lp_dec_admm}:%
% %
\begin{align*} \begin{align*}
@ -93,10 +93,10 @@ This consideration leads to constraints, that can be described as follows
\hspace{5mm}\forall j\in \mathcal{J} \hspace{5mm}\forall j\in \mathcal{J}
,\end{align*}% ,\end{align*}%
% %
where $\mathcal{P}_{d_j}$ is the \textit{check polytope}, the convex hull of all where $\mathcal{P}_{d_j}$ is the \textit{check polytope}, i.e., the convex hull of all
binary vectors of length $d_j$ with even parity% binary vectors of length $d_j$ with even parity%
\footnote{Essentially $\mathcal{P}_{d_j}$ is the set of vectors that satisfy \footnote{Essentially $\mathcal{P}_{d_j}$ is the set of vectors that satisfy
parity-check $j$, but extended to the continuous domain.}% parity-check $j$, but extended to the continuous domain.},
and $\boldsymbol{T}_j$ is the \textit{transfer matrix}, which selects the and $\boldsymbol{T}_j$ is the \textit{transfer matrix}, which selects the
neighboring variable nodes neighboring variable nodes
of check node $j$ (i.e., the relevant components of $\tilde{\boldsymbol{c}}$ of check node $j$ (i.e., the relevant components of $\tilde{\boldsymbol{c}}$
@ -139,7 +139,7 @@ and has only two possible codewords:
.\end{align*} .\end{align*}
% %
Figure \ref{fig:lp:poly:exact_ilp} shows the domain of exact \ac{ML} decoding. Figure \ref{fig:lp:poly:exact_ilp} shows the domain of exact \ac{ML} decoding.
The first relaxation, onto the codeword polytope $\text{poly}\left( \mathcal{C} \right) $, The first relaxation onto the codeword polytope $\text{poly}\left( \mathcal{C} \right) $
is shown in figure \ref{fig:lp:poly:exact}; is shown in figure \ref{fig:lp:poly:exact};
this expresses the constraints for the equivalent linear program to exact \ac{ML} decoding. this expresses the constraints for the equivalent linear program to exact \ac{ML} decoding.
$\text{poly}\left( \mathcal{C} \right) $ is further relaxed onto the relaxed codeword polytope $\text{poly}\left( \mathcal{C} \right) $ is further relaxed onto the relaxed codeword polytope
@ -169,7 +169,7 @@ local codeword polytopes of each check node.
draw, circle, inner sep=0pt, minimum size=4pt] draw, circle, inner sep=0pt, minimum size=4pt]
\tdplotsetmaincoords{60}{25} \tdplotsetmaincoords{60}{25}
\begin{tikzpicture}[scale=0.9, transform shape, tdplot_main_coords] \begin{tikzpicture}[scale=0.9, tdplot_main_coords]
% Cube % Cube
\coordinate (p000) at (0, 0, 0); \coordinate (p000) at (0, 0, 0);
@ -226,7 +226,7 @@ local codeword polytopes of each check node.
draw, circle, inner sep=0pt, minimum size=4pt] draw, circle, inner sep=0pt, minimum size=4pt]
\tdplotsetmaincoords{60}{25} \tdplotsetmaincoords{60}{25}
\begin{tikzpicture}[scale=0.9, transform shape, tdplot_main_coords] \begin{tikzpicture}[scale=0.9, tdplot_main_coords]
% Cube % Cube
\coordinate (p000) at (0, 0, 0); \coordinate (p000) at (0, 0, 0);
@ -290,7 +290,7 @@ local codeword polytopes of each check node.
draw, circle, inner sep=0pt, minimum size=4pt] draw, circle, inner sep=0pt, minimum size=4pt]
\tdplotsetmaincoords{60}{25} \tdplotsetmaincoords{60}{25}
\begin{tikzpicture}[scale=0.9, transform shape, tdplot_main_coords] \begin{tikzpicture}[scale=0.9, tdplot_main_coords]
% Cube % Cube
\coordinate (p000) at (0, 0, 0); \coordinate (p000) at (0, 0, 0);
@ -342,7 +342,7 @@ local codeword polytopes of each check node.
% Polytope Annotations % Polytope Annotations
\node[color=KITblue, below=0cm of c000] {$\left( 0, 0, 0 \right) $}; \node[color=KITblue, below=0cm of c000] {$\left( 0, 0, 0 \right) $};
\node[color=KITblue, right=0.17cm of c101] {$\left( 1, 0, 1 \right) $}; \node[color=KITblue, right=0.07cm of c101] {$\left( 1, 0, 1 \right) $};
\node[color=KITblue, right=0cm of c110] {$\left( 1, 1, 0 \right) $}; \node[color=KITblue, right=0cm of c110] {$\left( 1, 1, 0 \right) $};
\node[color=KITblue, above=0cm of c011] {$\left( 0, 1, 1 \right) $}; \node[color=KITblue, above=0cm of c011] {$\left( 0, 1, 1 \right) $};
\end{tikzpicture} \end{tikzpicture}
@ -354,7 +354,7 @@ local codeword polytopes of each check node.
draw, circle, inner sep=0pt, minimum size=4pt] draw, circle, inner sep=0pt, minimum size=4pt]
\tdplotsetmaincoords{60}{25} \tdplotsetmaincoords{60}{25}
\begin{tikzpicture}[scale=0.9, transform shape, tdplot_main_coords] \begin{tikzpicture}[scale=0.9, tdplot_main_coords]
% Cube % Cube
\coordinate (p000) at (0, 0, 0); \coordinate (p000) at (0, 0, 0);
@ -438,7 +438,7 @@ local codeword polytopes of each check node.
draw, circle, inner sep=0pt, minimum size=4pt] draw, circle, inner sep=0pt, minimum size=4pt]
\tdplotsetmaincoords{60}{25} \tdplotsetmaincoords{60}{25}
\begin{tikzpicture}[scale=0.9, transform shape, tdplot_main_coords] \begin{tikzpicture}[scale=0.9, tdplot_main_coords]
% Cube % Cube
\coordinate (p000) at (0, 0, 0); \coordinate (p000) at (0, 0, 0);
@ -483,7 +483,7 @@ local codeword polytopes of each check node.
\node[color=KITblue, below=0cm of c000] {$\left( 0, 0, 0 \right) $}; \node[color=KITblue, below=0cm of c000] {$\left( 0, 0, 0 \right) $};
\node[color=KITblue, above=0cm of c011] {$\left( 0, 1, 1 \right) $}; \node[color=KITblue, above=0cm of c011] {$\left( 0, 1, 1 \right) $};
\node[color=KITred, right=0.03cm of cpseudo] \node[color=KITred, right=0cm of cpseudo]
{$\left( 1, \frac{1}{2}, \frac{1}{2} \right) $}; {$\left( 1, \frac{1}{2}, \frac{1}{2} \right) $};
\end{tikzpicture} \end{tikzpicture}
@ -607,7 +607,7 @@ The steps to solve the dual problem then become:
\hspace{3mm} &&\forall j\in\mathcal{J} \hspace{3mm} &&\forall j\in\mathcal{J}
.\end{alignat*} .\end{alignat*}
% %
Luckily, the additional constaints only affect the $\boldsymbol{z}_j$-update steps. Luckily, the additional constraints only affect the $\boldsymbol{z}_j$-update steps.
Furthermore, the $\boldsymbol{z}_j$-update steps can be shown to be equivalent to projections Furthermore, the $\boldsymbol{z}_j$-update steps can be shown to be equivalent to projections
onto the check polytopes $\mathcal{P}_{d_j}$ onto the check polytopes $\mathcal{P}_{d_j}$
and the $\tilde{\boldsymbol{c}}$-update can be computed analytically% and the $\tilde{\boldsymbol{c}}$-update can be computed analytically%
@ -658,22 +658,19 @@ $\boldsymbol{\lambda}_j = \mu \cdot \boldsymbol{u}_j \,\forall\,j\in\mathcal{J}$
.\end{alignat*} .\end{alignat*}
% %
The reason \ac{ADMM} is able to perform so well is due to the relocation of the constraints The reason \ac{ADMM} is able to perform so well is due to the relocation of the constraints
$\boldsymbol{T}_j\tilde{\boldsymbol{c}}_j\in\mathcal{P}_{d_j}\,\forall\, j\in\mathcal{J}$ $\boldsymbol{T}_j\tilde{\boldsymbol{c}}_j\in\mathcal{P}_{d_j}\,\forall\, j\in\mathcal{J}$
into the objective function itself. into the objective function itself.
The minimization of the new objective function can then take place simultaneously The minimization of the new objective function can then take place simultaneously
with respect to all $\boldsymbol{z}_j, j\in\mathcal{J}$. with respect to all $\boldsymbol{z}_j, j\in\mathcal{J}$.
Effectively, all of the $\left|\mathcal{J}\right|$ parity constraints are Effectively, all of the $\left|\mathcal{J}\right|$ parity constraints can be
able to be handled at the same time. handled at the same time.
This can also be understood by interpreting the decoding process as a message-passing This can also be understood by interpreting the decoding process as a message-passing
algorithm \cite[Sec. III. D.]{original_admm}, \cite[Sec. II. B.]{efficient_lp_dec_admm}, algorithm \cite[Sec. III. D.]{original_admm}, \cite[Sec. II. B.]{efficient_lp_dec_admm},
as is shown in figure \ref{fig:lp:message_passing}.% depicted in algorithm \ref{alg:admm}.
%
\begin{figure}[H]
\centering
\begin{genericAlgorithm}[caption={}, label={}, \begin{genericAlgorithm}[caption={\ac{LP} decoding using \ac{ADMM} interpreted
as a message passing algorithm\protect\footnotemark{}}, label={alg:admm},
basicstyle=\fontsize{11}{16}\selectfont basicstyle=\fontsize{11}{16}\selectfont
] ]
Initialize $\tilde{\boldsymbol{c}}, \boldsymbol{z}_{[1:m]}$ and $\boldsymbol{u}_{[1:m]}$ Initialize $\tilde{\boldsymbol{c}}, \boldsymbol{z}_{[1:m]}$ and $\boldsymbol{u}_{[1:m]}$
@ -694,11 +691,6 @@ while $\sum_{j\in\mathcal{J}} \lVert \boldsymbol{T}_j\tilde{\boldsymbol{c}} - \b
end for end for
end while end while
\end{genericAlgorithm} \end{genericAlgorithm}
\caption{\ac{LP} decoding using \ac{ADMM} interpreted as a message passing algorithm%
\protect\footnotemark{}}
\label{fig:lp:message_passing}
\end{figure}%
% %
\footnotetext{$\epsilon_{\text{pri}} > 0$ and $\epsilon_{\text{dual}} > 0$ \footnotetext{$\epsilon_{\text{pri}} > 0$ and $\epsilon_{\text{dual}} > 0$
are additional parameters are additional parameters

View File

@ -13,7 +13,7 @@ Finally, an improvement on proximal decoding is proposed.
\section{Decoding Algorithm}% \section{Decoding Algorithm}%
\label{sec:prox:Decoding Algorithm} \label{sec:prox:Decoding Algorithm}
Proximal decoding was proposed by Wadayama et. al as a novel formulation of Proximal decoding was proposed by Wadayama et al. as a novel formulation of
optimization-based decoding \cite{proximal_paper}. optimization-based decoding \cite{proximal_paper}.
With this algorithm, minimization is performed using the proximal gradient With this algorithm, minimization is performed using the proximal gradient
method. method.
@ -83,7 +83,7 @@ The prior \ac{PDF} is then approximated using the code-constraint polynomial as:
\label{eq:prox:prior_pdf_approx} \label{eq:prox:prior_pdf_approx}
.\end{align}% .\end{align}%
% %
The authors justify this approximation by arguing, that for The authors justify this approximation by arguing that for
$\gamma \rightarrow \infty$, the approximation in equation $\gamma \rightarrow \infty$, the approximation in equation
(\ref{eq:prox:prior_pdf_approx}) approaches the original function in equation (\ref{eq:prox:prior_pdf_approx}) approaches the original function in equation
(\ref{eq:prox:prior_pdf}). (\ref{eq:prox:prior_pdf}).
@ -97,10 +97,9 @@ $L \left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) = -\ln\left(
\hat{\boldsymbol{x}} &= \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}} \hat{\boldsymbol{x}} &= \argmax_{\tilde{\boldsymbol{x}} \in \mathbb{R}^{n}}
\mathrm{e}^{- L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) } \mathrm{e}^{- L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) }
\mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) } \\ \mathrm{e}^{-\gamma h\left( \tilde{\boldsymbol{x}} \right) } \\
&= \argmin_{\tilde{\boldsymbol{x}} \in \mathbb{R}^n} \big( &= \argmin_{\tilde{\boldsymbol{x}} \in \mathbb{R}^n}
L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right) L\left( \boldsymbol{y} \mid \tilde{\boldsymbol{x}} \right)
+ \gamma h\left( \tilde{\boldsymbol{x}} \right) + \gamma h\left( \tilde{\boldsymbol{x}} \right)%
\big)%
.\end{align*}% .\end{align*}%
% %
Thus, with proximal decoding, the objective function Thus, with proximal decoding, the objective function
@ -148,13 +147,13 @@ It is then immediately approximated with gradient-descent:%
\begin{align*} \begin{align*}
\textbf{prox}_{\gamma h} \left( \tilde{\boldsymbol{x}} \right) &\equiv \textbf{prox}_{\gamma h} \left( \tilde{\boldsymbol{x}} \right) &\equiv
\argmin_{\boldsymbol{t} \in \mathbb{R}^n} \argmin_{\boldsymbol{t} \in \mathbb{R}^n}
\left( \gamma h\left( \boldsymbol{t} \right) + \gamma h\left( \boldsymbol{t} \right) +
\frac{1}{2} \lVert \boldsymbol{t} - \tilde{\boldsymbol{x}} \rVert \right)\\ \frac{1}{2} \left\Vert \boldsymbol{t} - \tilde{\boldsymbol{x}} \right\Vert \\
&\approx \tilde{\boldsymbol{x}} - \gamma \nabla h \left( \tilde{\boldsymbol{x}} \right), &\approx \tilde{\boldsymbol{x}} - \gamma \nabla h \left( \tilde{\boldsymbol{x}} \right),
\hspace{5mm} \gamma > 0, \text{ small} \hspace{5mm} \gamma > 0, \text{ small}
.\end{align*}% .\end{align*}%
% %
The second step thus becomes% The second optimization step thus becomes%
% %
\begin{align*} \begin{align*}
\boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right), \boldsymbol{s} \leftarrow \boldsymbol{r} - \gamma \nabla h\left( \boldsymbol{r} \right),
@ -228,13 +227,11 @@ where $\eta$ is a positive constant slightly larger than one:%
$\Pi_{\eta}\left( \cdot \right) $ expressing the projection onto $\Pi_{\eta}\left( \cdot \right) $ expressing the projection onto
$\left[ -\eta, \eta \right]^n$. $\left[ -\eta, \eta \right]^n$.
The iterative decoding process resulting from these considerations is shown in The iterative decoding process resulting from these considerations is
figure \ref{fig:prox:alg}. summarized in algorithm \ref{alg:prox}.
\begin{figure}[H] \begin{genericAlgorithm}[caption={Proximal decoding algorithm for an \ac{AWGN} channel},
\centering label={alg:prox}]
\begin{genericAlgorithm}[caption={}, label={}]
$\boldsymbol{s} \leftarrow \boldsymbol{0}$ $\boldsymbol{s} \leftarrow \boldsymbol{0}$
for $K$ iterations do for $K$ iterations do
$\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $ $\boldsymbol{r} \leftarrow \boldsymbol{s} - \omega \left( \boldsymbol{s} - \boldsymbol{y} \right) $
@ -245,12 +242,7 @@ for $K$ iterations do
end if end if
end for end for
return $\boldsymbol{\hat{c}}$ return $\boldsymbol{\hat{c}}$
\end{genericAlgorithm} \end{genericAlgorithm}
\caption{Proximal decoding algorithm for an \ac{AWGN} channel}
\label{fig:prox:alg}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -425,8 +417,7 @@ while the newly generated ones are shown with dashed lines.
($\gamma = 0.05$) the decoding performance is better than for low ($\gamma = 0.05$) the decoding performance is better than for low
($\gamma = 0.01$) or high ($\gamma = 0.15$) values. ($\gamma = 0.01$) or high ($\gamma = 0.15$) values.
The question arises if there is some optimal value maximazing the decoding The question arises if there is some optimal value maximazing the decoding
performance, especially since the decoding performance seems to dramatically performance, especially since it seems to dramatically depend on $\gamma$.
depend on $\gamma$.
To better understand how $\gamma$ and the decoding performance are To better understand how $\gamma$ and the decoding performance are
related, figure \ref{fig:prox:results} was recreated, but with a considerably related, figure \ref{fig:prox:results} was recreated, but with a considerably
larger selection of values for $\gamma$. larger selection of values for $\gamma$.
@ -814,22 +805,23 @@ Summarizing the above considerations, \ldots
\end{axis} \end{axis}
\end{tikzpicture} \end{tikzpicture}
\caption{Cmoparison\protect\footnotemark{} of \ac{FER}, \ac{BER} and \caption{Comparison of \ac{FER}, \ac{BER} and decoding failure rate\protect\footnotemark{}}
decoding failure rate; $\omega = 0.05, K=100$}
\label{fig:prox:ber_fer_dfr} \label{fig:prox:ber_fer_dfr}
\end{figure}% \end{figure}%
% %
\footnotetext{(3,6) regular LDPC code with n = 204, k = 102 \cite[\text{204.33.484}]{mackay_enc}}% \footnotetext{(3,6) regular LDPC code with n = 204, k = 102
\cite[\text{204.33.484}]{mackay_enc}; $\omega = 0.05, K=100, \eta=1.5$
}%
% %
Until now, only the \ac{BER} has been considered to assess the decoding Until now, only the \ac{BER} has been considered to gauge the decoding
performance. performance.
The \ac{FER}, however, shows considerably worse behaviour, as can be seen in The \ac{FER}, however, shows considerably worse behaviour, as can be seen in
figure \ref{fig:prox:ber_fer_dfr}. figure \ref{fig:prox:ber_fer_dfr}.
Besides the \ac{BER} and \ac{FER} curves, the figure also shows the Besides the \ac{BER} and \ac{FER} curves, the figure also shows the
\textit{decoding failure rate}. \textit{decoding failure rate}.
This is the rate at which the iterative process produces invalid codewords, This is the rate at which the iterative process produces invalid codewords,
i.e., the stopping criterion (line 6 of algorithm \ref{TODO}) is never i.e., the stopping criterion (line 6 of algorithm \ref{alg:prox}) is never
satisfied and the maximum number of itertations $K$ is reached without satisfied and the maximum number of itertations $K$ is reached without
converging to a valid codeword. converging to a valid codeword.
Three lines are plotted in each case, corresponding to different values of Three lines are plotted in each case, corresponding to different values of

View File

@ -316,10 +316,10 @@ $g : \mathbb{R}^n \rightarrow \mathbb{R} $ must be minimized under certain const
,\end{align*}% ,\end{align*}%
% %
where $D \subseteq \mathbb{R}^n$ is the domain of values attainable for $\tilde{\boldsymbol{c}}$ where $D \subseteq \mathbb{R}^n$ is the domain of values attainable for $\tilde{\boldsymbol{c}}$
and represents the constraints. and represents the constraints under which the minimization is to take place.
In contrast to the established message-passing decoding algorithms, In contrast to the established message-passing decoding algorithms,
the prespective then changes from observing the decoding process in its the perspective then changes from observing the decoding process in its
Tanner graph representation with \acp{VN} and \acp{CN} (as shown in figure \ref{fig:dec:tanner}) Tanner graph representation with \acp{VN} and \acp{CN} (as shown in figure \ref{fig:dec:tanner})
to a spatial representation (figure \ref{fig:dec:spatial}), to a spatial representation (figure \ref{fig:dec:spatial}),
where the codewords are some of the edges of a hypercube. where the codewords are some of the edges of a hypercube.
@ -495,8 +495,8 @@ interpreted componentwise.}
A technique called \textit{lagrangian relaxation} \cite[Sec. 11.4]{intro_to_lin_opt_book} A technique called \textit{lagrangian relaxation} \cite[Sec. 11.4]{intro_to_lin_opt_book}
can then be applied. can then be applied.
First, some of the constraints are moved into the objective function itself First, some of the constraints are moved into the objective function itself
and the weights $\boldsymbol{\lambda}$ are introduced. A new, relaxed problem and weights $\boldsymbol{\lambda}$ are introduced. A new, relaxed problem
is formulated: is then formulated as
% %
\begin{align} \begin{align}
\begin{aligned} \begin{aligned}
@ -555,7 +555,8 @@ and (\ref{eq:theo:admm_standard}) have the same value.
Thus, we can define the \textit{dual problem} as the search for the tightest lower bound:% Thus, we can define the \textit{dual problem} as the search for the tightest lower bound:%
% %
\begin{align} \begin{align}
\text{maximize }\hspace{2mm} & \min_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L} \underset{\boldsymbol{\lambda}}{\text{maximize }}\hspace{2mm}
& \min_{\boldsymbol{x} \ge \boldsymbol{0}} \mathcal{L}
\left( \boldsymbol{x}, \boldsymbol{\lambda} \right) \left( \boldsymbol{x}, \boldsymbol{\lambda} \right)
\label{eq:theo:dual} \label{eq:theo:dual}
,\end{align} ,\end{align}
@ -565,7 +566,7 @@ from the solution $\boldsymbol{\lambda}_\text{opt}$ to problem (\ref{eq:theo:dua
by computing \cite[Sec. 2.1]{admm_distr_stats}% by computing \cite[Sec. 2.1]{admm_distr_stats}%
% %
\begin{align} \begin{align}
\boldsymbol{x}_{\text{opt}} = \argmin_{\boldsymbol{x}} \boldsymbol{x}_{\text{opt}} = \argmin_{\boldsymbol{x} \ge \boldsymbol{0}}
\mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda}_{\text{opt}} \right) \mathcal{L}\left( \boldsymbol{x}, \boldsymbol{\lambda}_{\text{opt}} \right)
\label{eq:theo:admm_obtain_primal} \label{eq:theo:admm_obtain_primal}
.\end{align} .\end{align}
@ -584,7 +585,14 @@ using gradient descent \cite[Sec. 2.1]{admm_distr_stats}:%
\hspace{5mm} \alpha > 0 \hspace{5mm} \alpha > 0
.\end{align*} .\end{align*}
% %
The algorithm can be improved by observing that when the objective function is separable in $\boldsymbol{x}$, the lagrangian is as well: The algorithm can be improved by observing that when the objective function
$g: \mathbb{R}^n \rightarrow \mathbb{R}$ is separable into a number
$N \in \mathbb{N}$ of sub-functions
$g_i: \mathbb{R}^{n_i} \rightarrow \mathbb{R}$,
i.e., $g\left( \boldsymbol{x} \right) = \sum_{i=1}^{N} g_i
\left( \boldsymbol{x}_i \right)$,
where $\boldsymbol{x}_i,\hspace{1mm} i\in [1:N]$ are subvectors of
$\boldsymbol{x}$, the lagrangian is as well:
% %
\begin{align*} \begin{align*}
\text{minimize }\hspace{5mm} & \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right) \\ \text{minimize }\hspace{5mm} & \sum_{i=1}^{N} g_i\left( \boldsymbol{x}_i \right) \\
@ -598,8 +606,18 @@ The algorithm can be improved by observing that when the objective function is s
- \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} \right) - \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x_i} \right)
.\end{align*}% .\end{align*}%
% %
The minimization of each term can then happen in parallel, in a distributed fasion The matrices $\boldsymbol{A}_i, \hspace{1mm} i \in [1:N]$ are partitions of
\cite[Sec. 2.2]{admm_distr_stats}. the matrix $\boldsymbol{A}$, corresponding to
$\boldsymbol{A} = \begin{bmatrix}
\boldsymbol{A}_1 &
\ldots &
\boldsymbol{A}_N
\end{bmatrix}$.
The minimization of each term can then happen in parallel, in a distributed
fashion \cite[Sec. 2.2]{admm_distr_stats}.
In each minimization step, only one subvector $\boldsymbol{x}_i$ of
$\boldsymbol{x}$ is considered, regarding all other subvectors as being
constant.
This modified version of dual ascent is called \textit{dual decomposition}: This modified version of dual ascent is called \textit{dual decomposition}:
% %
\begin{align*} \begin{align*}
@ -616,7 +634,7 @@ This modified version of dual ascent is called \textit{dual decomposition}:
The \ac{ADMM} works the same way as dual decomposition. The \ac{ADMM} works the same way as dual decomposition.
It only differs in the use of an \textit{augmented lagrangian} It only differs in the use of an \textit{augmented lagrangian}
$\mathcal{L}_\mu\left( \boldsymbol{x}_{[1:N]}, \boldsymbol{\lambda} \right)$ $\mathcal{L}_\mu\left( \boldsymbol{x}_{[1:N]}, \boldsymbol{\lambda} \right)$
in order to robustify the convergence properties. in order to strengthen the convergence properties.
The augmented lagrangian extends the ordinary one with an additional penalty term The augmented lagrangian extends the ordinary one with an additional penalty term
with the penaly parameter $\mu$: with the penaly parameter $\mu$:
% %
@ -625,8 +643,8 @@ with the penaly parameter $\mu$:
= \underbrace{\sum_{i=1}^{N} g_i\left( \boldsymbol{x_i} \right) = \underbrace{\sum_{i=1}^{N} g_i\left( \boldsymbol{x_i} \right)
+ \boldsymbol{\lambda}^\text{T}\left( \boldsymbol{b} + \boldsymbol{\lambda}^\text{T}\left( \boldsymbol{b}
- \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i \right)}_{\text{Ordinary lagrangian}} - \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i \right)}_{\text{Ordinary lagrangian}}
+ \underbrace{\frac{\mu}{2}\lVert \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i + \underbrace{\frac{\mu}{2}\left\Vert \sum_{i=1}^{N} \boldsymbol{A}_i\boldsymbol{x}_i
- \boldsymbol{b} \rVert_2^2}_{\text{Penalty term}}, - \boldsymbol{b} \right\Vert_2^2}_{\text{Penalty term}},
\hspace{5mm} \mu > 0 \hspace{5mm} \mu > 0
.\end{align*} .\end{align*}
% %