Marginalization

<aside> ⚠️ The Problem of Growing Data: As the robot explores more, it gathers more data. This includes its own positions at different times and the positions of landmarks. Keeping track of all this data requires a lot of memory and computational power. Over time, as the robot keeps exploring, this becomes increasingly difficult to manage.

</aside>

Definition :

Marginalization in SLAM is a way to manage this growing data problem. It involves simplifying the map and the history of the robot’s locations by removing some of the older or less important information. Think of it as summarizing a long book into a short paragraph that captures the main points. The goal is to reduce the amount of data the robot needs to handle.

<aside> 💡 The information form and the squared information form, where marginalization can be done using matrix factorization, the case being considered is that of a linear inference case where eliminating a graph corresponds to matrix factorization and marginalization corresponds to dropping rows and columns as discussed below.

</aside>

Information Form [ Linear Inference ]

$$ \begin{aligned} p(\mathbf{x}, \mathbf{y}) &= \mathcal{N}(\Lambda^{-1} \begin{bmatrix} \eta_{x} \\ \eta_{y} \end{bmatrix} , \begin{bmatrix} \Lambda_{xx} & \Lambda_{xy} \\ \Lambda_{xy}^T & \Lambda_{yy} \end{bmatrix}^{-1}) \\ &\text{where } \mathbf{x}, \mathbf{y} \text{ are vectors representing random variables,} \\ &\mathcal{N} \text{ indicates the Gaussian distribution,} \\ &\Lambda \text{ is the information matrix which is the inverse of the covariance matrix,} \\ &\Lambda^{-1} \text{ is the precision matrix, which gives the covariance matrix,} \\ &\eta \text{ is the information vector, equivalent to the product of the precision matrix and the mean vector,} \\ &\text{and } \Lambda_{xx}, \Lambda_{xy}, \Lambda_{yx}, \Lambda_{yy} \text{ are sub-matrices of } \Lambda \text{ representing the relationships between different sets of variables.} \end{aligned} $$

Marginalization in information form [ Linear Inference ]

In the information form, the information matrix for y after marginalization is given by $\text{Schur's complement} = \Lambda / \Lambda_{xx}$

$$ \Lambda / \Lambda_{xx} = \Lambda_{yy} - \Lambda_{yx} \Lambda_{xx}^{-1} \Lambda_{xy}

$$ \Sigma_{marginal} = (\Lambda / \Lambda_{xx})^{-1} $$

Square root information form [ Linear Inference ]

$$ \begin{aligned} p(\mathbf{x}, \mathbf{y}) &= \mathcal{N}(R^{-1}\mathbf{d}, R^{-1}R^{-T}), \\ R &= \begin{bmatrix} R_{xx} & S_{xy} \\ 0 & R_{yy} \end{bmatrix}, \quad \mathbf{d} = \begin{bmatrix} d_{x} \\ d_{y} \end{bmatrix}. \end{aligned}

<aside> ➕ Legend

"p(x, y)" denotes the joint probability density function.
"N" denotes a Gaussian distribution.
"R" is a matrix obtained from the factorization of the covariance or information matrix.
"d" is a vector that represents a transformed version of the mean vector.
"R_xx", "S_xy", and "R_yy" are components of the matrix "R" after factorization.
"d_x" and "d_y" are elements of the vector "d".
"R^{-1}" denotes the inverse of "R".
"R^{-T}" denotes the inverse of the transpose of "R".

</aside>

<aside> 💡 If we decompose information matrix using cholesky into $R^T*R$ and substitute in the information form you get this form.

This formulation is numerically more stable.

</aside>

Marginalization in square root information form [ Linear Inference ]

Marginalization is as easy as the covariance form. The first n_x rows and n_y cols are eliminated yielding $R_{yy}$ as the new information matrix, along with squared information vector of $d_y$

Non-Linear Inference.

Bayes Tree

A Bayes tree is a binary tree where each node represents a clique (fully connected subgraph) of the underlying Bayes net.
The cliques are arranged hierarchically, with larger cliques at the top and smaller cliques towards the leaves.