# 1: Discrete linear Kalman filter

Course index:

We already know how to represent a state as a stochastic (noisy) variable by writing down equations when such a noise is Gaussian.  In particular, we have shown that for each state we have to focus on two properties, its mean and its standard deviation via its variance. When we have a collection of states, we stack their means in the vector $\hat x$ and we collect the standard deviations in the square matrix $P$, where the diagonal entries stand for the variances, e.g. $\sigma_{pp} = \sigma_p^2 = \sigma_p \sigma_p$ and the off-diagonal entries stand for the cross variances.

We also know how to describe the evolution of the means and the variances of a system in a simple way, e.g. via discrete events and where the couplings of the states are linear, namely

$\begin{cases}\hat x(k+1) = F\hat x(k) + Gu(k) \\ P(k+1) = FP(k)F^T + G\sigma_{u(k)}^2G^T. \end{cases}\quad (1)$

At this point we are going to simulate system (1) with the introduced $x, F, G$ and $u$ in the previous posts. You can download the Python script from my GitHub.

$\hat x = \begin{bmatrix}0 & 0 & 0\end{bmatrix}^T, \sigma = \begin{bmatrix}1 & 1 & 1\end{bmatrix}^T \implies P = I$,

that means that our guess for the position, velocity and bias is centered at $0$ with a standard deviation of $1$ [m], [m/s] and [m/s²] respectively. We consider that in our initial guess, all the cross correlations are zero. We also consider that our system is at rest, so $a(k) = a(k+1) = 0$ but the accelerometer is affected by Gaussian noise with $\sigma_a = 0.01$ and $\hat a = 0.07$ [m/s²]. Note that the $\hat a$ is equivalent to a bias in the accelerometer, since we are shifting the real measurement $a$ by the mean of the Gaussian distribution $f_a$.

Since we do not know where the position is with total accuracy, and we do not know for the velocity neither, then what (1) does is to evaluate and evolve all the possible combinations of positions and velocities at each time a value of the accelerometer $u(k)$ enters into the system. Therefore, the worst case condition at the starting $k=0$ is to have a position $p = \pm 1$ and velocity $v =\pm 1$, and next time $k = 1$ the dynamics (1) will evaluate again all the possible combinations that for sure will be bigger than at the starting time. Hence, the confidence about the position and velocity decreases, or in other words, the vector $\sigma$, via the matrix $P$, is getting bigger and bigger.

The position, velocity and estimated bias are the first, second and third plots respectively. The forth plot is the signal of the biased and noisy accelerometer. With black vertical bars we represent the actual or true values of the position, velocity and bias. Note that since in (1) we have that $b(k+1) = b(k)$, the estimation of the bias does not change at all. And of course, the confidence about the position and velocity of the system is getting worse and worse.

Now we are ready for presenting the Kalman filtering technique in its simplest version, the Discrete linear Kalman filter. We call it discrete and linear because it deals with discrete and linear models like (1). We will see how to handle other kind of models in later posts. Let us consider that from time to time, our system is able to measure its position, e.g. via GPS, and of course we shall describe it with a function $f_{GPS}$ that will be Gaussian, where we call the mean $\hat p_{GPS}$ and its standard deviation $\sigma_{GPS}$.

At the moment of taking the reading from the GPS, we find ourselves with two Gaussian distributions $f_p$ and $f_{GPS}$ describing the same information. We have already seen in equation (5) from post 0 how to fuse two Gaussian functions! and now we are going to show how to do that in a more general way. In order to fuse two informations, the first step is always to have them with the same coordinates, e.g. same units or you cannot compare apples and oranges. Therefore we need to extract the information $y$ from our system in an appropriated way such that we can compare it with our measurement $y_m$. In our example, the GPS is measuring the position, which is the first state in $x$, so we can write

$\hat y = \begin{bmatrix}1 & 0 & 0\end{bmatrix}\hat x = H_1\hat x, \quad (2)$

but it might happen that we are measuring the velocity of the system with a radar, in such a case we will have that

$\hat y = \begin{bmatrix}0 & 1 & 0\end{bmatrix}\hat x = H_2\hat x, \quad (3)$

or if we measure position and velocity at the same time

$\hat y = \begin{bmatrix}1& 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}\hat x = H_3\hat x, \quad (4)$

or we have a bizarre sensor that measures a linear combination of position and velocity, all in one number

$\hat y = \begin{bmatrix}6& 3 & 0\end{bmatrix}\hat x = H_4\hat x. \quad (5)$

That matrix $H$ is what we call an observation model. Sometimes is very obvious as in (2) or (3), but we will see in more advanced examples, such as determining the attitude of drones, that then we will have observation models like in (5). Note that we have assumed that the units of the position from $\hat p$ and $\hat p_{GPS}$ are the same, otherwise in (2) we should have in $H_1$ the corresponding factor for changing units instead of a one.

Let us rewrite again the result of fusing two scalar informations represented as a Gaussian variables  $f_{y_1}$ and $f_{y_2}$

$\frac{1}{2\pi \sigma_{y_1}^2} e^{-\left(\frac{y-\hat y_1}{\sqrt{2}\sigma_{y_1}}\right)^2}\frac{1}{2\pi \sigma_{y_2}^2} e^{-\left(\frac{y-\hat y_2}{\sqrt{2}\sigma_{y_2}}\right)^2} =\frac{1}{2\pi \sigma_{y_f}^2} e^{-\left(\frac{y-\hat y_f}{\sqrt{2}\sigma_{y_f}}\right)^2}, \quad (6)$

where we can identify that

$\begin{cases} \hat y_f = \frac{\hat y_1\sigma_{y_2}^2 +\hat y_2\sigma_{y_1}^2}{\sigma_{y_1}^2 +\sigma_{y_2}^2} \\ \sigma_{y_f}^2=\frac{\sigma_{y_1}^2 \sigma_{y_2}^2}{\sigma_{y_1}^2+\sigma_{y_2}^2}, \end{cases} \quad (7)$

where $y_f$ and $\sigma_{y_f}$ are scalar numbers. One can identify that we were fusing the mean and the standard deviation from two different scalar sources. Note that we can write (7) in the following way

$\begin{cases} \hat y_f = \hat y_1 + k (\hat y_2 - \hat y_1) \\ \sigma_{y_f} ^2= \sigma_{y_1}^2 - k \sigma_{y_1}^2 \\ k = \frac{\sigma_{y_1}^2}{\sigma_{y_1}^2+\sigma_{y_2}^2}.\end{cases} \quad (8)$

However, equations (1)-(5) involve vectors and matrices, i.e. we are going to present (6)-(8) such that we are fusing vectorial quantities.

Let us take the Gaussians $f_p, f_v, f_b$ and stack them into a vector, i.e. we substitute $\hat y_1$ by $H\hat x$ in (7) or (8). Then we take the corresponding Gaussians from the measurements and we also stack them into another vector $\hat y_m$ and we put it in (7) or (8) in the place of $\hat y_2$. Note that we need to employ $H$ in order to compare $\hat x$ with $\hat y_m$. Now compute all the fusions between these two vectors $H\hat x$ and $\hat y_m$ as we have done for the scalars in (6), i.e. we fuse all the components of the former vector with all the components of the latter vector (very tedious and I omit here the calculations since they are not illustrative but you will see the analogy). Finally we arrive at the compact vectorial/matricial expression for (8), namely

$\begin{cases} \hat x_f = \hat x + K(\hat y_m - H\hat x) \\ P_f = P - KHP \\ K = PH^T(HPH^T + R)^{-1}, \end{cases} \quad (9)$

where we have seen that $\hat y_m$ is the stacked vector of the means of the measurements, $K$ is the famous Kalman gain (yeah it was Kalman the one that perform this computation and publish it!) and $R:= diag(\sigma_{m1}^2, \dots, \sigma_{mj}^2)$ is the covariance matrix given by the standard deviations of the $j$ measurements. For example, in (4) we have that $j = 2$ and the corresponding two standard deviations are from the GPS and radar sensors, and correspondingly $\hat y_m = \begin{bmatrix}\hat p_{GPS} & \hat v_{radar} \end{bmatrix}^T$.

One can perform a quick identification between (8) and (9) by assigning

$\begin{cases}\hat y_f \to\hat Hx_f \\ \sigma_{y_f}^2 \to HP_f \\ \hat y_1 \to H\hat x \\ \hat y_2 \to \hat y_m \\ \sigma_{y_1}^2 \to HP \\ \sigma_{y_2}^2 \to R \end{cases}.$

If you substitute these equivalencies into (8) and write the operations in a matrix form instead of scalars, you will clearly see the similarity between (8) and (9).

After the fusion in (9), what we do is to consider $\hat x_f$ and $P_f$ as the new $\hat x$ and $P$ in (1). So we are ready for the next event iteration.

• The most expensive computation in the Kalman Filter is the matrix inversion in (9). Sometimes it is not advisable to take 892471489 measurments into (7)-(8) at the same time. For example, one can do (2) and (3) in two different updates at different times instead of (4) at the same time. With this simple trick you are distributing the computational effort of the filter over time!
• Sometimes we are dealing with really small numbers for $P$, i.e. high accuracy, this can result in numerical errors because of the floating operations in a computer. One should always check that $P_f$ is positive definite, for example by a quick inspection of the diagonal terms. If they are not positive, then one should discard the fusion. This test can be also a red flag in order to check that you have implemented the filter correctly.
• Never feed your filter with incorrect or false information. For example, if you are updating your system with (4) and you do not have available at this moment the measurement from the GPS, then employ (3) to correctly update the states, but do not put a dummy velocity, e.g. a past value!! you will have worse performance by doing so!
• You do not need to update your system all the time. For example, check the order of magnitude for an acceptable covariance in your system (let us say meters), you do not need all the time to poll that sensor that costs you a lot of in communications or energy (for having femto meters of accuracy) in order to fuse/update your states.
• Kalman filtering sometimes is an overkill solution for your problem. If you are happy by measuring a position five times per second with an accuracy of meters, then just take the measurement of the GPS. You do not need an inertial measurement unit with a Kalman filter. For example, a big vehicle/object that moves slowly. From other point of view, if your mission lasts short enough and your inputs, e.g. accelerometer has a very small bias and standard deviation, then you do not need to fuse/update your states. The former is pretty common in space applications like a re-entry scenario with tactical grade IMUs (intercontinental missiles that do not rely in GPS is an example).
• Because your system works, it does not mean necessarily that your Kalman filter is doing so. Check all the steps as we have described in this guide.

Application example, calibrating an accelerometer

We take (1) with the matrices F and G as in post 0.5 and the model of the accerometer $f_a$ with $\hat a = 0.07$ and $\sigma_a = 0.01$ meters/sec².

$F = \begin{bmatrix}1 & \Delta T & -\frac{\Delta T^2}{2} \\ 0 & 1 & -\Delta T \\ 0 & 0 & 1\end{bmatrix}, \quad G=\begin{bmatrix}\frac{\Delta T^2}{2} & \Delta T & 0\end{bmatrix},$

with $\Delta T = 0.001$ secs. Every 1000 iterations, i.e. each second, we take a measurement of the position, i.e. we use the observation model (2). Since our IMU is at rest on our table, the position does not change, so the measurement will be $\hat y_{GPS} = 0$ and we set a confidence of $\sigma_{y_{GPS}} = 2$ meters. Note that we could put a smaller value for the standard deviation (never zero!), but we keep it high for demonstration purposes.

As before, we set $\hat x = \begin{bmatrix}0 & 0 & 0\end{bmatrix}$ as initial condition for the states in (1), and an initial standard deviation $\sigma = \begin{bmatrix}1 & 1 & 1\end{bmatrix}$.

We note in the following simulation (you can download the Python script at my GitHub) that indeed, we tell the system at every second, hey! you are at position 0 with a confidence of plus/minus 2 meters. Then, the guesses in the position and the velocity correct themselves back, going to the vertical black lines (the actual values). The system is learning with each measurement, the state of the bias of the accelerometer (third plot) is approaching the value of 0.07 with a higher and higher confidence in each update. We also plot in blue the guess of the bias in the accelerometer signal (bottom plot), and we see how it converges to the actual bias of the signal. After almost 15 seconds, we have calibrated the accelerometer, we have identified its bias!