After a healthy holiday I came to the conclusion a low pass filter isn’t the way to go. Low pass filters are good for removing high frequency noise. But the errors the ultrasonic sensor give me aren’t like that. The errors it gives me are single misreadings occurring at random intervals (far apart). These misreadings give very high values most of the time. I’ll call these errors spikes as they look like spikes when you plot US readings graphically.

In a low pass filter these spikes are treated as valid values where I just want to ignore them. Also, in a low pass filter a spike affects the filter for some time after it occurred. This is also an unwanted effect of a low pass filter in my case.

There is a statistic called median that is not very sensitive to exceptional values. It is more or less like the mean value of a couple of readings but is calculated in a different way. The mean value is defined as the sum of all readings divided by the number of readings. The median is defined as the reading that separate the higher half from the lower half of all readings. Let me give an example. Suppose I got 5 readings from the US sensor, 100, 98, 255, 94, 92. The value of 255 is a misreading, the actual distance was 96 at that time. The median of these readings is 98, as there are 2 numbers bigger than 98 and two numbers are smaller. The median is quite close to the actual value of 96. In this case the median gives a better estimate of the average reading than the mean value (127,8).

A low pass filter is like the mean, they both assume all readings are valid although maybe not very precise on their own. The median is different in this respect, it ignores the biggest and smallest readings and assumes only that the most average readings are valid. If I want to get rid of the spikes in the US sensor readings I need a filter that behaves like the median. Luckily it is not to difficult to create a filter based on medians. The filter should just return the median of the last couple of readings form the US sensor.

So how does a median filter behave? Suppose a median filter that returns the median value of the three most recent readings and an object is approaching the US sensor. This filter would be able to completely filter out all spikes that last only one reading (unlike the low pass filter). If a spike lasts two or more readings it will not be ignored at all. The filter could still be able to ignore these situations when it uses more than three readings.
Also, the filter needs time to responds to changes in readings (just like the low pass filter). This becomes clear if an object is closing in on the US sensor. The result of the filter would lag one reading from the actual situation. A median filter based on more than three readings would suffer from even more lag. So there is a trade-off between the strength of the filter and the time it needs to adjust.

It’s good to be aware of the potential difficulties in programming a median filter. First,  one needs to keep track of the history of readings, this takes memory space and requires the use of queues, arrays or lists. Second, the readings have to be sorted by value for being able to find the median. This has to be done again for every reading.

The US sensor updates every 50 ms, so it is of no use polling it faster.  Spikes lasting longer than one reading are rare, so taking the median of the last three readings will most often be sufficient. In this case one can even create a median filter without queues and sorting routines.