Bug in Linux/PX4 HAL?

Hi folks,

Apologies for the wall-o-text. Not really sure where to ask my questions, so if I have the wrong forum, my apologies.

I’ve been reading the source, and building on my BBB trying to get my head around the code, and when running it I’ve been seeing a corruption issue I wanted to ask a few questions about.

The first question is: I think the buffer filling/emptying logic in libraries/AP_HAL_Linux/UARTDriver.cpp is of the “full means leave the last slot empty” sort, correct? So you can tell the difference between full and empty by head/tail equality checks…

If so, I wonder about this code in UARTDriver::_timer_tick():

// try to fill the read buffer uint16_t _head; n = BUF_SPACE(_readbuf); if (n > 0) { if (_readbuf_tail < _head) { // one read will do assert(_readbuf_tail+n <= _readbuf_size); _read_fd(&_readbuf[_readbuf_tail], n); } else { uint16_t n1 = _readbuf_size - _readbuf_tail; assert(_readbuf_tail+n1 <= _readbuf_size); int ret = _read_fd(&_readbuf[_readbuf_tail], n1); if (ret == n1 && n != n1) { assert(_readbuf_tail+(n-n1) <= _readbuf_size); _read_fd(&_readbuf[_readbuf_tail], n - n1); } }
From UARTDriver.cpp (indents are mine):

#define BUF_SPACE(buf) (((_head=buf##_head) > buf##_tail) \ ?(_head - buf##_tail) - 1 :((buf##_size - buf##_tail) + _head) - 1)
Seems to me that if _readbuf_head == 0, then the calculation for n1 doesn’t allow for the empty slot like BUF_SPACE() does. The manifestation of this is that n will be one less than n1. So, the second question is: this is a problem, right? See below for more on why.

You’d think the assertion would trigger in this case, but it doesn’t, I believe because of integer promotion rules. You end up with

assert(_readbuf_tail+(n - (n + 1)) <= _readbuf_size) or

assert(_readbuf_tail+(65535) <= _readbuf_size) only, because of integer promotion it’s really

assert(_readbuf_tail+(-1) <= _readbuf_size) so, the assertion holds and execution continues. So, I think the assertion should be

assert((uint16_t)(_readbuf_tail+(n-n1)) <= _readbuf_size) Other assertions need similar treatment.

I think that if the code is left as is, it means that the second _read_fd() will try to read 65535 bytes, which does bad things (manifestation on Linux is that the heap corruption detector triggers an abort).

I’m a little unsure of how you should fix the n1 calc. You could do:

uint16_t n1 = _readbuf_size - _readbuf_tail - (_readbuf_head == 0 ? 1 : 0); but I’m not sure you couldn’t be left with a zero-length read. If so, then you need to allow for one read, but on the other side if you will…

Final question: I wouldn’t bother people with this, only - doesn’t the PX4 use the same logic? Granted, it’s not going to happen often, but once may be enough.

Mitch

Hi Mitch,
You are absolutely right! I independently found this bug well after you did, and I wish I’d read your post much earlier.
Maybe post to the drones-discuss in future when you find bugs like this? Unfortunately I don’t have time to read this forum very often :frowning:
Cheers, Tridge