Update apj firmware via ota using esp8266 , project in devolpment

Hello, everyone. I would like to present a project that I have been working on for some time. This project has a main goal which is to be able to transfer an APJ firmware via OTA with the help of the ESP8266 and do a real remote upload. In the Ardupilot field this is normally done via USB cable using programs like Mission Planner or the python script :
“uploader.py”

These solutions work very well nowadays but unfortunately have limitations when it comes to convenience and speed. For my example, working with many drones, I need in case of new firmware versions to make a mass upload in the shortest time possible and here is where the OTA comes to help; with a solution like the OTA programming i can make a mass upload in a short time thanks to the help of the wifi network created by esp8266.

THE PROJECT :

Let’s move on to what I was able to do during this time .
First of all I had to overcome several problems to start just trying an OTA idea . In my hardware the esp8266 (wroom 2) is able to communicate with the FC via TX and RX through the UART 2 port, this connection obviously makes it capable of UDP communication with Mission Planner, Mavproxy …

The first thing I did was to modify the hwdef-bl.dat so that the BOOTLOADER was able to communicate through the UART2 port and with a baudrate of 921600(Normally the bootloader works at 115200 but this created problems for communication with the esp8266 that works instead at 921600, so I decided to adapt the bootlader to the baudrate of the esp).The last change concerns the SERIAL_ORDER or the order of the serial ports that the bootloader decides at boot, I put the USART2 as the first port .

Below are the changes to hwdef-bl.dat:


#CHANGE BAUDRATE
define BOOTLOADER_BAUDRATE 921600

# order of UARTs (and USB) for bootloading 
SERIAL_ORDER USART2 OTG1

# USART2 
PD5 USART2_TX USART2
PD6 USART2_RX USART2

The second SET of changes I had to do was mainly inside the original mavesp8266 firmware using the latest pull at this link:

My firmware modifications mainly include the activation of the raw mode of esp8266 (and the creation of various functions to control the status of the raw mode) at the arrival of a certain mavlink packet, in particular the message REBOOT_SHUTDOWN with param1=3
https://mavlink.io/en/messages/common.html#MAV_CMD_PREFLIGHT_REBOOT_SHUTDOWN

Here are the changes made to the esp8266 firmware:

void #function for enter in raw mode
MavESP8266Component::_enterRawMode(mavlink_command_long_t *cmd, uint8_t compID)
{
    if (_in_raw_mode) {
        return;
    }

    if (cmd) {
        getWorld()->getLogger()->log("Raw mode enabled (cmd %d %d)\n", cmd->command, compID);
    } else {
        getWorld()->getLogger()->log("Raw mode enabled\n");
    }

    _in_raw_mode = true;
    _in_raw_mode_time = 0;
}

////////////////////////////////

bool
MavESP8266Component::inRawMode() { #while the esp8266 is in raw mode
  // switch out of raw mode when not needed anymore
  if (_in_raw_mode_time > 0 && millis() > _in_raw_mode_time + 5000) {
      _exitRawMode();
  }

  return _in_raw_mode;
}

///////////////////////////////

void 
MavESP8266Component::_exitRawMode()
{
    if (!_in_raw_mode) {
        _in_raw_mode_time = 0;
        return;
    }

    _in_raw_mode = false;
    _in_raw_mode_time = 0;
    raw=false;
    getWorld()->getLogger()->log("Raw mode disabled\n");

    // Restore original baud rate
    Serial.end();
    Serial.begin(getWorld()->getParameters()->getUartBaudRate());
}


////////////////////////////////////

 // recognize FC reboot to bootloader command and switch to raw mode for bootloader protocol to work

            if(cmd->param1 == 3) 
            { //change param1>0 to param1==3
                _enterRawMode(cmd, compID);
                return false;
            }

Having modified the esp firmware in this way I’m able to activate the raw mode when the reboot_shutdown package arrives. Obviously the raw mode is needed to allow low level communication between the esp8266 and the FC during the bootloader phase.

First tests:

Created a spartan connection between esp8266 and bootloader I started to make the first tests that included sending a set of bytes useful to show the response of the FC and understand if there was a stable connection between the two. I worked completely on ubuntu environment and my idea was the following :

  1. Using a socat connection to create a virtual serial port with a direct udp connection to the network created by the esp8266
socat pty,rawer,link=/tmp/udp-serial-bridge udp4-datagram:192.168.4.1:14555,bind=:14550
  1. Activate a simple tcdumper to display the traffic via serial port
sudo tcpdump -i enp0s3 -X host 192.168.4.1 and udp
  1. Test a first handshake between mavesp8266 and FC by sending GET_SYNC+EOC bytes : 0X21 0X30
echo "21 20" | xxd -r -p | socat - udp:192.168.4.1:14555

The test started by connecting to the Access point created by the esp8266 and then creating the virtual port/udp . Then I had to send a mavlink command to send the FC into bootloader - and send the esp into raw mode , remember the changes made to the esp firmware - in this situation I started the tcpdumper and then sent the set of bytes 0x21 and 0x20 . With this set of packets the FC should respond with 0x12 0x20 indicating the success of communication.

The result was very satisfactory in fact on the tcdumper I saw both the outgoing and incoming bytes, confirming that the virtual connection between esp8266 and FC had been successful!

How to transfer apj and upload file via OTA and some errors :

After making sure that the connection was correct I asked myself how and what to use to load the apj firmware and do all the procedure that is normally done:
erasing , programming , verifing

So I decided to use px_uploader.py, a slightly modified python script useful to upload apj firmware via usb.

Being a script designed to work via usb I had to modify it with a kind of “retry” regarding the sending of packets, since in a wifi connection the loss of information is more frequent (if required I will post the changes to this script). At the end of all I tried to use the script starting it with this line :

python3 px_uploader.py --baud-bootloader 921600 --port /tmp/udp-serial-bridge firmware.apj

After a few failed attempts I managed to make the first OTA! The time to complete the operation took about 1 minute and a half, but it worked perfectly.

Being a real hack, however, still has some errors that I report below :

"CRC (Cyclic redundancy check) FAILED" ->  is an error probably due to a writing problem during the programming phase I believe that it is due to the verification phase that realizes that the writing previously made during programming does not match the one of the verification

"BOOTLOADER OPERATION FAILED" ->is an error due to failed communication between the bootloader and the script.
THIS PROBLEM DOESN'T HAVE A MESSAGE BUT FREQUENTLY THE PROGRAMING BLOCKS A T 83 %

CONCLUSION AND REQUEST FOR HELP OR ADVICE

A big thank you goes @ntamas for helping me with most of the projects and issues on here

This method works even if it has its problems and limitations today, one of these is obviously the limitation of being able to make the ota for one drone at a time, which totally blocks the final purpose of my project.

I ask help therefore to you for of the councils on like realizing a system, that it is also to command line that it is in a position in fact to managing a flow of information UDP with more routines and with a system similar to that one of the today uploader.py script using trio (Using an asynchronous solution with multiple coroutines) from python script maybe. I ask also to the maximum connoisseurs as @tridge to say their opinion and to give advices about this project. Thanks for reading the post ! Davide

5 Likes

I’ve been planning to write an alternative version of px_uploader.py with the newer Python async-await syntax and an abstraction layer for the IO part so most of the uploader module would not need to know whether it’s talking to the bootloader over a serial line or over the network with UDP packets. I probably won’t have time for this in the next few days, but if nothing else happens I’ll probably return to this sooner or later.

2 Likes

Analysing the possibility of using “trio” for this project:

Today I studied a bit about the ‘trio’ library available on Python and followed the tutorial that is offered:

https://trio.readthedocs.io/en/latest/tutorial.html#networking-with-trio

I believe that this library offers an excellent possibility for the simultaneous management of various uploads on multiple drones.

From what I have read today (and which I will continue to investigate over the next few days) is that this library offers the possibility of running several functions simultaneously, creating a true multithreading system or at least a system that creates independent blocks capable of functioning without the aid of other components.

The main idea would be to create a model block capable of performing all the operations necessary for uploading (erasing, programming, checking), obviously also covering all the operations that take place beforehand, such as synchronisation with the bootloader.

Using this block as a model would extend this system to multiple drones that would represent as separate objects or entities.

CREATE A BIDIRECTIONAL CONNECTION WITH MULTIPLE DRONES:

It is interesting to address the question of how to connect to many drones at the same time, initially this project had thought of creating multiple virtual serial ports, but this is probably not the standard solution and would lead to problems later on.

I believe that keeping the esp in raw mode is mandatory, because without it we would not be able to communicate at a low level with the bootloader. Then we could think of another way to manage the incoming UDP packets, through a socket between the drone address (for example for drone 20): 192.168.1.20:14555 and the model in the python script, I think the reading can be handled by the functions already present in the script px_uploader.py

SD CARD PROJECT IN DEVOLPMENT ? @tridge :

I have also learned of the possible addition during 2022 of the possibility of uploading apj firmware directly from the sd card on the drone. I think this is a great possibility and that compared to the project we are trying to carry out it is more standardised and placed in the context of mission planner and ardupilot. I’d like to know from @tridge how this idea would be developed and if there would be the possibility to do a mass upload through MavFTP on all the sd cards of a group of drones.

Thanks , Davide

2 Likes

Analysing px_uploader.py script

Today I was analysing the python script ‘px_uploader.py’, thinking about which parts to integrate into an OTA system and which not. Starting with the CRC table, I believe that this can be used as a model by each entity represented by the drones (from what I understand the crctable is used in the verification process).

crctab = array.array(
    "I",
    [
        0x00000000,
        0x77073096,
        0xEE0E612C,
        0x990951BA,
        0x076DC419,
        0x706AF48F,
        0xE963A535,
        0x9E6495A3,
        0x0EDB8832,
        0x79DCB8A4,
        0xE0D5E91E,
        0x97D2D988,
ect-...

Then the “firmware” class is used to load the file from the b64 encode ready to be used by the script. Also in this case I think there is no need to open the file for each entity (drones) so I think it is enough to encode or open it only once.

class firmware(object):
    """Loads a firmware file"""

    desc = {}
    image = bytes()
    crcpad = bytearray(b"\xff\xff\xff\xff")

    def __init__(self, path):

        # read the file
        f = open(path, "r")
        self.desc = json.load(f)
        f.close()

        self.image = bytearray(zlib.decompress(base64.b64decode(self.desc["image"])))

        # pad image to 4-byte length
        while (len(self.image) % 4) != 0:
            self.image.append(255)  # ntamas: fix for Python 3.x

    def property(self, propname):
        return self.desc[propname]

    def crc(self, padlen):
        state = crc32(self.image, int(0))
        for i in range(len(self.image), (padlen - 1), 4):
            state = crc32(self.crcpad, state)
        return state

The uploader class, on the other hand, is the one used for uploading the firmware and most of the functions, and so I think it is necessary to analyse it in detail to see if it is possible to divide up the various functions for several entities(drones).

I also wanted to add the modified parts within the script px_uploader.py that represent a first form of retry, which do not exist in the script “uploader.py”, again thanks to the help of ntmas I was able to modify “px_uploader.py” to allow a re-sync and a check during the programming phase

This code is the function dedicated to sync, which has been suitably modified to allow you to make 5 attempts before giving up:

def __sync(self):
        # send a stream of ignored bytes longer than the longest possible conversation
        # that we might still have in progress
        # self.__send(uploader.NOP * (uploader.PROG_MULTI_MAX + 2))
        
        self.port.flushInput()
        
        tries = 5
        while True:
            self.__send(uploader.GET_SYNC + uploader.EOC)
            try:
                self.__getSync()
            except RuntimeError:
                tries -= 1 
                if not tries:
                    raise
            else:
                return
  

In this other function, however, a similar thing is done and there are always 5 attempts, in this case, this function is used during the programming phase:

    # send a PROG_MULTI command to write a collection of bytes
    def __program_multi(self, data):
        
        tries = 5
        
        while True:
            length = len(data).to_bytes(1, byteorder="big")
        
            try:
                self.__send(uploader.PROG_MULTI)
                self.__send(length)
                self.__send(data)
                self.__send(uploader.EOC)
                self.__getSync()
            except RuntimeError:
                #probably a timeout
                tries -= 1
                if not tries:
                    #re-raise excpetion
                    raise 
                else:
                    #try to re-sync
                    self.__sync()
            else:
                return

And also this modification:

 if incomp:
                msg = (
                    "Firmware not suitable for this board (board_type=%u (%s) board_id=%u (%s))"
                    % (
                        self.board_type,
                        self.board_name_for_board_id(self.board_type),
                        fw.property("board_id"),
                        self.board_name_for_board_id(fw.property("board_id")),
                    )
                )

I will try to analyse it more and write my thoughts here in the discuss :slight_smile:

Thanks

UPDATE ON CRC ERROR :

Thanks to the help of @ntamas .

We managed to figure out what could be the cause of the “CRC Failed” error returned by the modified python script. This error could be caused by incorrect writing during the second step: programming the px_uploader.py script. In fact, the writing and programming phase works like this :

In the firmware of the bootloader there is a “write pointer” that is used to write to the flash memory of the Flight Controller .


When the command 0x23 0x20 is sent, it resets the chip and sets the write pointer to 0.

Afterwards the uploader (script) starts to send a train of information containing firmware fragments like this:

0X27 0XFC -> 252[bytes of data] -> 0X20

0x27 is the command: "write multiple bytes command".

0XFC is the command that says there will be 252 bytes

0X20 is the usual EOC command

When these 252 bytes have been written correctly it is sent :

0x12 0x20 as a reply: "Ok".

What happens and what leads to the CRC error is the loss of this response (0x12 0x20) which causes a misunderstanding between the script and the bootloader. The script wrote the 252 bytes correctly but since the positive response from the bootloader was lost, the script cannot know if the procedure went well.

This causes the script to rewrite those 252 bytes for the other parts of the flash memory as well. Obviously during the verification phase an error is returned because there are bytes that should not be there.

Possible solutions to this problem:

  1. The bootloader sends back the current position of the write pointer after a write operation (so the uploader script can detect when it is not the same as what it expects)

  2. The uploader script must send the address to write the next chunk to with every chunk of data, eliminating the need for an internal write pointer in the bootloader

Today and tomorrow, I will investigate more about where to find the part of the code we are interested in for this modification within the bootloader source files .

For any advice or opinion, please write in this post . Thanks ! :slight_smile:

@tridge @rmackay9

A thread on OTA firmware has been created inside of General thread of Ardupilot discord Discord

If you have something to say feel free to write , thanks !

Possible add to bootloader source files:

To improve the bootloader write pointer, a GET_WRITE_OFFSET has been implemented to let the uploader script know at what point the writer pointer is and through an ACK let the client know if it has to send back the 252 bytes of firmware packages or if it can continue sending the firmware.

Another idea:

Another idea, proposed by Buzz and previously thought for this project is to transfer through TCP protocol the firmware inside the SPIFFS of the esp8266 and after the change of mode of the FC in bootloader use the SPIFFS to transfer and send the firmware to the FC.

In esp8266 there is a 4MB flash memory and with 2MB of SPIFFS it should be more than necessary to load standard firmware which weighs about 1.4MB .

Thanks ! :slight_smile:

Updates on the resolution of the CRC problem:

So, thanks to the help of ntamas, the CRC problem has probably been solved, which occurred due to incorrect writing of bytes during the programming phase VERY likely due to a loss of the response packet 0x12 0x20.

Obviously, changes have been made to both the Bootloader source files and the uploader script file, which has been COMPLETELY rewritten for our needs.

So the bootloader revision will have to be updated to introduce this new feature.

In particular, an opcode has been added: GET_WRITE_PTR

which allows the position of the pointer to be received for greater control by the python script.

With this modification, it was possible to make 10 OTAs without CRC errors !

However, all documentation will be created with a repository on GitHub very soon. So stay tuned!

1 Like