Altitude estimation error - real altitude far higher than EKF altitude

This week we had our quadcopter climb to a very high altitude while the altitude was shown to be the desired altitude. The excessive climb caused a temporary loss of communication. After communication was restored, the EKF altitude had been updated and was indicating more or less a correct altitude.
Unfortunately I don’t have an arducopter log file as we have not (yet?) been able to retrieve the drone. Below is a graph which shows the altitude data that I have which comes from our own telemetry link. The three altitudes shown are the EKF altitude (coming from the relative altitude in the GLOBAL_POSITION_INT mavlink message), the lidar altitude from the rangefinder mavlink message and the pressure altitude (1013 hPa ref) which is calculated from the raw pressure in the SCALED_PRESSURE mavlink message. The vertical jump in the graph indicates the part where communication was lost for about 45s.

In the graph we see the pressure altitude and lidar altitude agree well for the part where the lidar is working, indicating that the pressure measurement is reliable.

We fly a custom version of Copter 4.4.4 and are using EKF2 with the barometer as primary source. For position we have a Here 4 and a Here 3 GPS unit, which are configured to do 3D velocity and 2D positioning in the EKF. We are flying a fleet of these multicopters and never had significant issues with altitude estimation. This airframe already had quite a few flight hours without any issues.

I am wondering what can cause the EKF altitude estimation diverges so much from the barometric altitude, even though it is configured as primary altitude source. Is it a faulty GPS measurement that indicates a large downward velocity? Is the accelerometers that did not work well?
I’m also wondering how the EKF can diverge so much without triggering a warning.

Can someone shed some light on this?

Without a log, it’s really hard to diagnose.

Additionally:

Recommend you rebase your custom firmware on 4.5.4.

Recommend you use EKF3 in the future. There is no good reason that I’m aware to continue using EKF2.

Unfortunately, we were not able to retrieve the airframe so I do not have a log. I connected mission planner after we found the altitude to be off so I have a TLog of the latter part of the flight, but not of the part where it was really wrong. I was hoping that someone could tell some possibilities of why the EKF would fail.

As far as I know, the EKF uses the accelerometers and GPS velocity to estimate the prediction and uses the baro altitude to correct this estimate. So either the covariance of the barometer got so big that its correction was rendered negllible or the prediction was so wrong that even a huge difference in altitude (600m) with barometer could not compensate for it. Both seem unlikely to me.
I know that without the logs, we will probably never know what happened, but I would like to find a plausible explanation of what could have caused this.

Regarding the other comments: I was planning to rebase to 4.5 soon, but haven’t had the time for it yet. We are still using EKF2 because it worked well so far , but I might need to expedite changing to EKF3 after this incident.

Here is another interesting graph from the data that we have. It shows the EKF status values from the EKF status message. The vertical position value is obviously high as there was a high altitude offset even though it should be higher in my opinion. The interesting line is the velocity estimation. This has a periodic spike happening that could provide a clue what was causing the erroneous behavior. The EKF failsafe threshold is set to 0.8, so this was not reached before we lost communication. When we regained communication, the drone was in LAND mode indicating that it triggered the EKF failsafe. It is also strange that the vertical position is not larger considering the difference between the EKF estimate and barometer altitude was larger than 500m.

@Yuri_Rage If it can help, I can provide the data that I have which is based on the mavlink telemetry of the flight controller. I can make a .csv file with the relevant data. I also have the parameter file of the drone if that is of any help.