Pixhawk persistent Bad AHRS

Hi guys,

hopefully someone can help with my issue.
I repeatedly keep getting an error on Mission Planner application “Bad AHRS”, I’m unsure to the cause and I can’t find any information about this error.
I haven’t flown the quadcopter, I am receiving this error while its on a table.

Firmware 3.3.1
PX4 v2.4.6

would be great if someone can help point me towards a cause to this error
Thanks in advance and have a nice day

1 Like

Anyone know what sensors are used for AHRS (attitude and heading reference system)?

I spent all night looking online for any help with this error and with no success, and I don’t know where to turn.

Guess Im on my own :cry:

Tonight ill look through the source code of ardupilot, find out what sensors AHRS uses and how it determines “Bad AHRS”.

So I can try and resolve this issue on my own

So after a quick look at the source code, i have found a virtual healthy method on the AHRS,

AP_AHRS_NavEKF.cpp extends AP_AHRS_DCM.cpp that extends AP_AHRS.cpp with in the Ardupilot libraries folder.
From what I understand 3.3.1 uses EKF. so I guess for some reason the AP_AHRS_NavEKF.cpp is reporting as unhealthy which then sets a bit flag on the MavLink Message “mavlink_sys_status_t” property “onboard_control_sensors_health”

looking at my tlog, the “onboard_control_sensors_health” is 6355979 (11000001111110000001011) but changes to 4258827 (10000001111110000001011) a few times, this tells me that AHRS is reporting as unhealthy bit mask (01000000000000000000000). I guess Mission Planner is testing onboard_control_sensors_health and then displays “Bad AHRS” to me

so time to dig into the AP_AHRS_NavEKF.cpp, AP_AHRS_DCM.cpp and AP_AHRS.cpp class to see whats going on

1 Like

Arducopter uses AP_AHRS_NavEKF.cpp
found the line its instantiated (copter.h:191)

but it seems a bit strange that NavEKF and NavEKF2 uses AP_AHRS_NavEKF,
and AP_AHRS_NavEKF using NavEKF and NavEKF2

NavEKF EKF{&ahrs, barometer, sonar}; NavEKF2 EKF2{&ahrs, barometer, sonar}; AP_AHRS_NavEKF ahrs{ins, barometer, gps, sonar, EKF, EKF2, AP_AHRS_NavEKF::FLAG_ALWAYS_USE_EKF};

AP_AHRS_NavEKF::healthy method,

there is a switch statement using ekf_type() which i believe would return 1.
So it appears that the class NavEKF.healthy is returning false, that leads to AP_AHRS_NavEKF::healthy returning false

time to look through NavEKF class to see what its healthy method does

NavEKF.healthy method simply calls AP_NavEKF_core->healthly

so either one or both of these code extracts is returning false, leading to my “Bad AHRS”.

if (frontend._fallback && velTestRatio > 1 && posTestRatio > 1 && hgtTestRatio > 1) {
        // all three metrics being above 1 means the filter is
        // extremely unhealthy.
        return false;
    }
// barometer and position innovations must be within limits when on-ground
    float horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]);
    if (!vehicleArmed && (!hgtHealth || fabsf(hgtInnovFiltState) > 1.0f || horizErrSq > 2.0f)) {
        return false;
    }

As no one else could answer my question,
“Bad AHRS” can occur if “velocity_variance”, “pos_vert_variance” or “pos_vert_variance” is greater than 1.

looking at my log this is not true (if im reading the log correctly, my highest variance is 0.11)

on to the other condition that may be returning false

// barometer and position innovations must be within limits when on-ground
    float horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]);
    if (!vehicleArmed && (!hgtHealth || fabsf(hgtInnovFiltState) > 1.0f || horizErrSq > 2.0f)) {
        return false;
    }
1 Like

This happens with us quite a bit too, we solve it by rebooting while keeping the Pixhawk absolutely still.

There’s another discussion here: diydrones.com/forum/topics/bad-ahrs but nothing definitive in solving the problem.

Great detective work though, I had no idea what in the code would trigger the error.

1 Like

Thanks Graham for getting back to me, Ill give your solution a go tonight and get back to you.
Ive been working on this quadcopter for months with one issue after another, and have only flown twice and crashed twice :cry:

this is my third build attempt and my patience is running out

as you say “this happens with us quite a bit too” i think it will be worth while me continuing my debugging so i can definitely answer why this is happening (or at least in my situation) so the results could lead to a fix.

// barometer and position innovations must be within limits when on-ground
    float horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]);
    if (!vehicleArmed && (!hgtHealth || fabsf(hgtInnovFiltState) > 1.0f || horizErrSq > 2.0f)) {
        return false;
    }

vehicleArmed = false
badIMUdata = false (appears this is detected by use of GPS as well, not using gps)
hgtHealth = ((hgtTestRatio < 1.0f) || badIMUdata) || hgtTimeout || (constPosMode && !vehicleArmed);
hgtInnovFiltState = ??? (hgtInnovFiltState += (innovVelPos[5]-hgtInnovFiltState)*alpha;)

innovVelPos[3] = posInnov.x
innovVelPos[4] = posInnov.y
horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]) //believe this is GPS data - i have no gps so would expect this value to be 0

so im guessing the cause of my issue could be hgtInnovFiltState.
to be honest im guessing a bit now, so ill make a custom build tonight and remove some of these conditions until my error goes away to isolate my issue

hgtInnovFiltState = ??? (hgtInnovFiltState += (innovVelPos[5]-hgtInnovFiltState)*alpha;)

this seems to be a calculation done on the barometer.

innovVelPos[5] += constrain_float(-innovVelPos[5]+gndBaroInnovFloor, 0.0f, gndBaroInnovFloor+gndMaxBaroErr);

I have found a very useful bit of data being logged.
EKF4 has a fault status feild (FS) in the Mission Planner log review.

in the image you can see my FS is 128 (10000000) but then it changes to 0
so the EKF is happy about its filters. so this doesnt look like the answer I need but it might help someone out :slight_smile:

Fault status bitmasks - theres a mistake below, i copied this from ardupilot source

return the filter fault status as a bitmasked integer 0 = quaternions are NaN 1 = velocities are NaN 2 = badly conditioned X magnetometer fusion 3 = badly conditioned Y magnetometer fusion 5 = badly conditioned Z magnetometer fusion 6 = badly conditioned airspeed fusion 7 = badly conditioned synthetic sideslip fusion 7 = filter is not initialised

struct log_EKF4 pkt4 = { LOG_PACKET_HEADER_INIT(LOG_EKF4_MSG), time_us : hal.scheduler->micros64(), sqrtvarV : (int16_t)(100*velVar), sqrtvarP : (int16_t)(100*posVar), sqrtvarH : (int16_t)(100*hgtVar), sqrtvarMX : (int16_t)(100*magVar.x), sqrtvarMY : (int16_t)(100*magVar.y), sqrtvarMZ : (int16_t)(100*magVar.z), sqrtvarVT : (int16_t)(100*tasVar), offsetNorth : (int8_t)(offset.x), offsetEast : (int8_t)(offset.y), faults : (uint8_t)(faultStatus), timeouts : (uint8_t)(timeoutStatus), solution : (uint16_t)(solutionStatus.value) };

When I say quite a bit, we’ve logged about 800 hours in 18 months and we get that warning about one of three boots, but as I haven’t found a solution other than keeping it still when booting we kinda live with it and attribute it to the nature of open source software.

Also there is FilterStatus (SS) field on the EKF4 data. But the Fault status in my previous post is used in the AHRS heath check where this isnt

as you can see from my image, I have 165 (10100101)
and I believe it means, attitude estimate valid, vertical velocity estimate valid, vertical position estimate valid and constant position mode

return filter function status as a bitmasked integer 0 = attitude estimate valid 1 = horizontal velocity estimate valid 2 = vertical velocity estimate valid 3 = relative horizontal position estimate valid 4 = absolute horizontal position estimate valid 5 = vertical position estimate valid 6 = terrain height estimate valid 7 = constant position mode

Another useful data of data is EKF4 SV = velocity variance, SP = postion variance, SH height variance.

if any of these are greater than 1 then AHRS is unhealthy

But mine is all good, and below 1 (you can see on the graph
but height variance (SH) is bit erratic compared to the other values for me.

Ardupilot EKF is failing on a health check,
value reported as EKF3 - IPD is the cause, if this value goes above 1 while ardupilot is disarmed a “Bad AHRS” error will be given.
This issue will not occur if the ardupilot is armed

How i got to the answer
So im going to walk through the EKF health check to find out what is the cause to “Bad AHRS”

Test 1: Passed
so in my previous out i checked the Fault Status code, which was 0 so test 1 passed

Test 2: Passed
again in a previous post I check my variances, the height variance was a bit erratic compared to other values but they were all below 1, so test 2 passed.
Im not sure what the value of _fallback but either way it doesnt matter, if its false the test wont run so this test doesnt matter, if its true then the test would pass because ive checked the other values

Test 3: Failed
Test 3.1: So next tests run (if armed then Test3 as a whole passes)

!vehicleArmed

Quadcopter not armed so next tests run

Test 3.2: Failed

fabsf(innovVelPos[5]) > 1.0f 

innovVelPos[5] = Log Review EKF3 - IPD = 1.4 (highest value, and this value is normally below 1 but spikes above every now and then)
so this value value has gone above 1, so this test failed causing routine to return false (Unhealthy EFK, which is also Unhealthy AHRS “Bad AHRS”)

Test 3.3: Passed

float horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]);
horizErrSq > 1.0f

innovVelPos[3] = Log Review EKF3 - IPN = -0.08 (highest value)
innovVelPos[4] = Log Review EKF3 - IPE = -0.05 (matching row, but also the highest value)

horizErrSq = sq(-0.08) + sp(-0.05)
-0.0089 = -0.0064 + -0.0025

So -0.0089 is less than 1 so this test Passed

Im modified the source below, ive only added comments

bool NavEKF::healthy(void) const
{

    // Test 1 : get the FilterFault code status, if its greater than 1 then EKF unhealthy
    uint8_t faultInt;
    getFilterFaults(faultInt);
    if (faultInt > 0) {
        return false;
    }

   //Test2: Check Fallback and variances
    if (_fallback && velTestRatio > 1 && posTestRatio > 1 && hgtTestRatio > 1) {
        // all three metrics being above 1 means the filter is
        // extremely unhealthy.
        return false;
    }
    // Give the filter a second to settle before use
    if ((imuSampleTime_ms - ekfStartTime_ms) < 1000 ) {
        return false;
    }

    //Test 3: when not armed (when every seems to have this issue??)
    // barometer and position innovations must be within limits when on-ground
    float horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]);
    if (!vehicleArmed && (fabsf(innovVelPos[5]) > 1.0f || horizErrSq > 1.0f)) {
        return false;
    }

    // all OK
    return true;
}

So I have found out in the Code why AHRS is being reported as bad.
EKF3 - IPD is related to the Barometer

but i dont currently know fully how IPD is calculated

innovVelPos[5] = statesAtHgtTime.position.z - observation[5];

so there are many ways to fix this, but I dont understand fully what the value IPD is, so im unsure of the safest way to correct this.

you could simply change the test to “> 2” but i dont know what the cause of doing this would be

// barometer and position innovations must be within limits when on-ground
float horizErrSq = sq(innovVelPos[3]) + sq(innovVelPos[4]);
if (!vehicleArmed && (fabsf(innovVelPos[5]) > 2f || horizErrSq > 1.0f)) {
return false;
}

think i should open an issue Ardupilot github, so the experts can fix this

this has been logged on Ardupilot project on github

github.com/diydrones/ardupilot/issues/3171

Hopefully we will have a solution to this issue soon
also I hope my posts have been helpful so far :slight_smile:

the bottom line is this error appears to be misreported while not armed,
if you get this while armed I think you are in trouble and something has really failed
but my previous posts would help you diagnose what if you have the flashdata log :slight_smile:

1 Like

Sorry but you will have to excuse the amount of grammar errors, I’ve spent my evening on this and didnt double check my posts before submitting