PixHawk crashed while ground testing

I was going to add this to the end of a previous post about a crash but felt it needed a new topic.

OK, this problem is getting deeper and more insidious the more testing I do.

Today I actually had the Pixhawk crash on me while ground testing. Not the copter, the Pixhawk flight controller.
This I have never encountered before.

First thing I needed to find out today was why Loiter was being refused.
Nothing in the logs that tells me, and nothing on the GCS (Mission Planner).

So I set the copter up minus props, first in the workshop where I could get expected results and in 2 locations outside.
As expected, indoors especially, it refused to go into Loiter with no messages on the GCS, only Loiter requires 3D Fix, which is expected inside. Outside with 17 sats and a HDOP of 0.6 it refused Loiter with the 2 tones from the beeper but no messages on the GCS.
By this time I had it sitting in the back yard for good reception and it was still refusing with no feedback as to why.
It was only when I switched to Loiter while disarmed and tried to arm that I got the message ‘PreArm: GPS Horiz error 6m (needs 5.0)’.
Nothing could coerce it into Loiter.

While it was in a clear area I started switched to Auto (no props remember) and immediately received the message on the GCS ‘Auto: Failsafe Terrain data missing’.
I do not have Use Terrain Data turned on, in fact I went into Params, set it to On and then set it to Off to be sure.
So auto is not selectable.

Now while I was doing all this switching the copter was in the back yard and I was in the workshop with the RC Tx and Laptop running Mission Planner, while keeping an ear on the error beeps from the copter.
At some point in between arming in Stabilise, switching to Loiter and Auto and disarming in all manner of combinations, the GCS stopped displaying any more messages and I heard no beeps from the copter.
The radio was still a solid green connection to the copter but no data was coming in.

On investigation I found the following:
1: The battery still had 45%
2: The motors were beeping
3: The Safety Switch was a solid red
4: No RGB LEDs were active
5: Although pressing the Safety Switch produced the right change in the switch light, 3 flash sequence, press again, back to solid, the motors continued to beep.
6: No telemetry was coming out.
7: Unplugging the battery and plugging it back in brought everything back to life.
8: No amount of mode switching after that could reproduce the FC crash.

I should mention again that we are using a Septentrio GPS with the protocol switched to SBF.
This is the only change made from when the copter was originally flight tested and it flew quite well with FW 3.3.3

3DR Pixhawk w/Crius HV Power Module
BEC into output rail w/Zener over voltage protection
MavToHOTT telemetry module in Comm2
Septentrio RTK GPS
Graupner Tx and Rx
T-motor U5 motors w/T-Air 40A ESC’s
T_motor 15x5 Props
1000 size Octo frame
6S 22000 battery
RFD900 Telem radios

This is major worry if I can get a FC to actually crash.
And in light of this I am having grave doubts about where 3.4 is actually going.
Can anyone, please, please allay my fears with any information, questions, anything, about what might be going on here.
I cannot in all conscience put copters this size in the air if there is the remotest possibility of the flight FW crashing the processor totally like this.

2 22-09-2016 11-48-24 AM.bin.zip (825.0 KB)

This is the first report I’ve heard of a FC crash. @CraigElder can we mention this on the dev call tomorrow please.

Thank you for the log, without it we have almost nothing to go on to track this down.

@MagicRuB I was planning to bring a similar event on the dev call

@mboland did you have the gps logging as well, would be good to get that log as well

this is a graph that shows some very odd data.

something is up with the IMU

@Michael_Oborne Just checked and data logging was not turned on.
I have set it to be initiated by the button on the unit which we will do just before flight.

Next tests I will make sure logging is on on the AsteRx-m UAS.
The files on the internal card are already very large from our tests and the AsteRx-m UAS does not split it’s data up but continues to log to just 2 files ???

Fortunately I am not the one that has to cope with their logging but I will get extracts from the next tests.

Do you think we might have a bad PixHawk?

This machine was stationary on the ground during these tests.

did you place the drone on the ground? or where you holding it the entire time? just trying to account for z acceleration.

I thought I may have solved the mystery of what was going on here with the PixHawk crashing but not yet the GPS problems, but it seems not.

This Octo has the battery on top of the dome covering the flight stack (not my design) so the internal Pixhawk compass and the internal AsteRx-m UAS compass are sandwiched in there.
To get a good compass reading I have placed another compass (HMC5983) on the GPS mast (an active multi frequency GNSS antenna).

I was alerted by the comment from @Michael_Oborne on the strange data so I have been looking very closely at the setup and found that the connections of the SDL and SCA lines on the i2c connection to the top compass were reversed. Funny that it did not show up as i2c errors.

The copter has been ground tested extensively with no crashes of the FC observed.
I couldn’t post the log as it was >46Mb in size but it has clarified at least one bug and one unresolved condition that I will need to start a new topic on.

@Michael_Oborne the Octo was sitting on the ground for all testing.

This is similar the issue I just posted - although mine only has motor 2 affected.

It seems it is NOT the crossed I2C lines that were the cause of the FC crashing.
I corrected the compass wiring and remounted the compass away from everything (especially the active GPS antenna in it’s heavy metal case) and now have a solid compass reading with a variance of +/- 0.4deg.
Best I have ever had.

Todays tests were centred on our RTK GPS and 20 minutes after powering up the Octo the motors started beeping and all response was lost. This happened while we were working on the GPS Base station Rx connected to the laptop, so I was not paying attention to the Octo while it was sitting out in the back yard.

We had at least another 2 FC crashes after that at random times.
I have included a log from one of them 2016-10-04 11-41-57Crash.bin.zip (2.4 MB)

At the end of the testing I also left the Octo powered up in the hope of getting another crash while the SBF GPS was logging but to no avail.

So this is still an unresolved issue and the finger is pointing to the SBF routines.
I will be forwarding the logs to @Michael_Oborne who has asked to view matching logs from the PixHawk and the AsteRx-m UAS.
I should just repeat here that this setup has flown very solidly with the AsteRx-m UAS in NMEA mode.

Got some testing in today and had several crashed FC events.
For anyone that is interested I have some matching Pixhawk and AsteRx-m UAS logs.
I am not sure which is relevant, the NMEA or the SBF (this one I assume) from the AsteRx-m UAS so I put both up on the server.

Having tried to upload the logs, it appears a couple are too big so here is the URL you can download them from
Octo logs

If there is anything else anyone needs to know please ask.

Mike

Just to keep this thread up to date with our testing.

Completed a round of props off ground tests in the back paddock with the GPS base station setup and RTK link through GPS insertion working very well.
Our stationary point variation was +/-6mm
RTK_Results_20161013.pdf (227.8 KB)

NO FC crash was observed while outside testing but did happen when inside the workshop downloading logs.
Next day I left the Octo powered up for approx 3.5 hours in the workshop with no FC crash evident. ???
Next tests will be outside with a 3D Lock in the next day or so.
I am hunting for a reproducible scenario.

The point of the tests was to verify the GPS co-ordiantes we have on the GCS to the actual co-ordinates as measured manually. Our standard survey peg is accurate to 1.2mm.
What was observed was a 20cm Easting and a 5cm Northing difference in GCS reported position from actual.

This is where all this exceeds my understanding so I will quote Jason, a GIS Consultant with over 30years experience working for some of the largest corporations and government agencies in Australia.

Mike
The Septentrio stores coordinates in the coordinate system we are using. (MGA94 - Zone nn). MGA94 used GDA94 datum which is slightly different to the WGS84 datum that GPS use. While they use the same spheroid (GRS80) they use slightly different transformation parameters, scales etc which equates to about a 20cm difference horizontally. In Australia we also have a specific height datum tied to port datums and approximated using a gravimetric grid (Ausgeoid09 being the latest accepted model). The GPS uses an ellipsoid height based on the spherical model used to approximate the earths surface. The Septentrio send coordinate to the pixhawk in wgs84.

So…… in short. It would be good if all systems (Septentrio RTK GNSS and Pixhawk and Mission Planner) all used the same datums and coordinate systems. The simplest solution would be to have Mission Planner allow you to select the coordinate system (other than WGS84) that you want to use for your project including implementation of Ausgeoid09 to calculate height in AHD.

MGA94 is ITRF compliant and as such accounts for the 7cm per year continental drift. Also very important if you want to work in cm accuracies over extended periods in same location ie landfill airspace monitoring.

Regards

Jason
Spatial Analyst
This apparently accounts for the Easting and Northing errors we were seeing.

There is also the issue of decimal places for the needed extra accuracy in RTK, as the current 6 decimal places will not give us sub cm accuracy.

@Michael_Oborne was kind enough to give a quick reply while he is still on vacation (?) :slight_smile:
I quote:

Michael_Oborne11h
the issue will be the fact that google earth provides the images in wgs84, on a global scale. not just Australia. so while it would be nice to fully support mga/utm/ gda94, I don’t think this will be viable. I will have a look when I get home though.

I use to work for a surveying company, so was aware of this.

Michael

More to follow.

At present I am trying to find any reproducible events that will trigger the FC crash.
Yesterdays testing:
Loaded mission - height relative, 4 waypoints around a paddock, set home.
15 sats hdop 0.7- green LED in Stabilise

  1. Pre arm - switch to loiter, blue LED, no GCS message
  2. Pre arm - switch to Auto, GCS message “FS: Terrain data missing” (all terrain parameters were set to 0)
  3. Arm in Stab - switch to Loiter - reject tones - LED stays green -no GCS message
    • switch to Auto - reject tones - LED still green - no GCS message
    • switch back to Stab - disarm
      GCS message "Bad AHRS"
      Repeated the above steps but at some stage between steps 4 and 5 the FC hung.
      Green LED’s solid, safety switch solid on, motors beeping (no ppm signal), no data being sent from FC.
      Logs Here

What was noticeable about these tests was the inconsistency in LED function when modes are rejected.
Pre-arm switch to auto mode will give you a switch from green to blue if the tests fail, which I am told is the new way the LED and flight modes work.
But arm in Stab and switch to an Auto mode and the LED stays green with a reject tone from the FC but no message on the GCS.
Disarm in an auto mode (while rejected from that mode) and the LED STAYS green, giving no indication that you are NOT really in that auto mode. You can arm again with the LED going from flashing green to solid green thinking you are OK to go in that auto mode but in reality you are NOT, and there are no GCS messages to indicate this.

Hi Mike,

Thank you for your work on this, if you can find a reproducible way to crash the FC it would be a great help to track this issue down. Let me ask this: have these crashes always happened while using the SBF protocol or have you ruled that out?

Regarding the led and mode change: when you were armed the mode change to Auto was rejected. So you stayed in Stab and that is why the led stayed green. Regarding a message on the GCS, you should open an issue with that request, I think it is a good one.

On the trail of this crashing I have been able to repeat the crash a number of times but not with a specific recipe.

Log of last crash test

3.4-rc7
Sats15 hoop 0.7

Ground test no props open area
Switch to Loiter and arm - GPS horiz error >5
repeat 5 times
Switch to Auto - FS:Terrain data missing
Try to arm - mode not armable
Switched back to Stab - EKF primary changed:0
Arm in Stab - switch to Loiter - disarm
Arm in Loiter - no message on GCS other than still in Stab - LEDs still green

Repeat the above again and left copter to auto disarm in Loiter with throttle at Zero

Crash.

I also get occasional ‘bad AHRS’ in the HUD if I:
Arm in Stab - switch to Loiter - disarm in Loiter - arm again in Loiter (really in Stab) - switch to Auto - rejected - back to Loiter - disarm - HUD displays ‘bad AHRS’ message.

I wonder if it might be a power problem instead of a software issue.

You’re powering the pixhawk through a power module or is it just powered through the real servo rail? If it’s just the rear servo rail then that could be the problem because I hear from Craig Elder that if the rear servo rail exceeds 5.7V then the flight controller will reboot. In the POWR message it’s reaching 5.9V.

I’ve heard one other report somewhere on this forum of a person trying to power a lot of peripherals (including an RTK GPS) all through the Pixhawk and that person was experiencing brownouts. That’s anecdotal but still, separating the powering of the other devices from the Pixhawk might make the problem go away.

Thank you so much @rmackay9

I am powering the PH from both the Power Module and the servo rail with the Zener diode attached on the servo rail, as the instruction highly recommend, to limit voltage.

Due to this I had always checked the Vcc to make sure all was well and see very consistent Vcc values.


I had not, however, looked as closely at the VServo voltage.
I do believe the voltage maximum in the Wiki for the servo rail is even in RED, hence the addition of the Zener.
I obviously put too much faith in a simple zener to regulate the peak voltage.

I will remove this supply on the servo rail, put there only for redundancy, and retest everything.

I live in hope once again, thanks Randy.

Looks like my hopes were dashed.

Removed the VServo supply and had multiple crashes (5), in short order.
Thought I might have a recipe, as their was no Rx, being power from the servo rail, and using GCS commands to test modes and arming, but it was rather random again.

These are some logs

and a pic of the VServo voltage

So removing the VServo voltage seemed to make it worse???
Next step is to set the Septentrio GPS back to NMEA and see if that makes a difference.
If not I will try replacing the PixHawk.
After that I have no idea.

Can you recreate any crashes if you flash plane on it? (Disconnect props/motors obviously if you are flashing plane firmware) I’ve never encountered any FC crash with plane code, which leads me to believe that it is either a hardware difference somewhere on your setup, or a firmware difference. If you want to replicate the setup I’ve had success with you can flash Plane 3.6.0 and that has had many hours of flight testing from me, without any problems with SBF format.

The next question is if you don’t encounter a crash with plane code, what happens if you raise the SCHED_LOOP_RATE (http://ardupilot.org/plane/docs/parameters.html#sched-loop-rate-scheduling-main-loop-rate) to 400, can you encounter any crashes then?