Quad falls of the sky (a second one)

Well, I posted a few weeks ago about a quad (with a newer firmware) that fell out of the sky. Yesterday I had a second one. I am not sure (or even suspect) that there is a strong connection between the two events, except that this second one that fell yesterday is an older sister (1+ years older) of the one that fell a few weeks ago and they share a number of physical elements (orange cube, 4in1 ESC, motors, propellers, circuit board, voltage regulators, batteries, etc.). However, it appears that the failure modes are different (possibly). Also, perhaps relevant is that the first one fell out of the sky while being setup (it was brand new, second flight), while this second one has tens (if not hundreds) of flights under its belly in the last 1+ years.

What happened here is that the quad was happily executing a mission in guided mode, while it “dropped an arm” (showed us the belly) and then dropped like a rock from 50m high and about 300m away. I looked at the logs and I have some theories, but if you could have a look, let me know what you think.

Here is the log: log_80_2023-5-6-16-09-12.bin - Google Drive

QGroundControl produced no error - the only four messages during flight were:
16:04:51.627] Info: EKF3 IMU0 MAG0 in-flight yaw alignment complete
[16:04:51.689] Info: EKF3 IMU1 MAG0 in-flight yaw alignment complete
[16:10:09.243] Info: EKF3 IMU0 is using GPS
[16:10:09.244] Info: EKF3 IMU1 is using GPS

QGC screenshot:

Your help is highly appreciated! After flying a few these quads for 2+ years without an incident, it’s a bit disturbing to lose two in two weeks (give or take).

For reference, this is the post for the first quad that failed:

Strange to me that the altitude drop isnt even recorded:

But we do see the attitude controller going haywire just before the sudden loss of logs:

Was the copter still powered on when it was retrieved?

A catastrophic failure or loss of power to the Flight Controller. After repair update to latest Stable as there are fixes for Cube Orange since 4.1.5 (old stuff).
And I would not configure for Dshot1200 when you re-build. Use Dshot 600. And some tuning should be done. There are a lot of default parameters.

Weirdly enough, the copter seemed to be powered (at least to some degree) when it was retrieved: it was clear that the GPS was still connected and blinking, which suggest it was given instructions from the ardupilot. However, after taking the pictures of the crash, I immediately disconnected the batteries (one was pretty deformed in the event). After that I measured the batteries (per cell) and they all looked good. Also, the batteries were 2 x 7Ah 6S lipo in parallel, so it’s unlikely that one of them croaked.

I agree with you that at the analysis the copter just starts to rotate and then the log ceases. Visually, I can confirm that the rotation continue, I think it got to 180 degrees (or thereabout) by the time it hit the ground - I think it hit the ground roughly upside down. Of course, one arm (and a motor) first, and that took the brunt of the damage (broken arm, broken motor mount).

Thanks for looking,
Mihai

Dave,

thanks for the input. Let me address them one by one:

Regarding the firmware, you’re right, I should update it. However, sometimes, if it ain’t broke, don’t fix it: I used this for the past two years without (almost) any problems. The only serious reason I found to update it is that if I keep that drone upside down for two hours (which sometimes happens, as folks work on the payload), the number of errors overflows some buffer that was fixed in a newer firmware. The main reason I’m reluctant to update is because sometimes new updates break things. It happened to me when we updated a big bird (15kg AUW) from 3.x to 4.x and the motors started twitching uncontrollably! That taught me to update just to be on the bleeding edge. It seems that thankfully the problem was fixed fairly fast after that, but I had to revert to the previous update to fly that week.

Regarding Dshot1200 - I heard (from the other fallen bird thread) that it’s possibly a problem, and it makes sense to me that since the internal loop is likely not running that fast, than it’s overkill, and I’ll talk to my vehicle engineer to fix it. However! However, I fail to see how that could lead to this particular crash. I agree with you that it’s a “good to fix”, but I don’t see it as a “root cause”.

Finally, I hear you about the loss of power at the flight controller. It makes sense looking at the log. However, here is the situation redundancy-wise: We have two batteries in parallel (7Ah, 6S LiPo), that tested well after the crash (despite falling from 50m and one of them clearly being at the top of the line to make contact with the ground). These batteries then go to a pretty well designed PCB (no wires) that feeds two 5V power regulators (with insane capacity - I don’t recall what 5A or 8A or so), which then feed the cube. Looking at the cube feeding power, it seems to be super-civilized throughout the flight (to the last sample). I do have doubts that this is it.

Thank you for looking,
Mihai

if you have a catastrophic crash like this its important to rule out a watchdog. If you do have a watchdog it will show in the GCS after the reboot, but only the first time - so its easy to miss if you reboot more than once. If the copter still had power after it crashed the log should show this - or the log after the crash log if it rebooted by itself.

Nope - I didn’t see anything in the GCS. In fact, here is the GCS log: the first four lines are from the flight, then walk of shame, then 18 minutes later the reboot. Interestingly, although the autopilot was alive after the crash there are no messages in between (probably too far 300m or so, and too low (on the ground)).

[16:04:51.627] Info: EKF3 IMU0 MAG0 in-flight yaw alignment complete
[16:04:51.689] Info: EKF3 IMU1 MAG0 in-flight yaw alignment complete
[16:10:09.243] Info: EKF3 IMU0 is using GPS
[16:10:09.244] Info: EKF3 IMU1 is using GPS
[16:28:02.681] Info: Calibrating barometer
[16:28:04.310] Info: Barometer 1 calibration complete
[16:28:04.310] Info: Barometer 2 calibration complete
[16:28:05.093] Info: Initialising ArduPilot
[16:28:05.191] Info: ArduPilot Ready
[16:28:05.204] Info: AHRS: DCM active
[16:28:05.501] Info: RCOut: PWM:1-8 DS1200:9-12
[16:28:05.501] Info: GPS 1: specified as UAVCAN1-125
[16:28:06.599] Info: EKF3 IMU0 buffs IMU=17 OBS=7 OF=16 EN:16 dt=0.0120
[16:28:06.671] Info: EKF3 IMU1 buffs IMU=17 OBS=7 OF=16 EN:16 dt=0.0120
[16:28:07.718] Info: EKF3 IMU0 initialised
[16:28:07.718] Info: EKF3 IMU1 initialised
[16:28:07.760] Info: AHRS: EKF3 active
[16:28:09.753] Info: EKF3 IMU0 tilt alignment complete
[16:28:09.811] Info: EKF3 IMU1 tilt alignment complete
[16:28:09.811] Info: EKF3 IMU0 MAG0 initial yaw alignment complete
[16:28:09.856] Info: EKF3 IMU1 MAG0 initial yaw alignment complete
[16:28:18.349] Info: EKF3 IMU0 origin set
[16:28:18.349] Info: EKF3 IMU1 origin set
[16:28:20.248] Critical: PreArm: Fence enabled, need position estimate
[16:28:20.248] Critical: PreArm: Fence requires position
[16:28:46.051] Info: EKF3 IMU0 is using GPS
[16:28:46.231] Info: EKF3 IMU1 is using GPS
[16:28:52.183] Critical: PreArm: vehicle outside fence
[16:29:23.349] Critical: PreArm: vehicle outside fence
[16:29:36.152] Critical: GPS Glitch
[16:29:42.781] Critical: EKF variance

I’ll have to out myself as the “vehicle engineer” in question.

Definitely will dial it back to Dshot 600 or so. I didn’t really have time to dive into the intricacies of Dshot at the time and just selected the one T-motor referenced.

Mihai can attest that I also nag them regularly to let me update some of these drones but once the drones are “In the field” I need to make sure I’m available to debug any issues that may come from an update. Rest assured any drones that leave my hands after being worked on are always going to be on the latest or near latest stable firmware.

We have been trying to find someone that can take the reins as a " field engineer" so to speak for these vehicles. Difficult as a university since most of the work needs to be done by students and by nature students are temporary. I have to divide my time between designing / building these drones and another project I work on. Because of that I’m usually not in the field on regular flying days.

I got the carcass of said failed drone and did some testing…

Everything points to a very sudden loss of power or some sort of transient that caused the cube to reboot.

I had a suspicion there was a ESC failure partially inspired by a sudden increase in current consumption in one of the last measurement points in a small corresponding voltage sag. Unfortunately I put what remained of the drone back together on the workbench and at least with no propellers attached the ESC has no issues spinning the motors and there’s no visible damage on it or any other electronic components.

The ESC is a T-motor F55A pro II

And as he already mentioned the cube VCC looked fine up until the last moments.

It seems very likely something caused the cube to reboot in flight and it was probably already booting back up by the time it hit the ground or shortly after, they would have almost certainly missed the watchdog message anyways.

One suspicion I have is the payload has a computer on board connected via USB, I don’t have any way of verifying this that I can think of, but I’m suspicious that that may have contributed to a power issue. Let me explain:

The custom carrier board we have for the cube in this drone has two high quality 8A TDK buck converter modules. These feed into the power one and power two inputs on the cube power management unit. They also power many other things on the drone but are never being used anywhere close to their limits.

There are potentiometers on the board for adjusting the output voltage of these converter modules, in the haste of building a bunch of these drones last year the voltage got left at what turned out to be the default of about 4.95V…

Now under normal circumstances this is fine for the cube even though they recommend 5.3V, why is it that they recommend 5.3V? Because if you have USB connected their integrated power selection circuit will always choose the higher voltage, that’s why.

I know from experience a typical USB port will not power the cube and connected peripherals on this drone without some very strange behavior and often random rebooting.

So my theory is that at some point the power selection circuit used the USB power from the companion computer rather than either of the onboard 5 volt converters, and the USB port was not able to power it successfully and a reboot happened.

Now obviously the PMU should have gone back to using the other voltage rails when the USB power sagged and I can’t say why it may not have done that, but it’s not hard for me to imagine if it was working right on the edge that there was perhaps some back and forth and hysteresis in how quickly it was willing to switch back. Hopefully what I’m trying to say makes sense…

Hard to say without doing a lot of testing and or having a schematic for the PMU.

Here’s some pictures of the carrier board for reference, if you spot the missing capacitor don’t worry about that, I knocked it off when disassembling the drone.



Thank you for chiming in Mark! I have to say that I tend to buy your explanation on the USB from the companion computer powering the board for a while, especially since we had problems with the USB on the companion computer power cycling the very next flight (sister drone, sister companions computer)! The only thing that throws me off in this theory is the lack of evidence in the log, but that may be missed by falling in between samples. Also, it does not explain the 75A reading on the current at the end.
Mihai

I agree that is a serious and suspicious current spike at the end of that log.
The root cause looks like battery voltage disappeared. The current spike may just be the ESC trying to momentarily drive motors to commanded RPM as the voltage drops.

At boot-up the power flags are

  • 3 - Power brick 1 valid and Power brick 2 valid

then after almost 1 minute of flight, the power flags change to

  • 39 - MAV_POWER_STATUS_USB_CONNECTED (and MAV_POWER_STATUS_CHANGED bit)

and frequently changes backwards and forwards between USB connected and USB not connected

  • 35 - disconnected MAV_POWER_STATUS_USB_CONNECTED (and MAV_POWER_STATUS_CHANGED bit)

Can you just remove the power wire from the USB connection to the companion computer?
So adjusting the 5v regulator output like you suggested would be good too, and I’d probably even set
BRD_VBUS_MIN,4.8

I like the carrier board design - good work!
The only thing I worry about with those bare circuit boards is hard-mounting them to the frame. I cant see what you’ve done under the circuit board to mount it via the screw holes, but hard-mounts make it a structural part of the frame. I suspect this might cause cracked tracks or component failure over the long term. I would be inclined to mount the board using this antivibration mounts so the board is independent of the frame.
Something like these:
image

EDIT
I think you should set these:

BATT_FS_CRT_ACT,1
BATT_FS_LOW_ACT,2 // or 3

and you can set these for ESC-data driven harmonic notch filter

INS_HNTCH_ENABLE,1  // set this then refresh params to see the rest
INS_HNTCH_MODE,3
INS_HNTCH_REF,1
INS_HNTCH_FREQ,50
INS_HNTCH_BW,25
INS_LOG_BAT_MASK,1
INS_LOG_BAT_OPT,4

Wow! That’s some good analysis Shawn! Thanks for looking: I had no idea about the power flags (I’m sure there are other hidden gems in those dataflash logs!). This makes complete sense and it’s consistent with what we saw on its sister drone (with a similar hardware and companion computer), where the autopilot would actually reset the companion computer USB bus at times. The radical cure (remove the power wire from the USB to the companion computer) also makes a lot of sense!

Thanks a lot,
Mihai

Thanks Shawn,

I actually did consider isolating the PCB originally, but the early versions of this drone had fantastically low vibrations and I didn’t want to mess with that. Obviously from the logs they are “OK” at best now, so something has degraded or changed from drone to drone. I’m going to order some of those and see how it goes.

We also obviously need a better system in place for tracking parameters and especially changes to them, as the battery failsafe and H-notch should have been configured on these drones. Sometimes experimenters / pilots need to make param changes in the field for one reason or another and it’s not getting documented or changed back.

I am also closer to understanding what happened in this crash:

I was inspecting the drone and noticed what I thought at first glance were broken M3 screws in a motor mount were actually still completely intact (the mount is a weird 3D printed design, and the screws are not normally visible). After washing the mud out of the motor, I can see its mounting threads are still intact. Basically, the screws backed out and the motor tried to go do its own thing, only restrained by the 1 surviving motor wire.

We originally ruled this out (and assumed the damage to the motor was from the 50m fall) as a likely possibility because the drone didn’t make any attempt at saving itself according to observers on the ground (and obviously the logs don’t show any attempt).

What I suspect now is the initial catastrophic event was triggered by the motor coming loose and that coincided with or triggered a USB power / connection issue. Still not happy that VCC showed no issues, but I don’t know the details about how that value is measured / logged and if it could have caught the early moments of this type of failure.

A secondary compounding possibility, the shock to the frame from the motor coming lose made the cube disconnect from the carrier board momentarily. If you look closely, we only have 2/4 screws attached. That batch (in particular) of cubes we got had the worst quality screws I have ever encountered, and we were lucky if 2 survived being removed and reinstalled one time. We since started throwing the included screws away immediately and using some much better ones from McMaster-Carr.

At any rate I think we have some actionable improvements to make. Although the motor theory can’t apply to the other drone crash, as all its motors are still well attached.

I am also considering building my own version of the Cube’s “PMU” for better control / transparency on what is happening power wise, perhaps interfacing with a basic uController on the carrier board that is keeping a sperate log of what’s happening with the power and other onboard systems. There is a ton of features on the one they sell we don’t need / use, and honestly it’s kind of annoying to deal with from a manufacturing perspective.

Obviously that path also has the potential to introduce a bunch of new failure modes lol

Edit: I did actually manage to find a schematic of the PMU, the main power switching IC, the LT4417 should always prefer the power 1 over USB as long as the voltage is within spec (it does NOT select the highest voltage) so I’m not as confident in my original theory. Still without completely taking apart the exact PMU and measuring every resistor and capacitor I can’t know exactly what over and under voltage thresholds were configured.

I saw that in the photo and assumed it was just a dummy assembly stage and not ready for flight.

I’m only recommending those antivibration mounts as a way of disconnecting the PCB from being a structural part of the frame so it doesnt get flexed, rather than just lowering vibrations. So quite firm mounts would be OK, they dont have to be super-soft.

Yep that pic was the board pulled fresh out of the drone.

As far as the mounts go, were on the same page about that, I knew you were not implying it for the sake of vibrations. But of course changing anything typically effects those. I’ll do some playing around with it and mechanically protect the board and perhaps help with some vibe at the same time.

Also if all the copters are built the same, you could easily apply a standard set of parameters from file. A param file is just text, so strip out all the calibration and non-tuning and non-safety related params leaving just the ones you need.
It could be re-applied (or compared) in about a second before the copter goes out on a job.

You can even go further and set up your own defaults and even read-only params, but that’s getting more complex to implement. It really suits production.

That’s definitely the long term plan, just a matter of getting ourselves organized enough to actually do it!