PreArm: Internal Errors

I have logs from yesterday that were collected with LOG_DISARMED enabled that I’ve uploaded to Google Drive. The one I’ve linked here includes the following from LogAnalyzer:
Found NaN in POS.RelOriginAlt
Found NaN in STER.LatAcc
Found NaN in THR.Speed
There are a variety of other logs I can link to if that would help.
I will grab some logs later today with REPLAY and BUFSIZE included.

I worked with the rover for 2+ hours trying to get it online for some tuning to no avail. Every boot included the internal error despite reloading Rover 4.0 and then Rover 4.1 on the off chance that something was corrupted in the firmware, but no dice. With no information on the source of the fault I don’t know if this is a hardware or software issue, but I’m dead in the water until a solution is found.

As an aside, I did get a build environment set up in WSL, forked the source, and successfully built Rover from source, so I can test code changes if that becomes necessary.

I was helping someone earlier today who started getting the same problem.

Updating to the most recent master apparently helped them - but now it
feels like it might just be hiding from him.

Unfortunately the log you uploaded didn’t contain any MON messages for
some reason; if you could check the other logs you’ve got for those that
would be handy.

What specifically would I be looking for with respect to MON messages? A search through the 10 text .log files from yesterday for MON only matched the BATTX_MONITOR messages.
https://drive.google.com/drive/folders/1asc28sVLjJVOgbN2mkDolh_W868gdMmY?usp=sharing

As Peter, said I had a similar problem with my rover yesterday, I was running master from around Dec 11, I rebased to the latest master and the issue went away, not sure what the problem was though.

Here. are a couple from today with LOG_REPLAY, LOG_DISARMED and LOG_FILE_BUFSIZE set.

These have MSG hits in them:

MSG, 4388515, ArduRover V4.1.0-dev (8b8029fd)
MSG, 4388551, ChibiOS: 331fe75d
MSG, 4388677, fmuv3 003C0019 30375105 36383839
MSG, 4388718, Param space used: 1375/3840
MSG, 4388752, RC Protocol: SBUS
MSG, 4388776, New mission
ATT, 4401359, 0, 0.73, 0, 2.61, 0, 0.09, 0.8, 1
PIDS, 4401382, 0, 0, 0, 0, 0, 0, 0, 0
PIDA, 4401403, 0, 0, 0, 0, 0, 0, 0, 0
IMU, 4401423, 0, 0.0001543595, -0.0001342529, -0.0002600986, -0.5872825, 0.7100661, -9.669054, 0, 0, 52.96917, 1, 1, 7996, 999
IMU, 4401423, 1, -0.003376344, -0.001270836, 0.001530997, -0.6771219, 0.7474126, -9.626653, 0, 0, 0, 1, 1, 756, 1000
BARO, 4401961, 0, 0.3673435, 48656.24, 46.7, 0.0656326, 4401, 0, 35, 1
THR, 4402027, 0, 0, 0, NaN, 0.7100661
NTUN, 4402043, 0, 0, 0, 9, 0
STER, 4402065, 0, 0, 0, NaN, 0, -0.09594557

@jmachuca: I will download today’s latest build from https://firmware.ardupilot.org/Rover/latest/fmuv3/ and load it up to see if it clears the errors. Using GPS yaw, so 4.1 is required, but I’ll fall back to 4.0 Stable for testing if necessary.

Let me know if you need anything further. Very interested in resolving this :slight_smile:

Flashed the latest 4.10 firmware, still had the Internal Error.
Flashed to 4.0 Stable and got this :
Restored watchdog attitude 2 -8 0
fmuv3 003C0019 30375105 36383839
ChibiOS: 0997003f
ArduRover V4.0.0 (0e52bafa)

With 4.0.0 I am seeing Internal Error 0x800 instead and MP is spamming EKF3 Waiting for GPS config messages.

Anyway, .bin files in the ‘Afternoon’ directory for the 4.1 update, then the 4.0 fallback.

@Highly please provide general access to that folder to avoid people having to request.

Could you detail the hardware involved here, please? @jmachuca77 was a Black with basically nothing on it. Sadly I’m currently away from all my hardware so I won’t be able to replicate here.

Need to get those WDOG and MON messages from the logs. I don’t think replay will be required for this, if its the same thing we were seeing with @jmachuca

Jaime’s logs were pointing to the GCS code - something pausing in there and taking way too long to do what it is supposed to. There’s a couple of ways to try to trace this, but the best is probably to start sticking assignments to a static uint8_t in all over the place and including that number in the output from the MON thread.

Also, @Notorious7- once is too often for us to be getting internal errors :slight_smile: They’re supposed to mark serious problems in the code. The main loop being stuck means quadcopters don’t stabilize (for example), which is bad.

Links updated. I guess a subdirectory of a shared folder is not by default shared. Sorry.
The latest bin files are here: https://drive.google.com/drive/folders/1NUz3nytNSFg19s6qtoEvWs5LlHfdknzx?usp=sharing

The hardware is currently just a Pixhawk 2.4.8 with power, an sbus reveiver, a telemetry radio, and two F9P GPS units. The rover has encoders (not currently in use) and the internal compass is disabled. Nothing external on I2C or CAN at the moment.

@peterbarker The MSG entries from all of today’s logs are attached.
messages.txt (17.0 KB)

I’ve been bench testing the GPS yaw setup for a week or so using the 4.1.0 firmware and never saw these messages. I moved the brain box out to the Rover, ran it for awhile getting some mechanicals sorted out, and then began trying to start PID tuning with ‘learn throttle’. It was shortly after that point that the messages started. The messages didn’t go away with a fallback to 4.0.

Is there a chance this could be a hardware issue? The only changes from a configuration where the messages didn’t appear are:

  1. I added a second F9P GPS unit attached to Serial 4
  2. I changed the serial ports from normal GPS to 17/18 to configure the F9Ps for yaw, serial ports 2 of the F9Ps are interconnected for RTCM3
  3. I moved my Static Base RTCM3 telemetry from direct-connected to F9P #1 serial port 2 to injected over Mavlink2.
  4. I disabled the internal compass and removed the external RM3100 from the I2C bus.

Only if I was running everything on 2MB flash boards, should I remove some features in AP_config.h which I am not using and try to merge this PR AP_Math: Log source line of constrain_float nan's by WickedShell · Pull Request #15662 · ArduPilot/ardupilot · GitHub for 1MB flash board builds

only thing is your serial port given telemetry data.
I have several time problem and only success on 1 thing reupload fw or clear reset , and it’s done

Happy flying :slightly_smiling_face:

1 Like

@Moksh I have reloaded firmware and flashed between versions a few times, but always Rover code. I might try flashing Plane then Rover to clear everything out. Thanks.

Had a look a the logs.

Sadly, the only log showing a WDOG message was a 4.0.x log - and that’s missing a critical entry which would allow me to track things.

… and no MON messages in any of the logs.

Does disabling GPS-yaw make the problem go away?

I note that in 0000007.BIN the GPS wasn’t redetected…

This is probably related:

Moving it to the Rover probably changed the power setup…

The only hardware change in any of this was to replace the telemetry radios with Holyboro branded 915Mhz units. The only things powered by the Pixhawk is the telemetry radio, R/C Receiver, and the two speed controllers on the PMW pins. Both F9Ps are powered from regulated 12v from a BEC with power coming from the system supply. The Pixhawk is powered from a regular power module from the system supply. The drivetrain is powered from a 24v 30A DC converter.

I presume the reason that the GPS wasn’t redetected in 007.bin might be that the configuration settings for GPS are type 17/18 in 4.1.0 for the yaw configuration - is Ublox Moving Base autoconfiguration configured to work on 4.0.0 using those settings for GPS type? I believe that 007.bin was after the reflash to 4.0.0?

Here is the general layout of the system right now for reference.

And the current machine layout. Electronics are floating in foam in the watertight box.

And the full configuration is something like this

2 Likes

Your Problem is still there ?

I just went out to the Rover and did the following:
USB connect to Pixhawk, Rover power OFF
Save Configuration to File
Load firmware -> Plane 4.0.7 Official
Reboot
Load Firmware -> Custom Firmware -> Rover 4.1.0 Latest
Reboot
USB Connect to Rover, sync params
Config -> Param tree -> Load from File (Enable changed, resync params)
Reboot
Power on Rover (GPS, drive system)
Disconnect USB, connect telemetry radio, connect to Rover via Telemetry
Power down, power up Rover.
Collect Bin files. After flashing Plane to the Pixhawk I saw NO NEW ‘Internal Error’ messages.

BIN files for this series of actions are located here

@peterbarker - I suspect the GPS was not detected in 007.bin because I had probably connected via USB and had the main power turned off to the Rover. That would prevent power from being applied to the GPS units and left EKF3 looking. The USB power would have been feeding the Pixhawk, R/X radio and telemetry radios which may have drug the rail power down.

It looks like I spoke too soon.

I disabled GPS yaw. Set Serial 4 to Protocol 0, set GPS 2 to Type 0 from 18.
Set GPS 1 to type 2 from 17. Rebooted.
Went to enable the internal compass and it was registered as MISSING. The LSM303D on the SPI bus is not recognized as present.

See also PreArm: Internal errors 0x8000 l:404 main_loop_stk after upgrading to latest 4.1.0-dev

@peterbarker Is there something I can do to help dig into this further? Any debug code to enable for a local build, etc?