Memory error after GCS failsafe

Hello all,

When I tested ArduSub, I was faced with a memory error (?).

You can reproduce the error for the following steps:

  1. add a location in location.txt

  2. execute SITL
    /Tools/autotest/ -v ArduSub --console -L newyork -D --gdb

  3. set the following parameters
    param set FENCE_ENABLE 1
    param set FENCE_TYPE 7
    param set FENCE_RADIUS 10000
    param set FS_GCS_ENABLE 2

  4. arming

  5. fence breach

  6. mode circle

  7. set heartbeat 0 (to trigger GCS failsafe)

  8. set heartbeat 1

  9. arming

Test version: master

Edit: there is a more simple way to trigger this issue.

  1. run a mission in Auto mode
  2. disarm
  3. arm

I spent a little time looking into this and I think I found the issue and a fix ?? PR here When Mode:auto_loiter becomes active the controllers are not updating their last called time. Then upon starting the mission because the controllers have not been recently initialized we get a nice SITL PANIC about be uninitialized.

BTW if you leave off the --gdb the stack trace will be printed and captured showing the line number of Panic. /Tools/autotest/ -v ArduSub --console -L newyork -D You can then share the offending lines in the screenshot above :slight_smile: You can attach to GDB after a panic as well.

1 Like

Hello @hendjosh,
Thanks for your work!

I have tested the PR, it seems that the PR resolves a part of this issue.
Unfortunately, I can still trigger this issue for the following steps:

  1. Circle mode
  2. disarm
  3. arm

Thanks for the testing!

I had a feeling I should check all the other modes too… but I was a bit lazy after looking into guided mode and guided seemingly being ok.

This really calls for adding a bunch of autotests to sub that are missing currently for some of its modes.

If you feel like being extra awesome and check any other modes let me know. :slight_smile:

Hello @hendjosh,

Absolutely, I like to check other modes!
I can only reproduce this issue during the circle mode after I repeatedly test the PR on SITL.

I found another memory error, but I guess that this is not related to the controllers initialization.
When a waypoint leverages a “Terrain” reference frame, it also leads to a memory error…
I know that using the terrain reference frame is weird in ArduSub. Just in case, I would like to let you know this :slight_smile:

Note my PR only touched auto mode so it wont help in other spots.

For the terrain, send along the stack trace and pertinent replications details and I will see if I’m able to figure it out.

Otherwise feel free to submit your own PRs if you figure it out.

The steps to trigger the issue:

  1. ./Tools/autotest/ -v ArduSub --console -L newyork -D -w --speedup=4

  2. upload and run this mission

  3. You can probably see the error as shown below.

This is the stack trace.

Check the PR above again for the fix for circle_mode.

I could not replicate the Guided_mode error in pos_control_run() given your steps but didn’t spend enough time on it.

In the PR I also included a change for guided_mode where your internal error comes from for something that didn’t look right. But I don’t think it is quite right since I can’t replicate the error. There might be something else going here

Hello @hendjosh,

Thanks for the commit :slight_smile:
It seems that the PR has fixed the controller initialization issue for the circle mode.
I notice that the issue for guided_mode can be reproduced only if you use QGC daily version (048081c44).

Huh that is weird about the QGC thing. I don’t think that has anything to do with the issue, but of course I’m wrong a lot.

I managed to trigger an error in control_guided using mavproxy

  1. start sitl sub as usual
  2. mode guided
  3. arm throttle
  4. Get an internal error
    I still haven’t figure out why this one occurs or how to fix it. It is not quite like the others.

Are you using master + my small PR only? I noticed before that the mission you gives the following outputs.
BAD NAV AltFrame because WP2 is not in REL_ALT (the only accepted frame it seems?)

Also any chance you see a bunch of EKF lane switches in SITL? Totally separate issue. But I rarely use sub sitl so there may be settings I need.

It seems likely to me that the control_guided issue occurs only if SITL increases simulating speed through the –speedup option.
The reason is that I could not reproduce the issue when I used --speedup=1.

Are you using master + my small PR only?

Yes, I tested the master + your PR.

In specialty, I saw the bunch of EKF lane switch 0/1 after the vehicle finished the mission.

Ahh you are right about the speedup.

I created an issue for this here as I couldn’t find the answer after spending some time for this one. Issue

1 Like

I have tested Master and ArduSub 4.1 and it seems that the control_guided issue is resolved.
Thanks for your effort!

While testing I found another issue :frowning:
You can probably reproduce the issue for the following steps:

  1. Add a location in location.txt

  2. Execute SITL
    /Tools/autotest/ -v ArduSub --console -L newyork -D --gdb --speedup=4

  3. set the following parameters
    param set FENCE_ENABLE 1
    param set FENCE_TYPE 7
    param set FENCE_RADIUS 10000
    param set FS_GCS_ENABLE 2

  4. Upload and run this mission

This issue only happens if SITL increases simulating speed through the –speedup option as below.
Master: --speedup=6
ArduSub 4.1: --speedup=4

It seems like it is the same issue linked above…

1 Like

OK, I was able to reproduce here - great instructions, thanks.

The only thing I had to do differently to the above instructions was to “mode guided” before “mode auto”. Also turning off the GCS failsafe caused less pain…

@Leonardthall this is another case of the sanity check that we’ve been calling the position controller regularly is failing - so we’ve started to call the position controller again after a prolonged period of time but nobody called init.

1 Like

Yeah my reproduction is mode guided then arm and bam you hit the error.

However, my trouble with this one is that I think we are keeping the xy_controller initialized via the call wp_nav.wp_and_spline_init(); If you follow it down eventually calls AC_PosControl::init_xy_controller() which sets _last_update_xy_us

My current theory…
I think there is an extra delay when we arm which means it takes too long between the last init_xy() and calling is_active_xy()