SITL error on MacOS Monterey 12.1 [almost SOLVED]

Unfortunately, there are very few devs who use MacOS. (I’m not one of them :frowning: I use WSL2 on windows.)

The floating_point exception is interesting and usually a stack trace will print. Then we could track down what happened. The dumpstack.sh_nknown.5365.out file also gets printed with the same information. Unfortunately, it looks like that failed?

I’m not sure how comfortable you are with debugging. But if you can attach a debugger and reboot in mavproxy you may be able to find the offending line of code.

The end EOF on TCP socket bit would likely be caused by the floating_point_exception.

Not much with gdb. I’ve seen two problems:

With both solved, this is the information provided:
% sim_vehicle.py -v ArduCopter -f quad

ERROR: Floating point exception - aborting
Running: sh dumpstack.sh 1693 >dumpstack.sh_nknown.1693.out 2>&1
dumpstack.sh has been run. Output was:
-------------- begin dumpstack.sh output ----------------
[New Thread 0x1b03 of process 1693]
[New Thread 0x2303 of process 1693]
[New Thread 0x2403 of process 1693]
Error calling thread_get_state for GP registers for thread 0x1b03

*warning: Mach error at “…/…/gdb/i386-darwin-nat.c:132” in function "virtual void i386_darwin_nat_target::fetch_registers(struct regcache , int)": (os/kern) invalid argument (0x4)

warning: unhandled dyld version (17)
0x00007ff8073fd3da in ?? ()
#0 0x00007ff8073fd3da in ?? ()
No symbol table info available.
1 0x00007ff8073128a9 in ?? ()
No symbol table info available.
#2 0x0000000000000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 0x2403 of process 1693):
#0 0x00007ff807403f12 in ?? ()
No symbol table info available.
1 0x00007ff80737da8e in ?? ()
No symbol table info available.
#2 0x0000000000000001 in ?? ()
No symbol table info available.
#3 0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x2303 of process 1693):
#0 0x00007ff8073fd3da in ?? ()
No symbol table info available.
1 0x00007ff8073128a9 in ?? ()
No symbol table info available.
#2 0xe5d1eb68c6a3003e in ?? ()
No symbol table info available.
#3 0x00007000089f7c40 in ?? ()
No symbol table info available.
#4 0x000000000000000b in ?? ()
No symbol table info available.
#5 0x000000010ca1e5f0 in ?? ()
No symbol table info available.
#6 0x000000010ca1d490 in ?? ()
No symbol table info available.
#7 0x0000000002b65489 in ?? ()
No symbol table info available.
#8 0x00007000089f7bf0 in ?? ()
No symbol table info available.
#9 0x00007ff8073127df in ?? ()
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 0x1b03 of process 1693):
#0 0x00007ff8073fd3da in ?? ()
No symbol table info available.
1 0x00007ff8073128a9 in ?? ()
No symbol table info available.
#2 0x0000000000000000 in ?? ()
No symbol table info available.
A debugging session is active.

  • Inferior 1 [process 1693] will be detached.*

Quit anyway? (y or n) [answered Y; input not from terminal]
-------------- end dumpstack.sh output ----------------
Running: sh dumpcore.sh 1693 >dumpcore.sh_nknown.1693.out 2>&1
dumpcore.sh has been run. Output was:
-------------- begin dumpcore.sh output ----------------
[New Thread 0x1a03 of process 1693]
[New Thread 0x1b03 of process 1693]
[New Thread 0x2303 of process 1693]
Error calling thread_get_state for GP registers for thread 0x1a03

*warning: Mach error at “…/…/gdb/i386-darwin-nat.c:132” in function "virtual void i386_darwin_nat_target::fetch_registers(struct regcache , int)": (os/kern) invalid argument (0x4)

warning: unhandled dyld version (17)
0x00007ff8073fd3da in ?? ()
/tmp/gdb.1720:2: Error in sourced command file:
Can’t create a corefile
-------------- end dumpcore.sh output ----------------
zsh: abort -S --model + --speedup 1 --slave 0 --defaults -I0

Please, illuminate me, perhaps with some print’s before the possible calls to dumpstack.sh.

Maybe some where around here would be a good spot for a debug point, ardupilot/libraries/AP_HAL_SITL/SITL_cmdline.cpp at d62e946d484b68c12ba96a0ca55d2bbb64e923a3 · ArduPilot/ardupilot · GitHub

You should be able to attach the debugger after the floating point exception has occurred.

You also need to be sure to build with debug symbols using the -D command
./sim_vehicle.py -v ArduCopter -f quad -D

I know this isn’t likely to be the popular answer, but it might save you some frustration.

Maybe you could run a Linux build environment in a VM, host SITL in that VM, and use network passthrough to connect your MacOS GCS software.

1 Like

This was a good workaround. I did the following:

  • % sim_vehicle.py -v ArduCopter -f quad -D: 869 compiled. Surpringly: no FPE.
  • % sim_vehicle.py -v ArduCopter -f quad: 869 compiled. FPE.
  • % sim_vehicle.py -v ArduCopter -f quad -D: 869 compiled. No FPE.
  • % sim_vehicle.py -v ArduCopter -f quad -D: no FPE.
  • % sim_vehicle.py -v ArduCopter -f quad -D -l 33.74111111,-118.374600,44,270 -t 33.74111111,-118.374500,44,270 --use-dir=BigW --console -m “–cmd="module load misseditor;module load graph;graph RANGEFINDER.distance;param set DISARM_DELAY 0;calpress;batreset"” (more complex): no FPE. I could connect MP on another Windows PC and run an endless mission without problems.

So for locating the failure dumpcore.sh could be modified (with echo’s or whatever) so as to provide a clue of where it happens, without symbols.

A scientific explanation of why it works compiling with debugging symbols?
(This things happen).

After this, I have been using SITL on Ubuntu on another PC, but using -D may be a good workaround for SITL on MacOS’s by now.

Hmm… debugging symbols fixes the issue… Shouldn’t be the case… but compilers are a magic mystery box to me :slight_smile:

One thought I had is that you are using clang to compile sitl?

If you want to create an issue on github for it feel free?

And I assume in all of your examples they run fine as well without issue?

I’m not familiar with clang. While build/sitl/bin/arducopter is being built, in another window:
% ps ax|grep clang
2297 s003 S+ 0:00.03 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ …
2300 s003 R+ 0:01.61 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang …
So yes, from Apple’s Xcode, which was upgraded when Mojave->Monterey.

What happens is too vague now.
But trying to understand better what happens, not running sym_vehicle, but instead running in one window build/sitl/bin/arducopter (having compiled without symbols (no -D)):
% build/sitl/bin/arducopter -S --model quad
Suggested EK3_BCOEF_ = 16.288, EK3_MCOEF = 0.209*
Starting sketch ‘ArduCopter’
Starting SITL input
Using Irlock at port : 9005
bind port 5760 for 0
Serial port 0 on TCP port 5760
Waiting for connection …
It waits forever.

Running mavproxy in another window:
% mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501
Connect tcp:127.0.0.1:5760 source_system=255
Log Directory:
Telemetry log: mav.tlog
Waiting for heartbeat from tcp:127.0.0.1:5760
MAV> online system 1
Mode(0)> Mode Mode(0)
APM: Barometer 1 calibration complete
APM: Barometer 2 calibration complete
Init Gyro**APM: ArduCopter V4.2.0-dev (97fee2d1)
APM: Mac-mini.local
APM: Frame: UNSUPPORTED
APM: ArduPilot Ready
APM: AHRS: DCM active
Received 1254 parameters (ftp)
Saved 1254 parameters to mav.parm
fence present
APM: EKF3 IMU0 buffs IMU=19 OBS=7 OF=17 EN:17 dt=0.0120
APM: EKF3 IMU1 buffs IMU=19 OBS=7 OF=17 EN:17 dt=0.0120
APM: EKF3 IMU0 initialised
APM: EKF3 IMU1 initialised
APM: AHRS: EKF3 active
APM: EKF3 IMU0 tilt alignment complete
APM: EKF3 IMU1 tilt alignment complete
APM: EKF3 IMU0 MAG0 initial yaw alignment complete
APM: EKF3 IMU1 MAG0 initial yaw alignment complete
APM: GPS 1: detected as u-blox at 230400 baud
APM: PreArm: Check firmware or FRAME_CLASS
APM: PreArm: 3D Accel calibration needed
APM: EKF3 IMU1 origin set
APM: EKF3 IMU0 origin set
APM: EKF3 IMU0 is using GPS
APM: EKF3 IMU1 is using GPS

On the build/sitl/bin/arducopter window:
Connection on serial port 5760
bind port 5762 for 2
Serial port 2 on TCP port 5762
bind port 5763 for 3
Serial port 3 on TCP port 5763
Home: -35.363262 149.165237 alt=584.000000m hdg=353.000000
Smoothing reset at 0.001
validate_structures:469: Validating structures

30" later:
On the mavproxy window:
no link
link 1 down
no link
no link
no link

On the build/sitl/bin/arducopter window:
ERROR: Floating point exception - aborting
Running: sh Tools/scripts/dumpstack.sh 3979 >dumpstack.sh_nknown.3979.out 2>&1

It seems so now with -D. I have to do some SITL tests, and I’ll do them with -D, as well as doing git with new versions. But I keep older versions and the problem is the same after Mojave->Monterey (I haven’t tested -D on them).

Anyhow, I found that mavproxy could be upgraded. After doing so (1.8.34->1.8.46), as well as other minor checks and upgrades, nothing changed.

BTW, about the two windows check without -D (one executing build/sitl/bin/arducopter and the other executing mavproxy.py …) if you don’t execute mavproxy.py (or just don’t open this second window) but instead go to another PC on the network and connect through TCP for example in MP ( mavproxy.py is a GCS as MP), after 30" the FPE is produced, and MP stops to communicate, so the problem is not in mavproxy.py.

The floating point exception error is happening in the C++ code. Sorry I wasn’t clear on that.

But the issue is reproduction and finding the offending line in the CPP

This documentation describes how to debug using VSCode. Debugging with GDB using VSCode — Dev documentation

Of course you will need to modify installation steps (which I think you have done already) for MacOS. And be cognizant that some directions may need to be modified due to your OS (that section was wrote for Linux / WSL)

Nice. I’ll try it but requires time for fully understanding.

While that, with the two windows check without symbols (no -D):
% gdb build/sitl/bin/arducopter
GNU gdb (GDB) 11.1
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-apple-darwin21.1.0”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from build/sitl/bin/arducopter…
(No debugging symbols found in build/sitl/bin/arducopter)
(gdb) set args -S --model quad
(gdb) r
Starting program: /Users/xxxxxx/Documents_outofthecloud/Desarrollo/Pythonadas/ardupilot/build/sitl/bin/arducopter -S --model quad
[New Thread 0x1803 of process 2288]
[New Thread 0x1b03 of process 2288]
warning: unhandled dyld version (17)
Suggested EK3_BCOEF_ = 16.288, EK3_MCOEF = 0.209*
Starting sketch ‘ArduCopter’
Starting SITL input
Using Irlock at port : 9005
bind port 5760 for 0
Serial port 0 on TCP port 5760
Waiting for connection …
Connection on serial port 5760
bind port 5762 for 2
Serial port 2 on TCP port 5762
bind port 5763 for 3
Serial port 3 on TCP port 5763
Home: -35.363262 149.165237 alt=584.000000m hdg=353.000000
Smoothing reset at 0.001
validate_structures:469: Validating structures
[New Thread 0x1907 of process 2288]
[New Thread 0x2203 of process 2288]
Thread 2 received signal SIGFPE, Arithmetic exception.
0x000000010004c794 in AP_Declination::get_mag_field_ef(float, float, float&, float&, float&) ()
(gdb)

Recall all this started to happen after upgrading Mojave to Monterey (directly): difficult to relate with magnetic fields.

There are a few divisions there involving SAMPLING_RES. How could I insert there code (similar to printf) so that it appears when running build/sitl/bin/arducopter?

If you can get the full output from that stack trace. It should give a line number for the error and the local values for the function.

That gets us one step closer.

BTW is this on master ? What is the output of git status?

printf’s are admitted and produce output.
So on date 20220103 on master (870 modules compiled, one more) I modified AP_Declination.cpp as follows:

uint32_t tracegg = 0;
bool AP_Declination::get_mag_field_ef(float latitude_deg, float longitude_deg, float &intensity_gauss, float &declination_deg, float &inclination_deg)
{
bool valid_input_data = true;
printf(“%d %f %f %f %f %f %d\n”, (int)SAMPLING_RES, latitude_deg, longitude_deg, intensity_gauss, declination_deg, inclination_deg, tracegg++);
/ round down to nearest sampling resolution /
int32_t min_lat = static_cast<int32_t>(static_cast<int32_t>(floorf(latitude_deg / SAMPLING_RES)) * SAMPLING_RES);

No FPE (in fact, with any printf inserted no FPE appears), and even a mission can be started:


But I don’t know if intensity_gaus 0.00000 may have some effect on the FPE (apparently, no division).

Hey @Webillo I saw similar error when I started contributing to ardupilot. When I went on to find the cause of it, it ended up to be a compiler issue. The gcc/g++ on mac is actually clang in disguise. I installed gcc/g++ using homebrew and fixed some aliases in usr/local/bin to get it working properly. I haven’t seen it since then.

Thanks a lot; this makes sense. Meanwhile, I have been compiling with -D, and simulations have worked (FPE without -D).

With Xcode many years being used, this is very surprising. Do Apple people test and debug?

I had gcc/g++ from Homebrew installed, but trying to use them I get compiling errors; I’ll try later.

Now about what appears above on the build process:

Checking for ‘g++’ (C++ compiler) : not found
Checking for ‘clang++’ (C++ compiler) : /usr/bin/clang++
Checking for ‘gcc’ (C compiler) : not found
Checking for ‘clang’ (C compiler) : /usr/bin/clang

it seems clear that it compiles with Apple clang, that for some reason without -D produces the FPE.

But both Apple gcc and g++ exist, and for some reason they are not found:
% file ‘which gcc’
/usr/bin/gcc: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/gcc (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/gcc (for architecture arm64e): Mach-O 64-bit executable arm64e
% file ‘which g++’
/usr/bin/g++: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/g++ (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/g++ (for architecture arm64e): Mach-O 64-bit executable arm64e

On build/config.log appears:
from /Users/xxxxxx/Documents_outofthecloud/Desarrollo/Pythonadas/ardupilot: Could not find gcc/g++ (only Clang), if renamed try eg: CC=gcc48 CXX=g++48 waf configure
not found

That has not much sense in my configuration, but trying instead from ardupilot directory:
CC=gcc CXX=g++ ./waf configure
gives:

Checking for ‘g++’ (C++ compiler) : not found
Checking for ‘clang++’ (C++ compiler) : g++
Checking for ‘gcc’ (C compiler) : not found
Checking for ‘clang’ (C compiler) : gcc

so this may suggest a workaround so that it compiles with Apple gcc/g++ instead of Apple clang/clang++.

How can I somehow incorporate CC=gcc CXX=g++ definitions for the build?

I am not sure about what errors you are talking about (please share screenshots to be more clear). I have been compiling with homebrew gcc/g++ since I begun building ardupilot binaries and it never gave unexpected errors. Also, AFAIK there is nothing like Apple gcc. The gcc and g++ shipped with our macbooks are actually just a launcher for clang. Its actually clang what works under the hood.

I mean modifying path so that compilation proceeds with gcc/g++ from Homebrew, I get compilation errors that I have to look at (too many errors).

I mean those on a normal Apple installation, in my case Mac Mini 2014 (Intel). Anyhow, I tried commenting two lines on modules/waf/waflib/Tool/c_config.py, which produced:

Autoconfiguration : enabled
Setting board to : sitl
Using toolchain : native
Checking for ‘g++’ (C++ compiler) : /usr/bin/g++
Checking for ‘gcc’ (C compiler) : /usr/bin/gcc
Checking for c flags ‘-MMD’ : yes
Checking for cxx flags ‘-MMD’ : yes
CXX Compiler : g++ 13.0.0

but (without -D) a build with a lot of warnings but successful build, and finally the same FPE, so no change.

I can tell you how I made it compile using homebrew gcc.
There are aliases pointing to preinstalled g++ and gcc in usr/local/bin. I replaced these aliases with those pointing to /usr/local/Cellar/gcc/11.2.0/bin/g++-11 and /usr/local/Cellar/gcc/11.2.0/bin/gcc-11 for g++ and gcc respectively and it started compiling with these since then. You can try it and see if it works. It simple and straight forward. No changing paths and altering files.

Thanks. In fact, what happened is that I had Homebrew gcc/g++ installed while on Mojave, and later upgraded directly to Monterey. gcc/g++ appeared installed, but a lot of headers were missing when compiling. So I did
% brew reinstall gcc,
and with a couple of symbolic links on /usr/local/bin:

Setting board to : sitl
Using toolchain : native
Checking for ‘g++’ (C++ compiler) : /usr/local/bin/g++
Checking for ‘gcc’ (C compiler) : /usr/local/bin/gcc
Checking for c flags ‘-MMD’ : yes
Checking for cxx flags ‘-MMD’ : yes
CXX Compiler : g++ 11.2.0

There are much fewer warnings on the build now, and it seems to work very well. Thanks a lot.

So for MacOS Monterey (probably also Catalina and Big Sur):

  • If with Apple provided software (XCode), use -D by now as a workaround.
  • Better install/reinstall Homebrew gcc/g++, as you say.