Unfortunately, there are very few devs who use MacOS. (I’m not one of them I use WSL2 on windows.)
The floating_point exception is interesting and usually a stack trace will print. Then we could track down what happened. The dumpstack.sh_nknown.5365.out file also gets printed with the same information. Unfortunately, it looks like that failed?
I’m not sure how comfortable you are with debugging. But if you can attach a debugger and reboot in mavproxy you may be able to find the offending line of code.
The end EOF on TCP socket bit would likely be caused by the floating_point_exception.
With both solved, this is the information provided: % sim_vehicle.py -v ArduCopter -f quad … ERROR: Floating point exception - aborting Running: sh dumpstack.sh 1693 >dumpstack.sh_nknown.1693.out 2>&1 dumpstack.sh has been run. Output was: -------------- begin dumpstack.sh output ---------------- [New Thread 0x1b03 of process 1693] [New Thread 0x2303 of process 1693] [New Thread 0x2403 of process 1693] Error calling thread_get_state for GP registers for thread 0x1b03
*warning: Mach error at “…/…/gdb/i386-darwin-nat.c:132” in function "virtual void i386_darwin_nat_target::fetch_registers(struct regcache , int)": (os/kern) invalid argument (0x4)
warning: unhandled dyld version (17) 0x00007ff8073fd3da in ?? () #0 0x00007ff8073fd3da in ?? () No symbol table info available. 1 0x00007ff8073128a9 in ?? () No symbol table info available. #2 0x0000000000000000 in ?? () No symbol table info available.
Thread 3 (Thread 0x2403 of process 1693): #0 0x00007ff807403f12 in ?? () No symbol table info available. 1 0x00007ff80737da8e in ?? () No symbol table info available. #2 0x0000000000000001 in ?? () No symbol table info available. #3 0x0000000000000000 in ?? () No symbol table info available.
Thread 2 (Thread 0x2303 of process 1693): #0 0x00007ff8073fd3da in ?? () No symbol table info available. 1 0x00007ff8073128a9 in ?? () No symbol table info available. #2 0xe5d1eb68c6a3003e in ?? () No symbol table info available. #3 0x00007000089f7c40 in ?? () No symbol table info available. #4 0x000000000000000b in ?? () No symbol table info available. #5 0x000000010ca1e5f0 in ?? () No symbol table info available. #6 0x000000010ca1d490 in ?? () No symbol table info available. #7 0x0000000002b65489 in ?? () No symbol table info available. #8 0x00007000089f7bf0 in ?? () No symbol table info available. #9 0x00007ff8073127df in ?? () No symbol table info available. #10 0x0000000000000000 in ?? () No symbol table info available.
Thread 1 (Thread 0x1b03 of process 1693): #0 0x00007ff8073fd3da in ?? () No symbol table info available. 1 0x00007ff8073128a9 in ?? () No symbol table info available. #2 0x0000000000000000 in ?? () No symbol table info available. A debugging session is active.
Inferior 1 [process 1693] will be detached.*
Quit anyway? (y or n) [answered Y; input not from terminal] -------------- end dumpstack.sh output ---------------- Running: sh dumpcore.sh 1693 >dumpcore.sh_nknown.1693.out 2>&1 dumpcore.sh has been run. Output was: -------------- begin dumpcore.sh output ---------------- [New Thread 0x1a03 of process 1693] [New Thread 0x1b03 of process 1693] [New Thread 0x2303 of process 1693] Error calling thread_get_state for GP registers for thread 0x1a03
*warning: Mach error at “…/…/gdb/i386-darwin-nat.c:132” in function "virtual void i386_darwin_nat_target::fetch_registers(struct regcache , int)": (os/kern) invalid argument (0x4)
warning: unhandled dyld version (17) 0x00007ff8073fd3da in ?? () /tmp/gdb.1720:2: Error in sourced command file: Can’t create a corefile -------------- end dumpcore.sh output ---------------- zsh: abort -S --model + --speedup 1 --slave 0 --defaults -I0
Please, illuminate me, perhaps with some print’s before the possible calls to dumpstack.sh.
% sim_vehicle.py -v ArduCopter -f quad -D: no FPE.
% sim_vehicle.py -v ArduCopter -f quad -D -l 33.74111111,-118.374600,44,270 -t 33.74111111,-118.374500,44,270 --use-dir=BigW --console -m “–cmd="module load misseditor;module load graph;graph RANGEFINDER.distance;param set DISARM_DELAY 0;calpress;batreset"” (more complex): no FPE. I could connect MP on another Windows PC and run an endless mission without problems.
So for locating the failure dumpcore.sh could be modified (with echo’s or whatever) so as to provide a clue of where it happens, without symbols.
A scientific explanation of why it works compiling with debugging symbols?
(This things happen).
After this, I have been using SITL on Ubuntu on another PC, but using -D may be a good workaround for SITL on MacOS’s by now.
I’m not familiar with clang. While build/sitl/bin/arducopter is being built, in another window: % ps ax|grep clang 2297 s003 S+ 0:00.03 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ … 2300 s003 R+ 0:01.61 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang …
So yes, from Apple’s Xcode, which was upgraded when Mojave->Monterey.
What happens is too vague now.
But trying to understand better what happens, not running sym_vehicle, but instead running in one window build/sitl/bin/arducopter (having compiled without symbols (no -D)): % build/sitl/bin/arducopter -S --model quad Suggested EK3_BCOEF_ = 16.288, EK3_MCOEF = 0.209* Starting sketch ‘ArduCopter’ Starting SITL input Using Irlock at port : 9005 bind port 5760 for 0 Serial port 0 on TCP port 5760 Waiting for connection …
It waits forever.
Running mavproxy in another window: % mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 Connect tcp:127.0.0.1:5760 source_system=255 Log Directory: Telemetry log: mav.tlog Waiting for heartbeat from tcp:127.0.0.1:5760 MAV> online system 1 Mode(0)> Mode Mode(0) APM: Barometer 1 calibration complete APM: Barometer 2 calibration complete Init Gyro**APM: ArduCopter V4.2.0-dev (97fee2d1) APM: Mac-mini.local APM: Frame: UNSUPPORTED APM: ArduPilot Ready APM: AHRS: DCM active Received 1254 parameters (ftp) Saved 1254 parameters to mav.parm fence present APM: EKF3 IMU0 buffs IMU=19 OBS=7 OF=17 EN:17 dt=0.0120 APM: EKF3 IMU1 buffs IMU=19 OBS=7 OF=17 EN:17 dt=0.0120 APM: EKF3 IMU0 initialised APM: EKF3 IMU1 initialised APM: AHRS: EKF3 active APM: EKF3 IMU0 tilt alignment complete APM: EKF3 IMU1 tilt alignment complete APM: EKF3 IMU0 MAG0 initial yaw alignment complete APM: EKF3 IMU1 MAG0 initial yaw alignment complete APM: GPS 1: detected as u-blox at 230400 baud APM: PreArm: Check firmware or FRAME_CLASS APM: PreArm: 3D Accel calibration needed APM: EKF3 IMU1 origin set APM: EKF3 IMU0 origin set APM: EKF3 IMU0 is using GPS APM: EKF3 IMU1 is using GPS
On the build/sitl/bin/arducopter window: Connection on serial port 5760 bind port 5762 for 2 Serial port 2 on TCP port 5762 bind port 5763 for 3 Serial port 3 on TCP port 5763 Home: -35.363262 149.165237 alt=584.000000m hdg=353.000000 Smoothing reset at 0.001 validate_structures:469: Validating structures
30" later:
On the mavproxy window: no link link 1 down no link no link no link
On the build/sitl/bin/arducopter window: ERROR: Floating point exception - aborting Running: sh Tools/scripts/dumpstack.sh 3979 >dumpstack.sh_nknown.3979.out 2>&1
It seems so now with -D. I have to do some SITL tests, and I’ll do them with -D, as well as doing git with new versions. But I keep older versions and the problem is the same after Mojave->Monterey (I haven’t tested -D on them).
BTW, about the two windows check without -D (one executing build/sitl/bin/arducopter and the other executing mavproxy.py …) if you don’t execute mavproxy.py (or just don’t open this second window) but instead go to another PC on the network and connect through TCP for example in MP ( mavproxy.py is a GCS as MP), after 30" the FPE is produced, and MP stops to communicate, so the problem is not in mavproxy.py.
Of course you will need to modify installation steps (which I think you have done already) for MacOS. And be cognizant that some directions may need to be modified due to your OS (that section was wrote for Linux / WSL)
Nice. I’ll try it but requires time for fully understanding.
While that, with the two windows check without symbols (no -D): % gdb build/sitl/bin/arducopter GNU gdb (GDB) 11.1 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type “show copying” and “show warranty” for details. This GDB was configured as “x86_64-apple-darwin21.1.0”. Type “show configuration” for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type “help”. Type “apropos word” to search for commands related to “word”… Reading symbols from build/sitl/bin/arducopter… (No debugging symbols found in build/sitl/bin/arducopter) (gdb) set args -S --model quad (gdb) r Starting program: /Users/xxxxxx/Documents_outofthecloud/Desarrollo/Pythonadas/ardupilot/build/sitl/bin/arducopter -S --model quad [New Thread 0x1803 of process 2288] [New Thread 0x1b03 of process 2288] warning: unhandled dyld version (17) Suggested EK3_BCOEF_ = 16.288, EK3_MCOEF = 0.209* Starting sketch ‘ArduCopter’ Starting SITL input Using Irlock at port : 9005 bind port 5760 for 0 Serial port 0 on TCP port 5760 Waiting for connection … Connection on serial port 5760 bind port 5762 for 2 Serial port 2 on TCP port 5762 bind port 5763 for 3 Serial port 3 on TCP port 5763 Home: -35.363262 149.165237 alt=584.000000m hdg=353.000000 Smoothing reset at 0.001 validate_structures:469: Validating structures [New Thread 0x1907 of process 2288] [New Thread 0x2203 of process 2288] Thread 2 received signal SIGFPE, Arithmetic exception. 0x000000010004c794 in AP_Declination::get_mag_field_ef(float, float, float&, float&, float&) () (gdb)
Recall all this started to happen after upgrading Mojave to Monterey (directly): difficult to relate with magnetic fields.
There are a few divisions there involving SAMPLING_RES. How could I insert there code (similar to printf) so that it appears when running build/sitl/bin/arducopter?
Hey @Webillo I saw similar error when I started contributing to ardupilot. When I went on to find the cause of it, it ended up to be a compiler issue. The gcc/g++ on mac is actually clang in disguise. I installed gcc/g++ using homebrew and fixed some aliases in usr/local/bin to get it working properly. I haven’t seen it since then.
I had gcc/g++ from Homebrew installed, but trying to use them I get compiling errors; I’ll try later.
Now about what appears above on the build process:
… Checking for ‘g++’ (C++ compiler) : not found Checking for ‘clang++’ (C++ compiler) : /usr/bin/clang++ Checking for ‘gcc’ (C compiler) : not found Checking for ‘clang’ (C compiler) : /usr/bin/clang
…
it seems clear that it compiles with Apple clang, that for some reason without -D produces the FPE.
But both Apple gcc and g++ exist, and for some reason they are not found: % file ‘which gcc’ /usr/bin/gcc: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executablex86_64] [arm64e:Mach-O 64-bit executable arm64e] /usr/bin/gcc (for architecture x86_64): Mach-O 64-bit executable x86_64 /usr/bin/gcc (for architecture arm64e): Mach-O 64-bit executable arm64e % file ‘which g++’ /usr/bin/g++: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e] /usr/bin/g++ (for architecture x86_64): Mach-O 64-bit executable x86_64 /usr/bin/g++ (for architecture arm64e): Mach-O 64-bit executable arm64e
On build/config.log appears: from /Users/xxxxxx/Documents_outofthecloud/Desarrollo/Pythonadas/ardupilot: Could not find gcc/g++ (only Clang), if renamed try eg: CC=gcc48 CXX=g++48 waf configure not found
That has not much sense in my configuration, but trying instead from ardupilot directory: CC=gcc CXX=g++ ./waf configure
gives:
… Checking for ‘g++’ (C++ compiler) : not found Checking for ‘clang++’ (C++ compiler) : g++ Checking for ‘gcc’ (C compiler) : not found Checking for ‘clang’ (C compiler) : gcc
…
so this may suggest a workaround so that it compiles with Apple gcc/g++ instead of Apple clang/clang++.
How can I somehow incorporate CC=gcc CXX=g++ definitions for the build?
I am not sure about what errors you are talking about (please share screenshots to be more clear). I have been compiling with homebrew gcc/g++ since I begun building ardupilot binaries and it never gave unexpected errors. Also, AFAIK there is nothing like Apple gcc. The gcc and g++ shipped with our macbooks are actually just a launcher for clang. Its actually clang what works under the hood.
I mean modifying path so that compilation proceeds with gcc/g++ from Homebrew, I get compilation errors that I have to look at (too many errors).
I mean those on a normal Apple installation, in my case Mac Mini 2014 (Intel). Anyhow, I tried commenting two lines on modules/waf/waflib/Tool/c_config.py, which produced:
… Autoconfiguration : enabled Setting board to : sitl Using toolchain : native Checking for ‘g++’ (C++ compiler) : /usr/bin/g++ Checking for ‘gcc’ (C compiler) : /usr/bin/gcc Checking for c flags ‘-MMD’ : yes Checking for cxx flags ‘-MMD’ : yes CXX Compiler : g++ 13.0.0
…
but (without -D) a build with a lot of warnings but successful build, and finally the same FPE, so no change.
I can tell you how I made it compile using homebrew gcc.
There are aliases pointing to preinstalled g++ and gcc in usr/local/bin. I replaced these aliases with those pointing to /usr/local/Cellar/gcc/11.2.0/bin/g++-11 and /usr/local/Cellar/gcc/11.2.0/bin/gcc-11 for g++ and gcc respectively and it started compiling with these since then. You can try it and see if it works. It simple and straight forward. No changing paths and altering files.
Thanks. In fact, what happened is that I had Homebrew gcc/g++ installed while on Mojave, and later upgraded directly to Monterey. gcc/g++ appeared installed, but a lot of headers were missing when compiling. So I did % brew reinstall gcc,
and with a couple of symbolic links on /usr/local/bin:
… Setting board to : sitl Using toolchain : native Checking for ‘g++’ (C++ compiler) : /usr/local/bin/g++ Checking for ‘gcc’ (C compiler) : /usr/local/bin/gcc Checking for c flags ‘-MMD’ : yes Checking for cxx flags ‘-MMD’ : yes CXX Compiler : g++ 11.2.0
…
There are much fewer warnings on the build now, and it seems to work very well. Thanks a lot.
So for MacOS Monterey (probably also Catalina and Big Sur):
If with Apple provided software (XCode), use -D by now as a workaround.
Better install/reinstall Homebrew gcc/g++, as you say.