SITL TCP socket closure when connecting with mavsdk

I am debugging a issue I have when connecting to SITL over TCP port 5763 with a minimal MAVSDK implementation (ie just using mavlink_passthrough plugin). Upon connection, the SITL HAL complains of an EOF and closes the socket. It seems that the This is a problem that happens only about 30% of the time.
It seems that the system sends a Command Long of type 520 CMD request autopilot capabilities, and tries to send a few others but they time out.
When running SITL in debug mode, It appears that there is a runtime fault which raises a SIGFPE

Thread 1 "arducopter" received signal SIGFPE, Arithmetic exception.
--Type <RET> for more, q to quit, c to continue without paging--
GCS_MAVLINK::convert_COMMAND_LONG_to_COMMAND_INT (in=..., out=..., frame=MAV_FRAME_GLOBAL_RELATIVE_ALT)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:4588
4588            out.y = in.param6;

and the backtrace is:

#0  GCS_MAVLINK::convert_COMMAND_LONG_to_COMMAND_INT (in=..., out=..., frame=MAV_FRAME_GLOBAL_RELATIVE_ALT)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:4588
#1  0x00005555556d8272 in GCS_MAVLINK::handle_command_long (this=0x555555ac1d70, msg=...)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:4622
#2  0x00005555556d6250 in GCS_MAVLINK::handle_common_message (this=0x555555ac1d70, msg=...)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:3592
#3  0x00005555555b6199 in GCS_MAVLINK_Copter::handleMessage (this=0x555555ac1d70, msg=...) at ../../ArduCopter/GCS_Mavlink.cpp:1422
#4  0x00005555556d0c4e in GCS_MAVLINK::packetReceived (this=0x555555ac1d70, status=..., msg=...)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:1459
#5  0x00005555555b3d9b in GCS_MAVLINK_Copter::packetReceived (this=0x555555ac1d70, status=..., msg=...)
    at ../../ArduCopter/GCS_Mavlink.cpp:585
#6  0x00005555556d0e9e in GCS_MAVLINK::update_receive (this=0x555555ac1d70, max_time_us=1000)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:1510
#7  0x00005555556d29b0 in GCS::update_receive (this=0x555555a2c3a0 <copter+23008>)
    at ../../libraries/GCS_MAVLink/GCS_Common.cpp:2171
#8  0x00005555555b1217 in Functor<void>::method_wrapper<GCS, &GCS::update_receive> (obj=0x555555a2c3a0 <copter+23008>)
    at ../../libraries/AP_HAL/utility/functor.h:88
#9  0x000055555565b864 in Functor<void>::operator() (this=0x5555559fbe40 <Copter::scheduler_tasks+992>)
    at ../../libraries/AP_HAL/utility/functor.h:54
#10 0x00005555556add75 in AP_Scheduler::run (this=0x555555a26cf8 <copter+824>, time_available=2500)
    at ../../libraries/AP_Scheduler/AP_Scheduler.cpp:251
#11 0x00005555556ae22e in AP_Scheduler::loop (this=0x555555a26cf8 <copter+824>) at ../../libraries/AP_Scheduler/AP_Scheduler.cpp:376
#12 0x00005555556b3993 in AP_Vehicle::loop (this=0x555555a269c0 <copter>) at ../../libraries/AP_Vehicle/AP_Vehicle.cpp:218
#13 0x00005555557d204e in HAL_SITL::run (this=0x555555a45a00 <AP_HAL::get_HAL()::hal>, argc=11, argv=0x7fffffffd9b8, 
    callbacks=0x555555a269c0 <copter>) at ../../libraries/AP_HAL_SITL/HAL_SITL_Class.cpp:270
#14 0x00005555555b0845 in main (argc=11, argv=0x7fffffffd9b8) at ../../ArduCopter/Copter.cpp:772

FYI, it is not just CMD_REQUEST_AUTOPILOT_CAPABILITIES, when I remove that request, the next command long message (a MAV_CMD_SET_MESSAGE_INTERVAL) also throws the same error.

On one occasion, the value for in.param6 was 2.e+38 or some similar large number. As this conversion from long to int only happens once in handle command long, and only for logging purposes, Commenting out the call to this function on lines 4621 - 4623 in GCS_Common.cpp, has stopped this error from killing the autopilot.

Hi Peter,
From memory, the issue here is that MAVSDK sends NaN for parameters 5 and 6 for the long command, which ardupilot doesn’t handle. This fixed things for me.

void GCS_MAVLINK::convert_COMMAND_LONG_to_COMMAND_INT(const mavlink_command_long_t &in, mavlink_command_int_t &out)
{
    out.target_system = in.target_system;
    out.target_component = in.target_component;
    out.frame = MAV_FRAME_GLOBAL_RELATIVE_ALT; // FIXME?
    out.command = in.command;
    out.current = 0;
    out.autocontinue = 0;
    out.param1 = in.param1;
    out.param2 = in.param2;
    out.param3 = in.param3;
    out.param4 = in.param4;
    if (command_long_stores_location((MAV_CMD)in.command)) {
        out.x = in.param5 *1e7;
        out.y = in.param6 *1e7;
    } else {
        out.x = in.param5;
        out.y = in.param6;
        if(isnan(in.param5))
        {
            out.x = 0;
        }
        else
        {
            out.x = (int32_t)in.param5;
        }
        if(isnan(in.param6))
        {
            out.y = 0;
        }
        else
        {
            out.y = (int32_t)in.param6;
        }
              
        
    }
    out.z = in.param7;
}

Thanks Luke, that looks fine.

In MAVSDK there is a check for autopilot version between Ardupilot and PX4 based on a field in the heartbeat message. This is on line 437 of the mavlink_commands.cpp in mavsdk. Based on that determination, It is supposed to set either 0.0f or NAN depending on the autopilot version.

I think that code is not working when the float value is valid but unititalised (ie not NAN) and the check on 429 always returns true. std::isnan() may not pick this up either.

There is also a problem on the ardupilot side because casting a float to integer results in undefined behaviour when that value causes a integer overflow which is why I think the system is throwing SIGFPE. I think this can happen even if the value is not a NAN.

But agree initialising all parameters how you like is the best way to go.