Discussion of cyphal support

tridge · March 12, 2023, 5:45am

A PR has been proposed to add “cyphal” support, which is a new CAN protocol touted as a successor to DroneCAN. I’ve opening this discussion to explain why the ArduPilot dev team are not keen on incorporating cyphal support.
The PR is here:

github.com/ArduPilot/ardupilot

add cyphal (uavcan v1) support

ArduPilot:master ← PonomarevDA:pr-uavcan-v1

opened 07:52AM - 31 Mar 22 UTC

PonomarevDA

+2235 -0

This PR adds a minimal implementation of Cyphal (uavcan v1.0) for ESC controllin…g. It has the following required for any application functions: - Node heartbeat (the node publishes uavcan.node.Heartbeat) - Generic node information (the node response on uavcan.node.GetInfo.1.0 request) - Register interface (response on uavcan.register.Access.1.0 and uavcan.register.List.1.0) I also implemented templates for the following data types, but they do nothing yet. They might be completed if someone needs them or even removed for a while: - uavcan.node.port.List (sends all supported port id) - uavcan.node.ExecuteCommand (handles requests on restart or save parameters commands) It also interacts with all necessary for ESC controlling data types, such as: - reg.udral.service.actuator.common.sp.Vector4.0.1 - reg.udral.service.common.Readiness.0.1 - reg.udral.physics.electricity.PowerTs_0_1 - reg.udral.service.actuator.common.Feedback_0_1 - reg.udral.service.actuator.common.Status_0_1 - reg.udral.physics.dynamics.rotation.PlanarTs_0_1 All ESC related data types have non-fixed subject identifier. It is handled by the register interface that is based on AP_Param library. Since the implementation doesn't support automatically port id enumeration, you need to either manually or using a script from external server set their id (for ardupilot and for esc's). Typical set of parameters and corresponded registers might be as on the picture below: ![ardupilot_params](https://user-images.githubusercontent.com/36133264/160892348-bdcf6122-24b3-4ac9-b768-44970b7c4562.png) I also forwarded esc telemetry to the corresponded MAVLink message: ![esc_telem](https://user-images.githubusercontent.com/36133264/160892583-821a31f1-431a-4174-bb10-59983c528031.png) The implementation is based on `libcanard`, `o1heap`, `public_regulated_data_types` repositories which are added as submodules. It also uses `nunavut` for generating the code from DSDL. I tried to implement it in the similar way as AP_UAVCAN is implemented. You may use both versions of UAVCAN simultaneously, but on different CAN buses only. According to the build summary comparison, on CUAV v5 the current branch used 14244 bytes of flash memory more than master branch. Although free flash on this branch on this hardware is 317088 bytes. The firmware was tested on real copter based on cuav v5. I attached here a few flight examples [first](https://youtu.be/wBNh1XV1EMQ) and [second](https://youtu.be/FE_kxwhicWM) and corresponded [log files](https://drive.google.com/drive/folders/11hBPpcgdHG1a3gUMJ8z9DtMmeTAbuwBR?usp=sharing). I used Holybro kotleta20 ESCs based on sapog v3 firmware. Here should be noticed that I tested it before merging master branch into this branch. I may provide more tests in future. I also tried `Tools/scripts/build_all.sh` and it's ok. I also want to notice that even if it was tested it is a still just a first version of implementation. It has a few assumptions, for example: - it is expected to use it only for quadcopter since it uses exactly 4 ESCs, - it works only on first CAN bus now. I didn't test it on other hardware. Any feedback is welcome.

First off, I’d like to say that @ponomarevDA has put a lot of effort into this PR. The HITL demonstration is particularly nice. Normally we would be delighted to merge a new feature like this, but in the case of cyphal there are some issues.
The PR itself just adds ESC support, but @ponomarevDA has done support for GPS, mag, baro and rangefinder in the cyphal-hitl branch:

that branch gives a much better idea of what the future ArduPilot implementation would look like as such a complex CAN protocol that can only do ESCs would be fairly pointless. I can understand however that the initial PR would try to start simple.
Background
The effort to create what is now called cyphal has been going on for many years. Initially called “UAVCAN v1”, it got renamed to cyphal when DroneCAN split off from UAVCAN development.
The key person behind cyphal (and UAVCAN) is @Pavel_Kirienko. Pavel has written a summary of why he thinks cyphal is superior here:

If you haven’t read that then you should probably read it now.
Along with others in the ArduPilot dev team, I’ve been involved in the discussion of UAVCAN-v1 (what is now cyphal) for years. We’ve pointed out the fundamental issues with it for a long time. Much of that discussion is no longer available publicly, as Pavel moved the discussion from the public section of the uavcan forums to a private section, making all those discussions no longer available.
Luckily I kept some notes locally before I posted some of the critiques of the proposed message structure. I’ve put those (very rough!) notes up here:
http://uav.tridgell.net/cyphal/
that was from May 2021, and much of the criticism I gave then is still valid with the current cyphal design.
Key Objections to Cyphal
The most important objections to cyphal are:

it is incompatible with DroneCAN, so you cannot mix DroneCAN and cyphal on the same CAN bus
it fragments the ecosystem for CAN peripherals just when we finally have great traction in getting a really good set of peripherals available for users and vendors
the design of the cyphal protocol is very poor, it is driven by a fundamentally flawed philosophy

Compatibility
When UAVCAN-v1 was first proposed Pavel decided to do it as a clean-sheet protocol. This is very tempting for protocol designers, as designing a new protocol from scratch gives you a lot of freedom and lets you ignore all the hard compatibility issues that are involved with evolving an existing widely used protocol.
Contrast what happened with cyphal and with MAVLink. From mavlink 0.9 to 1.0 to 2.0 we put the effort in to make it as seamless as possible for end users while still providing major benefits of new features. This has resulted in a great user experience.
During the discussion of uavcan-v1 (a discussion which is now invisible unfortunately) I pointed this out as a fundamental issue. Pavel said that he hadn’t realised that uavcan-v0 was widely used (which was a surprise, giving just how much it has dominated the industry for many years now) and he agreed to try to make it compatible.
Pavel came up with a modified encoding of uavcan-v1 that used a trick to allow uavcan-v1 and uavcan-v0 to co-exist on the same CAN bus. It did weaken the protections in the protocol a bit, but was a reasonable solution.
After that was proposed I asked that this compatibility be a central focus of the development, with real-world testing of mixed networks and devices. As far as I am concerned cyphal is dead in the water without this compatibility, proven on real vehicles, so a DroneCAN GPS can be on the same bus as a cyphal baro or ESC as well as any other combination. The network analysis tools need to support viewing mixed protocols (like mavlogdump and MAVExplorer support multiple versions of mavlink) and the whole process of changing between DroneCAN and cyphal needs to be as seamless as possible for the end user.
Unfortunately the proposed PR has not done that. It has stayed with the clean-sheet approach. That is vastly easier from a development point of view, but much much worse for end users with large investments in expensive DroneCAN peripherals.
Ecosystem Fragmentation
Given the limitations of CAN bus protocols that mixing protocols on the same bus is hard (and in many cases impossible) it is absolutely critical that we not fragment the ecosystem, or we end up losing the ecosystem scaling property that is needed to make CAN peripherals viable.
A good example is a conversation I had recently with a major ESC vendor looking to do CAN ESCs. They told me that looked and saw lots of different versions of protocols and they didn’t know what one to pick, so they decided to do their own proprietary protocol which just makes the situation worse. That type of decision gets repeated for scores of vendors.
Poor Design
The cyphal protocol is a protocol driven by buzzwords and philosophy. Real-world devices are jammed into the philosophical framework like jamming a square peg into a round hole.
All the “modern”, “data-centric-publish-subscribe” and other stuff is just window dressing around a poor design. Unfortunately discussions I’ve attempted to have with Pavel on this have led to what feels like rather patronising religious preaching, not protocol engineering. I know what publish subscribe models are. I also know when I see the principles being applied blindly without thinking about the context it is being applied in.
Let’s look at some real examples in the proposed cyphal PR to understand why the design of cyphal is an issue. I’ll use the cyphal-hitl branch as the base for discussions as a lot of the issues only become clear when you go beyond the simple case of ESCs (although the simple ESC example also has some very fundamental issues).
This image from the PR gives a good idea of how the philosophy of cyphal pans out:

It has the following parameters for the first ESC:

CAN_D1_UC1_DYN1
CAN_D1_UC1_EH1
CAN_D1_UC1_FB1
CAN_D1_UC1_POW1

In the branch it has support for maximum of 4 ESCs, so it is 16 parameters for max 4 ESCs. Each of these has to be assigned and match the corresponding parameters of each of the ESC nodes. There are tools to help do this.
There are a number of real issues with this approach. First off, it doesn’t scale. We support up to 32 actuators in ArduPilot. With the DroneCAN approach the existing SERVOn_xxxx actuator IDs are what is used on the CAN bus. So the end users only has to know the one ID. All of the different types of data associated with an ESC are all associated with that one ID.
With the cyphal approach, the reason there are so many parameters is that each little piece of information is a separate topic and a separate message on the (overloaded!) CAN bus. So the RPM comes back using reg_udral_physics_dynamics_rotation_PlanarTs. That contains:

a 56 bit timestamp
and a reg_udral_physics_dynamics_rotation_Planar value

then inside the reg_udral_physics_dynamics_rotation_Planar value we have a reg.udral.physics.kinematics.rotation.Planar and a uavcan.si.unit.torque.Scalar torque value.
The reg.udral.physics.kinematics.rotation.Planar value a uavcan.si.unit.angle.Scalar for angular position, a uavcan.si.unit.angular_velocity.Scalar for angular velocity and a uavcan.si.unit.angular_acceleration.Scalar for angular acceleration.

As a tree it looks like this:

reg.udral.physics.dynamics.rotation.PlanarTs.0.1 length=23
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        value: reg.udral.physics.dynamics.rotation.Planar.0.1 length=16
                reg.udral.physics.dynamics.rotation.Planar.0.1 length=16
                        kinematics: reg.udral.physics.kinematics.rotation.Planar.0.1 length=12
                                reg.udral.physics.kinematics.rotation.Planar.0.1 length=12
                                        angular_position: uavcan.si.unit.angle.Scalar.1.0 length=4
                                                uavcan.si.unit.angle.Scalar.1.0 length=4
                                                        radian: saturated float32 length=4
                                        angular_velocity: uavcan.si.unit.angular_velocity.Scalar.1.0 length=4
                                                uavcan.si.unit.angular_velocity.Scalar.1.0 length=4
                                                        radian_per_second: saturated float32 length=4
                                        angular_acceleration: uavcan.si.unit.angular_acceleration.Scalar.1.0 length=4
                                                uavcan.si.unit.angular_acceleration.Scalar.1.0 length=4
                                                        radian_per_second_per_second: saturated float32 length=4
                        torque: uavcan.si.unit.torque.Scalar.1.0 length=4
                                uavcan.si.unit.torque.Scalar.1.0 length=4
                                        newton_meter: saturated float32 length=4

total length is 23 bytes. Now go look at CyphalDynamicsSubscriber::handler() in the proposed pull request. After all of that complex tree of information, all it actually gets is a 16 bit RPM. That’s it. Everything else is thrown away because it just isn’t useful information and ESCs mostly can’t provide it anyway.
That is just for one of the 4 topics for ESCs. The same over-engineering happens at every level.
Now let’s look at what happens when commands are sent to ESCs. As ESCs are the only thing supported in the proposed PR you might expect it to be pretty well developed. Unfortunately we see things like this:

    for (auto esc_idx = 0; esc_idx < 4; esc_idx++) {

so, it is assuming a maximum of 4 ESCs. Only quadcopters then.
This is reinformated further down:

    for (uint_fast8_t sp_idx = 0; sp_idx < 4; sp_idx++) {
        _vector4_sp.value[sp_idx] = (hal.rcout->scale_esc_to_unity(srv_config[sp_idx].pulse) + 1.0) / 2.0;
    }

it is encoding the demanded speed using a vector4 which looks like this:

reg.udral.service.actuator.common.sp.Vector4.0.1 length=516
        value: saturated float16[4] length=2

yep, ESC commands are encoded with a vector4 (as an aside, why does the length show up as 516 bytes? It should be 8 bytes).
The design is that if commanding 8 ESCs you use a Vector8, for 16 ESCs you use Vector16, etc. Why do it like this? It just leads to code like in the PR which can only control a quadcopter.
I should say that the ESC command messages in DroneCAN right now aren’t great, but if you have the opportunity to re-design then why go to such great lengths to make it worse than what we have now?
Philosophy Driven Design
Really the fundamental problem isn’t the PR. The problem is the philosophy driven protocol design that ignores the real world properties of devices we actually use. The philosophy touts re-use and composability as an advantage, but also demonstrates why applying those principles blindly is such a bad idea. The philosophy also touts only using SI units, ignoring the fact that SI units are often not what is really needed. An example is the above with RPM. Most ESCs don’t know the RPM. They know the eRPM which is scaled by the number of motor poles which they often don’t know. The philosophy of cyphal means it transmits angular vecocity in radians/second, but eRPM doesn’t fit that mold. The PR just ignores that and assumes the angular velocity can be directly scaled to RPM ignoring motor pole count.
Having a flag that says “this is an eRPM and needs to be scaled by number of poles” would allow for ESCs to either be motor agnostic, and send rRPM, or know the motor and send RPM. That would be great, but where would this flag fit into the cyphal mold? Yet another topic?
Cyphal and GNSS Devices
Now let’s look at the GPS implementation in the cyphal-hitl branch. As the data from a sensor becomes more complex the problems with the cyphal approach become more apparent.
The AP_GPS_CYPHAL GPS driver implements 5 topics. They are:

position with reg.udral.physics.kinematics.geodetic.PointStateVarTs
yaw with uavcan_si_sample_angle_Scalar
num sats with uavcan_primitive_scalar_Integer16
status with uavcan_primitive_scalar_Integer16
pdop with uavcan_primitive_scalar_Integer16

Let’s expand each of those:

reg.udral.physics.kinematics.geodetic.PointStateVarTs.0.1 length=67
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        value: reg.udral.physics.kinematics.geodetic.PointStateVar.0.1 length=60
                reg.udral.physics.kinematics.geodetic.PointStateVar.0.1 length=60
                        position: reg.udral.physics.kinematics.geodetic.PointVar.0.1 length=36
                                reg.udral.physics.kinematics.geodetic.PointVar.0.1 length=36
                                        value: reg.udral.physics.kinematics.geodetic.Point.0.1 length=24
                                                reg.udral.physics.kinematics.geodetic.Point.0.1 length=24
                                                        latitude: saturated float64 length=8
                                                        longitude: saturated float64 length=8
                                                        altitude: uavcan.si.unit.length.WideScalar.1.0 length=8
                                                                uavcan.si.unit.length.WideScalar.1.0 length=8
                                                                        meter: saturated float64 length=8
                                        covariance_urt: saturated float16[6] length=2
                        velocity: reg.udral.physics.kinematics.translation.Velocity3Var.0.2 length=24
                                reg.udral.physics.kinematics.translation.Velocity3Var.0.2 length=24
                                        value: uavcan.si.unit.velocity.Vector3.1.0 length=12
                                                uavcan.si.unit.velocity.Vector3.1.0 length=12
                                                        meter_per_second: saturated float32[3] length=4
                                        covariance_urt: saturated float16[6] length=2
uavcan.si.sample.angle.Scalar.1.0 length=11
        timestamp: uavcan.time.SynchronizedTimestamp.1.0 length=7
                uavcan.time.SynchronizedTimestamp.1.0 length=7
                        microsecond: truncated uint56 length=7
        radian: saturated float32 length=4
uavcan.primitive.scalar.Integer16.1.0 length=2
        value: saturated int16 length=2
uavcan.primitive.scalar.Integer16.1.0 length=2
        value: saturated int16 length=2

so 86 bytes total and split over 4 messages. The thing is, this is missing lots of absolutely critical information you need for a CAN GPS. For example, no accuracy information. Nothing to support moving baseline yaw. No GPS time information (time of week and MS time within week, which is critical).
The cyphal driver just fills in the missing information with arbitrary values, for example this code:

    state.horizontal_accuracy = 0.1;
    state.vertical_accuracy = 0.1;
    state.speed_accuracy = 0.1;

these numbers really matter as they affect the EKF fusion process, you can’t just make them up.
Following the cyphal philosophy, each of these missing pieces will be yet more topics. By the time we got a useful cyphan GPS we’d end up with a dozen or more topics (possibly a lot more?), each with their own ID parameters, for each GPS, so more things for end users to mess up.
Ease of Development vs Ease of Use
Another way to look at cyphal is where it comes down on the ease of use vs ease of development spectrum.
Doing a clean-sheet protocol was an “ease of development” decision. It is vastly easier (and much more fun!) to do a protocol from scratch, but it is awful for end user deployment.
Same with not having the messages have an identifier in them to say what the message is. Pavel quite rightly points out that having an identifier in the protocol means you can get collisions and mistakes can be made. Of course that is true, but it is developer pain, whereas putting the IDs into user facing parameters is user pain. There are vastly more users than developers and the ratio of users to develops rapidly increases as the protocol becomes more widespread. Part of the job of a good developer is to take on the pain so that end users get an easier time.
I know you can develop tools to try to check the users work, but you shouldn’t need to do that and it will still be error prone, and the end user won’t have the same sophisticated knowledge needed to debug things that the developers are expected to have.
Think about dyslexic users (and yes, they are quite common). Getting CAN_D1_UC1_DYN3=2374 instead of 2347 could mean that ArduPilot could decode a GPS latitude/longitude as a ESC RPM. This sort of “network cast” is at the heart of the problems that cyphal creates.
If we merged the cyphal PR then the expectation would be that we would also merge support for mag, baro, GPS, airspeed, rangefinder etc etc. Following the cyphal philosophy we’d end up with many hundreds of topics, and as part of the philosphy is that topics don’t carry IDs that associate with actual devices, you need to have a parameter for each of these topics for each instance you could have of the device.
So if we need 5 topics for servos (likely we’d need quite a few more as servos give a lot of information) and we support 32 servos then that is 160 parameters just for servos! Once we have a full implementation that matches the current DroneCAN one then we’d end up with maybe 500 or so topic ID parameters?
I hope this makes it clear why we aren’t keen on cyphal, and wish that Pavel had listened more years ago when we made it abundantly clear what the issues were.

iampete · March 12, 2023, 3:21pm

Ignoring the discussion on the protocol and setup. Some questions for those keen on Cyphal support:

What would users gain?

What Cyphal devices are currently available?

I have seen a number devices that support both Cyphal and DroneCAN, on those devices what functionality would the user gain by using Cyphal over DroneCAN?

It seems to me that there will be some teething troubles between what Cyphal is now and where is would like to be. If the ArduPilot team has reservations about the protocol we could defer support until Cyphal has worked out those issues and has a wider range of supported hardware.

Pavel_Kirienko · March 12, 2023, 9:47pm

Andrew, thank you for starting this conversation. Much time has passed since we last spoke about Cyphal; the project has made some considerable progress meanwhile.

Both of us have spent enough time going back and forth on the architectural issues since 2019, so at this point I think we understand each other’s positions well enough so you will forgive me for not re-iterating the same arguments here once again, as I suspect it will not move us closer to convergence. Those who have not been part of our prior debates can catch up on the conversation by skimming through the following topics, which I just made accessible to the general public, albeit in a read-only mode:

The linked discussions are very long and the subject matter is complex. Those who are not willing to invest the time needed to read and understand them all will benefit from my abridged article that Andrew has already linked in the OP post: Cyphal vs. DroneCAN - General - OpenCyphal Forum. Further, the Cyphal Guide explains the rationale behind the design decisions, the breaking changes, and provides hands-on examples on how and why the old approaches fall short in 2023 while they were adequate in 2015: The Cyphal Guide - Applications & Usage - OpenCyphal Forum.

Cyphal is still a very young technology, despite being in development for several years, as our work is conducted in long iterations of diligent design followed by the collection of empirical feedback from the field. Despite being young, Cyphal has already been successfully adopted in some of the most advanced and challenging projects across various fields spanning from robotics and CubeSats up to the most advanced autonomous flying vehicles and man-carrying piloted VTOLs. I will desist from listing the specific companies and products, hoping that the interested parties will step up and speak for themselves in this thread if they deem so acceptable (which I hope they do).

While the intended user base of Cyphal differs from that of DroneCAN, they do intersect, and for those businesses and researchers that end up in the intersection, the lack of support for Cyphal in ArduPilot — needless to say, one of the best open-source autopilots — is a major obstacle. This is manifested not only in the inability to use Cyphal hardware with ArduPilot directly but also in the negative network effects on the nascent Cyphal ecosystem.

Rather than recycling the well-known arguments on the architecture, I would suggest that we turn the conversation to the practical, problem-solving side and see whether there might be some middle ground that is acceptable to all involved parties. Speaking on behalf of a large number of enterprises and researchers that have been investing in Cyphal heavily over the past four years, I propose that we work out a set of specific conditions that will render the pull request acceptable to the ArduPilot maintainers. According to the core design goals behind Cyphal, the following aspects of the protocol cannot be changed and no compromise can be made on these:

Fixed port identifiers are not going to happen (see “Cyphal vs. DroneCAN”).
The use of SI is non-negotiable (see “Cyphal Guide”).
Type polymorphism will stay (aka “network casting”, as aptly named by Andrew).
Protocol tunneling (I2C-over-Cyphal, uBlox-over-Cyphal, etc.) is not going to happen.
The transport protocol is unalterable (field deployments have already been made and it would be harmful to the growing Cyphal ecosystem).

Can we work out a set of changes to the existing implementation that would appease the ArduPilot maintainers while staying within the above-defined red lines? We are particularly open to modifying the UDRAL (or DS-015) standard, which is, and always was, a mere draft proposal; all of our attempts to discuss UDRAL to date always derailed into the discussion of the core design goals of Cyphal and other irrelevant matters. If one proposed a scratch reimplementation of UDRAL that does not break the red lines, I am certain the growing Cyphal community would welcome that effort and provide constructive criticism.

In anticipation of objections on the grounds of configuration complexity, I would like to point out that we have recently released (the alpha version of) Yukon — the new, comprehensive GUI tool for configuration and diagnostics of Cyphal-based networks, brought to us by @Silver_Valdvee. With Yukon, a dyslexic user does not need to distinguish 2347 from 2374 to validate the correctness of the network configuration, as any mistake would be rendered clearly visible by the visual connectivity graph on screen.

As to the implementation complexity, I would like the ArduPilot maintainers to notice two things. First, the fairly complete Cyphal driver only requires ca. 20 kB of ROM, which is a negligible amount for the higher-end ArduPilot build targets; further expansion of the driver’s functionality is not expected to alter its footprint significantly because, due to Cyphal’s design, once the core is in place, additional functions add very little complexity on top. Second, as a matter of discretionary revelation, the work on the Cyphal driver (masterfully done by @ponomarevDA) has been sponsored by several enterprises that are interested in seeing this protocol supported in both of the major open-source flight control stacks out there; as you know, this interest is not going to disappear overnight, which means that the Cyphal-related part of the ArduPilot codebase will continue to receive high-quality support for the years to come, not being a burden on the ArduPilot maintainers.

Last but not least, I would like to address some of the specific points you raised in the OP post separately. Please find my responses below.

We’ve pointed out the fundamental issues with it for a long time. Much of that discussion is no longer available publicly, as Pavel moved the discussion from the public section of the uavcan forums to a private section, making all those discussions no longer available.

Making the discussions private was a conscious decision. The point is that open debates tend to attract participants who lack sufficient understanding of the subject matter to contribute to the discussion meaningfully. Those who want to see the ill effects of uneducated opinions on the quality of discussion can see them in one of the topics I linked at the beginning of my response. It did get to the point of direct insults, where I had to intervene and relieve the thread of the unhelpful contributions so that the discussion could continue in a more civilized manner.

it is incompatible with DroneCAN, so you cannot mix DroneCAN and cyphal on the same CAN bus
<…>
As far as I am concerned cyphal is dead in the water without this compatibility, proven on real vehicles, so a DroneCAN GPS can be on the same bus as a cyphal baro or ESC as well as any other combination.

This is wrong; Cyphal/CAN and DroneCAN can coexist on the same bus without confilcting with each other.

it fragments the ecosystem for CAN peripherals just when we finally have great traction in getting a really good set of peripherals available for users and vendors

So does any other CAN protocol, yet ArduPilot supports many of them.

the design of the cyphal protocol is very poor, it is driven by a fundamentally flawed philosophy
<…>
The cyphal protocol is a protocol driven by buzzwords and philosophy.

The design of Cyphal is certainly not perfect, but I believe it to be superior to DroneCAN (see the article). Having designed quite a few protocols in my career, including and especially DroneCAN, I believe I have the grounds to claim that my understanding of the subject matter is at least as good as yours. I am likewise tempted to use emotionally loaded words to describe DroneCAN, but you might see perhaps how that would not serve the discussion well. May I suggest that we refrain from emotional statements in the interest of reaching convergence faster?

During the discussion of uavcan-v1 (a discussion which is now invisible unfortunately) I pointed this out as a fundamental issue. Pavel said that he hadn’t realised that uavcan-v0 was widely used (which was a surprise, giving just how much it has dominated the industry for many years now) and he agreed to try to make it compatible.

I don’t think I said that. The discussion that I believe you are referring to is and has always been public: UAVCAN v1.0 and ArduPilot - Development & Maintenance - OpenCyphal Forum

With the cyphal approach, the reason there are so many parameters is that each little piece of information is a separate topic and a separate message on the (overloaded!) CAN bus.

I wrote about this in the article you linked, but I will repeat here once again that the bandwidth issues are, in fact, overstated. Cyphal can be configured to utilize the limited network bandwidth more efficiently than DroneCAN, which I illustrated with the help of the bandwidth estimation spreadsheets. I urged you to provide sensible counter-examples where Cyphal/CAN falls short of DroneCAN in terms of its bandwidth utilization but never managed to get a comprehensive response.

The thing is, this is missing lots of absolutely critical information you need for a CAN GPS. For example, no accuracy information. Nothing to support moving baseline yaw. No GPS time information (time of week and MS time within week, which is critical).

The code you are referring to serves only as a minimal demonstration of the GNSS interface. The accuracy information, in particular, is communicated via the covariance matrices; it is indeed not used by this demo, but this is to be considered a deficiency of the demo.

Regarding your general analysis of the GNSS interface defined in UDRAL, I am afraid it is incorrect and highly misleading to someone unfamiliar with Cyphal. I have provided a response to your (emotionally loaded) post on GNSS, where I explained why your conclusions are incorrect. It hardly seems fair to share your incorrect analysis without my corrective response. Please find it here: Problems with DS-015 - #32 by pavel.kirienko - UDRAL: Cyphal Drone Application Layer - OpenCyphal Forum

As I said earlier, we are open to redesigning the UDRAL standard from scratch as long as the new design stays within the core principles of Cyphal outlined earlier. Maybe we should focus on that instead?

tridge · March 12, 2023, 11:28pm

@Pavel_Kirienko thanks for making the old discussions avaiable again (although they do require a login, can you make them readable without login? Note that this discussion is readable without login)

it certainly can’t in the proposed PR, and if I have understood the px4 implementation (which I may not have) then it can’t co-exist in px4 either (@dagar can you confirm?)
As I said when I set this as an absolute requirement right from the start, co-existance needs to be the number one priority. It needs to be demonstrated, not just theoretical, it needs to work for the tooling (eg. GUI bus analyser needs to decode both protocols at the same time in mixed environments) and needs to be seamless for end users.
Achieving that co-existance would require some significant changes. It likely is possible, but it can’t just be hand waved and put off.

I’m sure it’s an impressive tool for experts, but if that screenshot is representative then it won’t in any way help end users sort out issues. It is vastly too complex. End users should not be expected to be networking experts and certainly not experts in cyphal. They bought a GPS and plugged it in. They expect to be able to say “I’d like to use my new CAN GPS” and it should then just work. Showing them a diagram like that is a way to get very unhappy users.

and these days what we are doing is putting them in as lua scripts, see libraries/AP_Scriping/drivers
cyphal could be done as a lua script, although the complexity of cyphal makes that a massive piece of work.

the core principles are what lead to the issues. The lack of stable identifiers in messages is what leads to configuration being hard. That plus the rejection of device instances in messages leads to the combinatoric problem of num_topics*num_instances parameters being needed for each type of peripheral.
Right now cyphal looks like an early stage experiment. I went and had a look at the px4 implemention, as I had noticed cyphal went in there. There is no way the px4 implemenation can be actually being used. Here are some snippets:

		reg_udral_physics_kinematics_geodetic_Point_0_1 geo {};
		size_t geo_size_in_bits = receive.payload_size;
		reg_udral_physics_kinematics_geodetic_Point_0_1_deserialize_(&geo, (const uint8_t *)receive.payload, &geo_size_in_bits);

		double lat = geo.latitude;
		double lon = geo.longitude;
		double alt = geo.altitude.meter;
		PX4_INFO("Latitude: %f, Longitude: %f, Altitude: %f", lat, lon, alt);
		/// do something with the data

so it parses GNSS then throws away all the data.

			reg_udral_service_actuator_common_sp_Vector31_0_1 msg_sp {0};
			size_t payload_size = reg_udral_service_actuator_common_sp_Vector31_0_1_SERIALIZATION_BUFFER_SIZE_BYTES_;

			for (uint8_t i = 0; i < MAX_ACTUATORS; i++) {
				if (i < num_outputs) {
					msg_sp.value[i] = static_cast<float>(outputs[i]);

				} else {
					// "unset" values published as NaN
					msg_sp.value[i] = NAN;
				}
			}

so ESC sending always uses Vector31, which is 31 float16, which means 62 bytes of data, which means for a 4 ESC aircraft it wastes a lot of bandwidth.
There are likely branches of px4 that do a lot better. The point is that both the proposed ArduPilot PR and the current merged px4 implementation of cyphal are experimental. ArduPilot aims to be a production-ready autopilot for serious use, with ease of use, ease of setup and robustness as priorities. Despite many years of effort cyphal is still an academic exercise, and the rigidity of your design principles is keeping it that way.
Right now cyphal is actively damaging the CAN ecosystem by dividing efforts on the creation of a rich ecosystem of peripherals.

Roman_Fedorenko · March 12, 2023, 11:54pm

As for the Cyphal devices currently available, RaccoonLab is proud to offer a complete range of devices for controlling multirotor and VTOL planes, including CAN-PWM adapters, GNSS+Mag+Baro nodes, rangefinders and airspeed sensors:

https://docs.raccoonlab.co/guide/

We support both DroneCAN and Cyphal protocols, but we believe Cyphal is better choice for future critical applications such as drone delivery and air taxi due to its flexibility and compatibility with non-standard applications. We are already working with two projects that are using Cyphal to create delivery drones.

It’s important to note that our customers need Ardupilot support, so the topic discussed is very important to us. We believe that Ardupilot as a platform could support both DroneCAN and Cyphal. We believe it is important to give users more options and flexibility, rather than limiting them to one protocol.

tridge · March 13, 2023, 1:21am

the closest we came to agreement during the past discussions was in this thread:
https://forum.opencyphal.org/t/port-type-safety-enforcement/1303/96
(login required unfortunately).
In particular the proposal to use one of the reserved bits to distinguish between regulated and unregulated types. It allows for unregulated types for the non-standard applications that @Roman_Fedorenko mentioned (welcome Roman btw!) and would allow for us to have a set of regulated types where markup in the DSDL sets the ID to use.
So the proposal would be:

use reserved bit to distinguish regulated vs non-regulated types
develop sane message sets for each of the key peripheral types
in those messages include an instance ID, so we don’t have to have a different ID for each of your ESCs/servos/GPS
made it work properly mixed with existing DroneCAN devices, with real testing and UI tools that understand mixed bus usage

that would be a real step forward, making configuration vastly simpler and less error prone.
@VadimZ proposed this be done via a separate config file at the time. I much prefer markup in the DSDL itself.

james_pattison · March 13, 2023, 1:49am

I’d be interested in the views of @dk7xe.g or his colleagues, noting the efforts they’ve put into Cyphal. If you could wind back three years, would you make the same decisions? Why/why not?

aentinger · March 13, 2023, 7:52am

Hi

As maintainer of 107-Arduino-Cyphal which - as the name suggest - is an Arduino library that allows comfortable high-level access to the Cyphal protocol I’m obviously biased towards Cyphal .

I do not want to get involved into discussions of technical pros and cons of DroneCAN vs Cyphal as I believe this specific discussion (x vs. y) doesn’t really matter. What imho matters is that Cyphal is here to stay, the same way that DroneCAN is here to stay too.

Both protocols have merits and while I believe that Cyphal is better designed in some key aspects (-.- now I did get involved into the discussion which I tried to avoid) I’m realist enough to recognize that legacy technology (protocols (as a ham I’m thinking of AX.25 here), interfaces, code, etc.) has a way of sticking around (RS232 anyone?) for a v e r y long time.

Consequently I foresee a long future for DroneCAN- as well as Cyphal-enabled devices.

ArduPilot as a auto pilot platform only benefits from adding yet another communication protocol, as it increases the platforms overall relevance.

Just my 2 's,
Cheers, Alex

PS: We are successfully using Cyphal within the l3xz project, a ROS enabled, Cyphal/CAN networked mixed electric/hydraulic hexapod robot.

EDIT: I’m also aware of several companies using Cyphal within their products (forgot to mention that on the first write-up), and I’m not counting either Pavel’s company or my own .

atikhono · March 13, 2023, 8:34am

Hi,

I’m a representative of Turing Flying Machines, a company developing and manufacturing aero logistics purposed vertical take-off and landing UAVs of two different weight classes (TFM-15 and T-300).

We already actively use the Cyphal-based equipment on our VTOLs platforms and plan to expand its use in the future.
When comparing DroneCAN and Cyphal, we chose Cyphal because it is more suited for the full control reservation tasks, which are particularly important for the mission critical aerial delivery applications. We believe that Cyphal’s capabilities offer great potential for further development too, including seamless transition from Cyphal/CAN to Cyphal/UDP.

From our prospective we favor the development of the Cyphal protocol and the ecosystem around it, in particular its implementation in the leading autopilot, Ardupilot. We believe that supporting Cyphal in Ardupilot is very important for the ecosystem scale and quality.

devittdv · March 13, 2023, 8:43am

Let me add a few arguments regarding the use of Cyphal. I’m Dmitry from https://www.digitalautosystems.ru/ . We develop different unmanned vehicles for delivery and monitoring in agriculture and mining (boats, VTOL, underwater drones, construction equipment). In particular, some of our solutions work on Ardupilot, for which many thanks to the community!
We have been developing our solutions on the dronecan / uavcan ecosystem for a long time and we are interested in a more flexible architecture for integration into our projects with data redundancy and a decentralized bus inside the board.

Now we have launched industrial production of VTOL based on Ardulipot and DroneCAN, and we are very interested in Cyphal support for further integration of our sensors into a single ecosystem. I attached the actual video to the post.

We are currently uncomfortable with some properties of DroneCAN that are managed by using the new standard:

ethernet support
we need continuous customization of messages for various tasks (adding a payload, integrating a landing station for a drone, controlling an internal combustion engine unit or integrating an additional sensor for a boat). Now it is tedious to use a different set of custom messages between teams in different projects. With Cyphal, this situation looks better.

tridge · March 13, 2023, 9:02am

welcome Alexander.
Interesting that you mention control reservation tasks. That sounds like cyphal being used in the role that MAVLink is now used in ArduPilot. DroneCAN (and cyphal in the proposed PR) is a sensor and peripheral network implementation. It is not a control system. There is no way (or proposed way) to control missions, re-task the vehicle, change control modes or do any of the things associated with a control protocol.

I know it’s one of Pavel’s talking points to say that DroneCAN can’t do UDP, but I actually use DroneCAN over UDP all the time, via mavcan. I often help diagnose and fix DroneCAN setups on ArduPilot vehicles on the other side of the world, using https://support.ardupilot.org, which is UDP based. This is a standard part of stable releases of ArduPilot and has been for some time.

and this comes to the heart of the issue. I see people wanting ArduPilot to support cyphal for the ways it would help cyphal gain credibility and entice more vendors to adopt cyphal.
What I don’t see is how it benefits ArduPilot. The proposed PR is not actually useful in any real way (see all the holes in it above) and the issues just get worse as it gets more complete (see scalibility issues I describe above that make achieving feature parity with DroneCAN as extremely painful, or maybe impossible). Similarly the implementation in px4 is not actually useful (also see above). In both cases the implementation is a placeholder, not something usable.
I also see the increased adoption of cyphal give the major architectual issues I have pointed out again and again as being a net negative for ArduPilot. It will inevitably lead to more vendors implementing cyphal on peripherals instead of DroneCAN, as Pavel encourages vendors telling them it is the future and how much better it is. I just don’t agree it is better, in fact I think in its current form it is actually much worse.
So it seems to be all downside for the ArduPilot project. My responsibility is to the ArduPilot project and ArduPilot users, not to the cyphal project.
I’m sorry if some people were under the impression that the ArduPilot would just accept cyphal. We made our objections abundently clear over many years, and we’ve spent a lot of time trying to work with Pavel to find common ground, but kept hitting those red lines that Pavel will not cross. Pavel then hid those discussions, so perhaps you were not aware of the objections and our very clear statements that we were not going to accept this as is.
Now regarding using cyphal as a control protocol, I’ve been involved in projects to control some pretty large UAVs, and it scares me to think that a protocol working with the design principles of cyphal will be used for control flow for these types of UAVs. When sending control commands you want to be darn sure that the recipient of the command interprets those bytes in the way the sender intends. As a computer scientist and someone who has been working on UAV control for a long time I think you should re-evaluate the wisdom of using the protocol in this way.

Pavel_Kirienko · March 13, 2023, 1:54pm

We are on the same page here, and we will look into ensuring that it is possible to enable both DroneCAN and Cyphal/CAN drivers simultaneously. I don’t think this will entail a significant engineering effort since the two protocols are easily distinguishable on a per-frame basis (via the toggle bit).

First, the process of configuring a new Cyphal node is not as complex as you apparently imagine it to be; essentially, it amounts to connecting a few lines in the connectivity graph (even this part will eventually be mostly automated away). This is not an obstacle for the intended user base of Cyphal, and those users who cannot tolerate using a dedicated tool for network configuration will always be able to continue using DroneCAN. As @aentinger said, we are not proposing to obsolete DroneCAN but provide an alternative protocol for the benefit of the more complex systems.

Let us please not go there again. I already know what you are going to say next, you know my arguments against it. I disagree with your view of what makes a good communication protocol, and I don’t expect you to accept my arguments and change your view of the matter regardless of anything. Can we please leave this matter at rest and focus on the things that are possible to agree on? Specifically, to reiterate what I have said earlier, I am offering to forget UDRAL and redesign a new DSDL namespace from scratch. Can we focus on that? Maybe we could invite @ponomarevDA to contribute his view of what a new, “sane”, DSDL namespace that will be accepted by ArduPilot might look like?

Let us please stop trying to break Cyphal. As I have already said earlier on several occasions, lack of the fixed port-IDs is an essential part of the new design, and they are never going to be re-introduced. In the interest of saving time to all of the involved parties, I suggest we focus on finding a middle-ground solution that does not turn Cyphal into a clone of MAVLink/DroneCAN/I2C.

tridge:

The proposed PR is not actually useful in any real way (see all the holes in it above) and the issues just get worse as it gets more complete (see scalibility issues I describe above that make achieving feature parity with DroneCAN as extremely painful, or maybe impossible). Similarly the implementation in px4 is not actually useful (also see above). In both cases the implementation is a placeholder, not something usable.
I also see the increased adoption of cyphal give the major architectual issues I have pointed out again and again as being a net negative for ArduPilot. It will inevitably lead to more vendors implementing cyphal on peripherals instead of DroneCAN, as Pavel encourages vendors telling them it is the future and how much better it is. I just don’t agree it is better, in fact I think in its current form it is actually much worse.

What you dress as “architectural issues” are actually the key design features that allow users to bypass the unreasonable rigidity of DroneCAN and MAVLink at virtually no extra cost aside from the marginal increase in the number of configuration parameters. The concerns related to the data type misinterpretation or bandwidth efficiency are mostly imaginary, as I explained in detail in the aforementioned articles. Your refusal to accept the fact that the rigidity of the existing approaches is limiting the advancement of new designs is hurting the entire ecosystem and the ArduPilot community in particular, even if some of its members are not yet able to comprehend this.

I urge you to revisit my suggestion to work out a new DSDL namespace that you will find adequate. I would be the last person to claim that the UDRAL standard is perfect; rather, it is an approximation of what I see as the new, improved DCPS architecture. If you want to ditch that and go back to the cruder interfaces of the past, I have no problem with that as long as we stay within the red lines.

Sorry, that was a configuration mistake on my part. The discussions should be visible to everyone now.

maksimdrachov · March 13, 2023, 2:52pm

Full discosure: I’m a volunteer opensource contributor to OpenCyphal.

Allow me to put in my 2 cents:

Reading the threads that have taken place on this issue, it seems clear to me that both Andrew and Pavel have a lot of history debating this issue, however I don’t see how it might be resolved on the basis of technical arguments, since the difference (from my point of view) comes down to:

philosphical: Pavel’s decision to develop a new UAVCAN-inspired protocol, arguing that the lessons learned could be leveraged towards improving a more general communication protocol.
economical: Andrew’s unwillignes to disturb the current ecosystem for DroneCAN.

Now, I’m not experienced/technical enough to be able to discuss the technical merits of either side of the argument, so I won’t even try to go here. However, as I already have noted, I don’t think this is necessarily a technical argument.

Now taking a look at the core of the disagreement, I can only note the following:

Pavel and Andrew will not agree on which is better/best, and it seems futile to try to force either party to agree. However, I don’t see how not adding support for Cyphal improves ArduPilot. It seems clear that many companies are using the new protocol, so in the end, if the development of this support is denied to this part of the ArduPilot community, something is definitely lost. Given the option to the developers themselves to choose which protocol they want to use seems like the most sensible solution.

iampete · March 13, 2023, 2:54pm

I’m still struggling to understand the advantage. Maybe someone could explain what the new protocol adds.

The main argument here seems to be that the current approach is too rigid? Does this mean its too hard to add new messages?

The Cyphal home page has lots of features listed but there all developer focused, I would like to understand what this means for users. Thanks to @Roman_Fedorenko there is some hardware that people can buy, but why you use Cyphal mode vs DoneCAN mode? The main thing the vast majority of users want is to buy something, plug it in and have it work with minimal config, they don’t care (nor should they have to) if protocol X is better than protocol Y.

Pavel_Kirienko · March 13, 2023, 3:11pm

Hi Peter. It is hard to translate the advantages of Cyphal to the user front if we are focused on the currently available DroneCAN devices. What a DroneCAN node can do, Cyphal node can do as well; there may be a negligible increase in the configuration complexity, but that’s that (Andrew tends to overdramatize this part, presumably because he lacks the experience of actually using Cyphal nodes). Consider checking out this video made by @ponomarevDA (it shows an old version of Yukon the GUI tool, but it does convey the idea well):

Where Cyphal does excel is in its ability to build decentralized systems. If you need to make a motor controller talk to a BMS or make one node consume data from a specific sensor bypassing the central master node (that being the flight controller), you have to use Cyphal, as DroneCAN simply breaks here (see “more practical examples”). Thus, the advantage of Cyphal is its ability to bring entirely new kinds of products to the market, aside from (and including!) the simple “sensor nodes” that are already there.

To add to that, consider also that Cyphal supports not only CAN but also Ethernet (like the new 10base-T1S, which, along with some other standards, will render CAN obsolete in a few years) as the first-class transports. DroneCAN lacks this support as it is fully CAN-centered. Supporting other transports is not the same as being able to ferry CAN frames over multicast UDP etc.

khancyr · March 13, 2023, 3:12pm

Cyphal is advertised as a replacement and upgrade for DroneCAN which isn’t true. That is another protocol, and according to the reading, not dedicated to CAN only.
If that is truely a replacement for DroneCAN why should we continue to develop DroneCAN ? But as you have stated, it is a more general protocol. So that complexify things … And even more if the API can change as it is now proposed (in our favor this time , but breaking change are always hard to manage).

Not that clear, beside the PR I truely don’t remember demands for it. It doesn’t mean that there wasn’t but it isn’t that demanded against DroneCAN or DDS for ethernet stuff.

Pavel_Kirienko · March 13, 2023, 3:19pm

Cyphal is, indeed, a replacement and upgrade to DroneCAN while being a different and more general protocol. These things are not mutually exclusive. At the same time, we are not suggesting discarding DroneCAN overnight.

To serve the numerous existing deployments out there? I see no issues on this side.

Unlike DroneCAN, Cyphal allows you to change the DSDL definitions without breaking the network compatibility. No breaking changes are involved here.

Yuri_Rage · March 13, 2023, 4:08pm

I’ll keep this brief, as not to stray into the “lacking sufficient understanding” category, but I hope some end user perspective might be useful?

20kb is not a trivial cost. Many recent PRs are focused on savings in the 10s of bytes, much less on the order of kb. I do recognize there is space available on the more robust autopilots, but does putting Cyphal support into that space best serve the user-base? Which brings me to:

I could only find one vendor from whom to buy only a small assortment Cyphal-protocol hardware, leading me to believe it is not widely adopted at all. I’m sure there are other vendors who do not advertise so publicly, but if the hardware is so minimally proliferated, it certainly seems that Cyphal support is not an emergent need for the vast majority of users.

Finally, and perhaps most importantly from an end-user/configuration point of view: Can the need for potentially hundreds of new parameters (as mentioned by Tridge) be addressed in a more user-friendly way? Is there an alternative to that?

Pavel_Kirienko · March 13, 2023, 4:22pm

It is undoubtedly true that Cyphal is not nearly as widespread as UAVCAN v0, aka DroneCAN, as it is a new protocol, so this comparison hardly adds new information, nor is fair. Looking at the current adoption rates, I predict that Cyphal will quickly overtake DroneCAN due to its strong design and a much more robust implementation base. We already have testimony from some of the early adopters presented in this thread who can recognize the advantages offered by the new architecture.

@ponomarevDA had a proposal to this end, but I wouldn’t venture to judge on its technical merits and feasibility. He may pitch in himself.

ponomarevDA · March 13, 2023, 4:31pm

Hi. Thanks for starting the conversation and giving feedback to the PR.

As it was rightly understood, within this PR I tried to make a first step with a minimal Cyphal implementation to collect feedback from developers for further improvements. cyphal-hitl and cyphal branches are what I prepared as further steps. I’m still free to improve or extend the PR, fix and test something, and I’m really happy to get a more detailed review of what I’ve done.

I can start by going beyond the quadcopter example by increasing the number of ESCs (hopefully with the HITL simulator it will be easier to test more complex airframes).

Then, since it is said that co-existance must to be the number one priority, I will try to make both protocols work on the same bus. I also need to note that it is currently possible to have devices based on both protocols on different buses.

I will also complete the gnss message set (it is now just a minimal example to make HITL work and to fly with an assumption that accuracy is good enough) and review ESC feedback one more time, so we can discuss it.

Few words about the parameters and the possibility of misconfiguration.
Parameters with port IDs are just an internal representation of Cyphal registers. You don’t need to manipulate with them manually.
When I started with Ardupilot & Cyphal, I made configuration mistakes from time to time. However, they were not a significant problem because they are easily caught during the pre-flight testing process. Then I wrote my own scripts based on pycyphal and Yakut (you can actually see the output of one of them in the screenshot attached to the first post) and it became easier. But that was ~ 1 year ago. Today we have Yukon, which makes it even more easier. Also, I think an automatic network configuration correctness can be implemented in the near future. Since we have port.List and *.type registers, this seems to be trivial.
If the amount of parameters is still uncomfortable, we can consider storing them on an SD-card. For example, we can put a yaml file with register names and their port ID and read and parse it once during the driver initialization. If it has a string with a corresponding register name, let’s set the corresponding port id. If it is not, let’s disable it. By receiving ExecuteCommand we can write it back to the SD-card. Can this help with the scalability issue? I have to admit that this is just an idea and I didn’t get any details on how the sdcard interface is implemented in ArduPilot. Are there any pitfalls here?