One of my white whales as an engineer who is obsessed with designing robust systems is achieving true modularity. It’s actually pretty funny, since I don’t even come from a systems engineering background, I’m more of a circuits guy. Nonetheless, when I joined CWRU Motorsports my first year of college, I saw a system made out of Arduinos and speaker wire and figured these guys don’t know that I don’t know what I’m doing this is a perfect place for a distributed system!

Why though?

Baja SAE may have started out as a mechanical engineering competition, but we EEs and CompEs are beginning to take over. It really started with the development of the holy grail of Baja transmissions, the Electronic Continuously Variable Transmission. An ECVT provides the weight and packaging advantages of a conventional CVT, but without the flyweights and ramps and finicky tuning. Instead, you simply specify a shift curve which is carried out by some electronic actuator like a DC motor.

In the 2023 Competition Season, CWRU Motorsports unveiled our Semi-Active suspension system. We are only the second team to run one in competition, and the first one to do it without the use of costly and destructive magneto-rheological fluid. In the span of only about 6 months, we went from a conceptual system architecture of valves and drivers to a functional on-car system.

In retrospect, our first implementation left a lot to be desired. The main thing that bothered me was a lack of communication between the suspension system, our other major consumer of power, the four-wheel drive system, which you can read about here, and our data logging system. The Steering wheel had a UI for the driver to control the four-wheel drive, suspension, and radio, but these systems all split out of a 12 conductor connector. This was not ideal from a packaging, systems, and assembly standpoint, so how can we fix it.

Safety

One of the most important reasons I didn’t like the 2023 solution is its lack of robustness and safety. I felt that the system architecture was hard to analyze when looking for failure modes, and hard to fully cover with tests. This meant that we went into competitions not knowing what might fail, how it would fail, and how to fix it. In high reliability and life critical systems, there are both things that would cause a product to not ship, and since we have a driver in our car interacting with these systems, I really want to make sure that my team does everything in our power to design a system that protects them in the case of a failure.

CAN Bus

It’s pretty obvious that CAN is the solution here. It’s used in actual cars for exactly what we are attempting to do, and works well. Plus, it has a lot of nice features like hardware level filtering and deterministic arbitration. The only downside to CAN is its lack of a high-level representation. So, we decided to combine it with Cyphal/CAN, a deterministic pub-sub communications protocol designed for high reliability systems. Cyphal was particularly friendly to me as I immediately felt at home with its implementation of data structure description language or DSDL. It bears a striking resemblance to ROS’s implementation of a similar concept in the form of .srv and .msg files. Using Cyphal means we can have a given actuator subscribe to a command by a central computer, or even multiple computers. It also means we can have that same central computer anonymously subscribe to all messages being sent on the bus at once, and save them to some data structure for later analysis.

Implementation

My favorite part of Cyphal is the fact that the hardware implementation can so closely mirror the abstract network architecture. A “node” on the Cyphal network is usually a single PCB, with hardware designed to do exactly what the node needs to do, and nothing more. This promotes one of my favorite things as an engineer: reuse. For example, we use Hall Effect speed sensors everywhere on our cars. They tell us wheel speed, shaft speed, engine speed, etc. So, an obvious and correct solution would be to create a Hall Effect Node. This node’s only job would be to read Hall Effect sensors, but it would do a good job of it, and it could get reused everywhere we need rotational speed.

Testing and coverage

With the use of Cyphal, we can also do some pretty nice bench testing and mocking of our systems. Linux has mainline support for CAN using socketCAN, so developing desktop applications to both inspect the network and fully simulate it is surprisingly easy. A computer running multiple instances of the software of our nodes, interacting with each other using real CAN packets over socketCAN is a great way to test the robustness of a software system. In addition, since we log the messages flowing on the actual car, we can play back real traffic for testing nodes, as well as inspecting failure modes of a particular node or the system as a whole.

Conclusion

I hope I’ve done a decent job of explaining our approach to distributed systems on CWRU Motorsports. There’s still some stuff in the air, but overall I am happy with the “hard part”. This is an ever evolving system though, and one that I am constantly thinking about. Hopefully, this project will inhabit more of my portfolio, and I’m looking forward to seeing what else I learn from it!