Continuous Integration and Continuous Delivery (CI/CD) for Robots
by Gerry Ens, CTO, Go West Robotics
Go West Robotics Posted 02/01/2021
If there is one thing everyone can agree on, it is that no one can agree on a single formal definition of “CI/CD”. While definitions may vary, almost everyone relies on a similar diagram, based on an infinite loop, to represent the “continuous” nature of the practices that are universal to continuous integration and delivery:
- Development practices are agile and constant updates are made to code libraries. Depending on the industry and the platform executing the code, updates to production systems could occur multiple times per day (without the end user even knowing).
- Code release and deployment processes are automated to the greatest extent possible to make it easier for developers to ship stable, secure software and respond quickly to customer needs. Your process should allow developers to ship high quality code with as little effort as possible.
- Efficient practices will balance developer autonomy needs against quality demands, and the best processes emphasize continuous testing using software libraries and peer code reviews to identify regression or prevent new bugs before they impact system performance or customer satisfaction
- Intelligent reporting systems that provide insight into system performance and failures are critical for managers to support CI/CD practices.
- Well architected CI/CD systems execute in the cloud to control execution costs, decrease compilation and test system runtimes, and provide accessibility to remote development teams and managers.
Simply put, CI/CD can help your developers write better code faster, while getting it into the hands of your customers quicker and more easily.
Given the emphasis we place on testing, our team prefers to go one step beyond just CI/CD and use the term “CI/CT/CD” to represent Continuous Integration, Continuous Testing and Continuous Delivery.
Background on CI/CT/CD
Before the advent of the Internet of Things (IoT) and cloud processing, hardware products (including robots) were unable to dynamically update their software, requiring users or integrators to perform a manual firmware update procedure, often involving on-premise updates to cabling and computers. This process allowed the software schedule to trail the hardware development schedule and allowed more time for testing. There was no expectation that software updates were available all the time for these products. In today’s environment, that expectation has changed. Customers expect new features to be provided constantly and they expect bugs to be remedied quickly, without user interaction or expensive downtime.
Origins of CI/CT/CD
In pure software projects such as Mobile App, Web or SaaS development, CI/CT/CD has been part of the software development life cycle (SDLC) for quite some time. Different Agile project management strategies (Scrum, Test-Driven Development, Extreme Programming, etc.) all prescribe some form of these as part of their DevOps methods. Automated Test development has contributed heavily to the acceleration of feature delivery and turnaround time on software. By automating tests, the test schedule has become disconnected from the limits of a human schedule. Running multiple tests in parallel has allowed for exponential speedup of test execution, running tests at every build continuously. While test execution has become faster, development of the tests still take time. Due to the relative similarity between coding features and tests in pure software products, combined with the emergence of software test frameworks and platforms, automated software testing has become an integral part of a software developer’s job.
Application to Hardware Products
Following the examples of Mobile Phone OS updates, router firmware updates and other connected devices, expectations for frequent software updates has become the norm for most hardware, including robots. The above development practices have been mapped to robotics development with varying degrees of success, with problems arising commonly in the automated test aspects of development.
Unit tests that test individual components of the robot software in isolation are generally produced by developers as part of their standard process to ensure that a specific block of code operates as intended now and in the future. Testing becomes more difficult, though, for integration or system level testing, as the software must communicate with real hardware. For robotics specifically, the actions taken by a robot (i.e., movement) require high-fidelity sensor data, which further complicates testing needs. Simulation helps overcome this hurdle.
Most, if not all robotics companies utilize simulators at some point in their development process. Some build their own and some use off the shelf products, both of which tend to require a significant amount of roboticist’s development time to develop and configure. Although simulation provides an invaluable means of vetting code changes and new development by rapid trial and error, simulation is only truly capable of telling a developer that something does not work. Simulation success is by no means a guarantee of success on real hardware, as the simulation is only as good as the simulated environment. As the saying goes, “Simulations are doomed to succeed,” but they are a critical first step to provide rapid feedback about code quality to robotic developers, engineers, and their managers.
CI Tool Integration
The most important decision when developing or choosing an integration platform is an intuitive user-interface. This allows users to get clear feedback on results and to ensure they can find the data they need to make informed decisions. In addition to a user-friendly dashboard, the best continuous integration systems have easy to use APIs that allow tests to be triggered and to have results gathered by standard integration suites like Jenkins, Travis, CodePipeline, GoCD, etc.
Unfortunately, the binary nature of many continuous integration suites (i.e., tests only “pass” or “fail”) and their limited test result data, leads to costly customization. It also can lead to steep learning curves when managers try to bring new members onto their team. In contrast, a clean and intuitive interface can be created from industry standard web components in a fraction of the time and can provide useful details of failing tests to developers beyond a simple “fail” state. For example, we often configure our test dashboards to contain full system log dumps from each test’s completion for the test executor.
Shorter Test Times and Better Results
When robot code has been optimized to reduce state dependencies, longer simulated missions can be broken down into many smaller missions. For example, a long mission might have a robot leave it’s charging station, go to the warehouse, pick an item from a specified bin, deliver the item to a new location, and ultimately return to its “home.” For ideal test case execution, each of these intermediary steps might become its own mission.
Once missions are dissected into base cases, several advantages become available. Firstly, test cases will provide more targeted feedback when they fail - failures can be isolated to the smaller subset of code contained in the short mission. Secondly, the tests can be executed in parallel to significantly decrease overall test execution time, providing feedback to developers much more quickly. In addition, the component missions can be incorporated by other developers and used as they build out their own simulated missions.
Execution in the Cloud
To really save developer time and reduce processing costs, the best CI/CT/CD pipelines leverage cloud computing to optimize test execution. While many developers may have expensive and capable computers, none have local systems that can compete with the computing ability of high-end cloud computers. Importantly, if cloud execution scripts are implemented appropriately, companies only pay for cloud resources for the relatively short duration required for test execution.