Ensuring error-free operation of scores of IoT devices
Percepio has developed an integrated solution that allows IoT device developers to monitor software behaviour after deployment of their firmware, providing them with the ability to immediately detect, debug and deploy fixes before many, if any, customers are affected. Johan Kraft, PhD, CEO and Founder explains more.
The global roll-out of the Internet of Things (IoT) is putting an increasing focus on the quality of embedded code and its operation out in the field. With thousands and even millions of connected devices and sensor nodes operating in all kinds of environments, even small software bugs can become big business problems. After all, the integrity of the produced IoT data is potentially at stake, as well as the experience of thousands of users.
Yet it is a well-known fact that bugs are practically inevitable. Some amount of defects typically end up out in the field – various studies indicate that it is about five percent on average. Some of these missed bugs may stay below the radar, or customers may work around them by rebooting the device, but others can be far more serious. Identifying software bugs out in the field is a real challenge, and as projects get bigger, combining embedded control and cloud systems, this problem gets bigger too.
With large fleets of connected IoT devices, the challenges are even more significant. The connectivity increases the amount of code and system complexity, which makes verification even more difficult. Testing out all the possible combinations of factors for every edge and corner case is often not practically possible, so realistically there is always going to be the risk of a bug. Acknowledging this risk and mitigating the effect of any problem is increasingly important with the vast number of nodes being deployed.
Catch, analyse, communicate and act
There is also the challenge of being pre-emptive at such a scale. A particular bug in one device may not be obvious, but it may be a precursor to a bigger problem. Catching, communicating and acting on problems early helps to ensure reliable product performance and user experience and can save millions of dollars – on development costs, customer support, product returns and, not to forget, reputational damage repair.
One approach being adopted by Percepio is to capture errors in the code out in the field as they occur and notify developers in real-time. This is achieved through a combination of monitoring code in the device and an advanced cloud infrastructure to store and process captured alerts.
The monitoring agent, called DevAlert Firmware Monitor or DFM, is a compact software library that developers add to their code, running on a real-time operating system such as FreeRTOS or Azure RTOS. It keeps track of recent software events and provides a way for error-handling code in the application to report any detected software and hardware-related errors, as would be done anyway in an embedded system.
The key is cloud integration
DFM makes it easy for device developers to integrate automatic reporting of abnormal conditions in the device software, e.g., run-time errors, warnings or abnormal performance metrics, using the device’s already existing IoT cloud connection and cloud account. Together with this report, DFM provides a software trace showing the most recent software events before the reported error, providing vital context and making it easier for developers to analyse and fix the problem.
However, using this firmware monitor on its own may result in a flood of data. An additional solution, the DevAlert Classification Engine, pre-analyses that data, looking at error codes and other symptoms to identify new and unique issues. While all alerts are stored, only new issues are forwarded to the developers automatically. This avoids duplicate alerts swamping the developers.
Once an alert has been received, the developer can pull down the related software trace and analyse it using Percepio’s visual trace diagnostics tool Tracealyzer. This allows engineers to focus on the real problem, i.e., testing out a fix back in the lab, and then rolling-out an over the air update before most customers even notice the issue.
The speed with which you fix a bug matters greatly. Most bugs in deployment will not manifest themselves directly for all users - if they did, they would almost certainly have been found during testing. So the faster an update can be provided, the fewer customers will be affected.
A winning combo
Bugs in the IoT sphere can be a challenge for the entire organisation. The three elements in DevAlert all play a part in helping companies deploy and monitor complex IoT firmware and provide OTA updates when needed. DevAlert was developed in close cooperation with several manufacturers of IoT-ready boards, including STMicroelectronics and Infineon Technologies.
This combination of a development tool with cloud data management helps reduce support costs for IoT development organisations as they roll-out devices in the millions, minimising the number of users affected by missed bugs and improving the overall customer experience.