Industrial

Using a Device Firmware Monitor (DFM) to deal with software bugs

17th July 2019
Joe Bush
0

Johan Kraft, CEO and Founder of Percepio, introduces a new solution that leverages cloud connectivity to root out software bugs after device deployment. Computer scientist Edsger Dijkstra once said, “Program testing can be used to show the presence of bugs, but never to show their absence”. Despite developers testing their software as much as possible, they just can’t prove that bugs don’t exist in the system.

Missed bugs are pretty common - around 95% of all bugs introduced during embedded software development are found, meaning that five percent remain in the production firmware. And since even great programmers (the top one percent) introduce around 11 defects per KLOC (1,000 lines of code), missing five percent of the bugs is significant.

Assuming a 100 KLOC application with 20 defects per KLOC and five percent of the bugs missed, you end up with 100 bugs in your shipped product. Some perhaps harmless or very unlikely to ever cause any trouble, but you just can’t know for sure. Your system may appear to work just fine, as the bugs you miss are probably related to unexpected scenarios and corner cases. However, once the system is exposed to large amounts of real life use cases, these bugs may cause all sorts of trouble. A famous example is NASA’s Mars PathFinder mission that nearly failed due to a software issue. In this case, the problem was actually analysed and fixed thanks to remote diagnostics and update capability.

Once a product is deployed, it can be extremely difficult to get any useful information as to what issues actually occur. In practice, development teams are reliant upon their customers to report any issues, a responsibility they have not agreed to and thus can’t be expected to fulfil. With a connected IoT device though, developers can leverage a new service, the Device Firmware Monitor (DFM), to report issues during testing or in-the-field. Let’s look at what the DFM is and how it can help developers.

What is the DFM? 

The DFM allows development teams to become aware of issues in their deployed devices and retrieve trace data, allowing the team to analyse and identify the root cause. Once the cause is identified, teams can quickly provide a fix for the software and patch it using over-the-air updates before most users are affected by the issue.

The DFM can be thought of as a software ‘flight recorder’ that leverages cloud connectivity. A small trace recorder library is installed in the code base and records the software behaviour to a RAM ring buffer, based on code instrumentation in the RTOS kernel and other relevant APIs.

When the system misbehaves, an error message and the trace data that has been recorded in the background can then be transmitted (directly or after a reboot) through a communication interface such as WiFi or Ethernet to a cloud service that stores a report and notifies the developer. The developer can then access this trace data via Tracealyzer to review what was happening in the system leading up to the error, and reproduce those events on the bench so that the issue can be resolved quickly. A general overview of how this works can be seen in the image below:

What can DFM detect? 

The DFM can detect a large array of potential problems within the device firmware. First, developers can set up alerts for typical issues such as failed assertions or when a fault handler is generated. They can also set up custom triggers that can detect issues such as timeouts, stack overflow or other issues that might occur in a real-time embedded system. Developers can customise the firmware to detect and report only what they consider to be issues. This may also include warnings, e.g. that the stack usage has exceeded 95%.

When an issue is detected, the error message and the trace data is uploaded to the cloud service that stores it and notifies the developer about the issue. One nice thing about the DFM is that if a team has 1,000 devices in the field all reporting the same bug, they aren’t notified 1,000 times.

Amazon Web Service

Instead, they are notified once and informed that there have been 1,000 detections of this unique issue so far. This automatic classification helps to keep the developer’s inbox from being overwhelmed from multiple devices reporting the same bug. This way, you also ensure that each unique issue is noticed, even a single report of this issue in a large volume of other reports.

Conclusions

In our connected world, there is no longer a need to rely on the end customer to report when a device isn’t working as expected in the field. Using the DFM, development teams can make sure to get alerted as soon as issues are detected, in the field or on the bench, and with meaningful diagnostics that allows the bug to be quickly resolved. This is a capability that won’t just improve the quality of embedded software but allow early adopters to get ahead of their competition.

Product Spotlight

Upcoming Events

View all events
Newsletter
Latest global electronics news
© Copyright 2024 Electronic Specifier