Ensuring embedded browsers benefit from multi-core silicon

3rd September 2018

Ekioh

Alex Lynn

0 0

In this article, Stephen Reeder, Commercial Director, Ekioh, explores and explains how it can be ensured that embedded browsers benefit from multi-core silicon, and the impact of having multi-core hardware on software.

HTML browsers are now in widespread use for application rendering and user interface (UI) presentation on embedded systems, from set top boxes to car infotainment. The key reasons behind this are that they simplify UI authoring and provide a level of hardware abstraction that enables authoring to happen in parallel with hardware design. Browsers are, however, processor hungry components and achieving the desirable level of application responsiveness directly drives CPU selection.

Historically, this hasn't been a problem as processing power has been increasing year-on-year providing comfort that if a UI design is not responsive enough on cost effective silicon today, it will be by the time that mass production hardware is selected. But as processor clock speeds increase, the heat they generate also increases and ensuring this heat is kept below acceptable limits demands the periodic reduction of operating voltage and process size. Eventually such reductions become impractical for physical or cost reasons, which is leading the silicon industry to exploit the potential of multi-core processing.

Within a multi-core device, two or more processors are packaged together to provide increased overall processing power without having to increase the speed of the individual cores. A reduction in clock speed is usual to compensate for the additional heat generated by the increase in core count, so whilst individual core speed often falls, the gains in overall processing power are significant. Year-on-year overall processing power increases of 30% are not uncommon in consumer electronics processing despite the individual core speeds remaining broadly flat over the same period.

Impact of multi-core hardware on software

With a multi-core device, advertised performance can only be achieved if the task being performed can be split effectively so that each core is loaded equally. The presence of any sequential tasks that cannot be split, plus any overhead in communicating between parallel tasks, will govern the upper limit of the overall performance gain. Applications using an inherently single threaded design are unable to reap the benefits of multi-core silicon by themselves, although some benefits can be seen if multiple applications are run at the same time.

When considering how the benefits of multi-core processors impact browsers, it is necessary to delve into the browser’s internal workings. Browser activity largely comprises Networking (where content and resources are loaded), Parsing (to create the Document Object Model), Scripting, Layout (where text and image placement are calculated), Painting (drawing images and text into offscreen buffers), and Compositing the offscreen buffers onto the screen.

The table below shows the breakdown of tasks from a sample of various types of HTML applications. It shows that scripting and layout are by far the largest consumers of time, and whilst their combined size is fairly consistent, their balance varies with the type of application.

Historically all browser activity has been contained within a single thread and so has been unable to benefit from multi-core silicon developments. More recently some browsers have allocated networking and compositing into separate processes, but the key tasks of scripting and layout, which between them typically account for over 90% of the processor load, remain on a single thread.

This presents something of a problem when the browser is used for an application or UI. If 90% of a browser’s processing load is single threaded, and therefore cannot be split across multiple cores, the benefits of the shift to multi-core silicon cannot be realised. Indeed, as the average speed per core of a quad core device is considerably lower than that of an equivalent single core device, browser performance will noticeably reduce.

In the scenario where the browser is used to do multiple things at once, such as having several web pages open, each page can run on a separate core so any reduction in individual page responsiveness will be less noticeable. Hence there is no visible impact when running a general desktop browser on multi-core silicon. However, UI rendering scenarios tend to run as a single page and so single application performance is key; relying upon a traditional browser design will result in a marked decrease in responsiveness when multi-core silicon is used.

In order to properly benefit from multi-core silicon, the browser must be able to keep each of the cores fully loaded in both the single page and multi-tab use cases.

Recent browser evolution has focused upon the multi-tab use case, but has done little to benefit the single page performance needed for application and UI rendering. Instead, a new approach is needed where each page uses as many cores as possible in order to benefit from their combined performance.

Designing a new browser

Splitting the browser’s activities into these smaller tasks, and executing them on separate cores reduces the overall time taken until the page is usable. Since the performance of a multithreaded browser is determined by the duration of the longest task the aim is to break these down into as many smaller tasks as possible and execute them in parallel. A first step to designing a multithreaded browser is to identify a series of tasks and implement a method for their parallel execution. An obvious candidate is image decode as decoding one JPEG has no interaction with another.

There is considerable scope for splitting layout into multiple independent parallel tasks. When considering a printed page consisting of paragraphs of text, nothing in one paragraph affects the word wrapping of other paragraphs, just their vertical position. If a word were added to a paragraph the only effect would be the possible shifting up or down of the following paragraphs. This is important since each paragraph can be considered independent and word wrapped in parallel.

HTML works on the same principle. It considers every object as a simple rectangular box where the position of each box can affect the position of those around it, but not their contents. Once the width of each box is calculated, it is possible to layout its contents independently of the others. These layout tasks can then be executed in parallel and spread across the available cores in a multi-core environment.

User interfaces lend themselves towards parallel layout. They tend to consist of many independently positioned items, and these have no knock-on effects towards other objects. For instance, in the case of an onscreen TV Programme Guide (EPG), each channel row or even each programme cell is often ‘absolutely positioned’. Even though these regularly contain little or no text, there is still a benefit to being able to lay them out in parallel, especially when animating their size or position.

Putting theory into practice

Ekioh’s Flow HTML5 browser uses the parallel multithreaded layout architecture described above to unlock the full potential of multi-core silicon and the results underscore the company's view that multithreading is the way forwards.

In a series of side-by-side performance tests comparing Ekioh Flow, Apple Safari (WebKit), Google Chrome (Blink) and Mozilla Firefox (Gecko) on a quad core MacBook Pro, Flow outperformed all three:

In an extreme layout stress test involving the layout of over 70,000 paragraphs of text, Flow took just 4.6 seconds to complete compared with 10.9 seconds on Firefox, 17.6 on Safari and over a minute on Chrome.
Running the popular ‘The Man In Blue’ animation benchmark (with 1000 particles), Flow showed a clear performance advantage achieving 54.5 frames per second (fps) compared to 42.5fps on Safari, 29.5fps on Chrome and 16.5fps on Firefox.

Taking the ‘UI Applications’ use case from Table 1, and extrapolating the results from the tests above, shows that Flow can be almost twice as fast as traditional browsers, as shown below.

Testing on a quad core ARM based platform (4 x 3K DMIPS per core) showed a similar pattern of results when compared against the box’s default WebKit based browser:

In the extreme layout stress test, Flow took 20 seconds compared 131 seconds for WebKit.
‘The Man In Blue’ animation benchmark (with 500 particles), Flow achieved 17fps compared to 12fps on WebKit.

Scalability testing of Flow’s multithreaded layout, shows that layout times continue to fall with each additional core and that significant benefits are achieved on both dual and quad core devices. With the consumer electronics and embedded systems marketplaces moving to multi-core silicon, it is expected that Flow will be the benchmark against which other's performance will be measured for some time to come.

Connectivity is a growing trend within embedded consumer electronics, cars and industrial systems. This has driven an increased customer focus on the user interface as a key product differentiator. Customers no longer tolerate a slow and unresponsive experience - if the UI isn’t beautiful, functional and effortless, they’ll simply choose a different product from a rival supplier. When it comes to using HTML browsers, the switch to multi-core silicon threatens this experience, risking damage to brand reputation and ultimately sales. The time is therefore now for embedded engineers to look at their options in order to ensure their products continue to meet consumer expectations when it comes to the experience they deliver.