The combination of data, speech and video, new wireless standards, and the high resolution images used in (U)HDTV means transmission frequencies and bandwidths are constantly increasing. Stephan Leng, Applications Engineer, Electronics Division HEITEC AG explains.
Networks and the underlying control architectures not only have to cope with these increasing data speeds, they also need to operate without interruption. Availability Class 5 (‘5-nines’ or 99.999%), which highly reliable systems must satisfy, requires that maintenance work and updates take no more than five minutes and 16 seconds per year.
High availability - the requirement for a system to remain available even if a component suffers an outage - plays an important role in many fields today, from secure energy supply systems to telecommunications. In the first example, energy must be transmitted across long distances, while the second involves transmitting data via an optical fibre network that will ensure, for example, that high resolution data is streamed under full load conditions to millions of HDTV sets during a World Cup football match or the Olympic Games.
Any football fan watching a big match will understand very well just how essential high availability is in these cases (the fact that data conversion causes the stream to be delayed by a few seconds, meaning that neighbours who watch their TV programmes via local antenna get to celebrate the winning goal a few seconds earlier, is a structural problem in the streaming process itself that even high availability can’t fix).
There are a number of ways to approach making a system and its functionality as highly available as possible and to eliminate ‘single point of failure’ risks. One involves cluster systems in which all of the functions are contained in a running system with all their connections, with a redundant system provided in parallel that can take over the functions of the first in a matter of milliseconds if it should fail. Alternatively, all the components that fall below a specific reliability level can be redundantly arranged within a single system. The two approaches are also often used in combination.
Because an outage in a single element can cause the entire system to fail, redundant systems are usually structured in two parts - mechanical components like chassis, plugs and backplanes are arranged singly, without redundancy, due to their very high MTBF (mean time between failures); whereas a duplicate - redundant - arrangement is adopted for the active electronics (assemblies), power supplies and especially fans, which have a much lower MTBF because of their construction and moving parts. In addition, all active elements must be equipped with hot-swap mechanisms that allow them to be exchanged quickly while the system is in operation, and this functionality must be supported by both the system software and the individual hardware elements. If a power supply fails, there must be a guarantee that the remaining power supplies will continue to provide the system with sufficient electricity. If a fan should fail, the remaining fans must continue to reliably cool the system, and if a card stops working, it must be possible to replace it quickly.
This means that the chassis technology with integrated backplane plays a vital role. These elements not only support redundancy and hot-swap capability, they also guarantee a high level of signal integrity, which ensures both availability and ease of maintenance as well as strong and reliable performance. The ability of the overall application to function is quickly put at risk if even one small component in the greater overall system fails to play its part, or if the way the software, hardware and mechanical elements work together is not properly coordinated. So what can be done in the area of chassis technology to achieve this, effectively and with an eye on costs?
There are different approaches depending on the application. AdvancedTCA can provide the basis for implementing a high availability and high performance data control system. This standard was developed for carrier grade applications in the telecommunications field as a means of guaranteeing high availability even in a context of high throughput. It also offers customers the most cost efficient off-the-shelf base possible, along with modularity and access to an extensive ecosystem of standardised cards and software for future adaptations and updates.
The required AdvancedTCA base platform package is based on existing HEITEC 19” chassis technology components and a backplane in 9U Eurocard format, along with other standard elements. The advantage here is that there is a large number of standard cards on the market that cover the bulk of the requirements, which gives customers the ability to look for systems offering the card functionality they need within the large ATCA ecosystem pool. However, what is specifically required right now in terms of mechanics?
Because the latest control and transmission systems can process huge volumes of data, and state-of-the-art Ethernet networks are already aiming to transmit beyond 100GBit/s, the waste heat generated also creates elevated temperatures within the system, or can sometimes result in hot spots. This means that system cooling and the underlying heat management equipment have a substantial role to play. For example, it’s not enough to arrange a redundant layout for the fans, which have a relatively high failure rate (or a low MTBF) compared with the passive elements on account of their mobile structure - they also have to be hot swap-capable when maintenance is required, as well as easy to access and capable of being replaced by non-specialists with no risk of error. This also applies to all other components with a low MTBF, for example, power supplies.
Above: In case it is required, easy exchange of active components has to be secured in high-availability applications
A number of fans have been integrated for the high speed data processing application. They will continue to function reliably if one of the other fans fails, preventing the system from overheating or spontaneously cutting out. On the electronics end, the shelf manager adjusts the performance of the other fans if one fails by increasing their speed. System cooling is mechanically supported if one of the fans fails, by directing the air flow from the lower front to upper rear, with a valve system ensuring that if one fan fails the air generated by the others doesn’t escape to the front but remains within the system and maintains the residual pressure needed to continue effectively cooling the entire system.
Active components easy to replace
Even a task as straightforward as removing and inserting a card is becoming something of a challenge as different standards develop, increasingly complex applications need greater bandwidth, data volumes grow and latency periods shrink. The more pins there are, the greater the force needed to remove and insert the cards. Handles with a lever/pulling function make it easier to remove and insert them. Integrated microswitches alert the software that the service technician wants to replace an assembly.
The task of triggering a direct alarm using the microswitches and forwarding it to an external monitoring unit is implemented as part of the telecommunications specification. The system is designed for rapid access to the redundantly configured components to enable them to be replaced quickly and easily while the system is in operation. This also includes unambiguous labelling to improve maintenance - for example, colour coding the guide leads on the front to identify them and rule out any risk of incorrect operation.
High performance serial data processing and high frequency connections create tougher demands on clean signal routing and PCB interfaces. Signal integrity, and therefore unimpeded transmission, is extremely important, but is very difficult to achieve, especially with the high volumes of data that have become the norm today. Transmission rates in the multi-digit GBit/s range can lead to high frequency effects that were absent in earlier parallel bus systems.
Even the most minor disrupting factors can have a massive negative effect on signal quality. In a complex system where every channel is capable of supporting series transmission, this places major demands on PCB structures, cabling, electromagnetic compatibility and plug-in connections. In particular, incorporating an AdvancedTCA backplane, with its extremely high speed point-to-point connections between the boards, means that the plug connections and backplane have to work together as seamlessly as possible, with reliable contact guaranteed for at least 15 years. This implies even greater demands in terms of design, simulation and testing to detect interfering impulses, impedance or EMC effects. A risk analysis dealing with these issues should be performed very early in the design stage to identify and prevent interfering influences, both to substantially reduce subsequent costs and effort for testing and to minimise risk in complex applications.
High transmission speed, however, is vital not only for the backplane but also for various assemblies. Because it’s important with these applications to achieve extremely high real-time performance in the entire embedded system (also known more recently as ‘cyber physical system’, or CPS), it’s also possible to verify signal integrity at the assembly level to suit specific customer requirements. All assemblies and their functionalities and how they function together with the backplane can be reviewed. This means that the backplane can be checked for signal integrity under differing conditions to ensure that unimpeded operation with higher processor performance can be guaranteed, even following a potential system update involving additional cards.
Moving from the world of communication to other areas of application such as energy transmission, the approaches remain the same in terms of mechanics - because here too, the essential elements are high control system availability and data speeds. However, the interfaces often differ in the latter case, and can be highly specialised in terms of the application – in the present instance, the benefits of the ATCA system architecture are adapted for this market, and so the basic mechanical structure of ATCA is supplemented by a 6U backplane built to individual customer specifications. Sturdy, standard CompactPCI plugs are used, and the platform is also expanded to 21” to adapt the number of slots required to suit the application – all based on existing standard HEITEC chassis technology to implement a total system with the shortest possible development time while keeping the cost at a reasonable level.
Using established industry standards, in this case AdvancedTCA, offers customers a technologically advanced modular solution for implementing a high availability control unit at the most reasonable price possible. The shortest possible time-to-market must be considered, as well as the scalability specifically required for high speed data processing and other demanding industrial applications, without compromising performance.
In some fields of application, however, the demands are highly specific and can’t be covered by the existing standard alone. This calls for a company that can make rapid adjustments in close collaboration with the customer at a reasonable cost, even in relatively small projects, by offering an extensive, modular standard chassis portfolio and experience in a variety of market segments and relevant target applications.
With its decades of experience in chassis technology, development, and manufacture for all market segments, HEITEC can work in synergy with its customers to provide the required components based on best practices and state-of-the-art technologies, standard mechanical components, and adaptations to suit specific customer requirements, from the initial concept through to full implementation.