Design

Accelerating Deep Packet Inspection

24th June 2013
Nat Bowers
0

Innovative ‘next-generation’ applications now rely on deep packet inspection (DPI) as a fundamental capability to enable security, policy enforcement, QoS and new service provider network capabilities. By Kin-Yip Liu, Senior Director, Systems Applications Engineering, Infrastructure Processor Group, Cavium.

More than ten years ago data networking applications communicated over the network entirely using ports and addresses; applications used designated ports when sending and receiving data from the network, users could be identified by (IP) addresses, while a file transfer application and a web browser used different TCP ports for sending and receiving data. A firewall which was configured to prevent files from being sent out of an enterprise by certain users simply blocked the port which was used by the file transfer application for sending data when the data came from certain IP addresses. In terms of the standard 7-layer OSI model, such a firewall only needed to work up to layer 4, the transport layer.

Modern applications are mostly web based. At the transport layer, they all go through the same port for web traffic. So, port number by itself cannot identify the application and does not provide much intelligence. Moreover, many new proprietary applications — such as file-sharing applications or voice-over-IP (VoIP) applications — emerge every day. In order to figure out what application, content, and which user are behind an IP packet, deep packet inspection (DPI) processing is required. DPI processing involves reassembling end-to-end communication flows from packetised data in IP packets and then analysing the data in end-to-end communication flows above the transport layer.

Why DPI?



Applications

DPI applications tend to focus on one or more of the following areas: security, data leakage prevention (DLP), application and user recognition, and network performance and usage monitoring.

A basic DPI application is to detect what kinds of applications, what kinds of content, which users, and how much network bandwidth is consumed by these applications and users. The network administrator can monitor which users are consuming how much of the network bandwidth, using what kinds of applications, and transferring what types of content. The network administrator can then apply the collected intelligence from DPI processing for security, quality-of-service (QoS), and policy enforcement purposes. Here are some examples in these respective areas.

Security: The IT department of an enterprise can monitor content leaving the enterprise network and ensure that sensitive data does not leak out of the enterprise network; this is DLP. By monitoring network usages and activities, potential attacks like DDoS (distributed denial of service) and even previously unknown malware (also referred to as zero-day malware attacks), can be detected, logged and responded to. An automatic response may be to identify the malicious packet flows and block them, or to block certain users and/or types of activities. An IT department can also filter out accesses to unauthorised web sites according to the enterprise’s policy, as well as block spams and potential malware from entering the enterprise network.

QoS and policy enforcement: Network administrators can enforce QoS. For example, VoIP packets and company-conducted-webcasts can have higher priority and can be allocated guaranteed amount of network bandwidth. Data, e.g. email and file transfers, may be deprioritised. Employees accessing personal web sites and video sites may be deprioritised or limited, depending on enterprise policy. Besides enterprise network, the cellular wireless core network has been utilising DPI technologies to manage QoS and policy of network bandwidth usage. For example, wireless service providers can detect VoIP usage, provide QoS comparable to voice calls, meter and charge for VoIP usage. Service providers can also meter data bandwidth usage in fine granularity and decide when to charge. For example, when cellular users access the service provider’s web site or purchase from certain service provider’s partner web sites, the corresponding data usage is not counted towards the users’ data plans.

DPI Processing Flow

Network applications involve end-to-end communication flows. For example, when a user accesses a web site, there is a TCP flow between the user’s web browser and the remote web server serving the corresponding web page. End-to-end communication flows get packetised into IP packets and individual IP packets are routed through the network in between. DPI applications tend to be at the boundary of a network where the input data to be worked on is a bunch of packets.

DPI processing starts with reassembling the end-to-end communication flows by examining individual IP packets. The end-to-end communication flows tend to be at the transport layer (layer 4 of the OSI model). Sometimes, for example inside cellular wireless networks, packets go through tunnels like GTP (GPRS tunnelling protocol), and additional processing to get through the tunnels is required. This level of processing tends to be based on standard protocols, e.g. TCP, IP, UDP, GTP, etc.

Above the transport layer, DPI processing needs to deal with proprietary and new protocols. Data from the packet flow is extracted and used to match against signatures of strings and regular expressions. By analysing which signatures have gotten a match, the DPI application gets some clues on what the application, content and user behind the packet flow may be. The DPI application also maintains statistics like bit rate of the individual flows. Such statistics provide heuristics which helps to identify the application; a video streaming application, for example, would consume some typical bit rates. In addition, ports and header fields of known protocol layers can give additional clues.

Once the application, content and user information is identified, the DPI application can utilise the collected intelligence to work on security, QoS, and policy enforcement types of processing.

Typical DPI application: applications recognition and analysis



Accelerating DPI

DPI processing is complicated, especially when put into the context of very high network throughput. For example, each CPU in the new generation deployed cellular core network node can process up to 40Gbit/s of network traffic today. Next generation CPUs, which will be sampling soon, can process up to 100Gbit/s. In order to perform DPI processing and other functions, e.g. standard wireless core network functions, at 100Gbit/s line rate, while minimising power consumption, hardware acceleration is required. Software-only implementation cannot meet the performance nor power requirements.

The decisions on what types of processing to accelerate are driven by balancing performance and power requirements against flexibility. In other words, the hardware accelerators must provide many times speed up and must not take away flexibility from the application developers when implementing the software application.

Here are examples of hardware accelerators based on an example of a next generation CPU which delivers 100Gbit/s of DPI and application processing throughput, and is optimised for high performance applications like wireless core network. Note that this CPU is really an SoC (system-on-chip) which integrates 64-bit CPU cores, a wide variety of application-relevant hardware accelerators, 4xDDR3 or DDR4 memory channels, and a rich set of I/O interfaces like 40GE, 10GE, PCIe Gen3, Interlaken and Interlaken-lookaside ports.

Typical DPI processing flow: applications recognition and analysis



Packet receive hardware: Parses received packets’ standard and proprietary layer 2 through layer 4 or higher layer headers to classify packet flow, QoS, and other meta data for subsequent processing.

TCP acceleration hardware: Acceleration of time-consuming TCP processing steps like checksum calculation and checking, and TCP re-transmission management provide the performance, power efficiency and flexibility desired. Note that black-box type TCP offload is not desired, because it would be too inflexible.

Multi-field classification look up hardware: In order to classify packet flows to very fine granularity, multi-field classification hardware can speed up the classification process drastically. For example, a flow may be classified by the source and destination IP address, source and destination ports, protocol field, proprietary header fields, certain content in the packet data, etc. There may be 8, 10, or more tuples (i.e. fields). Each tuple may require different kinds of look-up, including range match, wildcard match, or exact match. Traditionally, an additional specialty hardware component called TCAM is used along with the CPU. Some next generation CPUs integrate multi-field classification look-up hardware on-chip.

Pattern matching hardware engines: Once the software reassembles end-to-end communication flows, it offloads the string and regular expression based signature look up to these engines. It is important for these engines to be able to work across packet boundaries seamlessly, in order to look up data from a flow which spans multiple packets. In addition, users must be able to use comprehensive and advanced syntax to write regular expressions. For example, syntax of PCRE (Perl compatible regular expressions) and POSIX (Portable Operating System Interface) must be supported. Advanced syntax like back-references should be supported. Users can develop sophisticated signatures and easily maintainable and portable signatures with such support.

Compression/decompression engines: In case the data is compressed, hardware offload for decompression and compression provides significantly higher performance at lower power consumption as compared to using software to decompress and compress the data.

Crypto engines: In case the data is encrypted and the DPI application is at the point of the network where it has the security keys, crypto engines provides significantly higher performance at lower power consumption as compared to using software to decrypt and encrypt the data.

Packet transmit hardware: Packet transmit hardware not only offloads the CPU from processing the transmission of packets, but also provides integrated traffic management, policing, shaping, and scheduling functions. Based on the intelligence learnt from DPI processing, packets to be transmitted are processed based on more intelligent QoS and policy enforcement decisions.

Example of a 100Gbit/s DPI Soc (Octeon III CN78XX)



DPI is a fundamental capability required by modern networking applications. DPI processing is complex and benefits enormously from hardware acceleration. Many steps of the overall DPI processing flow can be accelerated effectively. Pattern or regular expression matching is only one of these steps. The complete list of hardware accelerators relevant to DPI processing mentioned above is integrated into the OCTEON III CN78XX multicore SoC processor, a next generation 100Gbit/s DPI capable CPU.

Author profile: Kin-Yip Liu is the Senior Director, Systems Applications Engineering, Infrastructure Processor Group, Cavium Inc,. Liu has worked in the field of microprocessors, network processors and multicore processors for 23 years at Intel and Cavium. He designed and architected x86, Itanium and IXP processors for Intel and now leads Cavium’s multicore SoC engineering team.

Product Spotlight

Upcoming Events

View all events
Newsletter
Latest global electronics news
© Copyright 2024 Electronic Specifier