Scaling up Servers

4th June 2015

Phil Ling

0 0

With massive projected growth attributed to the Internet of Things, how are semiconductor vendors keeping up with demands for capacity and performance from internet and data centre infrastructure? Sally Ward-Foxton finds out.

With more and more connected devices becoming part of the Internet of Things (IoT), the amounts of data being communicated between these devices is increasing at an extremely rapid rate. Coupled with this, evolution of IT systems which now place all or most data in remote servers and data centres (‘in the cloud’) means that new server technologies are having to develop quickly to keep up with demands for increased capacity and bandwidth. These requirements are passed on to semiconductor manufacturers, but how are they responding to demands for space saving, increased performance and better energy efficiency from the server market?

Curtis Pulley, Data Centre Business Development Manager at Xilinx, explains that heterogeneous computing, or using more than one type of processor in a multi-core system, is a rapidly accelerating trend in the data centre industry. The idea is that dissimilar processors have different strengths and therefore handle different types of task better, so performance can be improved if it is implemented correctly. This is leading to increased use of FPGAs in the server world; FPGAs can be programmed to perform specific algorithms in hardware, making them an effective hardware accelerator for number-crunching tasks.
“FPGAs’ parallel architecture is inherently optimised to accelerate specific workloads in data centres to achieve superior performance per Watt metrics for a given compute workload,” commented Pulley, adding: “Xilinx is investing in this area on multiple fronts to deliver to the exponentially growing compute needs in the data centre while delivering the optimally powered solution. The internet of things is seen as one of the drivers pushing these compute demands in the data centre.”
Pulley also points to the leading edge process technology used by semiconductor manufacturers to deliver more and more impressive power and compute features. For example, Xilinx’s 7 Series of FPGAs uses the 28nm process node, but forthcoming UltraScale products are currently ramping in 20nm, and will be in 16nm by the end of this year, he says. In general, continuing to move to lower process nodes when they become available means smaller and more cost-effective FPGAs, though this is sometimes at the expense of leakage current and therefore power consumption. This trend is set to continue as new transistor architectures are invented to minimise leakage.
Getting the best possible performance and power efficiency out of FPGAs also depends on how you use them. Xilinx has also been working on tools which help maximise the devices’ utilisation and productivity.
“Critical to the IoT space, we announced the SDNet Design Environment last year,” said Pulley. “SDNet enables the easy creation of high performance packet processing systems, based on compiling high-level user defined specifications to optimise all-programmable FPGAs and SoCs. This capability enables the creation of solutions to manage the new network traffic flows being driven by the emerging IoT space.”

Performance bottlenecks
Historically in the server market, there has been a tendency to neglect the I/O and networking capability necessary for successful utilisation of the compute power provided by all the processors. This causes problems because the compute power provided by today’s high end CPUs and FPGAs is wasted if it can’t communicate with the rest of the system at top speed.
Mario Maccariello, Senior Marketing and Business Development Manager for the Computer and Storage Business Unit at Altera, says FPGA makers are aware of this bottleneck and are continuing to do their part to improve I/O performance: “Often the available compute FLOPs are not efficiently utilised as they are starved of data due to the memory bandwidth and I/O bottleneck,” he said, explaining that the specific challenge of IoT applications is that they create a massive amount of data, most of which is not useful.
“Altera FPGAs are used to filter this data so that only ‘actionable’ data is stored and processed in the data centre,” he explained. “In addition to filtering data in real time, FPGAs are used to compress and, where appropriate, encrypt data at line speed. This was traditionally done by the CPU, but an FPGA offloads these functions from the processor, significantly improving data centre efficiency and reducing the requirement for storage and network bandwidth.”
“Altera’s FPGAs can significantly reduce the bottleneck through filtering and compression, and also by providing high bandwidth, high efficiency 2.5D data paths between the memory and the on-FPGA CPU offload engine,” he added.
Maccariello explained that FPGAs’ popularity in the server market has been helped by the development of the OpenCL framework, which allows users with only software development experience to program FPGA hardware. Altera offers an OpenCL SDK which includes an emulator that steps through the code on an x86 to make sure it’s correct, a profiler for the kernel performance to ensure memory coalescence and stall free pipelines, and an OpenCL compiler which produces the entire FPGA image in one step. Commercial off-the-shelf server-qualified FPGA hardware is also available from third-party Altera partners for those not willing or able to learn OpenCL.
Of course, it helps that Altera’s FPGA technology has been adopted by several high profile users, such as Microsoft and IBM, showing FPGAs to be an affective hardware accelerator for datacentre applications, enabling more throughput and lower latencies.

Memory and storage
As well as computing power, data storage demands are expanding rapidly and it’s often difficult for data centres to predict how much storage will be required. Memory and storage technology for servers is rapidly evolving to keep up and allow systems to achieve the necessary flexibility and scalability.
“Cloud-computing storage technology has significantly improved enterprise operations, with adoption rates soaring as the technology matures,” commented Paul Rowan, General Manager for Storage Products at Toshiba Electronics Europe, explaining that hard disk drives (HDDs) are still widely used in enterprise server applications between a terabyte and a petabyte, due to their low cost per gigabyte and higher capacities per drive. This situation is slowly changing with the increasing affordability of enterprise SSDs, which is driven by a reduction in the cost per gigabyte.
“This reduction in cost per gigabyte (for SSDs) is being supported by the adoption of new cell technologies such as the shift from single level to multi-level cell NAND flash memory, the shift to smaller processing nodes that enable higher bit densities and therefore greater capacities per unit size of chip, and increased economies of scale as demand grows,” Rowan said.
Recently, Toshiba announced development of the world’s first 48-layer three dimensional stacked cell structure flash memory called BiCS, a 2-bit-per-cell 128-gigabit (16 gigabytes) device. The BiCS devices are based on a 48-layer stacking process, which enhances the reliability of write/erase endurance and boosts write speed.
“The decision as to how much data to store on different types of drives is determined by analysing the performance requirements of the data and the cost per gigabyte,” Rowan said. “As the cost per gigabyte of NAND flash memory falls, the trend will increasingly favour flash-based memory options.” Rowan also pointed out that the current 6GBit/s SATA interfaces used by consumer HDDs and SSDs are a bit of a bottleneck for enterprise SSDs; he expects that increasing numbers of systems will migrate to faster interfaces that do not throttle performance.
“In storage systems, 12Gbit/s SAS interfaces can enable up to four times the internal performance capabilities of SATA interfaces,” he claimed. “It is worth noting that NAND flash-based storage can even outperform the SAS 12Gbit/s interface, and this will lead the increasing adoption of PCI Express-based enterprise SSDs.”
PCIe interfaces support data transfer rates of 8Gbit/s per lane, with multiple lanes providing almost unlimited scalability of the total data rate. Though they’ve been used in all-flash arrays for some time, there are some drawbacks; they lack hot-swap capability, and it can be rather complex to set up as special proprietary drivers are needed to allow flash to use the PCIe bus. A new standard called NVM-e (non-volatile memory express), which uses the PCIe protocol with a SAS/SATA interface, solves these issues, but it will take a few more years, Rowan believes.
“The missing piece of the ecosystem of controller and expander modules will limit the usage of PCIe-based enterprise SSDs to a few devices directly integrated into the compute server,” he commented. “Wide usage of PCIe based SSDs in larger storage arrays will require an ecosystem of controller and expander modules similar to that established for SATA/SAS. This will take a few more years to establish.”