Algorithmic trading made easy
Since the millennium algorithmic trading has been gaining traction and is now widely used by investment banks, pension, mutual and hedge funds, that may need to spread out the execution of a larger order or perform trades too fast for human traders to react to. A study in 2019 showed that around 92% of trading in the foreign exchange market was performed by trading algorithms rather than humans. However, systems that are deployed today are by no means perfect, and the industry is split between those using hardware and those using software. This has created a huge gap in terms of capability and performance. Alastair Richardson, Global Business Development- Financial Technology, Xilinx, explains.
Those that are deploying accelerated hardware with the aid of FPGA-based technology and those using software-based CPUs, were described by Richardson as the ‘haves and the have-nots’ of algorithmic trading. However, despite the advantages, up until now there have been multiple barriers to entry that have led to the reduction in take-up of hardware acceleration, namely the need to hire hardware developers; significantly high costs; long lead times; and often high risks associated with projects that involved moving solutions to FPGAs.
There is a huge drive within the trading industry towards achieving lower latency, as this can significantly impact a trader’s total transaction cost analysis. “If you look at the overall lifecycle of a trade, if it’s not fast enough compared to the competition, this can negatively impact the value of the trade,” said Richardson.
“Therefore, the lower latency you can achieve, the more likely you are to find alpha (the measure of return on investment - ROI), and the more opportunities there are to trade. It also minimises your losses to high frequency trading firms, who can detect your algorithm while you’re still in execution, which effectively enables slippage or loss of potential revenue as the market moves, before you can finish executing your overall strategy.” The need for lower latency is continuously evolving and getting ever more competitive.
However, a problem has arisen in the fact that CPUs (central processing units) have hit their limit – they are no longer getting any faster. And because everything in electronic trading is network connected, and a trade has to get from the CPU or logic base to the network, you are limited by the PCI (peripheral component interconnect) on a standard CPU-based infrastructure. “With an FPGA, however, you’ve got the ability to directly connect into the network and gain that latency advantage that’s just impossible to achieve with a CPU or a software-based solution,” Richardson added.
The entry to high frequency trading (HFT) has traditionally been quite expensive. However, it is now beginning to become available to a much wider market and give software developers the ability to transition and take advantage of hardware.
Xilinx Accelerated Algorithmic Trading
To meet this need Xilinx has introduced Accelerated Algorithmic Trading, a composable, open-source trading system that enables traders to implement sophisticated strategies with sub-microsecond latency.
Richardson added: “This is the first time it has been possible for a software developer to buy out of the box standard Xilinx hardware, and have a full solution stack with open-source Vitis libraries, with zero license fees.
“Full algorithmic trading reference designs are available to start porting your software-based platform onto an FPGA. It is designed for a low latency framework, and to provide all the required IP cores that you need to start porting your application onto hardware.”
A key element of Xilinx Accelerated Algorithmic Trading, Richardson explained, is the ability to achieve the microsecond latency which is impossible to achieve with a CPU-based system, adding: “This is a real breakthrough for a software developer.”
In addition, the fact that it has been developed as an open-source platform means that users can use Xilinx’s libraries, change and manipulate them, but also retain and keep their IP. This is key for all trading firms who want to create a differentiator in both their algorithm, the way they interact with the number of exchanges worldwide, and in their respective asset classes. “This modularity and the ability to plug-and-play these various components is key to being successful in this space,” Richardson continued.
As an example, Xilinx Accelerated Algorithmic Trading has been rolled-out at the CME, one of the largest exchanges in the world, and all based on fixed trading systems which are used on a number of exchanges around the globe. This technology and design can be ported very easily between the different exchanges.
It also allows IP core to be mixed and matched, both with in-house exiting IP cores, and with third parties. Richardson added: “For example, you can take our tick to trade system with our existing TCP core, but you might want something specific for a particular market, to give you that little edge in the part that you need.
“That component can be swapped out with many of our partners going forward, who can mix and match these IP cores into their design to give them the required performance that they need. It’s also the first time there has been had a standardised piece of hardware and a platform where you can test out multiple vendors without having to buy bespoke hardware for that specific application.
“This really gives the end trader and the end user a lot of confidence, not only that they’ve got a stable platform to build upon, but that it’s future-proofed. And that really helps accelerate their time to market, which can be as low as weeks, in many cases, not the years that people were used to.”
The pathway to lower latency
Today, the vast majority of Xilinx customers within the software space have been accelerated using Xilinx Solarflare technology, which has helped traders get an edge from CPU-based trading by transmitting data into the CPU with as low latency as possible.
At the other extreme, Xilinx has seen a lot of high frequency traders go down custom hardware development routes, creating their own boards, and integrating Xilinx Ultrascale IP and chips into their own designs. While this requires a huge amount of resources and development, it does help to achieve the lowest possible latency.
What Xilinx is delivering with the Accelerated Algorithmic Trading platform is creating two new use cases - bridging the gap to getting down to as low latency as possible and enabling software developers to significantly reduce the latency of their systems.
Richardson continued: “That’s done in two ways. One is a hybrid CPU plus hardware, which allows you to accelerate the core and the critical components of your algorithm on Xilinx Alveo boards, but retain portions of your code on the CPU as well. It gives you that stepping stone to getting started in hardware.
“And then we can actually accelerate your entire application to give you the lowest possible latency and achieve sub microsecond trading capabilities on hardware. This is the first time that this has been achievable in hardware from a standardised platform.”
Xilinx Accelerated Algorithmic Trading can be used in a number of different areas within the electronic and algorithmic trading landscape. Some of Xilinx’s key initial customers have been brokers who historically have only used FPGAs for very small use cases, and only for their lowest latency customers – due to the complexities and only being able to roll this out in a small number of markets.
“What we’ve announced allows these customers to take that to the next level. And instead of supporting five or ten exchanges, you can support all of the exchanges globally because you can leverage your internal software developers to make that dream become a reality,” added Richardson.
Going forward the exchanges themselves can also standardise based on Xilinx’s platform. Xilinx has seen several exchanges, like NASDAQ and others, who have developed FPGA accelerated market data feed analysis, but have never gone for wide scale deployment.
“Partly this is because of the interaction,” said Richardson. “People are looking at moving things into the cloud, the state of scalability and the longevity of these platforms going forward. But based on the Xilinx Accelerated Algorithmic Trading platform, there is now a standardised platform to build upon which is going to work in the cloud, across the board, to enable you to have a reliable platform to build future exchanges on and really enable that next generation of low latency deterministic trading.
“You’ve got market data vendors who need to make sure that they are competitive, especially as the exchange volume starts to increase. And it’s in those small microseconds where the exchanges go wild, and that is where you make or lose money.”
The flash crash of 2010, a trillion-dollar stock market crash, is an indication of how quickly people can lose money. So, having FPGA or hardware accelerated capabilities that allow you to react in an appropriate way, is critical to making sure that you’ve got an edge and you can minimise potential losses in those kind of situations.
Richardson added: “It really creates a whole new capability that isn’t available in software. The Xilinx Accelerated Algorithmic Trading platform is available today on Alveo U50 and Alveo U250, which can be obtained through the major OEMs in our network and our vendors globally. And all you have to do is buy a board and go and download this directly from Xilinx.com. It’s open source and licence free. And it’s really the first time you’ve ever been able to take an open source library like this and start deploying and moving your application onto hardware. It’s an exciting time for us and an exciting time for the industry as a whole.”