October 17, 2016 Stacey Higginbotham
Microsoft’s embrace of programmable chips knowns as FPGAs is well documented. But in a paper released Monday the software and cloud company provided a look into how it has fundamentally changed the economics of delivering hardware as a service thanks to these once-specialty pieces of silicon.
Field programmable gate arrays, or FPGAs, are chips where the logic and networking functions can be reconfigured after they’ve been manufactured. They are typically larger than similarly functioning chips and traditionally were made for small jobs where the performance advantage outweighed the higher engineering cost associated with designing them.
But thanks to the massive scale that Microsoft and other cloud giants require in their computing operations, that tradeoff is changing. “When you go to scale like we have, with tens of thousands of servers running a service that is generating a lot of revenue, you can afford to run a really crack team of 10 hardware designers to be perpetually keeping their image up with the software,” says Peter Burger, the head of Microsoft’s Project Catapult.
Back in 2014 Microsoft created Project Catapult, an effort to think about changing the 18-month to 2-year delivery cycle for hardware to better match its underlying silicon to the workloads its chips were running. Burger says the thinking was that if it could do this well it could create an economic and performance advantage because it could change its underlying hardware at speeds that were closer to hardware speeds.
This thinking, by the way is not new. When Facebook embarked on its Open Compute Initiative and built its own servers it had a similar goal; to innovate on hardware at a speed closer to software. But in redesigning its very chips for a specific workload Microsoft is taking this idea a step further.
Burger looks at this like a different type of computing, likening a general purpose CPU that processes each instruction in an in-order sequence on a processor as temporal computing. FPGA’s he says are spatial computing, with an instruction laid out in hardware on the chip and the data funneled through the right path.
An example of this philosophy in action is with Bing, the Microsoft search engine. As Microsoft tweaks algorithms related to search it can implement changes in the silicon to make the searches faster or less expensive. According to one source, the Catapult hardware costs less than 30% of all of the other server gear, consumes less than 10% of the power and processes data twice as fast.
Bing was actually one of the first test cases for Project Catapult back in 2014. But after Burger built a server that could work for Bing, Microsoft decided that it didn’t make economic sense to have an entire FPGA effort that only worked in one aspect of its business. So Burger started over and built an architecture that could support all of Microsoft’s scaled out businesses, from Bing to Azure and even one day, to machine learning.
Because it wasn’t enough to just speed up the processing of search on a node, and instead think about how to use FPGAs to speed up things like networking, Burger and his team came up with a different architecture.
Instead of having the FPGAs in a cluster of servers talk to a top of rack switch, the FPGAs sit between the servers’ NICs and the Ethernet network switches. Thus, all of the FPGAs are linked together in a network and network traffic is sent through the FPGA network. The FPGA can still be used as a local compute node because it also has an independent PCIe connection to the servers’ CPU.
So the CPU can send tasks to FPGA when needed, but the FPGAs can also communicate together to accelerate networking. In this case, the FPGAs can be used as a network processor, which allows Azure to offer incredibly low-latency in its cloud business. From the paper released Monday:
By enabling the FPGAs to generate and consume their own networking packets independent of the hosts, each and every FPGA in the datacenter can reach every other one (at a scale of hundreds of thousands) in a small number of microseconds, without any intervening software. This capability allows hosts to use remote FPGAs for acceleration with low latency, improving the economics of the accelerator deployment, as hosts running services that do not use their local FPGAs can donate them to a global pool and extract value which would otherwise be stranded. Moreover, this design choice essentially turns the distributed FPGA resources into an independent computer in the datacenter, at the same scale as the servers, that physically shares the network wires with software.
This architecture allows some pretty powerful things to happen. Azure can add or support new networking protocols. Elements such as new encryption technology can be applied universally. And while Burger was cagey when asked about how quickly the FPGAs can be re-programmed, the sense was that it would take weeks not months.
The biggest challenge for Microsoft as it embarks on this new strategy are poor designs, and trying to apply FPGAs to workloads that aren’t big enough to reap the reward. For example, Burger says, “machine learning is not a big enough workload to go to scale yet.”
At the end of the day Burger’s team has to ensure that every request for a design pays for the hardware cost of designing the new image. Before we had massive workloads that benefited from more specialized and efficient computing FPGAs were an expensive luxury for defense and rarified applications.
Microsoft’s real insight is that now FPGAs can be economical for cloud giants. And that it figured out how to use them in a distributed fashion.
Stacey Higginbotham has spent the last fifteen years covering technology and finance for a diverse range of publications, including Fortune, Gigaom, BusinessWeek, The Deal, and The Bond Buyer. She is currently the host of The Internet of Things Podcastevery week and writes the Stacey Knows Things newsletter all about the internet of things. In addition to covering momentum in the Internet of Things space, Stacey also focuses on semiconductors, and artificial intelligence.