With Moore’s law coming to an end, storage systems are turning to hardware accelerators such as FPGAs to offload computing-intensive tasks from the CPU. However, provisioning these accelerators comes with a hefty price tag.
Researchers at KTH Royal Institute of Technology and three other universities have found that there is an alternate way to offload computing without making such investments. As it turns out, commodity Network Interface Cards (NICs) that support RDMA—a feature that allows directly reading/writing server memory—are Turing complete. This means that they are powerful enough to perform any arbitrary computations rather than just simply sending and receiving packets. In other words, these NICs can be effectively converted into smaller processors to offload computing tasks, reducing the burden on server CPUs. Moreover, this can also cutdown energy consumption since NICs employ low-power chips.
According to the paper, which was published at NSDI 2022 in the spring, such offloads can be done without any hardware modifications to these NICs. To do so, the authors created a framework called RedN that combines together RDMA operations (which perform memory reads/writes) to express more sophisticated constructs, such as conditional statements and even loops.
“The cool thing about this finding is that RDMA NICs are commodity, so they are much more accessible for offloads,” says Waleed Reda, the lead author on this paper and researcher at KTH. “As such, the potential for impact is much higher since there are millions of these devices already deployed in today’s datacenters.”
Evolving the RDMA standard
“RedN should make it easier for researchers to experiment with NIC offloads and help accelerate innovation in this area,” says Waleed. “Moreover, depending on how people use RedN, I believe our framework can create enough traction to push for changes in the RDMA standard itself, to perhaps add more advanced RDMA operations that improve offload efficiency.”
The paper has evaluated the benefits of RedN showing that it can fully-offload GET operations for a popular key-value store called Memcached—reducing CPU cycles and improving latency by up to 2.6x and 35x in lightly-loaded and heavily-loaded settings, respectively.
What is next?
“This work opens many opportunities for follow-up research. Our paper mainly focused on offloading common storage tasks such as accessing remote hash tables for Memcached. However, there are many other potential applications that can be targeted, including database transactions, distributed machine learning, and many others,” says Waleed.
“Beyond that, we are also looking into automating RDMA code generation to make it easier for developers to use RedN,” he adds. “Down the road, we might opt to create a compiler that converts C-like language into executable RDMA code to further reduce development time.”
The RedN project has been made available as open source to facilitate further research and experimentation using this framework. RedN was partly supported by the ERC project ULTRA.