Microarchitectural Vulnerabilities in Heterogeneous Computing and Cloud Systems

Weissman, Zane

Etd

Microarchitectural Vulnerabilities in Heterogeneous Computing and Cloud Systems

Öffentlich Deposited

This dissertation brings together some of the defining trends of early 21st century computing—cloud computing, heterogeneous computing, and the increasingly complex microachitectures that support them—and analyzes the microarchitectural threat landscapes of these systems while presenting a number of new vulnerabilities which they introduce. Microarchitectural attacks made headlines in 2018 with the disclosures of Spectre and Meltdown, two CPU vulnerabilities affecting devices as diverse as smartphones, laptops, and enterprise servers, and the following wave of research interest has uncovered countless variants and new, unrelated microarchitectural vulnerabilities. These vulnerabilities stem from bugs (and features) of the increasingly complex microachitectures of modern CPUs that squeeze as much performance as possible out of every transistor. While it remains to be seen if Moore's law is truly dead, diminishing returns in CPU performance gains and increased interest in specialized computing tasks as diverse as 3D graphics rendering, cryptography, and machine learning have driven the development of GPUs, FPGAs, and other devices that work heterogeneously alongside traditional CPUs. CPUs, in turn, have gained new features to tightly integrate these peripherals with their large shared caches and main memory for heterogeneous parallelism. Cloud service providers (CSPs) have lowered the cost of entry to heterogeneous computing by leasing compute time on GPUs and FPGAs, and even customers who don't specifically rent one of these peripherals will often have their CPU workloads optimized by smart network cards and storage devices that rely on the same technologies. However, the integration of these devices broadens the attack surfaces of several known microarchitectural vulnerabilities and even introduces new ones. This is of particular concern in cloud environments, where service providers share computational resources of all kinds between many users to maximize power and cost efficiency, often in ways that are opaque to their customers and to end users whose private data is handled in shared environments. Therefore, CSPs bear a major responsibility in ensuring that microarchitectural threats are mitigated where possible, and that client workloads are architecturally isolated where mitigations don’t exist or are unfeasibly expensive to implement. First, this work analyzes the architectural features of Intel's Arria 10 GX FPGA platform, a system designed specifically for heterogeneous computing and presents the first FPGA to FPGA, FPGA to CPU, and CPU to FPGA cache timing side-channels. Then, we present Jackhammer, a novel, efficient, and stealthy hardware implementation of the Rowhammer for the Arria 10 GX FPGA. Next, we show that I/O memory management units—intended to ensure proper isolation of peripherals—are the source of a new attack surface between FPGAs, GPUs, or other peripherals. Turning to cloud computing, we investigate the microarchitectural security of the Firecracker virtual machine manager, which powers a significant portion of AWS's compute services. we demonstrate that Firecracker provides negligible defense against major classes of microarchitectural attacks, uncover holes in AWS's setup recommendations for the microarchitectural security of Firecracker production hosts, and even identify a variant of the Medusa side-channel attack that works in a Firecracker virtual machine but not outside of it. Finally, we investigate a new fault injection technique enabled by the latest and most sophisticated Rowhammer techniques: adjacent bit memory faults; these faults in combination with a recently discovered lattice attack algorithm enable an incredibly powerful key recovery attack against ECDSA signatures. we establish the existence of such faults in modern hardware and lay out solutions to the practical problems an attacker must address to put together a real-world attack. In the course of evaluating these vulnerabilities, we also suggest and analyze a variety of countermeasures, both hypothetical and available, to be implemented in hardware, firmware, system software, or user-level software. We highlight the specific challenges of securing heterogeneous and cloud systems against microarchitectural attacks and emphasize the need for defenses at every level. We hope that this work encourages hardware designers, cloud systems engineers, and cloud service developers to reassess threat models and isolation assumptions when developing secure systems with shared and heterogeneous hardware.

Creator