Concurrent Deep Learning Workloads on NVIDIA GPUs
PublicContenu téléchargeable
open in viewerDeep learning GPU servers that execute latency-sensitive inference requests from clients often seek to run training tasks alongside inference when there are idle resources in order to improve overall system utilization. We empirically derive the thread block scheduler’s behavior under such concurrent workloads for NVIDIA’s Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM’s local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We then investigate the performance of current concurrency mechanisms on NVIDIA’s new Ampere microarchitecture under deep learning workloads and demonstrate that fluctuating resource requirements and kernel runtimes make executing such workloads while maintaining consistently high utilization and low, predictable turnaround times difficult on current NVIDIA hardware. Moreover, we conclude that the lack of sufficiently flexible preemption policies, robust task prioritization mechanisms, and contention-aware thread block scheduling techniques limits the effectiveness of NVIDIA’s concurrency mechanisms. We estimate that through the use of block-level, contention-aware preemption, it is possible to achieve 1.5X speedups in turnaround time with comparable utilization and improved predictability, as long as preemption overhead remains under 1-2ms.
- Creator
- Contributeurs
- Degree
- Unit
- Publisher
- Identifier
- etd-21846
- Mot-clé
- Advisor
- Defense date
- Year
- 2021
- Date created
- 2021-05-04
- Resource type
- Rights statement
- Dernière modification
- 2023-12-05
Relations
- Dans Collection:
Contenu
Permanent link to this page: https://digital.wpi.edu/show/pn89d976b