Redesigning Deep Neural Networks for Resource-Efficient Inference


Downloadable Content

open in viewer

Deep learning allows mobile applications to provide novel and useful features. However, the current deep inference paradigm is built on top of utilizing server-centric deep learning models, leading to high demand for computational resources. For the resources-constrained platforms such as mobile devices, the model with low computational cost and memory requirement is more desired. Besides the current studies on model compression methods, we addressed the resource-efficiency problem from the following four aspects: 1) Collaborative inference. Traditional paradigms to support mobile deep inference falls into either cloud-based or on-device---both require access to an entire pre-trained model. As such, the efficacy of mobile deep inference is limited by mobile network conditions and computational capacity. In this study we investigate collaborative inference, a means to split inference computation between mobile devices and cloud servers, to address the limitations of traditional inference through techniques such as image compression or model partition. 2) Recurrent attention model (RAM). As an alternative of expensive CNN, RAM was proposed as a computationally efficient model for CV tasks. In this thesis, we investigate the attention model for classification problems involving multiple ROIs. We design a double RNN architecture to disentangle the potential conflict in RAM, and propose a reward mechanism to train the model using the guidance information of ROIs. 3) Dynamic inference, an emerging technique that reduces the computational cost of deep neural networks. One way to achieve dynamic inference is to leverage multi-branch neural networks that apply different computations on input data by following different branches. In this study, we investigate the problem of designing a flexible multi-branch network and early-exiting policies that can adapt to the resource consumption to individual inference request. We propose a lightweight branch structure that also provides fine-grained flexibility for early-exiting and leverages the Markov decision process (MDP) to automatically learn the early-exiting policies. 4) Resource-efficient Multi-Task Learning. Multi-task learning (MTL) is a promising paradigm for improving the test accuracy of deep learning models that have to train with limited datasets. In this study, we investigate the problem of Resource-efficient Multi-Task Learning (MTL), where the goal is to design a resource-friendly model that suits resource-constrained inference environments. We proposed a novel solution for fine-grained parameter sharing, called FiShNet, which can learn how to share parameters directly on the training data. FiShNet can achieve high accuracy comparable to soft-sharing approaches, while only consuming a constant computational and memory cost per task.

  • etd-47231
Defense date
  • 2022
Date created
  • 2022-01-26
Resource type
Rights statement


In Collection:



Permanent link to this page: