Rethink Lighting Estimation for 3D Vision-enabled Mobile Augmented Reality

Zhao, Yiqin

Etd

Rethink Lighting Estimation for 3D Vision-enabled Mobile Augmented Reality

Public

Omnidirectional lighting provides the foundation for achieving spatially-variant photorealistic 3D rendering, a desirable property of mobile augmented reality applications. However, estimating omnidirectional lighting is challenging in practice due to limitations such as lack of access to 360◦ panorama of the rendering position and the inherent lighting and user dynamics. A new opportunity arises recently with the advancement in mobile 3D vision, including built-in high-accuracy depth sensors and deep learning-powered algorithms, which provide the means to better sense and understand the physical surroundings. Centering the key idea of 3D vision, in this thesis, we first design a fast and accurate lighting estimator PointAR. Leveraging our PointAR model, we conduct an empirical measurement that demonstrates the correlation between scene coverage and lighting estimation accuracy—a key insight we further embark on the design of Xihe, a 3D vision-based lighting estimation framework with a co-designed neural network XiheNet. Together, PointAR and Xihe provide mobile AR applications the ability to obtain accurate omnidirectional lighting estimation in real-time. In PointAR, we introduce an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art mobile deep learning models. Our pipeline, PointAR, takes a single RGB-D image captured from the mobile camera and a 2D location in that image, and estimates 2nd order spherical harmonics coefficients. This estimated spherical harmonics coefficients can be directly utilized by rendering engines for supporting spatially variant indoor lighting, in the context of augmented reality. Our key insight is to formulate the lighting estimation as a point cloud-based learning problem directly from point clouds, which is in part inspired by the Monte Carlo integration leveraged by real-time spherical harmonics lighting. While existing approaches estimate lighting information with complex deep learning pipelines, our method focuses on reducing the computational complexity. Through both quantitative and qualitative experiments, we demonstrate that PointAR achieves lower lighting estimation errors compared to state-of-the-art methods. Further, our method requires an order of magnitude lower resource, comparable to that of mobile-specific DNNs. In Xihe, we develop a novel sampling technique that efficiently compresses the raw point cloud input generated at the mobile device. This technique is derived based on our empirical analysis of a recent 3D indoor dataset and serves as a key component in our 3D vision-based lighting estimator pipeline design. With the proposed novel sampling technique, we further derive an optimized model XiheNet based on PointAR that well suits our system overall design. To achieve the real- time goal, we develop a tailored GPU pipeline for on-device point cloud processing and use an encoding technique that reduces network transmitted bytes. Finally, we present an adaptive triggering strategy that allows Xihe to skip unnecessary lighting estimation and a practical way to provide temporal coherent rendering integration with the AR ecosystem. We evaluate both the lighting estimation accuracy and time of Xihe using an iOS application developed with Xihe’s APIs. Our evaluation shows that Xihe takes as fast as 18.8ms per lighting estimation and achieves 9.4% better estimation accuracy than a state-of-the-art neural network.

Creator