MiVEC: Michigan Visual and Experiential Computing


This is the website for the project IIS 1539011, VEC: Medium: Large-Scale Visual Recognition: From Cloud Data Centers to Wearable Devices, jointly sponsored by the National Science Foundation and Intel.

Project Goals

This effort seeks to advance the core capabilities of large-scale visual recognition by co-designing visual models and computing infrastructure. The goal is to enable encyclopedic, real-time visual recognition through seamless integration of visual computing on wearable devices and in the cloud. The PIs envision a wearable visual recognition system that continuously captures live video input while providing intelligent, real-time assistance through automatic or on-demand visual recognition by means of a combination of computation at the device and offloading to the cloud. Such a system is not currently feasible due to a number of fundamental challenges. First, the severe energy and thermal constraints of wearable devices render them incapable of performing the intensive computation necessary for visual recognition. Second, it remains an open question how to support encyclopedic recognition in terms of both visual models and data center infrastructure. In particular, it remains unclear how current visual models, although highly successful at recognizing 1,000 object categories, can scale to millions or more distinct visual concepts. Moreover, such an encyclopedic visual model must be supported through data center infrastructure, but little progress has been made on how to build such infrastructure. This project addresses these fundamental challenges through an interdisciplinary approach integrating computer vision, hardware architecture, VLSI design, and heat transfer.

The PIs will investigate three research thrusts. In Thrust 1, the PIs will develop a new type of deep neural networks that allow resource-efficient execution of modules. This new framework provide a unified way to design, learn, and run scalable visual models that can maximize the utility of recognition subject to resource constraints, such as latency, energy, or thermal dissipation of a wearable device. In Thrust 2, the PIs will design and fabricate a visual processing chip capable of computational sprinting (bursts of extreme computation well above steady-state thermal dissipation capabilities), leveraging the new framework developed in Thrust 1. In Thrust 3, the PIs will design datacenter infrastructure that supports large-scale hierarchical indexing of visual concepts for encyclopedic recognition, with a focus on latency, throughput, and energy efficiency. Finally, the PIs will build a demonstration system to evaluate the proposed algorithms, software, and hardware components and to assess the overall performance of an end-to-end system.

Principle Investigators


Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.