November 2nd, 2019 in conjunction with ICCV (Oct 27 – Nov 2)
Room 318A | COEX Convention Center | Seoul, Korea
Virtual (VR) and Augmented Reality (AR) has garnered mainstream attention with products such as the Oculus Rift and Oculus Go. However, these products have yet to find broad adoption by consumers. Mass market appeal for these products may require revolutions in comfort, utility, performance, and consideration of user awareness and privacy related to eye-tracking features. These revolutions can, in part, be enabled by measuring where an individual is looking, where his/her pupils are, and his/her eye expression – colloquially known as eye tracking. For example, foveated rendering greatly reduces the power required to render realistic scenes in a virtual environment.
The goal for this workshop is to engage the broader community of computer vision and machine learning scientists in a discussion surrounding the importance of eye-tracking solutions for VR and AR that work, for all individuals, under all environmental conditions.
This workshop will host two challenges that are structured around 2D eye-image datasets that we have collected using a prototype VR head mounted device. More information about these challenges is located here. Entries to these challenges will address some outstanding questions relevant to the application of eye-tracking for VR and AR platforms. We anticipate that the dataset released as part of the challenges will also serve as a benchmark dataset for future research in eye- tracking for VR and AR.
Below is the list of topics that are of particular interest for this workshop:
- Semi-supervised semantic segmentation of eye regions
- Photorealistic reconstruction and rendering of eye images
- Generative models for eye image synthesis and gaze estimation
- Transfer learning for eye tracking from simulation data to real data
- Eye feature encoding for user calibration
- Temporal models for gaze estimation
- Image-based gaze classification
- Headset slippage correction, eye-relief estimation
- Realistic avatar gazes
ICCV Workshop: Eye Tracking for VR and AR
November 2, 9:00 am to 5:00 pm
|9:00 am – 9:15 am||Opening remarks, Facebook|
|9:15 am – 10:00 am||Jeff Pelz|
|10:00 am – 10:15 am||Break|
|10:15 am – 10:45 am||Kaan Aksit|
|10:45 am – 11:15 am||Matthias Kummerer|
|11:15 am – 12:00 pm||Ming Yu Liu|
|12:00 pm – 1:00 pm||Lunch Break|
|1:00 pm – 1:45 pm||First poster session – focus on challenge winners|
|1:45 pm – 2:00 pm||Winner of the Semantic Segmentation Challenge|
|2:00 pm – 2:15 pm||Winner of the Image Generation Challenge|
|2:15 pm – 3:00 pm||Satya Mallick|
|3:00 pm – 3:45 pm||Break and second poster session – focus on accepted papers|
|3:45 pm – 4:30 pm||Oleg Komogortsev|
|4:30 pm – 4:45 pm||Award ceremony|
|4:45 pm – 5:00 pm||Closing remarks, open discussion on ideas for follow-up workshop/challenge in 2020|
- 10/3: Workshop talk titles and abstracts are now available (here).
- 9/30: The OpenEDS Challenge winners have been announced (listed here).
- 9/18: Workshop schedule is now available (here).
Submissions must be written in English and must be sent in PDF format. Each submitted paper must be no longer than four (4) pages, excluding references. Please refer to the ICCV submission guidelines for instructions regarding formatting, templates, and policies. The submissions will be reviewed by the program committee and selected papers will be published in ICCV Workshop proceedings.
Submit your paper using this link before the August 31st deadline.
- Paper submission deadline: August 19th 2019
- Paper acceptance notification deadline: Aug 25th 2019
- Camera Ready deadline: Aug 30th 2019
- Publication Venue: (a) ICCV workshop proceedings (b) IEEE Xplore (c) CVF Open Access
- Paper submission deadline: August 31st 2019
- Paper acceptance notification deadline: Sept 15th 2019
- Camera Ready deadline: Sept 27th 2019
- Publication Venue: (a) IEEE Xplore (b) CVF Open Access
Texas State University
Talk Title: Eye Movement Detection Sensors, User Authentication, and Health Assessment
Abstract: The availability of eye movement detection sensors is set to explode, with billions of units available in future Virtual Reality (VR) and Augmented Reality (AR) platforms. In my talk I will discuss the past, present, and future of such sensors and their applications. I will discuss both the applications that initially necessitate the presence of such sensors in VR/AR devices, along with additional uses that would be enabled by those sensors such as eye movement driven biometrics and health assessment.
Rochester Institute of Technology
Talk Title: The Convergence of Computer Graphics, Computer Vision, and Machine Learning in Eyetracking
Abstract: Video-based eyetracking became practical over 50 years ago with the development of analog Pupil-Corneal Reflection systems. Those systems evolved rapidly, taking advantage of the miniaturization of video cameras and the ability to perform simple video operations such as thresholding and digitization in real time on small computers. The advent of compact, efficient computer-vision modules enabled more complex eye-tracking algorithms to be implemented in the collection and analysis of eyetracking data. More recently, computer graphics has been leveraged to generate artificial images to model eyetracking systems and create images with known properties for training, simplifying and speeding the development of new systems and algorithms. The recent explosion of machine learning has brought similar advances in the difficult problems of eye-image segmentation and event detection. I will discuss how the convergence of advances in computer graphics, computer vision, and machine learning is revolutionizing eyetracking by supporting machine learning-based systems that can be trained on computer-generated ‘ground-truth’ data.
University of Tubingen
Talk Title: DeepGaze III: Deep Learning for predicting and understanding human free-viewing scanpaths
Abstract: Many animals gather high-resolution visual information only in the fovea, therefore they must make eye movements to explore the visual world. How fixation locations are selected has been debated for decades in neuroscience and psychology. Because different observers fixate similar image locations, it has been proposed that fixations are driven by a spatial priority or “saliency” map. The saliency map hypothesis states that priority values are assigned locally to image locations, independent of saccade history, and are only later combined with saccade history and other constraints (e.g. task demands) to select the next fixation location. A second hypothesis is that there are interactions between saccade history and image content that cannot be summarized by a single value. For example, if after long saccades different content drives the next fixation than after short saccades, then it is impossible to assign a single saliency value to image locations. Here we discriminate between these possibilities in a data-driven manner. Using human free-viewing eye scan path data we train a new model “DeepGaze III”. Given a prior scanpath history, the model predicts the next fixation location using either a simple saliency map or allowing for more complicated interactions via multiple saliency maps. DeepGaze III achieves state-of-the-art performance compared to previous scanpath models and reproduces key statistics of human scanpaths such as the distribution of saccade lengths and of angles between saccades. We find that using multiple saliency maps gives no advantage in scanpath prediction compared to a single saliency map. Since the number of saliency maps the network can use imposes strong qualitative constraints on what the model is able to predict, this suggests that – at least for free-viewing – a single saliency map may exist that does not depend on either current or previous gaze locations. This provides evidence that in VR selective rendering might be helpful even in settings without eye tracking. In AR/VR settings with eye tracking, DeepGaze III could be used for even more applications. Selective rendering or rendering priorization could be substantially improved by conditioning on the previous gaze path. Also, multiple models trained on different tasks could be used for task inference.
Talk Title: Few-Shot Unsupervised Image-to-Image Translation
Abstract: Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework.
Talk Title: Eye tracking for next generation displays
Abstract: Next generation Virtual and Augmented reality near-eye displays promise an immersive visual experience with the help of an eye tracker. In this talk, I will overview such near-eye display architectures with a specific focus on eye tracking, and provide guidance to the remaining challenges.
Talk Title: Gaze Estimation Overview: A Computer Vision Scientist’s perspective
Abstract: In this talk, we will cover an overview of several gaze tracking algorithms and datasets. We will learn the conditions under which these algorithms and architectures can be employed and their limitations. One of the challenges in gaze tracking is the availability of real datasets. We will learn how synthetic data is being used to produce state of the art results. Our goal is to cover a breadth of ideas without going into extreme depth about any one algorithm. This talk will be useful for people who are interested in gaze tracking and want to get an overview before diving deep into the problem.
- Robert Cavin, Facebook Reality Labs
- Jixu Chen, Facebook
- Ilke Demir, DeepScale
- Stephan Garbin, University College London
- Oleg Komogortsev, Facebook Reality Labs (Visiting Scientist)
- Immo Schuetz, Facebook Reality Labs (Postdoctoral Research Scientist)
- Abhishek Sharma, Facebook Reality Labs
- Yiru Shen, Facebook Reality Labs
- Sachin Talathi, Facebook Reality Labs
- Kaan Aksit, NVIDIA
- Robert Cavin, Facebook Reality Labs
- Jixu Chen, Facebook Reality Labs
- Ilke Demir, DeepScale
- David Dunn, University of North Carolina
- Oleg Komogortsev, Texas State University
- Immo Schuetz, Facebook Reality Labs
- Sachin Talathi, Facebook Reality Labs
- Lei Xiao, Facebook Reality Labs
- Marina Zannoli, Facebook Reality Labs