Facebook at ICCV 2021 – OpenEDS Challenge

OpenEDS 2021 Challenge

With the advent of consumer virtual reality (VR) products, immersive technology in the form of VR and augmented reality (AR) is gaining mainstream attention. However, from a market adoption perspective, immersive technology is still in its infancy. Both users and developers are devising the right recipe for the technology to garner mass appeal. We posit that eye tracking, a technology that measures where an individual is looking and can enable inference of user attention, is a key driver of mass appeal for these immersive technologies.

To that end, we organized our first workshop and an accompanying competition on Eye Tracking for VR and AR at ICCV 2019. The purpose was to increase awareness of AR/VR eye-tracking challenges in the broader computer vision and machine learning researcher communities. Following the overwhelmingly positive feedback and challenge participation, Facebook Reality Labs (FRL) will organize a full-day workshop at ICCV 2021 in Montreal, Canada. In addition, we are virtually hosting a third iteration of the OpenEDS Eye Tracking for VR and AR Challenge, which focuses on novel sensors for eye tracking and gaze prediction for applications in AR and VR. For the challenge, we are releasing a data set of temporal sequences of synchronized eye images and gaze vectors captured using a VR headset, as well as a data set of 3D point clouds of eyes with semantic segmentations. The papers describing the datasets are available here and here.

We invite machine learning and computer vision researchers to participate in this challenge.

Challenge Winners

Winners will be chosen August 15, 2021.

Performance Tracks: Two Challenges

Track 1: 3D Point Cloud Segmentation Challenge. Point cloud segmentation is a fundamental problem in computer vision where the goal is to classify the category of each point in the 3D point cloud of a scene. Point cloud segmentation provides geometric and semantic meanings of objects in the scene, which benefit a number of real-world applications in augmented/virtual reality and autonomous driving. However, point cloud segmentation is challenging because of the uneven sampling density of the points and the unordered structure of the point cloud data that is often corrupted with noise. Recently, deep learning models have been quite successful in improving the accuracy of point cloud segmentation by utilizing the local geometry and global semantic features of the underlying shape of the objects in the scene. However, majority of the works have focused on segmenting synthetic rigid shapes and may not be generalizable to non-rigid 3D objects in the scene such as human eyes. Our goal is to develop robust 3D point cloud segmentation solution for non-rigid objects in the scene. In particular, we are interested in highly accurate semantic segmentation of point clouds from the key eye regions: the pupil, the iris, the sclera, the eyelashes and the skin (background). We note that the available point clouds for eyes may be noisy and can contain holes due to the specularity from the cornea and the sclera. Furthermore, differences in the geometry of eye region for different individuals may be subtle and yet critical to downstream applications such as realistic eye rendering and gaze estimation. We therefore seek a robust 3D eye segmentation solution that can not only provide an accurate semantic segmentation of key eye regions, but also can generalize to point cloud for eyes from different identities.

As such, this challenge calls for the following:

  • Accurate and generalizable semantic segmentation on 3D point clouds from the key eye region for different identities
  • Learning the geometric representation of human eyes to enable eye segmentation on noisy point clouds
  • Leveraging corresponding 2D eye-images given the camera information to further help 3D eye segmentation

Track 2: Gaze Prediction Challenge. Various applications for eye tracking, such as foveated rending (FR) and gaze-based interaction benefit from low latency gaze estimates. FR is a technique that presents a high-quality picture at the point where a user is looking, while reducing the quality of the picture in the periphery. FR is a critical application for AR/VR platforms because it allows for a substantial reduction in power consumption of the graphical pipeline without reducing the perceptual quality of the generated picture. However, fast eye movements present a challenge for FR due to the transmission and processing delays present in the eye tracking and graphical pipelines. If the pipelines do not compensate for the delays, fast eye movements can take the user’s gaze to the areas of the image that are rendered with low quality, thus degrading the user’s experience. Among the ways of remedying this issue are a reduction of delays, which is not always possible; predicting future gaze locations, thus compensating for the delays; or a combination of both.

Prediction of future gaze locations can be done based on previously estimated gaze-points, understanding of the content of the presented scene (i.e., visual saliency), or a combination of both. Considering real-time requirements of FR and its goal of reducing power consumption, the prediction of future gaze points based on a short sub-sequence of the already-estimated gaze locations is considered the most fruitful path of exploration. If predicting future gaze locations with high accuracy is feasible, it would allow an implementation of an FR method that would match closely with the human visual acuity function. As a result, it could encode only a very small part of the image at a high-quality resolution, providing the highest level of energy savings.

With the above-mentioned considerations, the challenge calls for predicting future gaze locations based on the previously estimated gaze vectors, and optionally, headset and hand movement.

Data Set Description

The 2021 challenges each have a data set.

3D Point Cloud Data Set: Contains 600 point clouds of the eye region; with corresponding 3D annotations of pupil, iris, sclera, eye lashes and skin; images corresponding to each point cloud, captured from a straight-on view of the eye; and camera information (intrinsic matrix). The point clouds were reconstructed from multi-camera views using multi-view stereo.

Gaze Prediction Data Set: This data set contains video sequences collected from 44 participants, totaling 2,194,865 frames. The data collected includes multiple sensor outputs, consisting of head pose, hand (controller) pose, scene content and gaze directions. The virtual scenes that the participants interacted with included an indoor and outdoor scene each with interactive content providing the opportunity for many behaviors such as reading, throwing, object-manipulation, drawing, aiming, and shooting.

  • Ground truth 3D gaze points are provided for each frame, obtained from a user-calibrated glint based model.
  • Head pose is given as a unit quaternion for each frame.
  • Left and right controller poses are given as unit quaternion and translation from the headset origin for each frame.
  • Scene content is given as a video stream of RGB images. To keep data set size within reasonable amounts, the scene images were downsampled to a size of 128x71. Depth maps are also provided at the same resolution.

Participation

In order to access either data set and participate in either or both challenge(s), please do the following:

1. Read the Official Rules for the Challenge(s) in which you would like to compete. The Official Rules are a binding contract that govern your use of OpenEDS and are linked below:

2. Submit the following information to openeds2021@fb.com to request access to the data set (OpenEDS):

Name:
Job title:
Institution:
Contact email:
Members of your team (if applicable):

By submitting your request to access OpenEDS, you agree to the Official Rules for the challenge(s) that you are participating in.

Please allow one business day when submitting requests to access the data set. We are responding to your requests as soon as possible. Thank you!

3. Create an account at evalAI.cloudcv.org* and register your team for one or both of the following two challenges:

  • Facebook Eye Tracking 3D Eye Segmentation Challenge
  • Facebook OpenEDS Gaze Prediction Challenge 2021

*Note: This is a third-party tool, not affiliated with Facebook, Inc. Please be sure to review the policies of this site before use.

4. Develop your algorithm with the help of the training data and validation data available as part of OpenEDS.

5. Generate SUBMISSION JSON file.

For the 3D Point Cloud Segmentation Challenge, generate a JSON file for the results produced by your Model as applied to the Data Set. The scripts to generate JSON files can be found in the submission_scripts folder of the Data Set. The labels are: Pupil (0), Iris(1), Sclera(2), Eye-lashes (3) and Background(4). The script requires the result from the model per test point cloud to be saved in .npy format in uint8 format. The name of the output file should be the same as that of test point clouds. Run the command:

python create_json_regen.py —list-file <LIST FILE> —submission-json <SUBMISSION JSON>

For the Gaze Prediction Challenge, generate a JSON file for the results produced by your model as applied to the data set. The result consists of a two arrays, containing the three-dimensional prediction vector for two future time stamps. The participants are asked to create JSON file with their model generated gaze estimates as follows:

{
    "sequence_ID1": {
        "25": [0.3, 0.2, -0.98],
        "50": [0.2, 0.21, -0.8],
    },
        "sequence_ID2": {
        "25": [0.3, 0.2, -0.98],
        "50": [0.2, 0.1, -0.91],
    },
}

6. Upload JSON file to the evalAI challenge portal. The scores will be made available on the leadership board after evaluation.

Prizes

The following is a summary. Please see the Official Rules for full details.

For the OpenEDS Gaze Prediction Challenge: Eligible entries will be scored through automated software using the performance metric of prediction error (PE). Up to three eligible Models with the lowest PE according the Criteria listed in the Official Rules will be deemed winners.

For the Eye Tracking 3D Eye Segmentation Challenge: Eligible entries will be scored through automated software using the mean intersection-over-union metric. Scores will be computed using truth test semantic masks. Up to three eligible Models with the highest scores according to the Criteria listed in the Official Rules will be deemed winners.

Subject to verification, winners will be notified on or about August 15, 2021. Winners will be notified via email, and announced in public at the OpenEyes Workshop, currently scheduled for October 10, 2021. We will coordinate with each winner after the workshop to arrange for the delivery of prize money and the winner plaque.

  • The first place winners will receive $5,000 USD, and are required to present their work virtually at the OpenEDS 2021 Workshop at ICCV.
  • The second place winners will receive $3,000 USD.
  • The third place winners will receive $2,000 USD.

All prizes are per-entry (not per-person).

*NO PURCHASE NECESSARY TO ENTER OR WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. Open only to natural persons who are: not a legal resident of any jurisdiction where applicable laws do prohibit participating or receiving a prize in the Contest and excludes China, Kenya, Venezuela,Argentina, Denmark, Greece, Quebec, Cuba, Iran, North Korea, Sudan, Myanmar/Burma, Syria, Zimbabwe, Iraq, Lebanon, Liberia, Libya, Somalia, Zimbabwe, Belarus, Balkans, and any other area or country designated by the applicable agency that designates trade sanctions, and at least eighteen (18) years old and the age of majority in his or her jurisdiction of residence. Begins 12:00 AM PST on 4/31/21; ends 11:59:59 PT on 7/31/21. Void where prohibited by law. SUBJECT TO FULL OFFICIAL RULES & OFFICIAL RULES. Internet access and valid email address required. Total ARV of all prizes: $10,000 USD, each contest ($20,000 total). Limit 1 Prize per person. Sponsor: Facebook Technologies, Inc., a wholly-owned subsidiary of Facebook, Inc. 1601 Willow Rd. Menlo Park CA 94025 USA

People

  • Karsten Behrendt: Facebook Reality Labs
  • Qing Chao: Facebook Reality Labs
  • Robert Cavin: Facebook Reality Labs
  • Kara Emery: University of Nevada, Reno
  • Alexander Fix: Facebook Reality Labs
  • Oleg Komogortsev: Visiting Professor, Facebook Reality Labs
  • Kapil Krishnakumar: Facebook
  • Tarek Hefny: Facebook Reality Labs
  • Cristina Palmero: Universitat de Barcelona, Spain
  • Abhishek Sharma: Facebook Reality Labs
  • Yiru Shen: Facebook Reality Labs
  • Sachin S. Talathi: Facebook Reality Labs