December 9, 2019
Cold Case: the Lost MNIST Digits
Neural Information Processing Systems (NeurIPS)
Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata such as writer identifier, partition identifier, etc.
By: Chhavi Yadav, Leon Bottou
Facebook AI Research