Adam Harvey, partner and collaborator of KIM, speaks to the Financial Times on how his project MegaPixels tackles the dubious ways training data sets for facial recognition are created.

Megapixels website
To the article (paywall)

Excerpt from: Madhumita Murgia, “Who’s using your face? The ugly
truth about facial recognition”, Financial Times, 20  April 2019.


In February this year, Adam Harvey discovered pictures of his friend and digital-rights activist Jillian York within a U.S. government database used for training facial-recognition algorithms. Often, photos that are readily available online under the Creative Commons licences are scraped from the Internet and stored in training data sets. Although technically the copyright of Creative Commons allows images to be copied and reused for academic and commercial purposes by anyone, the explicit consent of the person they depict is missing.  As this has now become a common practice of internet data accumulation, it raises the question if the privacy of the depicted individuals can be protected properly through copyright laws or if this issue needs to be addressed in different ways.

At its early stage, facial-recognition was a technology for simple identification purposes. Now, as these algorithms are being used for crowd surveillance, in automated driving as well as emotion analysis for advertising, this field of industry is growing exponentially. Therefore, providing “good” that means “natural, wild” data for improving the algorithm has become a top priority.

Adam Harvey, who has been investigating face datasets since 2010, says that it is difficult to draw a line between the academic utilization of such data within a non-profit research content on the one hand and the commercial exploitation by companies on the other hand. It becomes clear that an urge for rapid growth in these areas leaves too little room to think about questions of privacy and ethics.

To address these important issues, Harvey developed his new project MegaPixels. Recently launched, MegaPixels is as a searchable database of all the papers that cite one of the many datasets Harvey himself has created throughout the past decade. In the next months, he plans to develop the search tool to allow people to type in their names and see if their faces have been used to train an artificial intelligence system in any part of the world as it was the case with his friend, Jillian York.