{"id":2566,"date":"2019-06-06T14:17:13","date_gmt":"2019-06-06T12:17:13","guid":{"rendered":"https:\/\/kim.hfg-karlsruhe.de\/?p=2566"},"modified":"2019-06-07T10:49:26","modified_gmt":"2019-06-07T08:49:26","slug":"microsoft-deletes-face-recognition-dataset","status":"publish","type":"post","link":"https:\/\/kim.hfg-karlsruhe.de\/microsoft-deletes-face-recognition-dataset\/","title":{"rendered":"Financial Times: Microsoft quietly deletes largest public face recognition data set"},"content":{"rendered":"

With his research and database investigation, KIM partner and collaborator Adam Harvey<\/strong> has forced Microsoft, Duke and Stanford universities to delete training datasets for face recognition in an unprecedented event in the recent history of (corporate) AI.<\/p>\n

<\/p>\n

See also the previous Financial Times article<\/a> about Adam Harvey’s project Megapixel<\/strong>.<\/p>\n

 <\/p>\n


\n
\n
\n
\n
\n

 <\/p>\n

Microsoft quietly deletes largest public face recognition data set<\/strong>
\nStanford and Duke universities also remove their facial recognition data sets
\nMadhumita Murgia, The Financial Times,\u00a0<\/em>6 June 2019.<\/p>\n

\u2192 Source: FT website<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n

Microsoft has quietly pulled from the internet its database of 10m faces, which has been used to train facial recognition systems around the world, including by military researchers and Chinese firms such as SenseTime and Megvii.<\/p>\n

The database, known as MS Celeb, was published in 2016 and described by the company as the largest publicly available facial recognition data set in the world, containing more than 10m images of nearly 100,000 individuals.<\/p>\n

The people whose photos were used were not asked for their consent, their images were scraped off the web from search engines and videos under the terms of the Creative Commons license that allows academic reuse of photos.<\/p>\n

Microsoft, which took down the database days after the FT reported on its use by companies, said: \u201cThe site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.\u201d<\/p>\n

Two other data sets have also been taken down since the FT report was published in April, including the Duke MTMC surveillance data set built by Duke University researchers, and a Stanford University data set called Brainwash.<\/p>\n

Brainwash used footage of customers in a caf\u00e9 called Brainwash in San Francisco\u2019s Lower Haight district, taken through a livestreaming camera. Duke did not respond to requests for comment. Stanford said it had removed the data set after a request by one of the authors of a study it was used for. A spokesperson said the university is \u201ccommitted to protecting the privacy of individuals at Stanford and in the larger community\u201d.<\/p>\n

All three data sets were uncovered by Berlin-based researcher Adam Harvey, whose project Megapixels documented the details of dozens of data sets and how they are being used.<\/p>\n

Microsoft\u2019s MS Celeb data set has been used by several commercial organisations, according to citations in AI papers, including IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime and Megvii. Both Sensetime and Megvii are Chinese suppliers of equipment to officials in Xinjiang, where minorities of mostly Uighurs and other Muslims are being tracked and held in internment camps.<\/p>\n

Microsoft itself has used the data set to train facial recognition algorithms, Mr Harvey\u2019s investigation found.<\/p>\n

The company named the data set \u201cCeleb\u201d to indicate that the faces it had scraped were photos of public figures. But Mr Harvey found that the data set included several arguably private individuals, including security journalists such as Kim Zetter, Adrian Chen and Shoshana Zuboff, the author of Surveillance Capitalism, and Julie Brill, the former FTC commissioner responsible for protecting consumer privacy.<\/p>\n

\u201cMicrosoft has exploited the term \u2018celebrity\u2019 to include people who merely work online and have a digital identity,\u201d said Mr Harvey. \u201cMany people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build.\u201d<\/p>\n

When the Financial Times previously contacted people in the database, they were unaware of their inclusion. \u201cI am in no sense a public person, there is no way in which I\u2019ve ceded my right to privacy,\u201d said Adam Greenfield, a technology writer and urbanist who was included in the data set.<\/p>\n

\u201cIt\u2019s indicative of Microsoft\u2019s inability to hold their own researchers to integrity and probity that this was not torpedoed before it left the building,\u201d he said. \u201cTo me, it is indicative of a profound misunderstanding of what privacy is.\u201d<\/p>\n

Tech experts said Microsoft may have been in violation of the EU\u2019s General Data Protection Law by continuing to distribute the MS Celeb data set after the regulations came into effect last year.<\/p>\n

\u201cThey are likely to have taken it down because their lawyers expressed concern that they do not have a basis to process special category data such as faces under Article 9 of GDPR,\u201d said Michael Veale, a technology policy researcher at the Alan Turing Institute. \u201cThey may not have a get-out clause for processing biometric data for the purposes of \u201cuniquely identifying a natural person\u201d.<\/p>\n

\u201cParticularly as the use of the data set has moved from a purely research use to something that products are being built with,\u201d he added. \u201cThere is reason to believe that the people in data set cannot be considered to expressly and clearly have made their faces public.\u201d<\/p>\n

Microsoft said it was not aware of any GDPR implications and that the site had been retired \u201cbecause the research challenge is over\u201d.<\/p>\n

Although the database has been deleted by Microsoft, it is still available to researchers and companies that had previously downloaded it. Mr Harvey said it is still being shared on open source websites.<\/p>\n

\u201cYou can\u2019t make a data set disappear. Once you post it, and people download it, it exists on hard drives all over the world,\u201d he said. \u201cNow it is completely disassociated from any licensing, rules or controls that Microsoft previously had over it. People are posting it on GitHub, hosting the files on Dropbox and Baidu Cloud, so there is no way from stopping them from continuing to post it and use it for their own purposes.\u201d<\/p>\n

Copyright: The Financial Times Limited 2019. All rights reserved.<\/p>\n","protected":false},"excerpt":{"rendered":"

With his research and database investigation, KIM partner and collaborator Adam Harvey has forced Microsoft, Duke and Stanford universities to delete training datasets for face recognition in an unprecedented event in the recent history of (corporate) AI.<\/p>\n","protected":false},"author":2,"featured_media":2586,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[38],"tags":[],"_links":{"self":[{"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/posts\/2566"}],"collection":[{"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/comments?post=2566"}],"version-history":[{"count":10,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/posts\/2566\/revisions"}],"predecessor-version":[{"id":2587,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/posts\/2566\/revisions\/2587"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/media\/2586"}],"wp:attachment":[{"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/media?parent=2566"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/categories?post=2566"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kim.hfg-karlsruhe.de\/wp-json\/wp\/v2\/tags?post=2566"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}