Three commercially released facial-analysis programs from major technology companies demonstrate both skin-type and gender biases, according to a new paper researchers from MIT and Stanford University will present later this month at the Conference on Fairness, Accountability, and Transparency. In the researchers’ experiments, the three programs’ error rates in determining the gender of light-skinned men were never worse than 0.8%.
For darker-skinned women, however, the error rates ballooned — to more than 20% in one case and more than 34% in the other two. The findings raise questions about how today’s neural networks, which learn to perform computational tasks by looking for patterns in huge data sets, are trained and evaluated.
For instance, according to the paper, researchers at a major U.S. technology company claimed an accuracy rate of more than 97 % for a face-recognition system they’d designed. But the data set used to assess its performance was more than 77% male and more than 83% white.
“What’s really important here is the method and how that method applies to other applications,” says Joy Buolamwini, a researcher in the MIT Media Lab’s Civic Media group and first author on the new paper.
“The same data-centric techniques that can be used to try to determine somebody’s gender are also used to identify a person when you’re looking for a criminal suspect or to unlock your phone. And it’s not just about computer vision. I’m really hopeful that this will spur more work into looking at [other] disparities.”
Buolamwini is joined on the paper by Timnit Gebru, who was a graduate student at Stanford when the work was done and is now a postdoc at Microsoft Research.
The three programs that Buolamwini and Gebru investigated were general-purpose facial-analysis systems, which could be used to match faces in different photos as well as to assess characteristics such as gender, age, and mood.
All three systems treated gender classification as a binary decision — male or female — which made their performance on that task particularly easy to assess statistically. But the same types of bias probably afflict the programs’ performance on other tasks, too.
Indeed, it was the chance discovery of apparent bias in face-tracking by one of the programs that prompted Buolamwini’s investigation in the first place.
Several years ago, as a graduate student at the Media Lab, Buolamwini was working on a system she called Upbeat Walls, an interactive, multimedia art installation that allowed users to control colorful patterns projected on a reflective surface by moving their heads. To track the user’s movements, the system used a commercial facial-analysis program.
The team that Buolamwini assembled to work on the project was ethnically diverse, but the researchers found that, when it came time to present the device in public, they had to rely on one of the lighter-skinned team members to demonstrate it. The system just didn’t seem to work reliably with darker-skinned users.
Curious, Buolamwini, who is black, began submitting photos of herself to commercial facial-recognition programs. In several cases, the programs failed to recognise the photos as featuring a human face at all. When they did, they consistently misclassified Buolamwini’s gender.
Discover more here.
Image credit: MIT/Bryce Vickmark.