Diversity Biases in AI

Anything that is built will reflect the builder and that’s no different with AI. One of the concerns that have become apparent with Artificial Intelligence is that it can be created with detrimental biases built into its core.

The tech industry, though gradually changing, currently is very male and very culturally similar, despite the myriad of locations that developers are based. This may be down to the level of education that’s needed or it might be because of global online communities that are often frequented by developers.

Interestingly, a lot of our everyday AI Assistants tend to have female (think Alexa, Siri, and Cortana etc.) You can change Siri to male if you wish, but the default setting is female. Intricate problem-solving AI bots, like IBM’s Watson and Salesforce’s Einstein are distinctly male.  This may be indicative of the views of gender roles by the manufacturers, harking back to female PAs and male leaders.  Also when Microsoft were researching which voice to use, the results came back that when building a helpful, supportive, trustworthy assistant, a female voice was best. IBM’s Watson, however, speaks with a male voice as it works beside doctors on cancer treatment and does so in short, definitive phrases which mimic the voice patterns expected of leaders. Google Translate converts Spanish phrases into English as “he said” “he did”, even when the subject is female, so it’s not just the persona’s, it’s the results as well.

It’s not just gender biases either, Nikon cameras’ software to detect when somebody is blinking consistently indicated that East Asian were always blinking, and of course, AI machine learns, as in the case of Microsoft’s Tay, which learnt to be racist, misogynistic and generally horrid from being on Twitter for less than 24 hours.

So if we are using artificial intelligence to CV screen, video interview or suggest who should be promoted, how can we trust it to not start out, or learn to be biased?

One of the significant causes of AI bias is the data it’s trained on. For example, AI for images are often trained using the 14 million labelled images of ImageNet, others might use scrapes from Google Images or Wikipedia. As some groups are under-represented and others are over-represented, this obviously skews the data. When I did a Google Image search for “business person”, of the 45 images on the first page, 38 were male, 31 were white males. I’m fairly sure 70% of all business people are not white males, but if your data set for a business person has been scraped from Google Images, your AI will think this is the case.

With the amount of time AI has been learning from skewed data sets, it could be said the next generation of AI matures, the human biases in the ancestor AIs are now intrinsically embedded into the system and the logical steps that the AI takes to get to decisions are complex and hidden.

So how can AI be freed from the bias trap?

Creating transparency standards and using open source code should allow the AI’s logical steps to be scrutinised and biases rooted out. Training data will have to be screened to remove biases and ensure representations across gender, race and much more. AI is here; let’s do our best to ensure it produces the best results, not biased results.