How Google addresses ML fairness and eliminates bias in conversational AI algorithms

Find all the Transform 2020 sessions on-demand.

When it comes to algorithmic bias, one of the products that best illustrates the challenges Google faces is its popular Translate service. How the company teaches Translate to recognize gender serves as an example of just how complex such a basic problem remains.

“Google is a leader in artificial intelligence,” said Barak Turovsky, director of product for Google AI. “And with leadership comes the responsibility to address a machine learning bias that has multiple examples of results about race and sex and gender across many areas, including conversational AI.”

Turovsky spoke at VentureBeat’s Transform 2020 conference in a fireside chat with HackerU’s Noelle Silver.

The results from Translate potentially have a big global impact. Turovsky said roughly 50% of the content on the internet is in English, but only 20% of the world has English-speaking skills. Google translates 140 billion words every single day by 150 billion active users, including 95% outside the U.S.

“To be able to make the world’s information accessible, we need translation,” he said.

The problem is that the algorithms that do the translation don’t recognize gender, one of the most foundational elements of many languages. Even more problematic is that the source material the company feeds into its machine learning systems is itself built on gender bias. For instance, Turovsky said one of the most important translation sources Google uses is the Bible.

“That gender bias comes from historical and societal sources because a lot of our training data is hundreds, if not thousands of years old,” he said.

As an example, historically in many cultures, doctors have tended to be primarily men and nurses primarily women. So even if an algorithm begins to grasp some aspects of gender, it is likely to return a default translation in English that says, “He’s a doctor, she’s a nurse.”

“This inherent bias happens a lot in translations,” he said.

Among the AI principles that Google had adopted, the company internally is expected to avoid introducing or reinforcing any unfair bias via its algorithms. But while there are several ways to fix this translation gender issue, none are necessarily very satisfying.

The algorithm could effectively flip a coin, it could decide based on what users select or how they react to a translation, or it could provide multiple responses and let users choose the best one.

Google had opted to go with that last option. Translate will show multiple options and let users select one. For instance, if someone types “nurse”, Translate in Spanish will show “enfermera” and “enfermero”.

“It sounds very simple,” he said. “But it required us to build three new machine learning models.”

Those three models detect gender-neutral queries, generate gender-specific translations, and then check for accuracy. In the first model, this involved training algorithms on which words could potentially express gender and which ones would not. For the second model, the training data has to be tagged as male or female. The third model then filters out suggestions that could potentially change the underlying meaning.

On the latter, Turovsky offered this example, where the search results introduced gender when the original phrase had no sense of gender, and in the process changed the meaning:

“This is what happens when the system is laser-focused on gender,” he said. Turovsky said Google continues to fine-tune all three models and how they interact with each other to improve those results.

Hannah