In our everyday lives, uses of machine learning are ever expanding - think Netflix, Siri or Google Maps.
Whilst this technology can simplify our lives and improve human decision making, it can have unintended consequences if we aren't careful.
Will Knight, senior editor at MIT Technology Review, recently published an article discussing the issues of using biased algorithms. He also raises a concern; organisations don't seem to care if their algorithms are biased.
Knight makes an important point. As algorithms become more commonplace, we should all be looking more closely at whether they can introduce or perpetuate bias. He especially highlights a couple of examples from the criminal justice system and financial services industry.
Judges are using algorithms to guide their decision to grant parole. At financial institutions, algorithms help determine which applicants are eligible for loans.
Given the profound impact these decisions can have, a biased algorithm would clearly be problematic.
Without human intervention, the use of algorithms in such contexts poses a critical issue facing society today. That is why we need to do whatever we can to minimise bias in algorithms we use.
So, what can we do about algorithmic bias?
Well first we need to remember that algorithms themselves are not biased. They don't hold racial biases, and they don't have a personal preference for a certain gender.
Actually, algorithms are agnostic.
The problems start when we use biased data to develop an algorithm, as this leads to it making biased predictions.
Let’s consider algorithms used to help decision making in the parole process. More specifically, the COMPAS algorithm which is the most used risk assessment tool in the US.
COMPAS uses data available in criminal records, as well as background data and the defendants' answers to a series of questions. Based on this data, COMPAS generates a risk score. The higher the risk score, the more likely the defendant is to commit further crimes in the future.
In their analysis, ProPublica discovered the tool contained significant racial bias. It frequently flagged black/hispanic defendants as high risk when they weren't, and white defendants as low risk when they weren't.
This isn’t an issue with the algorithm as it is representing patterns in the data used to develop it. The problem lies in the data itself.
For example, one of the questions the defendants answer as part of their risk assessment is whether their parents have served jail time. This question, as well as others used, is likely to surface historic bias of the criminal justice system. Unless corrected for, this systematic historic bias will then influence the algorithm's predictions.
Beauty.AI, a beauty contest held in 2016, is another prime example of biased datasets leading to biased outcomes. This was the first beauty contest to have algorithms as judges. Six algorithms rated approximately 6,000 selfies, and finally picked 44 winners.
Out of the 44 female winners, the majority were white, a very small number were Asians and only one had darker skin.
Alex Zhavoronkov, Beauty.AI's chief science officer, later told The Guardian that this was due to biased training data. White women had been over-represented in the datasets used to train the algorithms. As a result, they were over-represented among the winners too.
To make sure algorithms don’t perpetuate bias, we do need to be careful when selecting our datasets. Most datasets will represent a biased starting point as they are the result of human decisions.
Even if we find it hard to admit, unconscious biases are brought to bear in all human decision making processes (sometimes there will even be conscious biases!).
The above image is a neat representation of the majority of cognitive shortcuts we take, and when and why we use them. These shortcuts can often be quite useful and time saving. But, when we apply them to parole decisions, or candidate selection, it can quickly become problematic.
- We prioritise data and details due to its confirmation of our own existing beliefs.
e.g. “I’d really like to attend this high-prestige school - oh, this candidate has qualifications from there - they must be great!”
- We rely on broad stereotypes and generalities to make specific predictions and associations.
e.g. “This person is overweight - he/she is going to lack discipline.” or “This person has grey hair - he/she must have a lot of experience.”
- Additionally, effects such as anchoring or the contrast effect suggest that the order in which we receive information impacts our decision too.
e.g. "Well, compared to the person before lunch, the person we interviewed now was exceptional!"
Minimising biases in candidate selection processes is the reason we exist. We take the risk of incorporating biases in our predictive models extremely seriously. To reduce this risk, we use a number of tactics and processes.
We focus not only on the data we collect, but also on helping our customers to recognise when the data they collect might have some form of bias.
The datasets we use to build our algorithms do not contain any information about race, age or gender. Nor do they contain educational or professional background. The only reason we collect demographic information is to identify any relating biases introduced by humans through historical workforce decisions.
These biases are then corrected for in the development of our algorithms.
Before we build the algorithm, our data scientists screen all datasets to ensure there are no pre-existing biases along the dimensions of age, gender, race or ethnicity. If they identify any of these biases, we can introduce corrections to the dataset to minimise their influence in the algorithm.
Consider the following example; we’re building a predictive model which will predict who is likely to be a successful manager. Our data scientists find that there is an imbalance between male and female managers’ performance metric in the training dataset. By pre-processing the dataset we can introduce a mathematical correction to even out this implicit gender imbalance which minimises any direct influence of gender in the algorithm.
Algorithms are extraordinary at picking up subtle patterns in datasets, and sometimes they can pick up on biased patterns that a human wouldn’t be able to recognise. This may result in biased predictions. Thus, we also test the predictions for bias before we deploy the algorithm. If we find any bias at this stage, we can make further corrections to minimise any remaining bias.
Adverse impact is when a process that might appear neutral, actually has a negative impact on certain groups.
For example, candidates might answer the question 'Do you enjoy watching live sports?' differently depending on gender, so it might have an adverse impact on women if that question was included in a predictive model.
All questions we use to develop our algorithms are thoroughly checked to make sure they are neutral and non-discriminatory. We have partnered with leading independent adverse impact consultants and perform external tests to ensure this. If a question has a statistically significant adverse impact on candidates based on age, gender, race or ethnicity, it gets removed.
In addition to removing potentially biased questions, we also conduct adverse impact studies on our algorithms, so called ‘criterion based studies’. This is to ensure they meet even the most stringent adverse impact requirements, and that we don’t include variables that are irrelevant.
When we collect performance data to develop our algorithms, we work with our customers to ensure we are starting off with the least possible bias. For example, we give guidance on what to consider when nominating 'cultural fit' to reduce the risk of bias. This is in addition to analysing and screening the data once collected.
Even though trained algorithms may not always be completely unbiased, it is important to recognise that they are capable of objectively taking many more factors into account than humans can. Humans make frequent use of cognitive shortcuts, and are inconsistent decision makers.
To name just one example, a study found that judges are likely to be more lenient after their lunch break. Now, that's definitely not fair!
With the speed of today's technological advancement, the impact algorithms have on our daily lives will only increase.
And that's a good thing!
When trained with quality datasets, predictive models significantly improve human decision making.
However, it is still critical that we monitor algorithms and make corrections when necessary. At least until we get to a stage where algorithms can recognise and remove bias on their own.
It is also important that we use algorithms for their intended use.
Going back to the COMPAS example discussed earlier, the algorithm was never meant to be a discrete decision tool. It was designed to be one out of many inputs into a decision, and thus only gave a score of the likelihood of someone re-offending. But, in practice judges put increasing weight on the score and use it as an absolute yes or no decision making tool. This can have very serious consequences, as illustrated in the ProPublica analysis.
The algorithms we develop here at PredictiveHire are not intended to replace the human in the hiring process. Instead, we educate our customers on how to use the algorithms to ensure they are complementary to the existing process. Because it turns out that when humans and machines work together, the results are far superior!