An article by TheVentury’s Data Scientist Josh Kunkle. He is an expert in Natural Language Processing for bots and development of Machine Learning algorithms for classification and image processing. Josh is currently also researching dialogue and robotic management systems.
How do we ensure that Artificial Intelligence represents humanity’s best principles?
Introduction to Bias in AI
However, there is a pervasive problem in AI – Bias. A biased AI may perform poorly, but beyond that, it can inherit Humans’ prejudicial biases.
When an AI is perceived to make prejudicially biased decisions, users lose confidence in it and usually it is abandoned.
This has happened in some of the largest tech companies such as Amazon and Microsoft. This occurs despite the fact that humans may be just as biased. This is because we tend to hold AI to a higher standard than humans, as we should.
AI companies and developers need to put in the time and resources to ensure that their product best serves the needs and morals of society. It is therefore time to pay attention to this problem and try to find ways to manage it.
Is there a simple solution?
As a tech person, I like to use technology to solve problems. There are in fact many tech solutions that aim to address bias, but in a narrow fashion. As I’ll discuss in more detail below, once you start thinking critically about bias in AI and where it comes from, you are led to consider other areas of the business.
Finally, you realize that bias is everywhere and that a tech solution alone is not sufficient to solve this broader problem.
Before discussing the topic further, I would like to explain what I mean by bias in more detail because it is an overloaded word.
I do not mean bias in the mathematical sense, instead I mean bias in an algorithm that produces unwanted outcomes.
More specifically I want to address prejudicial bias – outcomes that disadvantage certain groups of people.
Sometimes this type of bias is instead referred to as fairness. Nevertheless, it is easier to say ‘bias’ than ‘lack of fairness’ so I’ll stick with ‘bias’.
Dealing with prejudicial bias is difficult because it is ingrained in human nature. People and societies must consciously work to eliminate it. While progress in the right direction continues, the previous and existing bias persists in the data that we record.
Computers to the rescue? Not so fast!
Since bias is introduced by humans, we might hope that computers will solve our bias problems. Even though humans try to be aware of their biases, we may become less vigilant when we are tired or distracted by other problems.
Computers do not get tired, hungry, or grumpy; they just rely on the algorithm and the data (watch out for memory leaks though!). However, the algorithm and in particular the data are the routes by which bias creeps in to AI.
One of the starkest and most famous examples of this was the Tay bot. Tay was developed by Microsoft with the goal of learning to talk like a Twitter user by learning from twitter users. Within 24 hours of deployment though, the bot was disabled because it made racist, sexist, and generally offensive tweets after learning such language from other users. This example is quite extreme because the developers maintained no control over the training data (it had initial training before deployment, but then it learned freely).
Problems with AI in HR
Another, more pertinent example can be found in Human Resources – a growing target field for AI. We also see the growth in the field through our accelerator ELEVATE, where 3 out of 5 AI-startups in our current batch are HR focused. A common desire is to automatically process CVs and choose the best candidate for the position. To learn which candidates best match, historical hiring data is used.
However, in many tech firms (which are the ones trying to adopt new AI technologies) there is a historical lack of female hires. If this bias is not corrected in the data, the AI learns that male applicants are preferred over female applicants.
This reportedly happened when Amazon tried to develop this kind of AI.
Sources of Bias
In order to develop targeted solutions, I will start by breaking down the sources of bias and identifying how they can influence AI.
1. History & Society
This does not mean that we should do nothing about them.
Business and individuals alike should aim to address such biases in order to advance society as a whole.
2. Business Objectives
Consider how prejudicial bias can enter through business objectives.
While individuals or companies may not have an intentional bias, it is surprisingly easy for biases to hide behind business objectives.
For example, a company wants to use sales data to direct development resources. The data analysis may lead to the company to direct its resources to high income areas. By ignoring low income areas their service will tend to disadvantage minorities, who, in some countries, tend to live-in low-income areas.
This is already beginning to be a difficult conversation. From the business perspective this may be the desired outcome.
Why should they develop services where they will not be profitable? From the perspective of the disadvantaged group, they have lost access to a potentially beneficial service. Moreover, the severity depends on the product and business model. On one extreme the disadvantaged group may have no interest in the service (for example, high-end designer clothing).
On the other extreme a company that provides a public service such as internet access should aim to avoid this type of bias. When confronting this type of bias, a company must assess their own values and business model as well as the public response to their actions.
3. Learning Bias
Now consider how existing biases can influence AI.
The issue comes down to the fact that machines cannot learn on their own.
Instead humans must design the algorithm and supply training data. The data reflect the realities of the possibly biased process that create it. We then give that data to a machine learning algorithm, imprinting the biases onto it. This is the case of the CV example that I mentioned above.
4. Sample Bias
AI developers in particular must be aware of the ways in which biases can enter their algorithms. The developers make choices on what data to collect and reject which may lead to a bias. Sample bias can result in data that is not fully representative of the data used for prediction.
For example, in order to train an image classifier to identify cats you supply it with many pictures of cats. However, if you forget to include training images of Sphynx cats (which have no fur – if you haven’t seen one google will not disappoint), your classifier will probably fail on Sphynx cats.
Based on this example, sample bias may not seem to be prejudicial, but if you consider that certain groups of people may be ignored when collecting training data the link is clear.
In fact there have been real cases where facial recognition algorithms were trained with faces from only a certain skin color.
Not surprisingly, these fail to identify faces having skin colors that were not trained. Such a bias would have even more impact in the medical field.
For example an AI could be used to identify skin melanoma. Such algorithms show promise to increase the chance of early detection, but must be trained against all skin colors so that some people are not left behind.
5. Measurement Bias
Another type of bias which is related to sample bias is measurement bias. This is essentially sample bias, but with a time component. This type of bias occurs when the data collection method or quality changes relative to the training data.
For example, consider an AI that classifies images. The AI is trained with clean, high-quality images, but over time the camera gets dirty. The dirty images will likely fail the classification.
Such bias can extend to prejudicial situations when using surveys, for example. If the survey questions change slightly or the survey method changes the AI that interprets the survey may acquire a bias.
The takeaway is that in many conditions data taking conditions are not static and must be monitored.
Biases are present in society and can be inherited or introduced by business practices or data science decisions.
Addressing bias should involve all areas of a company from management, to data science, and developers.
The upfront cost may be higher than expected, but weighted against the cost of a failed project, the additional cost is worth it. In my next blog post I will lay out a broad program – or at least some suggestions – for how to handle bias and ensure fairness when it comes to AI.