The ghost in the machine (is unfair)
Elizabeth Chorney-Booth - 16 December 2022
Computers and artificial intelligence (AI) are designed to make our lives easier. As much as people like to joke about how various algorithms seem to know a little too much about our habits and that we’re headed to an Orwellian dystopia, AI-based tech can help us make stress-free decisions based on data analysis. Systems that rely on datasets carry a lot of information about human behaviour, but since they’re analyzing numbers rather than making judgment calls about personal traits, many of us take comfort in the idea that those hauntingly accurate entertainment suggestions are coming from a place of dispassionate neutrality. Or, at least that’s what we’d like to believe, as our lives become increasingly entwined with social media, online shopping and entertainment, facial recognition, and the various AI-powered programs used by our banks, governments, health-care services and educational institutions. Making these systems better and fairer is the goal of experts at work in the field, and it just so happens the U of A is a heavyweight.
“I have been interested in privacy and ethics in AI and have felt I could contribute positively to this department,” says Nidhi Hegde, ’95 BSc and an associate professor in the Department of Computing Science, who has added her research interests to the pool of expertise. “I feel very lucky and privileged to be here. The University of Alberta's Department of Computing Science is world-renowned for its AI and machine learning programs.”
Hegde knows that machine learning is only as effective as the data it is given. When AI analyzes patterns in data to predict outcomes and make decisions, it relies on the datasets researchers have fed it. That can mean some bad habits when it comes to making choices and assessing outcomes. And learning from experience sounds pretty human. Bad data can lead to bad habits when it comes to making choices and assessing outcomes — and it takes thoughtful and intentional design by AI researchers to avoid these pitfalls.
These habits are what concerns Hegde, who is also a Fellow and Canada CIFAR AI Chair at the Alberta Machine Intelligence Institute (Amii), a non-profit body that brings together academic and industry partners to support and empower researchers to create an AI and machine learning landscape that is “for good and for all.” Hegde has had a long career as a researcher in the private sector, working on matters concerning machine learning for Bell Labs (the research arm of Nokia) and the Royal Bank of Canada’s Borealis AI research institute. Three years ago Hegde moved back to academia at the University of Alberta, where she now leads a team focused on ethics in machine learning and AI. The group looks at how issues of privacy and fairness can affect the accuracy of algorithms. Hegde’s work is novel and her research is in progress. With it, she hopes to identify where and how bias is occurring in machine learning, why it’s happening, and how to build fairness and privacy protection into algorithms to create more trustworthy and effective (for everyone) machine learning applications.
“These algorithms — AI and machine learning — are absolutely important and there’s no holding them back because there are many services and industrial processes that have come to rely on them,” Hegde says. “I don’t think it’s a question of whether these algorithms are useful since we are already down this path. As they become more deeply embedded and integrated in the ways we interact with other people and the world, it becomes more important to take care of the adverse effects that may negatively impact us.”
Hegde’s team is a piece of the puzzle when it comes to deciphering potential ethical concerns in machine learning, with scientists around the world working on making AI as equitable and benign as possible. Researchers like Bei Jiang, ’08 MSc, an associate professor in the Department of Mathematical and Statistical Sciences, have a keen interest in built-in bias. Jiang has done her own work in the field of gender bias in natural language processing and says that work like her own and Hegde’s is crucial to understanding how technology-based services are affecting modern life.
“Fairness research in AI is still somewhat under-investigated and there are lots of really interesting research challenges,” Jiang says. “Ultimately we want to use these tools to help us make better decisions and make the world a better place.”
The first step is uncovering exactly what bias and unfairness are in relation to machine learning and how they creep into what we might think of as “objective” sets of data. It’s not as if computer programmers are explicitly teaching machines to be unfair to subjects based on race, age, gender or geographical location, but Hegde says those biases can still find their way into an algorithm.
This could come in the form of an unintended bias on a programmer’s part. For example, facial recognition software often does not evaluate the faces of Black women with the same accuracy it would with white men. This is likely due to a variety of factors, including fewer images of Black women being fed into databases and cameras not being optimized to properly capture darker skin tones, according to the 2018 study “Gender Shades,” in Proceedings of Machine Learning Research. Other examples of bias to which Hegde points could come from the choice of data used to train an AI — like the databases comprising people arrested by a particular police department. Since the records would be of those who are arrested, rather than those who actually commit or are convicted of crimes (there can be a big discrepancy), location of the arrests and bias of the arresting police would skew the data being used to teach a machine about criminal activity, and thus the decisions it makes.
“Certain demographics and groups are treated unfairly if the machine’s decision about them is based on a skewed set of data that does not really reflect reality,” Hegde says. “It’s a little complicated to understand why it’s happening, but it happens when care has not been taken in making sure that data has been collected in an unbiased way, what kind of data has been collected or what kind of algorithm is used on that data. Every step that leads to an algorithm giving an outcome is important and any of those steps could lead to a biased result.”
On the other side of the coin, part of Hegde’s team is working specifically on privacy concerns. It’s not new for the public to be worried about digital data and privacy breaches, but in terms of machine learning, the concern is not just about data being leaked, but also the computers inferring certain things based on an individual’s data. For example, an algorithm could make a guess about a person’s health, sexual orientation or political leanings based on their movie-watching or book-buying habits, without that person ever explicitly divulging personal information. Even if the machine’s assumptions are correct, that information may not be something the person wants revealed to the public, employers, or friends and family, let alone a marketing firm or online retailer.
“Privacy is sometimes misunderstood,” Hegde says. “It doesn’t necessarily mean that you’re anonymous or your name is not associated with your data. What it really means is observers should not be able to infer personal information about you that they don’t already know.”
Many of us eagerly feed our data into social media apps knowing that bias already exists in traditional systems that involve human subjectivity. It can be easy to shrug off ethical questions and simply accept fairness and privacy violations as the cost of doing digital business. Many people ignore events like the Cambridge Analytica scandal that saw Facebook users’ data collected for use in political advertisements targeting them, deciding it’s worth a machine being able to infer our political leanings if it means streaming services can seamlessly lead us to our next favourite show. We willingly give commercial businesses personal data every day. Privacy and bias concerns can seem like more of an abstract threat than a practical concern, but we use AI in all our systems.
“Predictive policing algorithms used in some places will decide whether someone gets bail or is jailed without bail,” Hegde says. “Research has shown that certain demographic groups have been unfairly treated because the algorithm, which is interpreting biased data it has been provided, would classify them as someone who should receive a tough assessment.” Hegde gives another example of biased machine learning in banking, which can lead to people being turned down for mortgages or business loans if the algorithm decides their race, gender or location makes them likely to default on the loan, even if all of their finances are in order.
But the road to fixing the problem starts with identifying it. Part of Hegde and other researchers’ work includes identifying those instances where a fairness or privacy problem can cause serious, life-changing consequences.
Hegde says that the potential benefits to unbiased AI are monumental. Her goal is to find ways to build anti-bias and privacy protection protocols into machine learning so that the algorithms, and the institutions that rely on them, serve society better. The more we can trust these tools and our ability to make informed choices, the greater the benefit. Hegde is one of a constellation of AI researchers, including Jiang, at the U of A who are working on such research as smart and connected vehicles, responsive prosthetics, smart homes and more precise health diagnostics. Research is underway in fields as diverse as energy, the environment, the digital economy, manufacturing, transportation, finance and more.
“As a society we should all be trying to put the responsibility less on people and how they use these services and more on the developers and how they use the data in their algorithms,” Hegde says. And end users should have a better understanding of the products we let into our lives. For example, law enforcement shouldn’t think of AI as a bastion of objectivity when they purchase a product claiming “objective, AI-powered solutions” or similar. “People should ensure they have better knowledge about these algorithms and how you send your data out there. These are not scary things.”