Insights

5 Steps to Take as an Antiracist Data Scientist

June 11, 2020

This post was originally published June 8, 2020 on Towards Data Science.

In a racist society, it is not enough to be non-racist [data scientists]. We must be antiracist [data scientists] — Angela Davis

Data scientists are data stewards. We collect data, store data, transform data, visualize data, and ultimately impact how data are used. In our data-driven world, we have found ourselves with the responsibility to use data to tell stories and effect change.

But with this responsibility, it is not enough for us non-Black data scientists to simply not be racist. It is not enough for us to sit behind our computer screens to write code and feel angry but not take action after the deaths of George Floyd, Breonna Taylor, Ahmaud Arbery, and too many other Black individuals. It is not enough for us to acknowledge the racist systems that continue to exist in the United States but not actively do anything about them.

As Angela Davis said, “In a racist society, it is not enough to be non-racist. We must be antiracist.” Both non-racists and antiracists recognize that racism and white supremacy are wrong. Antiracists are those who take action to do something about it.

As non-Black data scientists, we must be antiracist data scientists. We must take responsibility for our power and privilege. We must confront the ways in which data and algorithms have been used to perpetuate racism, and eliminate racist decisions and algorithms in our own work. We must recognize that our field is lacking diversity (only ~3% of data scientists identify as Black) and contribute to pathways that change this. Being antiracist data scientists isn’t a one-time decision or something we will ever fully achieve, but instead a commitment we make each day, now and in the future, towards building a more equal world.

Here are 5 steps we can take to get started:

Step 1: Educate ourselves about becoming antiracist

To be antiracist data scientists, we must take the steps to be antiracist individuals. Being antiracist is different for white people than it is for people of color. As written in this toolkit by the National Museum of African American History and Culture: “For white people, being antiracist evolves with their racial identity development. They must acknowledge and understand their privilege, work to change their internalized racism, and interrupt racism when they see it. For people of color, it means recognizing how race and racism have been internalized, and whether it has been applied to other people of color.” This excerpt from The Racial Healing Handbook by Dr. Anneliese Singh is a great place to start as it walks through the six responsibilities that individuals can take in the ongoing process to be antiracist: Read, Reflect, Remember, Risk, Rejection, and Relationship Building.

To white readers specifically who have begun to acknowledge privilege and are looking to Read and Reflect — before burdening Black, Indigenous, or People of Color (BIPOC) friends with requests for reading resources or conversation, start with the many resource lists that are currently available online, such as here and here, and reach out to white friends who are also on this journey for conversation.

Step 2: Learn about how data and algorithms have been used to perpetuate racism

As data scientists, we use data to answer questions, solve problems, and (hopefully) have a positive impact. But history has repeatedly shown that good intentions are not enough. Data and algorithms have been used to perpetuate racism and racist societal structures. It is imperative that we educate ourselves about these realities and the uneven effects they have had on Black lives*. This list is meant as a starting point and is by no means exhaustive; we must continue to learn from, contribute to, and amplify research and reporting on this work in our efforts to confront these challenges.

New Articles: Racial Bias in a Medical Algorithm Favors White Patients Over Sicker Black Patients; Many Facial-Recognition Systems Are Biased, Says US Study; Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks; As Cameras Track Detroit’s Residents, a Debate Ensues Over Racial Bias; Facebook’s ad-serving algorithm discriminates by gender and race; How community members in Ramsey County stopped a big-data plan from flagging students as at-risk

Lectures: Big Data, Technology, and the Law; Algorithmic Justice: Race, Bias, and Big Data; Legitimizing True Safety (which includes discussion of facial recognition and how police surveillance is currently being used against Detroit residents accused of violating social distancing orders)

Books (consider purchasing from a Black bookstore): Algorithms of Oppression: How Search Engines Reinforce Racism (Safiya Noble); Artificial Unintelligence: How Computers Misunderstand the World (Meredith Broussard); Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (Virginia Eubanks); Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech (Sara Wachter-Boettcher); Weapons of Math Destruction (Cathy O’Neil)

Experts to Follow: Nasma Ahmed (Digital Justice Lab); Alvaro Bedoya (Visiting Professor of Law at Georgetown University and Founding Director of the Center on Privacy and Technology); Meredith Broussard (Associate Professor at NYU); Joy Buolamwini (MIT Media Lab); Max Clermont (Senior Political Advisor to Holyoke Mayor Alex Morse); Teresa Hodge (Co-founder and CEO of R3 Technologies); Tamika Lewis (Fellow at Data Justice Lab); Yeshimabeit Milner (Co-founder and Executive Director, Data for Black Lives); Tawana Petty (Non-Resident Fellow at the Digital Society Lab and Director of Detroit Community Technology Project); Rashida Richardson (Director of Policy Research at AI Now); Samuel Sinyangwe (Co-founder of Campaign Zero); Latanya Sweeney (Professor of Government and Technology in Residence at Harvard University, Director of the Data Privacy Lab)

Organizations to Follow: Data & Society; AI Now; Digital Civil Society Lab; Center on Privacy and Technology; Data for Black Lives; Campaign Zero; Digital Equity Laboratory; Data Justice Lab

*While this post specifically focuses on antiracist support for Black individuals, there is also a long history of data-driven discrimination related to ethnicity, gender, sexuality, and other demographic attributes that I encourage readers to also learn more about

Step 3: Eliminate racist decisions and algorithms in our own work

As antiracist data scientists, we must commit to taking action every day in our own work to eliminate racist decisions and algorithms. There is no one checklist that will accomplish this, but I have found myself regularly applying a series of questions to the data science projects that I contribute to. Portions of these questions come from a 2018 lecture I attended titled “The Data You Have and the Questions You Ask It” by Logan Koepke, a Senior Policy Analyst at Upturn.

If the answers to these questions reveal underlying racism, we must speak out and challenge the status quo.

Start with the data you have. Review the data and always reach out to subject-matter experts to better understand:

How was the data obtained?
For whom was the data obtained?
By whom was the data obtained?
Was permission granted to obtain the data?
Would individuals be comfortable if they knew this data was being obtained?
Would individuals be comfortable if they knew how this data was being stored or shared?
To what end was the data obtained?
How might this data be biased?
Explore the zine Digital Defense Playbook to consider how you might better inform and include broader communities, including Black communities, into the conversation on obtaining and using data

Consider the questions you’re hoping to answer or the problems you’re hoping to solve with your data. Ask:

Are the communities that will be impacted by this analysis involved in the process of shaping the questions you’re hoping to answer? If not, why not?
Do current goals complicate the use of historical datasets and use them in ways that are different than originally intended?
To what extent are predicted outcomes dissimilar from the observations in the data? Is the question you’re asking trying to force a reality that isn’t grounded in truth?
Does the very act of prediction also change the future observation space? How might behaviors change because of the predictions?

When you’re building a model, think like an adversary:

How could this system be gamed?
How could it be used to harm people, especially those in BIPOC communities?
What could be the unintended consequences of this model?
As the model “learns” from new data, how might this new data introduce new biases?

When you’re communicating the results of the model:

Is the model communicated such that the community who contributed the data is able to view and understand the results?
Have you clearly communicated the ways in which the model was tested to uncover racial bias?

Learn the Technical Details:

There is a growing body of research of technical approaches to addressing race in algorithms in a way that considers fairness. Simply not including race as a variable in an algorithm and saying that you have “Fairness through unawareness” is unacceptable: j ust because an algorithm does not include race as a predictor does not mean that it is unbiased. Instead, data scientists should explicitly consider the sensitivity of algorithms to race. This article provides an introduction to algorithmic fairness including the concepts of Demographic Parity, Equalized Odds, and Predictive Rate Parity, and tools that can be used to reduce disparity during pre-processing, training, and post-processing. This article illustrates how to explore Demographic Parity using SHAP, an explainable AI tool. The report Exploring Fairness in Machine Learning for International Development by the MIT D-Lab explores how to integrate fairness into a machine learning project with considerable detail. For additional learning, utilize this free online textbook and these videos: Google Machine Learning Crash Course Fairness in ML; 2017 Tutorial on Fairness in Machine Learning; 21 Fairness Definitions and Their Politics.

Step 4: Commit to increasing diversity in the data science field

The 2020 Harnham US Data and Analytics Report found that only 3% of Data and Analytics professionals identified as Black, and even fewer in leadership positions. This is unacceptable, particularly as we (non-Black data scientists) continue to use data collected from and write algorithms that impact Black communities.

To push the organizations we work for and the data science community at-large to change, we must commit to:

Confronting our own unconscious biases and how they manifest themselves in the workplace so as to make our field a more inclusive space
Inventorying our internal company practices and making changes to advance equity, diversity, and inclusion at all levels of our organizations
Reviewing and updating our hiring processes so they don’t reflect unconscious biases of the individuals/teams responsible for hiring
Demanding representation on executive leadership teams, boards, and expert panels
Developing leadership pathways to support emerging leaders from historically underrepresented backgrounds

Step 5: Contribute financially to Black-led and community-driven organizations committed to data awareness and increasing diversity in data science

It is no secret that data science is a lucrative field with a mean annual salary of approximately $100,000. Since we were not born knowing data science, many of us have likely entered this field thanks to robust educational experiences. As antiracist data scientists, we must recognize that we live in a racist society where education opportunities are distributed unequally. Since data science impacts everyone, we must commit to using the financial resources we’ve received for our work to support educational experiences that increase diversity in the data science workforce (and make this lucrative field more accessible) as well as data awareness for everyone.

Support Black-led and community-driven organizations contributing to data awareness

Set up recurring monthly donations to Black-led and community-driven organizations contributing to data awareness, data collection, and data visualization of timely issues such as police violence. Organizations to consider include:

Black in AI
Campaign Zero
Data for Black Li ves
Support data science and tech programs that serve Black students

Set up recurring monthly donations to support data science and tech programs that serve Black students. While it may be tempting to volunteer for teaching opportunities, it can be extremely powerful for BIPOC students to learn from BIPOC data scientists. Consider financially supporting programs such as:
Black Girls Code
Code the Dream
Rewriting the Code — Black Wings

Start a scholarship at your local community college

In 2016, Google completed research highlighting the role that community colleges can play and the challenges they face in creating a pathway to increased diversity in computer science. Community colleges generally have substantially smaller financial requirements than universities for starting a scholarship, and these scholarships can go a long way. Reach out to the financial aid office at your local community college to get started today.

Start or contribute to a scholarship or data science program at a historically Black college or university (HBCU)

Many HBCUs have existing or new data science programs including:

The AUCC Data Science Initiative at Clark Atlanta University, Spelman College, Morehouse College, and Morehouse School of Medicine funded by a $8.2 million gift in 2019 by UnitedHealth
The Center for excellence in Research and Education for big military Data InTelligence (CREDIT) at Prairie View A&M
PhD in Applied Science and Technology at NC A&T with a concentration in Data Science and Analytics
Lab for Artificial Intelligence and its Applications at Coppin State University
Virtual Reality Laboratory at Bowie State University

Reach out to these programs directly to learn more.

Our work as non-Black data scientists requires us to not only recognize racism, but to become antiracist data scientists and act upon it.

We cannot stand by as the decisions we make as data stewards continue to cause irreparable harm to the Black community. I’ve committed to the steps in this post while knowing that the work will not end so long as racism continues to exist.

I hope you join me.

I welcome feedback and additional contributions.

Thanks to the friends, colleagues, and family members who provide feedback on the draft of this post, and to the many role models who have provided guidance on my journey thus far. The work continues.

Disclaimer: This piece was written by Emily Hadley (Research Data Scientist) to share perspectives on a topic of interest. Expression of opinions within are those of the author or authors.