In 2012, data science was declared the “Sexiest Job of the 21st Century” by Harvard Business Review. For people familiar with the term, you may recognize that data science is behind many of things we interact with each day, from internet search results to recommendations on what to watch on Netflix.
Data science’s “wizardry” can feel far removed from the day-to-day of non-profit work, but it has a critical role to play in the non-profit sector. For one, data science is not all about modelling and machine learning. In fact, the ideal data scientist brings a blend of statistics, domain expertise, data engineering, and communication skills. We’ll use this article to describe what these terms mean and why they are important to non-profits.
Traditionally, non-profits have used data to count things such as people and activities. This focus on outputs won't go away anytime soon, but it sets low expectations for how data can be used to improve the way non-profits operate and to support better outcomes. A data science approach goes beyond counting and recognizes that, through statistical methods, data can reveal meaningful patterns. These patterns can be used to understand important relationships (such as identifying risk factors that a participant will drop out of a program) and to assist with manual tasks (such as identifying the participants with those risk factors). In this way, data science can help organizations to better understand their impact and facilitate decision-making around program design and implementation.
Of course, data scientists need to be enthusiasitc about working with numbers, but it is important that they are just as passionate about understanding the meaning of the data that they are handling. In other words, developing knowledge about the subject of analysis is an important part of data science work. This domain expertise helps the data scientist to know which data is useful and allows them to design processes that support and refine the continued collection of that data. Understanding the correct context is also critical for interpretation and making appropriate inferences. For example, without domain expertise, a data scientist can’t tell whether an increase or decrease in program attendance is a positive or negative outcome.
While data scientists like to showcase their advanced techniques, the reality is that more than 80% of data science workflows are processing tasks. When data goes through repeated manipulations (think multiple Excel tabs), it becomes critical to ensure that these steps are executed accurately such that another data scientist could produce the same results independently. We call this “reproducibility” and to support this, data scientists use code or scripting tools instead of spreadsheets so that their steps are well-documented and can be repeated with a click of a button. This reduces the likelihood of human error and allows for faster, more frequent processing even when the data changes. A reproducible result, especially one challenged by new data, also helps us know that the patterns we’ve identified are the right ones.
Finally, data science recognizes that the best analysis has little value if the audience doesn’t understand the results. A good data scientist is a communicator and versed in translating multi-dimensional analyses into information that a general audience can understand. Here, visualizations such as charts and infographics are part of the data scientist’s toolkit as well as careful documentation in plain language of definitions, assumptions, and decisions made over the course of the analysis.
In today’s non-profit sector, organizations require time to build up their competencies in each of these areas. One of the key differences between data science for the non-profit sector and the private sector is that data science is more of a destination than a starting point. In addition, as we progress along this path, it will be increasingly important to consider the implications of bias in data and its impact on decision-making. Following numerous headline stories, this is something that the private sector is only beginning to grapple with and a topic which we will return to in a later blog post.
I have recently joined Purpose Analytics as a “Data Scientist” where I see my role as a guide along this journey. I’m particularly interested in the opportunities to save staff time by relieving them of manual data processing tasks and to provide them with timely information that will assist in decision making. This may not make headline news like it does in Silicon Valley, but it will help move towards a data-informed non-profit sector that can better meet the needs of the people it serves.