Big data in education might be the savior of our failing learning system or the cement shoes that drags the system to the bottom of the ocean depending on who you talk to. No matter what your view of big data is, it is here and we need to pay attention to it regardless of our views.
My view? It is a mixture of extreme concern for the glaring problems mixed with hope that we can correct course on those problems and do something useful for the learners with the data.
Yesterday at LINK Lab we had a peak behind the scenes at a data collection tool that UTA is implementing. The people that run the software at UTA are good people with good intentions. I also hope they are aware of the problems already hard coded in the tool (and I suspect they are).
Big Data can definitely look scary for a lot of reasons. What we observed was mostly focused on retention (or “persistence” was the more friendly term the software uses I believe). All of the data collected basically turns students into a collection of numbers on hundreds of continuums, and then averages those numbers out to rank them on how likely they are to drop out. To some, this is scary prospect.
Another scary prospect is that there is the real danger of using that data to see which students to ignore (because they are going to stick around anyways) and which students to focus time and energy on (in order to make the university more money). This would be data as surveillance more than educational tool.
While looking at the factors in this data tool that learners are ranked by led to no surprises – we have known from research for a long time what students that “persist” do and what those that don’t “persist” do (or don’t do). The lists of “at risk” students that these factors produce will probably not be much different from the older “at risk” lists that have been around for decades. The main change will be that we will offload the process of producing those lists to the machines, and wash our hands of any bias that has always existed in producing those lists in the first place.
And I don’t want to skip over the irony of spending millions or dollars on big data to find out that “financial difficulties” are the reason that a large number of learners don’t “persist.”
The biggest concern that I see is the amount of bias being programmed into the algorithms. Even the word “persistence” implies certain sociocultural values that are not the same for all learners. Even in our short time looking around in the data collection program, I saw dozens of examples of positivist white male bias hard coded in the design.
For example, when ranking learners based on grades, one measure ranked learners in relation to the class average. Those that fell too far below the class average were seen as having one risk factor for not “persisting.” This is different than looking at just grades as a whole. If the class average is a low B but a learner has a high B, they would be above the class average and in the “okay” zone for “persistence.”
But that is not how all cultures view grades. My wife is half Indian and half Australian. We have been to India and talked to many people that were under intense stress to get the highest grades possible. It is a huge pressure for many in certain parts of that culture. But even a low A might not register as a troubling signal if the class average is much lower. But to someone that is facing intense pressure to get the best grades or else come home and work in Dad’s business… they need help.
(I am not a fan of grades myself, but this is one area that stuck out to me while poking around in the back end of the data program)
This is an important issue since UTA is designated as a Hispanic Serving Institute. We have to be careful not get into the same traps that education has fallen into for centuries related to inequalities. But as our LINK director Lisa Berry pointed out, this is also why UTA needs to dive into Big Data. If we don’t get in there with our diverse population and start breaking the algorithms to expose where they are biased, who else will? Hopefully there are others, but the point is that we need to get in there and critically ask the hard questions, or else we run the risk of perpetuating educational inequalities (by offloading them to the machines).
For now, a good place to start is by asking the hard questions about privacy and ownership in our big data plan:
Are the students made aware that this kind of data is being collected?
If not, they need to be made aware. Everywhere that data is collected, there should be a notification.
Beyond that, are they given details on what specific data points are being collected?
If not, they need to know that as well. I would suggest a centralized ADA-compliant web page that explains every data point collected in easy to understand detail (with as many translations to other languages as possible).
Can students opt-out of data collection? What about granular control over the data that they do allow to be collected?
Students should be able to opt out of data collection. Each class or point of collection should have permissions. Beyond that, I would say they should be able to say yes or no to specific data points if they want to. Or even beyond that, what about making data collection opt-in?
Who owns the students’ data (since it is technically their actions that create the data)?
This may seem radical to some, but shouldn’t the student own their own data? If you say “no,” then they should at least have the right to access it and see what is being collected on them specifically.
Think of it this way: How will the very substantial Muslim population at UTA feel about a public school, tied to the government, collecting all of this data on them? How will our students of color feel about UTA collecting data on them while they are voicing support for Black Lives Matter? How would the child of illegal immigrants feel about each class at UTA collecting data about them that could incriminate their parents?
These issues are some of the hard things we have to wrestle with in the world of Big Data in Education. If we point it towards openness, transparency, student ownership, and helping all learners with their unique sociocultural situations, then it has potential. If not, then we run the risk of turning Big Education Data into Scary Retention Surveillance.