The following is by Jim Thatcher, an editor of Thinking Big Data in Geography: New Regimes, New Research (April 2018). Thatcher is an assistant professor of geography at the University of Washington Tacoma.
The past couple of weeks have seen a variety of media outlets cover the revelation by Nathan Ruser that Strava’s global heat map of users has inadvertently revealed a series of military installations around the world. Various media outlets emphasized different aspects of the story: at Wired, Arielle Pardes offered a guide on managing the privacy of your fitness applications, Vox’s Aja Romano discussed the “new realities of data collection,” while, in an op-ed at the New York Times, Zeynep Tufekci used the situation to critique the increasingly outdated notion of informed consent. None of these articles is outright wrong in their approach; however, they all suffer from a dangerous social affliction I like to call “who could have known”-ism.
In such framings, the mistake is always inadvertent, always a surprise, something that no one could have predicted. For example, there’s been somewhat of a cottage industry as of late decrying the various ways that machine-learning produces sexist and racist outcomes (For example, see here and here); yet, rarely do these works discuss the obviousness of these outcomes: that if you train a system meant to detect patterns on an existing culture, it will evince the same biases as said culture (see, though, Safiya Umoja Noble’s forthcoming Algorithms of Opression for an excellent counter-example). By inevitably framing these outcomes as a surprise, as unknowable until they occur, such coverage elides the long history of critical engagement with the roles of quantification and data visualization within society and, in so doing, blunts critiques and eschews the responsibilities of the tech industry itself.
Nearly a decade ago, “Please Rob Me” satirized the dangers of making public location information and it was only a few years later that “Girls Around Me” was widely covered as enabling stalkers via a clever use of the Foursquare and Facebook APIs. More directly related to the Strava issue, four years ago Nathan Yau scraped and visualized public running routes from the service RunKeeper, describing the visualizations as “the tip of a very interesting iceberg.” And yet, the Strava maps are a revelation. If millions of data points, hundreds of thousands of runs, are all visualized, what information could that possibly reveal? How could the designers of Strava possibly have foreseen that revealing the spatial-temporal location of thousands of individuals could possibly have an effect upon personal privacy or state secrecy?
Data is now the “world’s most valuable resource” and we need to recognize it in such a context. The data we produce in our daily lives is fed into much larger systems of quantification, surveillance, and control. This is not a new process, but it is an accelerated one. It is our daily tweets, our use of ride-share applications, and our runs that are fed into and constitute much of these new data regimes. The promise of big data is that its analysis will reveal new, previously impossible truths; that it will let us peer ever deeper into reality’s secrets; its peril, in placing so much faith into data and algorithms that we forget the underlying humans involved. We do so when we efface responsibility for irresponsible uses of technology. When Tufekci writes “[e]ven if all technology companies meant well and acted in good faith, they would not be in a position to let you know what exactly you were signing up for”; we must ask “Why not?”