Big Data and Privacy: Paradoxes and Opportunities

Last week the Future of Privacy Forum and Stanford Law School’s Center for Internet and Society hosted a highly interactive meeting of the minds to discuss the most significant challenges to and opportunities for harnessing big data both effectively and ethically. For those who were able to attend, we got to participate in spirited debates about the roles of the private sector, government, consumers and privacy advocates alike in structuring workable standards that respect the privacy of those whose data is being collected, analyzed and, hopefully, protected. For those who were unable to make it, here are four big takeaways that represent the state of the dialogue and what we can expect as this hot topic plays out among decision makers and academics with a stake in the future of big data:

  1. No one can agree on how to produce standards for privacy and data analysis. Participants at the conference highlighted the fact that significant disagreement remains about the value of the data itself. Indeed, various schools of thought assign the value within the data analysis process elsewhere, essentially placing much higher value on the inferences drawn from the data, rather than the raw data. The dialogue here turns very scientific, but suffice to say, a long road lies ahead as researchers, lawmakers, businesses and regulatory bodies map out a standardized way of discussing big data and all of the associated outcomes of data analysis.
  2. Disagreement notwithstanding, experts crave more rigor around privacy standards. The general lack of agreement about how to address some aspects of the big data phenomenon has not weakened the notion among participants of the conference that standards are needed more than ever. Two interesting areas where participants identified the most opportune use for standards relate back to the creation phase of big data-capture technologies:  standards of ethics among engineers and privacy by design. Engineers play an important role in designing the back-end (typically unseen) systems that collect user data and assemble macro-level profiles that allow organizations to extract trends. A sensible set of ethical standards for these developers could go a long way toward ensuring heightened security and fewer risks of re-identification of individuals once data is in the hands of collectors. In a related sense, privacy by design refers to the concept that privacy protocols be preconceived and baked into the research and development phases of production of technologies, rather than tossed in as an afterthought. Such a paradigm would actually cause many developers and engineers to rethink the processes that they use to create and shape technologies, which is likely why this model has been met with hesitation, rather than enthusiasm.
  3. Cumulative disadvantage is a trippy concept, but a real concern among big data and privacy experts. If you’re a Netflix subscriber, then you’re familiar with the fact that this service offers up suggestions of  movies/shows that might appeal to you based on your established viewing behaviors. What you might not have thought about is all the content that might interest you, but Netflix never recommends it for various reasons. This is the idea of cumulative disadvantage. By nature, the service constructs somewhat of a profile of you (based on an accumulation of details about your behaviors), and reacts to you, which could influence your future behaviors. The idea is completely rationale and likely well-intentioned. But whose to say that Netflix’s algorithm has your profile right? It’s important to remember that these systems are man-made and “man” makes mistakes. Therefore, those who create these systems must take every precaution possible to ensure that individuals and larger populations are not misclassified. This leads us to the final takeaway.
  4. What is the impact, if any, when entire swaths of a population just aren’t classified at all? Data collection systems are ubiquitous–and this ubiquity often leads us to overlook the fact that we, as consumers, are delivering mountains of data to organizations across the world. But among us are those who make an extra effort to live “off the grid.” Additionally, individuals typically can voluntarily opt-out of some data collection processes. With marketers, health organizations and others placing more and more priority on making data-driven decisions, would this eventually lead to whole segments of the population being underserved simply because they choose not to contribute into the data pool? Both the social and economic impacts could be substantial.

FPF’s event drummed up conversations about several other themes, but these tended to rise to the top as consistent points of discussion.

A keynote address from Obama for America 2012 presidential campaign Chief Scientist (director of research) Rayid Ghani represented a special treat for participants. Ghani talked us through the different processes and “experiments” that the campaign undertook to optimize its outreach to donors, supporters and the voting public. Ghani asserted that the big data playing field has not really changed in the past seven years, only we’ve grown more sophisticated and even dependent on data analysis. He left us with thoughts on the five things that intricate data analysis is allowing us to do today (hopefully better than we were seven years ago):

  1. Make more accurate predictions
  2. Make more granular predictions
  3. Make predictions earlier
  4. Reduce risk when taking an action
  5. Ultimately, become more rational about the world

Image Source:  NITRD Program