Recent revelations concerning the extent of surveillance and massive personal data collection by the United States National Security Agency (NSA) and big business has sparked fierce debate about the appropriate balance of privacy versus security in our society. On the one hand, any governments first job priority is to protect citizens from harm and this means using technology and intelligence methods (e.g., data collection and analysis) to do the job effectively and efficiently. On the other hand, most people believe in some sort of inalienable right to a certain level of privacy and protection from potential government abuse. Most think the law constrains intelligence agencies and big business from spying on citizens.
How to strike the optimal balance of privacy versus security in our society should be answered by the citizens in a democracy - not unelected judges or agencies with little to no political accountability. Most people knew some level of surveillance and intelligence collection was taking place to protect us - yet are shocked at the massive amount of private data being collected and stored and the fact that so many leading tech and communication corporations have been willing partners with government in collecting all this data. Many are disturbed by the fact that our political leaders did not disclose the pervasive level of data snooping and thus foreclosed public debate about surveillance levels folks feel comfortable with. We need to have this debate.
In theory, we could stop 99% of terrorist attacks - yet he price paid would be a significant reduction in our quality of life. How much quality of life are we willing to sacrifice for what risk percentage of protection? Who owns our personal data? What legal rights do we have to our data? What secondary uses can the government use all this stored personal data in the future? These are issues to be debated and decided by the people.
I respectfully suggest we need to grow up as citizens (and world community) and figure this out so we can tell our political leaders what type of society we desire. What levels of risk are we willing to accept? How do we optimally allocate resources among various risks (e.g., car accidents, fires, climate change, health care, poverty, education, terrorism...)? Technology and broad surveillance of society for security purposes is expensive: is the current spending proportional to the risks? Is the spending rational considering competing risks and issues? Trade-offs are required - no easy answers - but we the people need make these difficult decisions and not default to government and big business that skew the system in favor of their interests.
Technology and raw data is morally and ethically neutral: they can be used for good and bad purposes. Yes, tech design and data objectivity / quality matter - but humans decide purpose. Intelligence methods used by government can help protect us and simultaneously (or at a later time) be abused to target groups and individuals for both good and bad alternative purposes.
This is not new - we have been down this road before. The United States has a long history of government data snooping and abusing intelligence methods (e.g., wiretaps and opening mail - foreign manipulation and assassination). See 1976 Church Committee (found illegal intelligence gathering on citizens by the Central Intelligence Agency, National Security Agency and Federal Bureau of Investigation).
While most reasonable folks expect the government to use some level of surveillance to protect us, we also desire to live a reasonably good quality of life and fear government and big business - comprised of fallible humans - may at some time - abuse these awesome powers if not monitored and checked. It is plausible that future governments will mine personal data to control, manipulate and abuse the citizenry. And it frightens many that big tech firms and government appear to be in bed together in creating a modern Orwellian surveillance state - without full disclosure to and approval by the people.
Simply put, it comes down to trust: do we the people trust our government and big tech and communication firms not to abuse this new extraordinary power? Historical evidence creates doubts. At this time there is no evidence of personal data abuse. Yet there is also no solid evidence that data snooping has protected us from specific harm. What are the checks and balances against abuse? Are they effective or flawed? What are the incentives?
It appears at this time only "metadata" is being collected (e.g., logs of calls, data for credit card transactions and online communications). Americans now produce about 161 exabytes of combined raw data per year and collecting, filtering, organizing, storing and analyzing this raw data is only possible using sophisticated technology and data science techniques.
The raw data sets are massive and growing exponentially - and at this time only machine learning and algorithms can understand them. The search is for trends, patterns, associations and networks. Once strange activity is identified, then humans can drill down and investigate.
Here is the data science problem: you are attempting to find a small needle in a larger and larger haystack. You will find more "statistically significant" relationships in larger data sets - and more patterns and relationships will have no meaning - creating greater opportunity to mistake noise for signal. Put another way, you will find more correlations and patterns between data - yet the number of false positives will rise significantly - more correlations without causation leading to an illusion of reality.
The danger is (1) government will make bad policy decisions believing noise is signal eroding our quality of life and (2) as the data grows exponentially, more false positives will require more and more humans to further investigate and find more specific personal data on a greater and greater number of both citizens and non-citizens. The secondary uses of this particularized personal data are many and offers temptation for abuse.
Data science - especially machine learning, algorithms and future artificial intelligence - will play an important role in big security data analysis and turning this massive amount of personal data into valuable, actionable information. As professional data scientists, I respectfully suggest we have a moral duty to make sure personal data is used responsibly and actionable intelligence used for proper legal purposes (e.g., to protect us from harm) and not abused against the people.
What is needed is a type of "magna carta" for government and big business to use our personal data responsibly and a Data Science Code of Professional Conduct to guide and protect data scientists when the temptation to abuse our personal data arises. There should be a legal procedure for data scientists and other technology professionals to report potential data abuses to specified government authorities or watchdog agencies to create an effective check and balance against personal data abuse.