Regulation of data science is under consideration (read here and here) and Michael Walker argued that either data science becomes a profession and regulates itself or congress will impose draconian regulations that defeat the purpose of data science: to make life, business and government better. He has drafted a proposed "Data Science Code of Professional Conduct". See: bit.ly/YbsjXR.
In support of data science as a profession is the following:
1) Data science is in the pre-industrial stage and needs to develop a "Canon" (a body of principles, rules, standards, or norms) of scientific methods, principles and best practices for practitioners. Data science incorporates a number of disciplines - is wide open for innovation - and requires guidance to ensure data science is used to make life, business and government better - and prevent abuse. Ninety percent (90%) of the worlds data has been produced in the past two (2) years and will grow exponentially. How we extract meaning from all this data without creating an illusion of reality is important.
2) To protect both consumers of data science and data scientists from charlatans, illegal and unethical conduct and data science malpractice. A Data Science Code of Professional Conduct is needed to protect individuals privacy, clients confidential data, prevent conflicts of interest and to ensure data scientists have a duty to the greater good of society, and not just blind loyalty to the client.
3) Self-regulation versus imposed regulation. Either data science becomes a profession and regulates itself or congress will impose both good and bad regulations. It is better for data scientists to architect and implement a regulatory scheme than to trust congress to enact an appropriate regulatory structure that may defeat or limit the development of data science.
4) To create a check and balance against big government and big business using data science at the expense of the majority in society. Some argue that the internet, mobile smart-phones and computers are a big spying machine that big government and business uses to collect information on people further eroding civil liberties. The potential for abuse is significant and the professionalization of data science can mitigate harms.
Reasons to oppose data science becoming a profession include:
1) Professions tend to create artificial barriers to entry causing artificially higher prices.
2) Professions tend to be self-serving at the expense of consumers.
3) Professions - after a period of time - tend to stifle innovation to protect vested interests.
Michael Walker argued that - on balance - the equities favor data science becoming a profession. He pointed out that in many disciplines like medical research, economics and psychology, data manipulation is common and the scientific method has not been honored resulting in decreased reputation and the eroding trust of society. Future data scientists need to preempt this outcome by not only honoring the traditional scientific method, but by developing new data science "canons" and scientific methods to liberate meaning from data without creating an illusion of reality.
Eric Siegel is agnostic about whether data science needs to become a profession. Mr. Siegel agreed that data science can be abused - that a code of professional conduct may be useful and stated that a certification to establish a base level of competency may be prudent. He voiced concern over the civil liberties aspect of the use and potential abuse of data.
Gregory Piatetsky-Shapiro argued against data science becoming a profession. He asserted that other established organizations - like ACM (computing professionals) - is considering The Pledge of the Computing Professional, which touches upon many themes relevant to Data Science - and also pointed out that INFORMS has Analytics Certification programs. He thinks these organizations will be adequate to develop data science.
Mr. Piatetsky-Shapiro asserted that while a code of professional conduct is a noble goal, it is meaningless without a central organization that promotes and enforces this goal, and currently data science is such a diverse field that central organization is very unlikely. Just looking at current Data Sceince related meetings on www.kdnuggets.com/meetings page, we see meetings sponsored by research societies like ACM, IEEE, INFORMS, SIAM, commercial companies like O’Reilly, GigaOM, IEG, Big Data Companies like IBM, SAS, EMC, and many others. It looks very unlikely that all these diverse interests will agree to a single organization to enforce any codeof conduct.
Further, a recent KDnuggets Poll (March 2013) found that a majority of data scientists voted against a pledge. Yet a majority of non-data scientists supported the pledge suggesting that consumers of data science would welcome and favor a data science code of professional conduct.
Mr. Walker responded that data science is a new field that encompasses a variety of skill sets from different disciplines and desperately requires a professional body to develop canons that incorporate and blend scientific methods from a myriad of disciplines. The blend of scientific methods will create something new and relying on the scientific methods of math, statistics, computer engineering and others - alone - is not sufficient. Data science requires its own professional canons.
Mr Walker also asserted that - while a majority of data scientists may not at this time favor a "pledge" - a large majority of data science consumers would likely favor hiring a data scientist who is certified and is required to honor a code of professional conduct - similar to certified public accountants, lawyers and physicians. Considering the significant damage data science malpractice can cause, Walker speculated that the market would favor certified, professionalized data scientists. Moreover, a professional code can protect data scientists from unethical and illegal client conduct.
Mr. Walker suggested that we should learn from other professions like law and medicine - adopt the good and remove the bad to mitigate the negatives of a profession. To earn and maintain trust and credibility, data science must follow traditional scientific methods, innovate new methods and follow a code of professional conduct.