TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Last week we discussed the importance of data scientists prioritizing client confidentiality and the concern of exposing high-value information to (internal or external) data scientists who may share this information with competitors.
In addition, many organizations are reluctant to build modern data analytical ecosystems because of real and perceived data security concerns.
While there are current database security solutions such as Accumulo and private hybrid cloud designs with modern security systems to mitigate potential hacks and protect valuable information, many organizations do not fully understand how vulnerable their current data systems are and do not appropriately analyze the significant risks of a data breach.
The solution is architecting and building a modern data analytical ecosystem with robust data security and designing and executing an information management strategy and knowledge processes to protect high-value information.
The Cyber Risk Report 2015 outlines critical security issues that all organizations must prioritize. Key findings include:
Well-known attacks are still commonplace: Attackers continue to leverage well-known techniques to successfully compromise systems and networks. Many vulnerabilities exploited in 2014 took advantage of code written many years back; some are even decades old. Adversaries continue to leverage these classic avenues for attack.
Misconfigurations are still a problem: The 2013 report documented that a large percentage of vulnerabilities reported were related to server misconfiguration. The trend continued in 2014, with misconfigurations being the number-one issue across all analyzed applications. Our findings show access to unnecessary files and directories dominates the list of misconfiguration-related issues.
Newer technologies introduce new avenues of attack: As new technologies are introduced into the computing ecosystem, they bring with them new attack surfaces and security challenges. This past year saw a rise in already prevalent mobile-malware levels. Even though the first malware for mobile devices was discovered a decade ago, 2014 was the year in which mobile malware stopped being considered just a novelty. Connecting existing technologies to the Internet also bring a new set of exposures. Point-of-sale (PoS) systems were a primary target of multiple pieces of malware in 2014. As physical devices become connected through the Internet of Things (IoT) - a paradigm that brings ubiquitous computing and its security implications closer to the average person.
Determined adversaries are proliferating: Attackers use both old and new vulnerabilities to penetrate all traditional levels of defenses. They maintain access to victim systems by choosing attack tools that will not show on the radar of antimalware and other defense technologies. In some cases, these attacks are perpetrated by actors representing nation-states, or are at least launched in support of nation-states. Network defenders should understand how events on the global stage impact the risk to systems and networks.
Cyber-security legislation is on the horizon: Activity in European and US courts linked information security and data privacy more closely than ever. As legislative and regulatory bodies consider how to raise the general level of security in the public and private spheres, there was an avalanche of reported retail breaches in 2014. This spurred increased concern over how individuals and corporations are affected once private data is exfiltrated and misused. Companies should be aware that new legislation and regulation will affect how they monitor their assets and report on potential incidents.
Secure coding continues to pose challenges: The primary causes of commonly exploited software vulnerabilities are consistently defects, bugs, and logic flaws. Cyber security research professionals have discovered that most vulnerabilities stem from a relatively small number of common software programming errors. While much has been written on how to integrate best secure-coding practices into their daily development work, we continue to see old and new vulnerabilities in software.
Complementary protection technologies fill out coverage: In May 2014, a senior executive of a prominent anti-malware vendor declared antivirus dead. The industry responded with a resounding “no, it is not.” Both are right. Studies show that antimalware software catches only about half of all cyberattacks—a truly abysmal rate. In our review of the 2014 threat landscape, we find that enterprises most successful in securing their environment employ multiple layers of complementary protection technologies. These technologies work best when paired network policies and practices that assume a breach will occur, instead of one that only aims to prevent intrusions and compromise. By using all tools available and not relying on a single product or service, defenders place themselves in a better position to prevent, detect, and recover from attacks.
Does human intelligence have any connection to the type of music a person listens to? Can we define human intelligence? How do you measure human intelligence? Do SAT scores accurately measure human intelligence? Is there any evidence that SAT scores accurately predict educational or workplace performance?
I am skeptical 1) that we can measure human intelligence at this time; and 2) that SAT scores are an accurate measurement of anything save a very narrow form of test-taking ability that adds little if any value in the real world.
Yet Virgil Griffith purports to measure human intelligence with SAT scores and proceeded to chart musical tastes based on average SAT scores. Is this real data science using the scientific method?
Griffith’s chart shows that Sufjan Stevens, Bob Dylan, The Shins, and the Counting Crows as the bands smart people prefer. Really?
The chart also asserts that Lil Wayne, Beyoncé, The Used, and gospel music are preferred by stupid people. Wow!
Dear readers, I sincerely ask you, do smart folks really favor John Mayer over Pink Floyd?
See chart below. Click here for larger image.
Free Live-stream registration here.
NOTE: For folks unable to attend in person register and we will email you a live-stream link prior to event
Location: University of Colorado Denver - 1201 Larimer Street, Denver, CO 80204 - Room ACAD 1600 - Map: https://goo.gl/maps/GKIH5
Date: Saturday, November 7, 2015 - 2:00pm to 6:00pm
Jennifer Evans - Enhanced CART Algorithms - Born Again Decision Trees
Charles Clifford - Machine Learning and Google Big Query
Lee Cole - Machine Learning for Finance & Trading
Andrew Weekley - Extracting Data from Numerical Weather Prediction Files
Chris McHenry - Machine Learning Use-cases with Azure
Machine Learning Contest - http://hackski.com
To date we have over 100 teams around the world registered!
Still time to form a team to compete. Contestant Registration @ http://hackski.com/signup. Contestants will compete to create the best mobile phone app to be made free to the public to make best possible ski venue and route/time decision to get to the ski slopes. A panel of judges shall award USD $10,000 for the best app. Additional prizes will be awarded.
Become a Sponsor: Earn a strong place in the global Data Science Community. Gain visibility, recruitment and learn the latest machine learning techniques. Attendees are a mix of data scientists, IT technical professionals and business decision-makers from organizations of all sizes, representing a wide range of industries.
Sponsor Registration @ http://bit.ly/1IxjXll
Award ceremony and party on Saturday December 5, 2015 from 4:00pm 7:00pm at Level 3 Communications in Broomfield Colorado. Platinum sponsors will have a booth to provide information, demonstrate products and meet people. Keynote, data science presentations and networking. Livestreamed to a global audience.
Brand awareness and media visibility
Logo placement on website
Inclusion in media releases and contacts
Access to a global talent pool
Introductions to data science talent
Prominent name and logo placement onsite
Product endorsement and promotions
Onsite product sampling opportunities
Provide prizes for media promotion
Provide tech for test drive
Learn latest machine learning methods
Access to DSA Members
Access to Big Data Members
Three Levels of Sponsorship
Premium Logo Size and Placement + Demonstration &
Display Opportunities with Booth at Award Ceremony
+ Judge Apps + Name Recognition in all Press
Releases + Logo Placement at Press Photos
Gold Logo Size and Placement + Promotion / Media
Releases + Onsite Logo + Award Prizes
Silver Logo Size and Placement + Media Promotion
Web Site: http://hackski.com
Sponsor Registration @ http://bit.ly/1IxjXll
Many organizations are reluctant to create data science teams (internally or externally) because of information confidentiality and privacy concerns. It is dangerous to open the kimono to competition - disclosing high-value information about inner workings of the firm may cause significant damage.
There is real fear of exposing valuable confidential information to data scientists who may leave the firm and share key knowledge with competitors. Moreover, externally hired data scientists could potentially share critical information with their other clients who may be direct or indirect competitors.
One solution is hiring only data scientists who are governed by the Data Science Code of Professional Conduct and are required to maintain strict client confidentiality.
Rule 5 of the Data Science Association Code of Professional Conduct includes the following confidentiality provision:
Rule 5 - Confidential Information
(a) Confidential information is information that the data scientist creates, develops, receives, uses or learns in the course of employment as a data scientist for a client, either working directly in-house as an employee of an organization or as an independent professional. It includes information that is not generally known by the public about the client, including client affiliates, employees, customers or other parties with whom the client has a relationship and who have an expectation of confidentiality. The data scientist has a professional duty to protect all confidential information, regardless of its form or format, from the time of its creation or receipt until its authorized disposal.
(b) Confidential information is a valuable asset. Protecting this information is critical to a data scientists reputation for integrity and relationship with clients, and ensures compliance with laws and regulations governing the client's industry.
(c) A data scientist shall protect all confidential information, regardless of its form or format, from the time of its creation or receipt until its authorized disposal.
(d) A data scientist shall not reveal information relating to the representation of a client unless the client gives informed consent, the disclosure is impliedly authorized in order to carry out the representation or the disclosure is permitted by paragraph (e).
(e) A data scientist may reveal information relating to the representation of a client to the extent the data scientist reasonably believes necessary:
(1) to prevent reasonably certain death or substantial bodily harm;
(2) to prevent the client from committing a crime or fraud that is reasonably certain to result in substantial injury to the financial interests or property of another and in furtherance of which the client has used or is using the data scientist's services.
(f) A data scientist shall make reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client, which means:
(1) Not displaying, reviewing or discussing confidential information in public places, in the presence of third parties or that may be overheard;
(2) Not e-mailing confidential information outside of the organization or professional practice to a personal e-mail account or otherwise removing confidential information from the client by removing hard copies or copying it to any form of recordable digital media device; and
(3) Communicating confidential information only to client employees and authorized agents (such as attorneys or external auditors) who have a legitimate business reason to know the information.
(g) A data scientist shall comply with client policies that apply to the acceptance, proper use and handling of confidential information, as well as any written agreements between the data scientist and the client relating to confidential information.
(h) A data scientist shall protect client confidential information after termination of work for the client.
(i) A data scientist shall return any and all confidential information in possession or control upon termination of the data scientist - client relationship and, if requested, execute an affidavit affirming compliance with obligations relating to confidential information.