RECOMMENDATION ENGINES - ABSTRACT
Recommendation Engines (RE) are software tools and techniques providing item suggestions to a user. The massive growth and variety of information can often overwhelm, leading to poor decisions. While choice is good, more choice is not always better. REs have proved in recent years to be a valuable means for coping with the information overload problem.
In their simplest form, personalized recommendations are offered as ranked lists of items. In performing this ranking, REs try to predict what are the most suitable products or services for a user, based on their preferences and constraints. In order to complete this computational task, REs collect preferences from users, which are either explicitly expressed (e.g., as ratings for products) or are inferred by interpreting user actions. For instance, a RE may consider the navigation to a particular product page as an implicit sign of preference for the items shown on that page.
Amazon's RE for example relies on a basic formula (collaborative filtering) that suggests products to you based on your viewing history, your purchase history and which related products other customers bought.
Tom Rampley is a data scientist with a background in finance and psychology. He received his MBA from Indiana University’s Kelley School of Business in 2012, with concentrations in finance and business analytics. Since graduation, he has been working within the Viewer Measurement group at Dish Network LLC on customer segmentation models, the development of recommendation engines, and the implementation of big data IT platforms. He prefers R to SAS, Python to any other scripting language, and while trained as a frequentist currently considers himself Bayes-curious. Outside of work he is married with no kids (yet), a lifelong martial artist, and endlessly nostalgic for the days when he played lead guitar in his grad school rock band. This is his first Data Science meetup presentation.
ACCUMULO - SQRRL NOSQL DATABASE - ABSTRACT
Apache Accumulo is an open-source highly secure NoSQL database created in 2008 by the National Security Agency. It easily integrates with Hadoop, can securely handle massive amounts of structured and unstructured data - at scale cost-effectively - and enables users to move beyond traditional batch processing and conduct a wide variety of real-time analyses. Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is a system built on top of Hadoop, ZooKeeper and Thrift. Written in Java, Accumulo has cell-level access labels and a server-side programming mechanisms.
Accumulo offers "Cell-Level Security" - extending the BigTable data model, adding a new element to the key called "Column Visibility". This element stores a logical combination of security labels that must be satisfied at query time in order for the key and value to be returned as part of a user request. This allows data of varying security requirements to be stored in the same table, and allows users to see only those keys and values for which they are authorized.
Sqrrl Enterprise, developed by Sqrrl Data, is the operational data store for large amounts of structured and unstructured data. It is the only NoSQL solution that scales elastically to tens of petabytes of data and that has fine-grained security controls. Sqrrl Enterprise enables development of real-time applications on top of BigData. Sqrrl uses HDFS for storage; Accumulo for security/speed of access; Thrift API for interactivity; and works with map/reduce, visualizations, third party software, and existing schema explored databases.
This presentation reviews Accumulo and Sqrrl Enterprise.
John Dougherty is CIO for Viriton, a consulting and systems integration organization. He is the organizer for Big Data for Business, helping to apply Big Data concepts to C-suite perspectives. He began utilizing applied strategies, using technology, in the early nineties, and has continued to incorporate blue blood technologies in forward thinking solutions.