Neo4j is the world's leading NoSQL graph database. Open Source. Dual-licensed: GPLv3 and AGPLv3 / commercial.
Neo4j is a high-performance, NoSQL graph database with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.
For many applications, Neo4j offers performance improvements on the order of 1000x or more compared to relational DBs.
Learn more: http://neo4j.org
HailDB is a relational database that is embeddable within applications. Not a SQL database, although you can use this library as the storage backend for a SQL database.
You embed HailDB by linking to a shared library and calling a clean and simple API. HailDB is a continuation of the Embedded InnoDB project. It is not itself a database server, but is a library implementing the storage layer. With the addition of the HailDB plugin to Drizzle you get a full SQL interface.
Open source download: https://code.launchpad.net/haildb
Secrets of Building Realtime Big Data Systems
Essentials of a data system
Robust to machine failure and error
Low latency reads and updates
Allows ad-hoc analysis
Batch layer is used for a majority of historical data
Speed layer is used for data that has not quite made it to the batch layer
Speed layer is transient data that eventually is overridden by the batch layer
Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!
The Rationale page on the wiki explains what Storm is and why it was built. This presentation is also a good introduction to the project.
Storm has a website at storm-project.net. Follow @stormprocessor on Twitter for updates on the project.
Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Cascalog is a fully-featured data processing and querying library for Clojure or Java.
The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.
Simple - Functions, filters, and aggregators all use the same syntax. Joins are implicit and natural.
Expressive - Logical composition is very powerful, and you can run arbitrary Clojure code in your query with little effort.
Interactive - Run queries from the Clojure REPL.
Scalable - Cascalog queries run as a series of MapReduce jobs.
Query anything - Query HDFS data, database data, and/or local data by making use of Cascading's "Tap" abstraction
Careful handling of null values - Null values can make life difficult.
Cascalog has a feature called "non-nullable variables" that makes dealing with nulls painless.
First class interoperability with Cascading - Operations defined for
Cascalog can be used in a Cascading flow and vice-versa
First class interoperability with Clojure - Can use regular Clojure functions as operations or filters, and since Cascalog is a Clojure
DSL, you can use it in other Clojure code.
Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multithreaded designs.
Learn more @ http://en.wikibooks.org/wiki/Clojure_Programming
Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. Instead of writing a load of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business.
Nanny is a simple dependency management system for your projects. Unlike tools like Maven, Nanny can be used for arbitrary dependencies and is easy to use.
Nanny lets you specify dependencies to your project, and Nanny will go ahead and pull in all the dependencies (and everything those dependencies are dependent on) into the _deps folder in your project. Nanny makes it easy to create dependencies and manage dependency versions.
Nanny has a minimum of configuration. We use Nanny at BackType to manage all of our jars (external and internal), distribute custom software builds (like hadoop+confs, cassandra), and manage dependencies between our python projects (instead of something like svn externals or git submodules).
Check it out @ https://github.com/nathanmarz/nanny
Dependency management in software projects is a pretty simple problem when you think about it. A tool to manage dependencies just needs to do three things:
Provide a mechanism to specify the direct dependencies to a project
Download the transitive closure of dependencies to a project
Publish packages that can be used as a dependency to other projects
Some languages have good dependency management systems - for example, rubygems. Others, like Java, have tools like Maven which I would call a complex solution to a simple problem. You shouldn't need to buy a book to understand the solution to such a simple problem. Plus, these dependency management systems are all language specific.
I've seen companies do crazy things to manage their dependencies. One company, to manage their jar files, would put all the jars that any project might need in a special "jars" project. You would then need to setup a JARS_HOME environment variable and be sure to update the jars project if you need any of the dependencies. If you needed an older version of something - forget about it. Plus it made deploys a huge pain, as each project had to ship with dependencies it didn't even use.
Enter Nanny. Nanny makes it really easy to setup an internal repository to manage dependencies between projects.
RabbitMQ is an open source message broker software (i.e., message-oriented middleware) that implements the Advanced Message Queuing Protocol (AMQP) standard.
The RabbitMQ server is written in Erlang and is built on the Open Telecom Platform framework for clustering and failover.
The Advanced Message Queuing Protocol (AMQP) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security.
Scout is the easy-to-use server monitoring solution installed on thousands of servers since 2007
The Apache Jackrabbit content repository is a fully conforming implementation of the Content Repository for Java Technology API.
A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more.
4QL is a rule-based database query language with negation allowed in bodies and heads of rules.
4QL is the first such language with tractable and at the same time intuitive semantics, even though the area of deductive databases is over 30 years old.
Founded on a four-valued semantics with truth values: true, false, inconsistent and unknown.
bddbddb stands for BDD-Based Deductive DataBase. It is an implementation of Datalog, a declarative programming language similar to Prolog for talking about relations. What makes bddbddb unique is that it represents the relations using binary decision diagrams (BDDs). BDDs are a data structure that can efficiently represent large relations and provide efficient set operations. This allows bddbddb to efficient represent and operate on extremely large relations - relations that are too large to represent explicitly.
We use bddbddb primarily as a tool for easily and efficiently specifying program analyses. We represent the entire program as database relations. Developing a program analysis becomes as simple as writing the specification for the analysis in a declarative style and then feeding that specification to bddbddb, which automatically transforms your specification into efficient BDD operations.
Using bddbddb for program analysis has a number of advantages:
First, it closes the gap between the algorithm specification and its implementation. In bddbddb, the algorithm specification is automatically translated into an implementation, so as long as your algorithm specification is correct you can be reasonably sure that your implementation will also be correct.
Second, because BDDs can efficiently handle exponential relations, it allows us to solve heretofore unsolved problems in program analysis, such as context-sensitive pointer analysis for large programs.
Third, it makes program analysis accessible, and dare I say it, actually fun. Trying out a new idea in program analysis used to be confined to the realm of experts and compiler writers, and would take weeks to months of tedious effort to implement and debug.
With bddbddb, writing a new analysis is simply a matter of writing a few straightforward inference rules. The tool takes care of most of the tedious part and helps you develop powerful program analyses easily.
The Apache Lucene project develops open-source search software, including:
Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
Solr is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
Open Relevance Project is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
PyLucene is a Python port of the Core project.
Lucene can be downloaded from http://lucene.apache.org/core/mirrors-core-latest-redir.html
Solr can be downloaded from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
Oroboro is a lightweight Java RDF processing framework. Its design focuses on rule-based data extraction and integration tasks involving moderate datasets, targeting common use cases while striving to remain as simple and flexible as possible.
Compact task-oriented API and scriptable command-line shell
Simple RDF model with pluggable datatypes and sets navigation
Datalog query/update engine supporting OWL RL and XPath 2.0
Codecs for common RDF serialization formats
XML/XSLT processing tools and SAX-based data adapters
Single JAR distribution with no external dependencies
IRIS Reasoner (Integrated Rule Inference System) is an extensible reasoning engine for expressive rule-based languages.
Extensible reasoning engine for Datalog extended with function symbols, unsafe rules, negation, locally stratified or non-stratified programs, XML schema data types and a comprehensive and extensible set of built-in predicates.
ConceptBase.cc is a multi-user deductive database system with an object-centered data model. Its ability to represent information at any abstraction level (data, class, metaclass, meta-metaclass, etc.) makes it a powerful tool for metamodeling and engineering of customized modeling languages.
The system is accompanied by a highly configurable graphical user interface that builds upon the logic-based features of the ConceptBase.cc server.
LogicBlox is a cloud-delivered platform-as-a-service that enables the rapid development of adaptive and actionable Big Data enterprise-class applications.
Two Midtown Plaza
1349 West Peachtree Street NW
Suite 1880, Atlanta, GA 30309
OrientDB - The Fastest NoSQL Document-Graph DBMS
OrientDB is an open source NoSQL database management system written in Java. Even if it is a document-based database, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes.
It has a strong security profiling system based on users and roles and supports SQL as a query language. OrientDB uses a new indexing algorithm called MVRB-Tree, derived from the Red-Black Tree and from the B+Tree; this reportedly has benefits of having both fast insertions and ultra fast lookups.
Learn more: http://www.slideshare.net/aemadrid/orientdb
Supports ACID transactions
Data stored in JSON Documents
Support for both Java and JRuby, amongst many other languages as well
SQL style queries in a NoSQL world
I am commenting to let you know what a terrific experience I enjoyed reading through your web page. I noticed a wide variety of pieces, with the inclusion of what it is like to have an awesome helping style to have the rest without hassle grasp some grueling matters.
<a href="http://www.besanttechnologies.com/training-courses/data-warehousing-training/big-data-hadoop-training-institute-in-bangalore">Big Data Training in Bangalore </a>
Our mission is to identify, design, customize and implement smart technologies / systems that can interact with the human race faster, cheaper and better.
Application Performance Monitoring
Application Security Testing
Backup Recovery Software
Benefits Of Data Virtualization
Business Cloud Strategy
Business Improvement Priorities
Business Improvement Priorities
Business Intelligence And Analytics Platform
Business Process Analysis Tools
Business Smartphone Selection
Business Technologies Watchlist
Client Management Tools
Cloud Assessment Framework
Cloud Business Usage Index
Cloud Deployment Model
Cloud Deployment Model Attributes
Cloud Strategies Online Collaboration
Core Technology Rankings
Corporate Learning Systems
Crm Multichannel Campaign Management
Customer Communications Management
Customer Management Contact Center Bpo
Customer Relationship Management
Customer Service Contact Centers
Data Analytics Lifecycle
Database Management Systems
Data Center. Database
Data Center Outsourcing
Data Center Outsourcing And Infrastructure Utility Services
Data Integration Tools
Data Loss Prevention
Data Management Stack
Data Quality Tools
Data Volume Variety Velocity
Data Volume Variety Velocity Veracity
Data Warehouse Database Management Systems
Dr. David Ferrucci
Dr. John Kelly
E Discovery Software
Emerging Technologies And Trends
Employee-Owned Device Program
Employee Performance Management
Endpoint Protection Platforms
Enterprise Architecture Management Suites
Enterprise Architecture Tools
Enterprise Content Management
Enterprise Data Warehousing Platforms
Enterprise Mobile Application Development
Enterprise Resource Planning
Enterprise Service Bus
Enterprise Social Platforms
Global It Infrastructure Outsourcing 2011 Leaders
Global Knowledge Networks
Global Network Service Providers
Hadoop Technology Stack
Hadoop Technology Stack
Hardware As A Service
Health Care And Big Data
Hidden Markov Models
High Performance Computing
Ibm Big Data Platform
Information Capabilities Framework
Infrastructure As A Service
Infrastructure Utility Services
Integrated It Portfolio Analysis Applications
Integrated Software Quality Suites
Internet Of Things
Internet Trends 2011
It Innovation Wave
Key Performance Indicators
Kindle Fire Tablet
Long Term Evolution Network Infrastructure
Managed Security Providers
Marketing Resource Management
Marketing Resource Management
Master Data Management
Microsoft Big Data Platform
Microsoft Dynamics Ax
Mobile App Internet
Mobile Application Development
Mobile Business Application Priorities
Mobile Business Intelligence
Mobile Consumer Application Platforms
Mobile Data Protection
Mobile Development Tool Selection
Mobile Device Management
Mobile Device Management Software Magic Quadrant 2011
Mobile Internet Trends
Mobile Payment System
Modular Disk Arrays
Natural Language Processing
N-gram Language Modeling
Pioneering The Science Of Information
Platform As A Service
Primary Storage Reduction Technologies
Real Time Analytics
Real-time Bidding Ad Exchange
Retail Marketing Analytics
Sales Force Automation
Sap Big Data Platform
Scenario-Based Enterprise Performance Management (EPM)
Security Information & Event Management
Self-Service Business Intelligence
Service Oriented Architecture
Software As A Service
Sony Tablet S
Survey Most Important It Priorities
Technology Industry Report Card
Technology M&A Deals
Vendor Due Diligence
Vertical Industry It Growth
Web Content Management