Hank: A Fast, Open-Source, Batch-Updatable, Distributed Key-Value Store
Hank is a new open source distributed database project that is part of Rapleaf. Their system utilizes Hadoop and Cascading as the processing component. However, they failed to find a suitable database that was fast, scalable, and wouldn’t degrade performance during updates.
Other requirements included:
Random reads need to be fast – reliably on the order of a few milliseconds
Datastores need to scale to terabytes, with keys and values on the order of kilobytes
We need to be able to push out hundreds of millions of updates a day, but they don’t have to happen in realtime. Most will come from our Hadoop cluster
Read performance should not suffer while updates are in progress
So they built their own. Hank consists of a fast, read-only data server backed by a custom-designed batch-updatable file format, a set of tools for writing these files from Hadoop, and a special daemon process that manages the deploy of data from the Hadoop cluster to the actual server machines.
Get code @ https://github.com/bryanduxbury/hank
Learn more: http://blog.rapleaf.com/dev/2011/03/15/announcing-hank-a-fast-open-source-batch-updatable-distributed-key-value-store/
Using Redis with Ruby on Rails
Redis is an extremely fast, atomic key-value store. It allows the storage of strings, sets, sorted sets, lists and hashes. Redis keeps all the data in RAM, much like Memcached but unlike Memcached, Redis periodically writes to disk, giving it persistence.
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
You can run atomic operations on these types, like appending to a string; incrementing the value in a hash; pushing to a list; computing set intersection, union and difference; or getting the member with highest ranking in a sorted set.
In order to achieve its outstanding performance, Redis works with an in-memory dataset. Depending on your use case, you can persist it either by dumping the dataset to disk every once in a while, or by appending each command to a log.
Pentaho and 10gen Collaborate to Integrate MongoDB in Enterprise Architectures
New Partnership Combines Pentaho’s Cutting Edge Data Integration and Visualization Tools with Leading NoSQL Technology MongoDB
Pentaho Corporation, delivering the future of business analytics, and 10gen, the company behind MongoDB, today announced a partnership to provide direct integration between Pentaho Business Analytics and MongoDB. As enterprise data architectures continue to evolve, customers are looking to address rapidly changing multi-structured data and take advantage of cloud-like architectures. This alliance brings the data integration, data discovery and visualization capabilities of Pentaho to MongoDB. The native integration between Pentaho and MongoDB helps enterprises take advantage of the flexible, scalable data storage capabilities of MongoDB while ensuring compatibility and interoperability with existing data infrastructure.
Pentaho and 10gen have developed connectors to tightly integrate MongoDB and Pentaho Business Analytics. By adding MongoDB integration to its existing vast library of connectors for relational databases, analytic databases, data warehouses, enterprise applications, and standards-based information exchange formats, Pentaho provides the richest environment for enterprise architects, developers, data scientists and analysts for both MongoDB and existing databases. Enterprise architects benefit from a scalable data integration framework functioning across MongoDB and other data stores, and developers gain access to familiar graphical interfaces for data integration and job management with full support for MongoDB. Data scientists and analysts can now visualize and explore data across multiple data sources, including MongoDB.
OrientDB - The Fastest NoSQL Document-Graph DBMS
OrientDB is an open source NoSQL database management system written in Java. Even if it is a document-based database, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes.
It has a strong security profiling system based on users and roles and supports SQL as a query language. OrientDB uses a new indexing algorithm called MVRB-Tree, derived from the Red-Black Tree and from the B+Tree; this reportedly has benefits of having both fast insertions and ultra fast lookups.
Learn more: http://www.slideshare.net/aemadrid/orientdb
Supports ACID transactions
Data stored in JSON Documents
Support for both Java and JRuby, amongst many other languages as well
SQL style queries in a NoSQL world
Oracle Cloud Platform Services
Built on a common, complete, standards-based and enterprise-grade set of infrastructure components, Oracle Cloud Platform Services enable customers to speed time to market and lower costs by quickly building, deploying and managing bespoke applications. Oracle Cloud Platform Services will include:
Database Services to manage data and build database applications with the Oracle Database.
Java Services to develop, deploy and manage Java applications with Oracle WebLogic.
Developer Services to allow application developers to collaboratively build applications.
Web Services to build Web applications rapidly using PHP, Ruby, and Python.
Mobile Services to allow developers to build cross-platform native and HTML5 mobile applications for leading smartphones and tablets.
Documents Services to allow project teams to collaborate and share documents through online workspaces and portals.
Sites Services to allow business users to develop and maintain visually engaging .com sites
Analytics Services to allow business users to quickly build and share analytic dashboards and reports through the Cloud.
Oracle Cloud Application Services
Oracle Cloud Application Services provides customers access to the industry’s broadest range of enterprise applications available in the cloud today, with built-in business intelligence, social and mobile capabilities. Easy to setup, configure, extend, use and administer, Oracle Cloud Application Services will include:
ERP Services: A complete set of Financial Accounting, Project Management, Procurement, Sourcing, and Governance, Risk & Compliance solutions.
HCM Services: A complete Human Capital Management solution including Global HR, Workforce Lifecycle Management, Compensation, Benefits, Payroll and other solutions.
Talent Management Services: A complete Talent Management solution including Recruiting, Sourcing, Performance Management, and Learning.
Sales and Marketing Services: A complete Sales and Marketing solution including Sales Planning, Territory Management, Leads & Opportunity Management, and Forecasting.
Customer Experience Services: A complete Customer Service solution including Web Self-Service, Contact Centers, Knowledge Management, Chat, and e-mail Management.
Oracle Cloud Social Services
Oracle Cloud Social Services provides the most broad and complete enterprise social platform available in the cloud today. With Oracle Cloud Social Services, enterprises can engage with their customers on a range of social media properties in a comprehensive and meaningful fashion including social marketing, commerce, service and listening. The platform also provides enterprises with a rich social networking solution for their employees to collaborate effectively inside the enterprise. Oracle’s integrated social platform will include:
Oracle Social Network to enable secure enterprise collaboration and purposeful social networking for business.
Oracle Social Data Services to aggregate data from social networks and enterprise data sources to enrich business applications.
Oracle Social Marketing and Engagement Services to enable marketers to centrally create, publish, moderate, manage, measure and report on their social marketing campaigns.
Oracle Social Intelligence Services to enable marketers to analyze social media interactions and to enable customer service and sales teams to engage with customers and prospects effectively.
Benefits of Data Virtualization
Data virtualization is the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and
Consuming applications may include: business intelligence, analytics, CRM, enterprise resource planning, and more across both cloud computing platforms and on-premises.
Data Virtualization Benefits:
● Decision makers gain fast access to reliable information
● Improve operational efficiency - flexibility and agility of integration due to the short cycle creation of virtual data stores without the need to touch underlying sources
● Improved data quality due to a reduction in physical copies
● Improved usage through creation of subject-oriented, business-friendly data objects
● Increases revenues
● Lowers costs
● Reduces risks
Data virtualization abstracts, transforms, federates and delivers data from a variety of sources and presents itself as a single access point to a consumer regardless of the physical location or nature
of the various data sources.
Data virtualization is based on the premise of the abstraction of data contained within a variety of data sources (databases, applications, file repositories, websites, data services vendors, etc.) for
the purpose of providing a single-point access to the data and its architecture is based on a shared semantic abstraction layer as opposed to limited visibility semantic metadata confined to a single
Data Virtualization software is an enabling technology which provides the following capabilities:
• Abstraction – Abstract data the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.
• Virtualized Data Access – Connect to different data sources and make them accessible from one logical place
• Transformation / Integration – Transform, improve quality, and integrate data based on need across multiple sources
• Data Federation – Combine results sets from across multiple source systems.
• Flexible Data Delivery – Publish result sets as views and/or data services executed by consuming application or users when requested In delivering these capabilities, data virtualization also addresses requirements for data security, data quality, data governance, query optimization, caching, etc. Data virtualization software includes functions for development, operation and management.
Data virtualization is almost a requirement in building the data warehouse BI infrastructure.
Enterprise Information Integration (EII) and data federation have been used by some vendors to describe a core element of data virtualization: the capability to create relational JOINs in a federated VIEW. Some forms of legacy data virtualization build on knowledge and concepts developed within EII and Data Federation.
Newer types of data virtualization do not always require movement of the data to construct the view. They may allow you to see the results of the relational joins before any data is moved anywhere.
This additional capability is a very significant differentiation point between legacy data virtualization vendors (older EII technology) and newer technologies based upon persistent metadata servers.
Virtual master data management (Virtual MDM) utilizes data virtualization and a persistent metadata server to implement a multi-level automated MDM hierarchy.
I suggest virtual MDM is critical to the architecture of a solid business intelligence / data analytics platform.
Recent developments in business intelligence (BI) provide more usable and quality data for smarter decision making and spending.
I think data virtualization and virtual master data management (Virtual MDM) is the right architecture to implement a multi-level automated MDM hierarchy. MDM can be defined as a set of policies, procedures, applications and technologies for harmonizing and managing the system of record and systems of entry for the data and metadata associated with the key business entities of an organization.
Hierarchy management is defined as the ability to define and store relationships between master-data records in the MDM hub.
Relationships are a critical part of the master data: Products are sold by salesmen, employees work for managers, companies have subsidiaries, sales territories contain customers, and products are made from parts. All these relationships make your master data more useful.
Data virtualization and a multi-level automated MDM hierarchy enables one of the main objectives of an MDM system: to publish an integrated, accurate, and consistent set of master data for use by other applications and users. This integrated set of master data is called the master data system of record (SOR). The SOR is the gold copy for any given piece of master data, and is the single place in an organization that the master data is guaranteed to be accurate and up to date.
Although an MDM system publishes the master data SOR for use by the rest of the IT environment, it is not necessarily the system where master is created and maintained. The system responsible for maintaining any given piece of master data is called the system of entry (SOE). In most organizations today, master data is maintained by multiple SOEs.
Customer data is an example. A company may, for example, have customer master data that is maintained by multiple Web store fronts, by the retail organization, and by the shipping and billing systems. Creating a single SOR for customer data in such an environment is a complex task.
The long term goal of an enterprise MDM environment is to solve this problem by creating an MDM system that is not only the SOR for any given type of master data, but also the SOE as well.
● Improving business agility
● Providing a single trusted view of people, processes and applications
● Allowing strategic decision making
● Enhancing customer relationships
● Reducing operational costs
● Increasing compliance with regulatory requirements
MDM helps organizations handle four key issues:
● Data redundancy
● Data inconsistency
● Business inefficiency
● Supporting business change
Organizations must retain data for a certain amount of time to adhere to compliance requirements, but keeping this data for more time than you need poses potential risk.
What security policies do you suggest for proper data destruction?
What tech tools and business functions do you suggest for the data destruction process?
Organizations need to carefully think about the right policy for keeping and destroying data. Legal regulations and liability risk versus the need for big amounts of data for business analytics are critical considerations. This is a business decision - not a tech decision - the tech can support whatever policy controls.
For business analytics - you want to keep as much data as possible to play around with. Liability concerns militates towards data destruction. Every organization is different and needs to make the best compromise.
What security policies do you suggest for proper data destruction?
This is a business process decision. Most new tech can support proper data destruction. Rose or other professionals can help select the right tech and build and implement data destruction processes. Every organization has different needs.
What tech tools and business functions do you suggest for the data destruction process?
There are a number of tech tools on the market that can do the job. More important is the organization data sanitization and data destruction processes.
Data sanitization is the process of deliberately, permanently, irreversibly removing or destroying the data stored on a memory device. The devices include magnetic disks, flash memory devices, CDs and DVDs, and PDAs (Palm Pilots, Pocket PCs and Smart phones). A device that has been sanitized has no usable residual data and even advanced forensic tools should not ever be able recover erased data. Exceptions tend to be specialized hardware used by large government agencies to recover sanitized data under special extreme circumstances. It is possible to sanitize a single file, a set of files, or an entire disk or device.
Sanitization processes include using a software utility that completely erases the data, a separate hardware device that connects to the device being sanitized and erases the data, and/or a mechanism that physically destroys the device so its data can not be recovered.
Data destruction policies must comply with the law - specific for each industry. Your process should guide the organization in deliberately and irreversibly removing and destroying old data stored on your systems. This destruction is intended to be permanent.
Having a consistent data destruction policy followed by everyone at all times is vital, especially when you are faced with litigation. Legally and properly destroying data prevents extensive fishing expeditions by your opponents in litigation (which is a legalized and ritualized form of warfare). A regular business process addressing data destruction should also get you some "safe harbor" protections under the Federal Rules of Evidence relating to electronic evidence should litigation arise. I hate to use the word "should," but every situation is different. Be aware that the safe harbor protections exist. You should work with your tech attorney to take advantage of them.
A data destruction policy is the second part of your data retention policy. Completing and implementing your data retention policy will help you determine where you store your data, which makes it somewhat easier to delete old data you no longer need. Once you have mapped out of where you store your stuff and developed a policy on how long you need to keep it, you must formalize the destruction process.
Naturally, your data destruction policy must handle media leaving the control of your company differently than media simply being reused internally. However, even then, different procedures may apply for media used by different departments. The general rule for the disposal of any data, even when media is reused internally, is that simple deletion and overwriting of data is not enough.
When reusing media, you must create processes whereby your company wipes the old data, validates the data is gone and media can be reused, and then documents the completion of the process. Only upon completion of these steps should you release the storage media for reuse.
Things get more complex with media that leaves the control of your company. Whether you are destroying your old media or reselling it to another party, your data destruction policy must require additional processes. Your policy has to cover the purging and destruction of data and sometimes the physical destruction of media.
In developing and implementing your data destruction policy, you face the challenge of coming up with a level of destruction that is appropriate for a particular situation. Simple deletion and overwriting of data on media you retain and reuse may be appropriate in some instances. In other situations, you may require the total physical destruction of your media that may include disintegration, shredding, incineration, pulverization, or melting your media.
Whether an organization is obligated to take certain steps in destroying your data really depends on the laws, rules, or regulations that regulate you. Regulated industries have requirements in place through a variety of sources. For example, depending on your industry you may have to look to Sarbanes-Oxley, Graham-Leach-Bliley, the Fair and Accurate Credit Transactions Act, or HIPAA for guidance. These laws may say you need to keep your data for a certain period. Check with your tech attorney who can provide guidance on what laws, rules, and regulations apply to your situation.
If you are not heavily regulated, you can look to some of the other destruction standards out there. The U.S. Department of Defense standards and methods might be good places to start, but do not forget other sources. Look to international, national, state, and local laws, rules, and regulations for guidance. Look also to international standards such as the National Institute of Standards and Technology's "Guidelines for Media Sanitization."
After your review of the applicable laws, rules, and regulations, you need to add steps to your data destruction policy. Your data destruction policy needs to address how to classify and handle each type of data residing on your media. Your policy needs a process for the review and categorization of the types of data and what kinds can be removed. Classifications and contents of your data will also play a role. Data and media containing confidential information, trade secrets, and the private information of your customers requires the strictest controls and destruction methods. Data and media containing little to no risk may have relaxed levels of control and destruction.
Do not forget to look at your contracts with other companies to ensure you are handling data destruction within the terms of those contacts. For example, non-disclosure agreements sometimes contain data destruction terms and you must comply with those terms.
Educate your people and verify they are complying with your policy. This is particularly important with media that you are not destroying, but instead are reselling or recycling. You should take samplings as appropriate to ensure you maintain the proper levels of destruction. If you are doing the data destruction in-house, you need to verify your data sanitation and destruction tools and equipment are functioning properly and maintained appropriately.
Document the entire data destruction policy so you will know what media is sanitized and destroyed. Your documentation should allow you to quickly answer those who, what, where, when, why, and how questions.
Finally, the last step of an effective data destruction policy is to have a process in place so you can follow up with regularly scheduled testing of your process and media to ensure the effectiveness of your policy.
I am very much pleased with the contents you have mentioned. I enjoyed every little bit part of it. It contains truly information. I want to thank you for this informative read; I really appreciate sharing this great.
The post is written in very a good manner and it entails much useful information for me. I appreciated what you have done here. I am always searching for informative information like this. Thanks for sharing with us.
Really I impressed from this post. The person who created this post is a genius and knows how to keep the readers connected. Thanks for sharing this with us.
I enjoy a amazing useful content like this one. The aspects designed are exclusive. I talk about many of the opinions of this author. Thank you.
That's another business opportunity windows for all of us.
Our mission is to identify, design, customize and implement smart technologies / systems that can interact with the human race faster, cheaper and better.
Application Performance Monitoring
Application Security Testing
Backup Recovery Software
Benefits Of Data Virtualization
Business Cloud Strategy
Business Improvement Priorities
Business Improvement Priorities
Business Intelligence And Analytics Platform
Business Process Analysis Tools
Business Smartphone Selection
Business Technologies Watchlist
Client Management Tools
Cloud Assessment Framework
Cloud Business Usage Index
Cloud Deployment Model
Cloud Deployment Model Attributes
Cloud Strategies Online Collaboration
Core Technology Rankings
Corporate Learning Systems
Crm Multichannel Campaign Management
Customer Communications Management
Customer Management Contact Center Bpo
Customer Relationship Management
Customer Service Contact Centers
Data Analytics Lifecycle
Database Management Systems
Data Center. Database
Data Center Outsourcing
Data Center Outsourcing And Infrastructure Utility Services
Data Integration Tools
Data Loss Prevention
Data Management Stack
Data Quality Tools
Data Volume Variety Velocity
Data Volume Variety Velocity Veracity
Data Warehouse Database Management Systems
Dr. David Ferrucci
Dr. John Kelly
E Discovery Software
Emerging Technologies And Trends
Employee-Owned Device Program
Employee Performance Management
Endpoint Protection Platforms
Enterprise Architecture Management Suites
Enterprise Architecture Tools
Enterprise Content Management
Enterprise Data Warehousing Platforms
Enterprise Mobile Application Development
Enterprise Resource Planning
Enterprise Service Bus
Enterprise Social Platforms
Global It Infrastructure Outsourcing 2011 Leaders
Global Knowledge Networks
Global Network Service Providers
Hadoop Technology Stack
Hadoop Technology Stack
Hardware As A Service
Health Care And Big Data
Hidden Markov Models
High Performance Computing
Ibm Big Data Platform
Information Capabilities Framework
Infrastructure As A Service
Infrastructure Utility Services
Integrated It Portfolio Analysis Applications
Integrated Software Quality Suites
Internet Of Things
Internet Trends 2011
It Innovation Wave
Key Performance Indicators
Kindle Fire Tablet
Long Term Evolution Network Infrastructure
Managed Security Providers
Marketing Resource Management
Marketing Resource Management
Master Data Management
Microsoft Big Data Platform
Microsoft Dynamics Ax
Mobile App Internet
Mobile Application Development
Mobile Business Application Priorities
Mobile Business Intelligence
Mobile Consumer Application Platforms
Mobile Data Protection
Mobile Development Tool Selection
Mobile Device Management
Mobile Device Management Software Magic Quadrant 2011
Mobile Internet Trends
Mobile Payment System
Modular Disk Arrays
Natural Language Processing
N-gram Language Modeling
Pioneering The Science Of Information
Platform As A Service
Primary Storage Reduction Technologies
Real Time Analytics
Real-time Bidding Ad Exchange
Retail Marketing Analytics
Sales Force Automation
Sap Big Data Platform
Scenario-Based Enterprise Performance Management (EPM)
Security Information & Event Management
Self-Service Business Intelligence
Service Oriented Architecture
Software As A Service
Sony Tablet S
Survey Most Important It Priorities
Technology Industry Report Card
Technology M&A Deals
Vendor Due Diligence
Vertical Industry It Growth
Web Content Management