Master data management (MDM) comprises a set of processes and tools that defines and manages data. MDM lies at the core of many organizations’ operations, and the quality of that data shapes decision making. MDM helps leverage trusted business information—helping to increase profitability and reduce risk.
Master data is reference data about an organization’s core business entitles. These entities include people (customers, employees, suppliers), things (products, assets, ledgers), and places (countries, cities, locations). The applications and technologies used to create and maintain master data are part of a master data management (MDM) system. Data governance encompasses the people, processes, and technology required to create a consistent and proper management of an organization's data. It includes data quality, data management, data policies, business process management, and risk management. Data governance is a quality control discipline for assessing, managing, using, improving, monitoring, maintaining, and protecting organizational information. It is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods. A data model is a plan for building a database. To use a common analogy, the data model is equivalent to an architect's building plans. Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. To be effective, it must be simple enough to communicate to the end user the data structure required by the database yet detailed enough for the database design to use to create the physical structure. A data model is a conceptual representation of the data structures that are required by a database. The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects. As the name implies, the data model focuses on what data is required and how it should be organized rather than what operations will be performed on the data. Data modeling is the formalization and documentation of existing processes and events that occur during application software design and development. Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering. A data model can be thought of as a diagram or flowchart that illustrates the relationships between data. Although capturing all the possible relationships in a data model can be very time-intensive, it's an important step and shouldn't be rushed. Well-documented models allow stake-holders to identify errors and make changes before any programming code has been written. Data modeling is also used as a technique for detailing business requirements for specific databases. It is sometimes called database modeling because a data model is eventually implemented in a database. There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system: 1) Conceptual data models. These models, sometimes called domain models, are typically used to explore domain concepts with project stakeholders. On Agile teams high-level conceptual models are often created as part of your initial requirements envisioning efforts as they are used to explore the high-level static business structures and concepts. On traditional teams conceptual data models are often created as the precursor to LDMs or as alternatives to LDMs. 2) Logical data models (LDMs). LDMs are used to explore the domain concepts, and their relationships, of your problem domain. This could be done for the scope of a single project or for your entire enterprise. LDMs depict the logical entity types, typically referred to simply as entity types, the data attributes describing those entities, and the relationships between the entities. LDMs are rarely used on Agile projects although often are on traditional projects (where they rarely seem to add much value in practice). 3) Physical data models (PDMs). PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables. PDMs often prove to be useful on both Agile and traditional projects and as a result the focus of this article is on physical modeling. Although LDMs and PDMs sound very similar, and they in fact are, the level of detail that they model can be significantly different. This is because the goals for each diagram is different – you can use an LDM to explore domain concepts with your stakeholders and the PDM to define your database design. Data Modeling in Context of Business Process Integration Data Modeling in the Context of Database Design
Database design is defined as: "design the logical and physical structure of one or more databases to accommodate the information needs of the users in an organization for a defined set of applications". The design process roughly follows five steps: 1. planning and analysis 2. conceptual design 3. logical design 4. physical design 5. implementation The data model is one part of the conceptual design process. The other, typically is the functional model. The data model focuses on what data should be stored in the database while the functional model deals with how the data is processed. To put this in the context of the relational database, the data model is used to design the relational tables. The functional model is used to design the queries which will access and perform operations on those tables. Data Model Components The data model gets its inputs from the planning and analysis stage. Here the modeler, along with analysts, collects information about the requirements of the database by reviewing existing documentation and interviewing end-users. The data model has two outputs. The first is an entity-relationship diagram which represents the data strucures in a pictorial form. Because the diagram is easily learned, it is valuable tool to communicate the model to the end-user. The second component is a data document. This a document that describes in detail the data objects, relationships, and rules required by the database. The dictionary provides the detail required by the database developer to construct the physical database. Managing data is challenging. Many efforts result in siloed information and fragmented views that damage competitiveness and increase costs. In the modern era of "big data" the best practice may be to create one central data depository with a uniform data governance architecture yet allow each business unit to own their data. The goal is to provide simple ways for both data scientists and non-technical users to explore, visualize and interpret data to reveal patterns, anomalies, key variables and potential relationships. Data Governance and Master Data Management (MDM) design is key to achieving this goal. Master data management (MDM) comprises a set of processes and tools that defines and manages data. MDM lies at the core of many organizations’ operations, and the quality of that data shapes decision making. MDM helps leverage trusted business information—helping to increase profitability and reduce risk. Master data is reference data about an organization’s core business entitles. These entities include people (customers, employees, suppliers), things (products, assets, ledgers), and places (countries, cities, locations). The applications and technologies used to create and maintain master data are part of a master data management (MDM) system. Recent developments in business intelligence (BI) aid in regulatory compliance and provide more usable and quality data for smarter decision making and spending. Virtual master data management (Virtual MDM) utilizes data virtualization and a persistent metadata server to implement a multi-level automated MDM hierarchy. Benefits include: ● Improving business agility ● Providing a single trusted view of people, processes and applications ● Allowing strategic decision making ● Enhancing customer relationships ● Reducing operational costs ● Increasing compliance with regulatory requirements MDM helps organizations handle four key issues: ● Data redundancy ● Data inconsistency ● Business inefficiency ● Supporting business change MDM provides processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and
distributing data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information. MDM seeks to ensure that an organization does not use multiple (potentially inconsistent) versions of the same master data in different parts of its operations and solves issues with the quality of data, consistent classification and identification of data, and data-reconciliation issues. MDM solutions include source identification, data collection, data transformation, normalization, rule administration, error detection and correction, data consolidation, data storage, data distribution, and data governance. MDM tools include data networks, file systems, a data warehouse, data marts, an operational data store, data mining, data analysis, data virtualization, data federation and data visualization. MDM requires an organization to implement policies and procedures for controlling how master data is created and used. One of the main objectives of an MDM system is to publish an integrated, accurate, and consistent set of master data for use by other applications and users. This integrated set of master data is called the master data system of record (SOR). The SOR is the gold copy for any given piece of master data, and is the single place in an organization that the master data is guaranteed to be accurate and up to date. Although an MDM system publishes the master data SOR for use by the rest of the IT environment, it is not necessarily the system where master is created and maintained. The system responsible for maintaining any given piece of master data is called the system of entry (SOE). In most organizations today, master data is maintained by multiple SOEs. Customer data is an example. A company may, for example, have customer master data that is maintained by multiple Web store fronts, by the retail organization, and by the shipping and billing systems. Creating a single SOR for customer data in such an environment is a complex task. The long term goal of an enterprise MDM environment is to solve this problem by creating an MDM system that is not only the SOR for any given type of master data, but also the SOE as well. MDM then can be defined as a set of policies, procedures, applications and technologies for harmonizing and managing the system of record and systems of entry for the data and metadata associated with the key business entities of an organization. Oracle is uniquely qualified to combine everything needed to meet the big data challenge - including software and hardware – into one engineered system. The Oracle Big Data Appliance is an engineered system that combines optimized hardware with the most comprehensive software stack featuring specialized solutions developed by Oracle to deliver a complete, easy-to-deploy solution for acquiring, organizing and loading big data into Oracle Database 11g. It is designed to deliver extreme analytics on all data types, with enterprise-class performance, availability, supportability and security. With Big Data Connectors, the solution is tightly integrated with Oracle Exadata and Oracle Database, so you can analyze all your data together with extreme performance. In-Database Analytics Once data has been loaded from Oracle Big Data Appliance into Oracle Database or Oracle Exadata, end users can use one of the following easy-to-use tools for in-database, advanced analytics: Oracle R Enterprise – Oracle’s version of the widely used Project R statistical environment enables statisticians to use R on very large data sets without any modifications to the end user experience. Examples of R usage include predicting airline delays at a particular airports and the submission of clinical trial analysis and results. In-Database Data Mining – the ability to create complex models and deploy these on very large data volumes to drive predictive analytics. End-users can leverage the results of these predictive models in their BI tools without the need to know how to build the models. For example, regression models can be used to predict customer age based on purchasing behavior and demographic data. In-Database Text Mining – the ability to mine text from micro blogs, CRM system comment fields and review sites combining Oracle Text and Oracle Data Mining. An example of text mining is sentiment analysis based on comments. Sentiment analysis tries to show how customers feel about certain companies, products or activities. In-Database Semantic Analysis – the ability to create graphs and connections between various data points and data sets. Semantic analysis creates, for example, networks of relationships determining the value of a customer’s circle of friends. When looking at customer churn customer value is based on the value of his network, rather than on just the value of the customer. In-Database Spatial – the ability to add a spatial dimension to data and show data plotted on a map. This ability enables end users to understand geospatial relationships and trends much more efficiently. For example, spatial data can visualize a network of people and their geographical proximity. Customers who are in close proximity can readily influence each other’s purchasing behavior, an opportunity which can be easily missed if spatial visualization is left out. In-Database MapReduce – the ability to write procedural logic and seamlessly leverage Oracle Database parallel execution. In-database MapReduce allows data scientists to create high-performance routines with complex logic. In-database MapReduce can be exposed via SQL. Examples of leveraging in-database MapReduce are sessionization of weblogs or organization of Call Details Records (CDRs). ![]()
Hadoop (MapReduce where code is turned into map and reduce jobs, and Hadoop runs the jobs) is great at crunching data yet inefficient for analyzing data because each time you add, change or manipulate data you must stream over the entire dataset. In most organizations, data is always growing, changing, and manipulated and therefore time to analyze data significantly increases. As a result, to process large and diverse data sets, ad-hoc analytics or graph data structures, there must be better alternatives to Hadoop / MapReduce. Google (architect of Hadoop / MapReduce) thought so and architected a better, faster data crunching ecosystem that includes Percolator, Dremel and Pregel. Google is one of the key innovative leaders for large scale architecture. Percolator is a system for incrementally processing updates to a large data sets. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, you significantly speed up the process and reduce the time to analyze data. Percolator’s architecture provides horizontal scalability and resilience. Percolator allows reducing the latency (time between page crawling and availability in the index) by a factor of 100. It allows simplifying the algorithm. The big advantage of Percolator is that the indexing time is now proportional to the size of the page to index and no more to the whole existing index size. See: http://research.google.com/pubs/pub36726.html Dremel is for ad hoc analytics. Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data and allows analysts to scan over petabytes of data in seconds to answer queries. Dremel is capable of running aggregation queries over trillions of rows in seconds and thus is about 100 times faster than MapReduce. Dremel's architecture is similar to Pig and Hive. Yet while Hive and Pig rely on MapReduce for query execution, Dremel uses a query execution engine based on aggregator trees. See: http://research.google.com/pubs/pub36632.html Pregel is a system for large-scale graph processing and graph data analysis. Pregel is designed to execute graph algorithms faster and use simple code. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use.
Pregel is architected for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program. See: http://kowshik.github.com/JPregel/pregel_paper.pdf The Economist Intelligence Unit surveyed over 600 business leaders, across the globe and industry sectors about the use of Big Data in their organizations. The research confirms a growing appetite for data and data-driven decisions and those who harness these correctly stay ahead of the game. The report provides insight on their use of Big Data today and in the future, and highlights the advantages seen and the specific challenges Big Data has on decision making for business leaders. Key Findings: 75% of respondents believe their organizations to be data-driven 9 out of 10 say decisions made in the past 3 years would have been better if they’d had all the relevant information 42% say that unstructured content is too difficult to interpret 85% say the issue is not about volume but the ability to analyze and act on the data in real time More than half (54 percent) of respondents cite access to talent as a key impediment to making the most of Big Data, followed by the barrier of organizational silos (51 percent) Other impediments to effective decision-making are lack of time to interpret data sets (46 percent), and difficulty managing unstructured data (39 percent) 71 percent say they struggle with data inaccuracies on a daily basis 62 percent say there is an issue with data automation, and not all operational decisions have been automated yet Half will increase their investments in Big Data analysis over the next three years The report reveals that nine out of ten business leaders believe data is now the fourth factor of production, as fundamental to business as land, labor, and capital. The study, which surveyed more than 600 C-level executives and senior management and IT leaders worldwide, indicates that the use of Big Data has improved businesses' performance, on average, by 26 percent and that the impact will grow to 41 percent over the next three years. The majority of companies (58 percent) claim they will make a bigger investment in Big Data over the next three years. Approximately two-thirds of 168 North American (NA) executives surveyed believe Big Data will be a significant issue over the next five years, and one that needs to be addressed so the organization can make informed decisions. They consider their companies as 'data-driven,' reportingthat the collection and analysis of data underpins their firm's business strategy and day-to-day decision-making. Fifty-five percent are already making management decisions based on "hard analytic information." Additionally, 44 percent indicated that the increasing volume of data collected by their organization (from both internal and external sources) has slowed down decision-making, but the vast majority (84 percent) feel the larger issue is being able to analyze and act on it in real-time. The exploitation of Big Data is fueling a major change in the quality of business decision-making, requiring organizations to adopt new and more effective methods to obtain the most meaningful results from their data that generate value. Organizations that do so will be able to monitor customer behaviors and market conditions with greater certainty, and react with speed and effectiveness to differentiate from their competition. ![]()
|
Rose TechnologyOur mission is to identify, design, customize and implement smart technologies / systems that can interact with the human race faster, cheaper and better. Archives
May 2017
Categories
All
|