Most Read Latest News Blog Resources

Guest View: Old thinking does a disservice to new data hubs




October 15, 2009 — 
Master Data Management (MDM) deals with master data. Master data are those that are generally the most highly shared and the most critical to successfully meeting the goals of an enterprise. Master data are the most essential sets of core data to an enterprise, which means they have to be accurate. If master data are inconsistent, they could potentially expose an enterprise to significant risk.

Over the past decade, data hubs have become a popular and evolving architectural construct for MDM and other enterprise data management solutions. Yet in my travels, I’m amazed that so many IT professionals still aren’t clear in their understanding of data hubs and their capabilities. The term is used (inaccurately) to mean the same thing as the more traditional operational data stores of the 1980s and 1990s. This is quite maddening because misusing the term adversely affects understanding modern design options of Enterprise Data Management (EDM) and MDM solutions that are enabled by data hubs.

There are some key characteristics and features of data hubs that often are underestimated or misunderstood by enterprise architects and systems integrators. Here are two of the most common misconceptions:

Misconception 1: Data must be cleansed and standardized before it is loaded into the data hub.

For many professionals brought up on the concepts of operational data stores, data warehouses and ETL (extract, transfer and load), this is an undisputable truth. Data must be first cleansed before the inbound processes load the data in the hub. With this principle in mind, a data hub is just another data repository or database used for storage of cleansed data content, oftentimes used to build data warehousing dimensions.

The reality nowadays for data hubs includes a much more active approach to data than just storage of a golden record. The data hub makes the best decisions on entity and relationship resolution by arbitrating the content of data created or modified in the source systems. Expressed differently, a data hub operates as a “service” responsible for the creation and maintenance of master entities and relationships.

The data hub as the enterprise Master Data Service (MDS) applies the power of advanced algorithms and human input to resolve entities and relationships in real time. In addition, data governance policies and data stewardship rules and procedures define and mandate the behavior of MDS, including the release of reference codes and code translation semantics for enterprise use.

The data hub as MDS provides an ideal way for managing data within a SOA environment. Using a hub-and-spoke model, the MDS serves as the integration method to communicate between all systems that produce or consume master data. The MDS is the hub, and all systems communicate directly with it using SOA principles.

Participating systems are autonomous in SOA parlance, meaning that they can stay independent of one another and do not have to know the details of how other systems manage master data. This allows disparate system-specific schemas and internal business rules to be hidden, which greatly reduces tight coupling and the overall brittleness of the ecosystem. It also helps to reduce the overall workload that participating systems must bear to manage master data.

Misconception 2: The golden record must be persisted in the data hub.

The notion of a data hub as a data repository presumes that the golden record must be persisted in the data hub. The notion of the data hub as a service does not make this presumption. Indeed, as soon as the master data service can deliver the golden record to the enterprise, the data hub may or may not persist the golden record. The notion of the data hub as a service leaves open the decision to persist or not to persist. A data hub can persist the golden record or assemble it dynamically instead.

One of the arguments for a persistently stored golden record is that performance for golden record retrieval will suffer if the record is assembled dynamically on request. The reality is that the existing data hub solutions have demonstrated that a dynamic golden record can be assembled with practically no performance impact.

One of the advantages of dynamically assembled records is that the data hub can maintain multiple views of a golden record aligned with line-of-business and functional requirements, data visibility requirements, tolerance to false positives and negatives, and latency requirements. Mature enterprises increasingly require multiple views for the golden record, and the dynamic record assembly works better to support this need.

Another argument oftentimes brought up in favor of a persistently stored golden record comes from the need to support the history of the golden record. Indeed, history support for master data is critical. There exist two major usage patterns for the history of master data.

The first pattern is driven by audit requirements. The enterprise needs to be able to understand the origin, the time and possibly the reason for a change. These audit needs must be supported by the data hub at the attribute level. MDM solutions that maintain the golden record dynamically address this need by supporting the history of changes in the source systems record content.

The second usage pattern for history support results from the need to support database queries on data referring to a certain point in time or certain time range, such as what was the inventory on a certain date, or sales over the second quarter. A classic example of this type of history support is the management of slowly changing dimensions in data warehousing. In order to support this usage pattern, the golden version of the master record must be persisted. It is just a question of location. Many enterprises decide this question in favor of data warehousing dimensions while avoiding the persistently stored golden record in the data hub.

Modern data hubs function as active components of service-oriented architecture and master data services, rather than passive repositories of cleansed data. This consideration should help the enterprise architects and systems integrators build sound master data management solutions.

Larry Dubov is senior director of business management consulting at Initiate Systems, and is co-author of “Master Data Management and Customer Data Integration for a Global Enterprise,” McGraw-Hill, 2007.


Related Search Term(s): databases


Share this link: http://www.sdtimes.com/link/33828
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading



 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media

4/19/2010
New York City
Flagg Management

4/25/2010 to 4/28/2010
Overland Park, Kans.
IIUG