Wednesday, 17 November 2010

Does Microsoft SQL Server Denali put us in denial about the ODS?

Microsoft have provided a first taste of the latest version of SQL Server, codenamed "Denali". Denali focuses on mission-critical systems, scaleability, multi-site clustering and high availability, with facilities for data integration and data management tools to reliably deliver robust data. Denali also offers advanced data visualisation, supported by column-based accelerations in query performance. Given these technological advancements in database management in Microsoft (as well as other technologies), what's the future of the ODS? 

Historically, mainframes could not offer quick reporting, and the data warehouse was subject-oriented. Thus, another reporting provision mechanism was required, that was situated between the source transactional systems and the final data warehouse. Data warehouses caused issues for database management, since they were required to balance small queries with large queries involving millions of records. This reporting needed to be abstracted away from the silo-based source systems, so integration was required; thus, the ODS was born. 

There are different methodologies surrounding the architecture of an ODS. Inmon defines the ODS as an integrated source of data, which offers an 'as-is' status of the business, holding only the most recent transactions. 'Recent' does not mean data that is only a few days old; the ODS may be feeding into a summarised data warehouse, and may need a full month's data in order to come up with the final summary monthly data item required for the summary data warehouse. On the other hand, Kimball defines the ODS as the structure that stores detailed transaction data, which is not present in the data warehouse. However, as technological advances are made in database engines such as SQL Server, the transaction detail can be stored in the data warehouse, and can be interrogated by business users. This means that the place of the ODS can look uncertain.

In my experience, I've found that operational source systems are becoming more sensitive to the need for reporting. One example here is the Cisco Contact Centre call management database, which is based on SQL Server, which feeds into a dedicated reporting system. If Operational data sources are becoming more accommodating to serving reporting architectures, and technological advancements have been made in database management, what is the future for the ODS? Particularly, with the advent of more streamlined ETL applications such as SSIS offering lower-latency data delivery to users.

So where does the ODS fit in? Before the detailed transactional data can be placed into the data warehouse, it needs to be rationalised and integrated. Whilst stored in the ODS, the current operational data can be cleansed, integrated from its source systems, and steps can be taken towards redundancy resolution. Further, data in the ODS can be investigated for compliance with business rules.

However, an ODS should not, as Jill Dyche calls it, 'compost' data. In other words, the data sources have been pretty much copied into a data store with no concerted effort at integrating the data. Thus, you could have ten distinct 'customer' tables, from different sources, and have these ten tables copied directly into the data store with minimal - if any - integration taking place. This type of scenario is more akin to a staging area, than a dedicated ODS. However, here, the ODS starts to look like a data warehouse, since the ODS also has complex multiple data sources and dedicated to a number of subject areas. 

Instead, the ODS should be viewed as an interim logical step towards the data warehouse. At first glance, it may seem that the only difference between the ODS and the data warehouse is that the ODS does not store history, whereas the data warehouse does store history. This is not necessarily the case. The key to the ODS is current operational data. The ODS could potentially provide rapid access to current operational data, whereas the enterprise data warehouse may not get refreshed until overnight. How much 'history' is stored depends on user requirements, although I suspect that most business users would say 'we need the data right now, and we want to keep it forever' if they were consulted! This means that data could be interrogated for analysis and reporting purposes whilst the business operations are ongoing, and before the data is transferred to the data warehouse for longer-term reporting, data mining, or perhaps archiving.

The ODS can potentially be used to farm data back to source systems. However, it is optimised by having a Master Data Management system for as a robust library of reference data. Although Master Data Management has been around for some time, it has become more relevant with its specific inclusion with SQL Server 2008 R2. MDM is concerned with data quality and consistency, and the reconciliation of any data issues. Specific MDM implementations can depend on the environment. For example, the ODS and data warehouse can be supported by a Customer Data Integration module, or CDI, which is a 'golden' source of the enterprise customer data. The CDI can be viewed as a library or 'hub' for faciliating the consolidation and provision of good quality customer data across the enterprise, coming from various operational sources. An ODS and a CDI are complementary to one another; the CDI can have dedicated algorithms to perfect the customer data, the result of which can be served to the ODS. The CDI is not a facility for serving up reporting; this is the job of the ODS and the data warehouse. Further, the ODS should be viewed as a facility for integrating sources of different types of current operational data across the enterprise, but the CDI could be viewed as a nimble younger sibling of MDM, dedicated to customer information only.

With MDM, and potentially CDI in place, the ODS can offer data up more quickly since it has a good basis to offer up data, since the integration can happen more quickly. If the CDI and MDM are properly architected, then their offerings of good quality integrated data can result in low latency data served to the ODS, thus accelerating the process of provisioning data to users. 

To summarise, what's the future of the ODS? With the advent of Denali, it is hoped that an optimised user experience is available to all, from DBA right through to operational users supported by MDM and data warehouse, with the ODS in a clearly-defined role. The key to an ODS is in integration; a well-integrated, current operational data store, where data is cleansed, governed and made compliant before it enters a data warehouse. If it isn't integrated, its shelf-life is most likely minimal since the business users won't be served properly, and may start to export data into Excel for their own slice and dicing. The ODS needs to be carefully considered, along with user requirements; if not, it is in danger of becoming a 'political football', and the business questions and needs ultimately not being answered.
Post a Comment