The big change in Datawarehouse?

What is the new big change in Datawarehouse?

Traditional MI systems rely on data stored on a database that resides on a spinning disk; this data might have come from an ERP but could also come from other sources. The data has to be collected, transformed and stored in the disk using a file system. This means that there is a disconnection between the data creation, storage and report, as the reports are based on pictures of the data taken the last month, the last week or last night on the best cases.

When data is required by the MI system, the database will search through the indexes and retrieve the data from the disk. As the data volumes grow the file systems become bigger and bigger which means that the search and selection on the database takes longer.

Therefore we can see the current MI solutions have two big drawbacks: The lack of synchronisation with the reality and the management of large volumes.

To have several Tb MI is becoming the normality rather than the exception, and frequent data refreshes are the only way to support an ever demanding user community. So are the current MI systems doomed?

Thankfully the answer is no. Thanks to new technologies and new architectures coming to light we will move towards Business Analytics Engines, SAP has named it HANA (High performance Analytical Appliance). The short description of HANA is a data-warehouse appliance geared to process high volumes of operational and transactional data in real-time using in-memory analytics. The key words of HANA are:

  • Appliance: HANA is a bundle of a few technologies on a specific hardware platform. It includes ETL (Sybase Replication Server and BusinessObject Data Services), Database and database-level modelling tools, and reporting interfaces (SQL, MDX, and possibly bundled BusinessObjects BI reporting tools).
  • High volumes: As said above HANA requires extra hardware, so the additional cost will only make sense for high volume solutions where traditional storage methods struggle with the performance
  • Real-time: No more extraction, load and report. Reports are done against current data, not a picture of the past
  • In-memory: This is one of the key aspects of HANA. Data is column-based indexed and stored in memory. Memory size has doubled every 2 years (Moore’s law) and prices have more than halved on each iteration, providing large amounts of memory at an affordable cost.

The key to this change is the move to in-memory computing which is much faster than reading from disk. The move from Hard disk to memory is comparable to the move from tape to hard disk. SAP claims up to 3600x times faster reporting, this is a huge claim but 100s of times faster reporting should be easily achievable.

So the question is not what is the new big change in Data-warehouse, but how is going to affect me?