What is a data warehouse?

Trends and buzzwords often come and go in the IT business. But a concept that’s becoming very popular and likely to stick is that of the ‘Data Warehouse’.

Business Intelligence – it’s not an oxymoron
The whole point of a data warehouse is to facilitate a very important process for upper management, the idea of gathering intelligence or information. The more intelligence or relevant information you can get into managements hands the more likely they are to make sound decisions that improve the value and revenue of the company (and hopefully improve your paycheck in the process!)

It can’t just be any information however, the key is for this information to be relevant to the kinds of decisions they’re making. As a simple example, if your company runs an online store they don’t want server logs, you want to be finding things like the average time a user spends on the store, the number of products browsed before buying, or maybe the percentage of people who abandon the checkout process after viewing the total price.

These can all lead to important business questions like ‘How use friendly is our store?’ or ‘Are our shipping prices costing us business?’

The key is that the data is aggregate and analysed. It’s not just raw data, and it’s not specific data. Instead it shows trends that are relevant and interesting to the people making business decisions.

It’s not just a base, it’s a warehouse
So management knows they want a Data Warehouse, no more making decisions in the dark. We’ll actually listen to our users. What’s the deal for technologists?

The setup tends to be that the databases continue to operate as normal, and in addition will now feed their data to the data warehouse. It’s a one way transfer, nothing useful is going to come back straight away that the application will need to operate, and in fact the applications would do just fine without any data warehousing at all.

Information can be said to move into warehousing in three key layers.
Staging – Information from all databases is stored in raw format. This is often useful for developers and application support personnel. It also saves additional potentially CPU intensive but operationally irrelevant queries from being made on the live databases.
Integration – This layer provides a level of abstraction between the Staging and Access layers.
Access – The data has been filtered, analysed and is now ready for further analysis.

Once data is at the access layer further analysis can be done by statistics, business system engineers, or it could already be broken down into easy to understand metrics like ‘sales per day’ and simply need graphing. The application used to display this information is often called a Dashboard, as if the business group are driving an organisation like they’d drive a car.

You say Tomato, I say Tomato
One issue that almost everyone likely ends up with is that the same data is stored in different formats depending on the system. For example gender might be ‘M’ of ‘F’ in one system, but ‘Male’ or ‘Female’ in another. This is definately one of the benefits of data warehousing, that the organisations information can be standardised in readiness for aggregate analysis. Believe it or not there’s also an international standard for storing data, ideally everyone would use it.

Benefits

  • The information is under control of the data warehouses so that, even if the information in the individual systems is deleted, data can still be viewed for as far back as required. This could also be very useful when dealing with compliance, for example in Australia where under the Archives Act information must be kept for a minimum of 7 years.
  • Because they are separate from the operational systems, the crunching of data in data warehouses incurs no loss of penalty to the actual applications.
  • Data Warehouses can record historical information even for tables that were not originally designed to have their history recorded.
  • Data is standardised, streamlining the analytical process.
  • If designed and analysed properly, management and executives have the data to enhance the decision making process.

And looking into the future I see:
– Super Data Warehouses (the sharing of Data Warehouses from multiple organisations).
– Automatic analysis via evolutionary algorithms and neural networks.
– Important information such as prices and specification metrics decided by algorithms on the back of data from data warehouses.

Did you like this article? Check out the free task management tool I’m developing at teamsgo.com. You’ll love the simplicity, and your colleagues will love how much more orgnised you are. http://www.teamsgo.com

Comments are closed.