The future of the operational data warehouse

In the last five years, we’ve seen the cloud data warehouse, exemplified by Snowflake and BigQuery, become the dominant tool for large and small businesses that need to combine and analyze data. The initial use cases are usually classic decision support. What is my revenue? How many customers do I have? How are these metrics changing and why?

But the iron law of databases is data attracts workloads. When you have all of your data in one place, clever people in your team will come up with unexpected uses for it. The cloud data warehouse enables these new use cases with its elasticity. As you discover new things you’d like to do with data, you can add new compute capacity, effectively without limit.

However, these new workloads often don’t look like the classic analytical queries that data warehouses are optimized for. For the last 20 years, commercial data warehouses have been optimized for handling a small number of large queries that scan entire tables and aggregate them into summary statistics. They are well-optimized for questions like:

How many new customers did I add, in each state, in each month, for the last year?

But they are less-well optimized for questions like:

What are all the interactions I have had with one particular customer?

These queries require many data sources to be in one place, but they touch only a small percentage of data from any particular source. They have both analytical and operational characteristics, and they are typical of the new workloads we see as cloud data warehouses have become ubiquitous.

The major data warehouse vendors are making changes to better support these types of queries. Snowflake recently released the search optimization service, which allows you to have indexes in your data warehouse. Indexes are ubiquitous in operational databases, but in the past most data warehouses did not support them, because they were thought to be irrelevant to analytical workloads. Meanwhile, BigQuery has released BI Engine, which allows you to store a subset of your database in-memory for faster access.

Copyright © 2021 IDG Communications, Inc.

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *