How to choose a cloud data warehouse

Enterprise data warehouses, or EDWs, are unified databases for all historical data across an enterprise, optimized for analytics. These days, organizations implementing data warehouses often consider creating the data warehouse in the cloud rather than on premises. Many also consider using data lakes that support queries instead of traditional data warehouses. A third question is whether you want to combine historical data with streaming live data.

A data warehouse is an analytic, usually relational, database created from two or more data sources, typically to store historical data, which may have a scale of petabytes. Data warehouses often have significant compute and memory resources for running complicated queries and generating reports, and are often the data sources for business intelligence (BI) systems and machine learning.

The write throughput requirements of transactional operational databases limit the number and kind of indexes you can create (more indexes mean more writes and updates per record added, and more possible contention). This in turn slows down analytic queries against the operation database. Once you have exported your data into a data warehouse, you can index everything you care about in the data warehouse for good analytic query performance, without affecting the write performance of the separate OLTP (online transaction processing) database.

Data marts contain data oriented toward a specific business line. Data marts may be dependent on the data warehouse, independent of the data warehouse (i.e., drawn from an operational database or external source), or a hybrid of the two.

Data lakes, which store files of data in its native format, are essentially “schema on read,” meaning that any application that reads data from the lake will need to impose its own types and relationships on the data. Traditional data warehouses, on the other hand, are “schema on write,” meaning that data types, indexes, and relationships are imposed on the data as it is stored in the data warehouse.

Modern data warehouses can often handle structured data and semi-structured data and query them simultaneously. In addition, modern data warehouses can often query historical data and streamed recent data simultaneously.

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *