Amazon Web Services
Use AWS infrastructure with Teradata Vantage
A focused repository with a unique point of view.
The data mart is a subject-oriented slice of the data warehouse logical model, serving a narrow group of users. Many only need a subset of data from the full tables in the data warehouse. For example, a mart may only have sales transactions, products, and inventory records. Most only have 5-20 tables instead of 4,000.
The number of tables in a data mart has no relevance to the size of the database. The main tables–called fact tables–can be 100s of terabytes of call detail records for a telecommunications company, for instance. The data mart itself can be huge, but it is a narrow selection of all the data available in the data warehouse.
Data marts solutions are often denormalized, capturing only summaries of data by sorting it and aggregating a result table, usually throwing away detail data. Some are completely reloaded weekly or monthly; it is relatively easy to delete all the data and refresh it so that reports only look at the last 30 days of transactions.
Star schema
Data marts and the star schema are inextricably linked. Imagine rows and columns of data in five spreadsheets. Four of the spreadsheets are connected via key fields that match the largest sheet called the fact table. Imagine the fact table is 50 million records, which really doesn’t fit in a spreadsheet, so they are stored in data mart tables. Most have 5-10 tables in this star schema design, and the small tables on the star arms are called dimension tables.
Snowflake schema
Dimensions are small tables with vital information. The fact table is where the bulk of the data is stored, maybe billions of records, and can be joined to the customer table to get actual customer name and address fields. A variation, the snowflake schema, has more than multiple fact tables linked together by key fields. Each fact table has only four or five dimension tables; a diagram of the set of tables and relationships resembles a snowflake–but is still a data mart.
The distinction between data marts and data warehouses is about subject areas and integration, separated by schema complexity, not database size. That’s why the questions that can be asked are 100 times more sophisticated with a data warehouse than with a data mart because all the data is in the warehouse.
There are many “puzzle pieces” in the data warehouse, all the integrated tables grouped into subject areas. The data warehouse doesn’t have to be huge; it can be only five terabytes of storage. Or, it could have hundreds of terabytes of records. An alternative is storing three huge tables in a data mart.
Loading data
There’s no shortage of tools for data integration, or databases to hold the data, or even tools for analyzing the data. Moving small amounts of data to business intelligence (BI) tools for reports or dashboards isn’t labor intensive. The real cost is in moving and transforming data.
Moving data
The integration step is intense in terms of labor and compute processing, but the alternative is giving business users broken, incomplete, or inaccurate data— and that sends users back to spreadsheets. That’s why data warehouse data is valuable, and why data mart proliferation is risky.
Teradata Vantage is the platform for pervasive data intelligence. Its cloud-friendly architecture is designed to tap into 100% of a company’s data no matter where it exists. The Teradata services team is well-versed in helping companies get the answers they need from the data they have.
Amazon Web Services
Use AWS infrastructure with Teradata Vantage
Microsoft Azure
Combine Azure resources with Teradata Vantage
Google Cloud
Leverage Google Cloud with Teradata Vantage