The process involves collecting and analyzing large sets of data from varied data sources: databases, supply chains, personnel records, manufacturing data, sales and marketing campaigns, and more. The data itself might be stored in internal data warehouses, private clouds or public clouds, and the engineering involved in extracting and processing the data (ETL) has given rise to a number of technologies, both proprietary and open source.As with the previous use cases outlined here, the ELK Stack comes in handy for pulling data from these varied data sources into one centralized location for analysis. For example, we might pull web server access logs to learn how our users are accessing our website, We might tap into our CRM system to learn more about our leads and users, or we might check out the data our marketing automation tool provides.
The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:
By the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.
Ralph Kimball invented a data warehousing technique called \"dimensional modeling\" and popularized it in his first Wiley book, The Data Warehouse Toolkit. Since this book was first published in 1996, dimensional modeling has become the most widely accepted technique for data warehouse design. Over the past 5 years, Kimball has improved on his earlier techniques and created many new ones. In this second edition, he provides a comprehensive collection of all of these techniques, from basic to advanced.
This chapter explores the genesis of data growth and explains the need for a new paradigm in data architecture. This chapter starts by examining the predominant paradigm, the enterprise data warehouse, popular in the 1990s and 2000s. It explores the challenges associated with this paradigm and then covers the drivers that caused an explosion in data. It further examines the rise of a new paradigm, the data lake, and its challenges. Furthermore, this chapter ends by advocating the need for a new paradigm, the data lakehouse. It clarifies the key benefits delivered by a well-architected data lakehouse.
As shown in Figure 1.1, the pattern entailed source systems composed of databases or flat-file structures. The data sources are predominantly structured, that is, rows and columns. A process called Extract-Transform-Load (ETL) first extracts the data from the source systems. Then, the process transforms the data into a shape and form that is conducive for analysis. Once the data is transformed, it is loaded into an EDW. From there, the subsets of data are then populated to downstream data marts. Data marts can be conceived of as mini data warehouses that cater to the business requirements of a specific department.
According to the International Data Corporation (IDC), by 2025, the total data volumes generated will reach around 163 ZB (zettabytes), that is, a trillion gigabytes. In 2010, that number was approximately 0.5 ZB. This exponential growth of data is attributed to a vast improvement in internet technologies that have fueled the growth of many industries. The telecommunications industry was the major industry that was transformed. This, in turn, transformed many other industries. Data became ubiquitous and every business craved more data bandwidth. Social media platforms started to be used as well. The likes of Facebook, Twitter, and Instagram flooded the internet space with more data. Streaming services and e-commerce also generated tons of data. This generated data was used to forge and influence consumer behaviors. Last, but not least, the technological leaps in the Internet of Things (IoT) space generated loads of data. 153554b96e