Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today’s modern data stack.
You’ll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions.
Table of contents
1. Introduction to Data Pipelines
2. A Modern Data Infrastructure
3. Common Data Pipeline Patterns
4. Data Ingestion: Extracting Data
5. Data Ingestion: Loading Data
6. Transforming Data
7. Orchestrating Pipelines
8. Data Validation in Pipelines
9. Best Practices for Maintaining Pipelines
10. Measuring and Monitoring Pipeline Performance