Understanding Data
Chapter 3 describes the sources and nature of data in some detail: how data is generated and what affects data quality. Data may originate from transaction systems, sensors, social media sites, or crowdsourcing. It comes in different forms, such as structured and unstructured data; it may be updated in real time or periodically and may have variable accuracy. The nature of the source, whether it be a person or device, is relevant to understand the type of processing and analysis that will be conducted. Also, the method of collection, whether it be through survey, sensor, or third-party aggregator, affects the completeness and representativeness of the data, which in turn impacts the reliability of any insights drawn from it. Chapter 4 looks into the destination of data-data pipelines, data warehousing, and modern data infrastructure. Data pipelines are the frameworks that enable raw data to flow from its source to its destination, often requiring multistage transformations, cleaning, a...