9.3. Data Blending: Behind the Scenes - Theory

Data blending combines data from different data sources to generate a single, unified data set. In this lesson, we'll focus on the left join method, which retains all the records from the left data source and combines them with matching records from the right data source based on a join key or condition.

9_page-0003.jpg

Left Join

Imagine two circles, each representing a data source. The left circle represents the left data source, and the right circle represents the right data source. Data blending using left join works by taking all the records from the left data source and combining them with matching records from the right data source based on the join key or condition.

If there is data in the right data source that doesn't match anything in the left data source, it will be left out of the blended data set.

Example:

Consider the following example: you have two tables, one with dates and clicks from an ad platform (left data source) and another with dates and offline sales from a CRM (right data source).

To blend the data, you'll use the "date" field as the join key. The blended data will retain all the records from the left data source (ad platform) and combine them with the matching records from the right data source (CRM). If a date from the left data source has no matching date in the right data source, the sales value will be unknown, but the clicks data will still be retained in the blended data set.

Key Points

Key Concepts about Data Blending

  1. Join keys and join conditions must match exactly: To successfully blend data, the join keys in both data sources must match exactly. This includes matching formats, capitalization, and the absence of trailing spaces. If the join keys do not match, the blended data will return null values for those records.
  2. All matching rows from the right data source will be returned for every row in the left data source: Data blending works by looking up the join key in both data sources and returning all matching rows. This can cause duplicate data and incorrect totals if there are multiple matches for a single join key.

Example: Data Blending with Multiple Matches

Consider the example where you have a table with colors and the number of searches for each color (left data source) and another table with translations of the colors into Italian (right data source). In the Italian table, there are two translations for "red" and "yellow" but only one for "blue".

When blending the data using the color as the join key, Looker Studio will return both translations for "red" and "yellow", resulting in duplicate rows for these colors. This will cause the total number of searches to be incorrect in the blended data set.