In this lesson, we'll walk through data modeling in BigQuery using Google Search Console and Google Analytics data as examples.
First, let's prepare our data. We have two datasets: Google Search Console and Google Analytics. We'll start by removing rows with zero clicks from the Google Search Console dataset, as they don't provide any valuable insights for our analysis.
SELECT *
FROM search_console_data
WHERE clicks > 0
Next, we need to filter the Google Analytics dataset to only include traffic from Google Organic:
SELECT *
FROM google_analytics_data
WHERE source_medium = 'google/organic'
In order to join these two tables together, we need to standardize the format of the landing pages. For example, we need to remove https://
and ensure that both formats have a trailing slash at the end.
For Google Search Console data:
SELECT
search_query AS search_query,
REPLACE(landing_page, 'https://', '') AS landing_page,
impressions AS impressions,
clicks AS clicks
FROM search_console_data
WHERE clicks > 0
For Google Analytics data:
SELECT
REGEXP_EXTRACT(landing_page_path, r'^[^?]*') AS landing_page,
source_medium AS source_medium,
sessions AS sessions,
goal_completions_all_goals AS goal_completions,
total_goal_value_all_goals AS goal_value
FROM google_analytics_data_cleaned_up
WHERE source_medium = 'google/organic'
Now that we've standardized the landing pages in both datasets, we can proceed with joining them.
We want to join these tables using a left join, with the Google Search Console dataset as the left table and the Google Analytics dataset as the right. We'll use the landing_page
field as our join key.
SELECT
search_query,
landing_page,
impressions,
clicks,
sessions,
goal_completions,
goal_value
FROM search_console_data_cleaned_up
LEFT JOIN google_analytics_data_cleaned_up
ON search_console_data_cleaned_up.landing_page = google_analytics_data_cleaned_up.landing_page