How to Get Bounce Rate per Landing Page in Bigquery for GA4?

This is almost a classic request from clients, wanting to know the bounce rate per landing page and if you can add the traffic source then that’s even better. While Google Analytics 4 does offer this ability in the GA4 interface, when it comes to pulling bounce rate per landing page from bigquery it can get a bit tricky and not so straight forward. 

You can use query generators to pull this one off (such as GA4BQ) or look for the answer in stackoverflow, I found that those solutions didn’t match to what I’m seeing on my clients accounts, so I’ve set off to explore and write it my way – hope this will come in handy for you.

Step 1 – Understanding Bounce Rate in Google Analytics 4.

In GA4 bounce rate is defined as “the percentage of sessions that were not engaged.”, which means we need to find all engaged sessions, subtract those from total and divide the outcome with the total number of sessions.

But hold on – if this is a calculated metric, in which we need to divide 2 figures, while we can do in BigQuery in most cases I don’t think we should. If the output data will be used anywhere else you’ll find yourself with a not so small problem – how to get bounce rate per landing page dynamically. If we’ll just formulate the bounce rate it will produce a numeric value per the number of dimensions we’ll put in our query – and there is no why to take those numbers and turn it into an actual bounce rate that we can use, this is why the calculation itself should be done in our final destination – be it a Looker Studio dashboard or any other output.
Understanding this we now need to pull a “raw data” table for our bounce rate calculation. 

SELECT
event_date,
(select value.string_value FROM unnest(event_params) WHERE event_name = 'session_start' and key = 'page_location') AS page_location,
  if((SELECT value.int_value FROM unnest(event_params) WHERE key="session_engaged")=1, CONCAT(user_pseudo_id, "-", (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_id")), null) AS engaged_sessions,
  user_pseudo_id AS users,
CONCAT(user_pseudo_id, "-", (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_id")) AS sessions,
    FROM `your-project-and-table-*`// modify to your dataset
where _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) // modify to your desired timeframe
and event_name = 'session_start'

Here I’m pulling several keys that I will use later on, such as the session id, user pseudo id, page location (note this is not yet a landing page) and date. The outcome of this query will look like this:

Step 2 – Pulling Landing Page (per Source).

We want to get a list of landing pages with the users who landed in them with their respective sources, to be used as our next building block, so we may, at a later step, join them all together. Pulling landing pages is much like pulling page_loaction, with a small edition where event_param.key = 'entrances‘ has a value of  ‘1’, this is the indication that a specific page was also the landing page. My query pulls landing page per source, but you can modify it and add any additional dimensions you wish to use such as hostname or device.   

SELECT event_date,CONCAT (
  CASE
WHEN (select value.string_value FROM UNNEST (event_params) where key='gclid') IS NOT NULL THEN 'google'
ELSE traffic_source.source
END, ' / ',
CASE
WHEN (select value.string_value FROM UNNEST (event_params) WHERE key='gclid') IS NOT NULL THEN 'cpc'
else traffic_source.medium
end) AS source_medium,
CASE
    WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location')
    end AS landing_page_location,
    REGEXP_EXTRACT(case
    WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location') end , r'(?:https?://[^/]+)?([^?#]+)') AS landing_page_path,
user_pseudo_id AS users,
CONCAT(user_pseudo_id, "-", (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_id")) AS sessions,
    countif(event_name='page_view') AS pageviews
    FROM `your-project-and-table-*`// modify to your dataset
where _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) // modify to your desired timeframe
and event_name = 'page_view'
group by all
having landing_page IS NOT NULL

Note that this table is based on page_view and our bounce table is based on session_start. This approach is different from other queries you may find out there and the premise here is that sessions based metrics such as bounce rate should be based on session_start event. While you can in principle base your bounce on the page_view event (potentially creating a single query), In several clients of myn I’ve noticed this isn’t accurate and only when splitting the query I got a more accurate result. 

The outcome of this query will look like this:

Notice I’ve unified source & medium into a single column (and the CASE deals with the issue of organic source with gclid present) and I also have both the full page_location and the page_path. Page location is more ideally suited to be used as a join key and page path is more for display purpose.

Step 3 – Joining Bounce Rate with Landing Page (per Source).

Now that we have all our building blocks together we just need to join them in the right way, which looks something like this:

WITH landing AS (SELECT CONCAT (
  CASE
WHEN (select value.string_value FROM UNNEST (event_params) where key='gclid') IS NOT NULL THEN 'google'
ELSE traffic_source.source
END, ' / ',
CASE
WHEN (select value.string_value FROM UNNEST (event_params) WHERE key='gclid') IS NOT NULL THEN 'cpc'
else traffic_source.medium
end) AS source_medium,
CASE
    WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location')
    end AS landing_page_location,
    REGEXP_EXTRACT(case
    WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location') end , r'(?:https?://[^/]+)?([^?#]+)') AS landing_page_path,
user_pseudo_id AS users,
CONCAT(user_pseudo_id, "-", (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_id")) AS sessions,
    countif(event_name='page_view') AS pageviews
    FROM `your-project-and-table-*`// modify to your dataset
where _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) // modify to your desired timeframe
and event_name = 'page_view'
group by all
having landing_page IS NOT NULL),
bounce AS (SELECT
Event_date,
(select value.string_value FROM unnest(event_params) WHERE event_name = 'session_start' and key = 'page_location') AS page_location,
  if((SELECT value.int_value FROM unnest(event_params) WHERE key="session_engaged")=1, CONCAT(user_pseudo_id, "-", (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_id")), null) AS engaged_sessions,
  user_pseudo_id AS users,
CONCAT(user_pseudo_id, "-", (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_id")) AS sessions,
    FROM `your-project-and-table-*`// modify to your dataset
where _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) // modify to your desired timeframe
and event_name = 'session_start')
SELECT landing.date AS date,
source_medium,
landing_page_location
landing_page_path
landing.users,
pageviews,
bounce.sessions,
engaged_sessions
 FROM Landing 
LEFT JOIN bounce ON landing.date=bounce.date AND landing.landing_page_location=bounce.page_location AND landing.users=bounce.users AND landing.sessions=bounce.sessions

The outcome should look like this:

We are still missing our bounce rate, but we do have sessions and engaged_sessions, which is all we need since the formula to calculate bounce rate is: 1-(count_DISTINCT(engaged_sessions)  / COUNT_DISTINCT(sessions))

While you can use this inside BigQuery the output will likely be too fragmented across pages and this is why I prefer to calculate bounce rate “outside”, in this case Looker Studio. So After saving the results of my query to a table and connecting that table as a source in Looker Studio you simply need to add the above formula as a new calculated field and that’s it – you can now build a report with bounce rate per landing page from bigquery, with the final result looking like this

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts