{"id":5829,"date":"2024-05-30T13:45:44","date_gmt":"2024-05-30T10:45:44","guid":{"rendered":"https:\/\/trackingchef.com\/?p=5829"},"modified":"2024-06-04T21:27:41","modified_gmt":"2024-06-04T18:27:41","slug":"how-to-get-bounce-rate-per-landing-page-in-bigquery-for-ga4","status":"publish","type":"post","link":"https:\/\/trackingchef.com\/google-analytics\/how-to-get-bounce-rate-per-landing-page-in-bigquery-for-ga4\/","title":{"rendered":"How to Get Bounce Rate per Landing Page in Bigquery for GA4?"},"content":{"rendered":"\n

<\/h2>\n\n\n\n

This is almost a classic request from clients, wanting to know the bounce rate per landing page and if you can add the traffic source then that\u2019s even better. While Google Analytics 4 does offer this ability in the GA4 interface, when it comes to pulling bounce rate per landing page from bigquery it can get a bit tricky and not so straight forward. <\/p>\n\n\n\n

You can use query generators to pull this one off (such as GA4BQ<\/a>) or look for the answer in stackoverflow, I found that those solutions didn\u2019t match to what I\u2019m seeing on my clients accounts, so I\u2019ve set off to explore and write it my way – hope this will come in handy for you.<\/p>\n\n\n\n

Step 1 – Understanding Bounce Rate in Google Analytics 4.<\/h2>\n\n\n\n

In GA4 bounce rate is defined as \u201cthe percentage of sessions that were not engaged.\u201d, which means we need to find all engaged sessions, subtract those from total and divide the outcome with the total number of sessions.<\/p>\n\n\n\n

But hold on – if this is a calculated metric, in which we need to divide 2 figures, while we can do in BigQuery in most cases I don\u2019t think we should. If the output data will be used anywhere else you\u2019ll find yourself with a not so small problem – how to get bounce rate per landing page dynamically<\/em>. If we\u2019ll just formulate the bounce rate it will produce a numeric value per the number of dimensions we\u2019ll put in our query – and there is no why to take those numbers and turn it into an actual bounce rate that we can use, this is why the calculation itself should be done in our final destination – be it a Looker Studio dashboard or any other output.
Understanding this we now need to pull a \u201craw data\u201d table for our bounce rate calculation. <\/p>\n\n\n\n

SELECT\nevent_date,\n(select value.string_value FROM unnest(event_params) WHERE event_name = 'session_start' and key = 'page_location') AS page_location,\n  if((SELECT value.int_value FROM unnest(event_params) WHERE key=\"session_engaged\")=1, CONCAT(user_pseudo_id, \"-\", (SELECT value.int_value FROM unnest(event_params) WHERE key=\"ga_session_id\")), null) AS engaged_sessions,\n  user_pseudo_id AS users,\nCONCAT(user_pseudo_id, \"-\", (SELECT value.int_value FROM unnest(event_params) WHERE key=\"ga_session_id\")) AS sessions,\n    FROM `your-project-and-table-*`\/\/ modify to your dataset\nwhere _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))\n  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) \/\/ modify to your desired timeframe\nand event_name = 'session_start'<\/code><\/code><\/code><\/pre>\n\n\n\n

Here I\u2019m pulling several keys that I will use later on, such as the session id, user pseudo id, page location (note this is not yet a landing page) and date. The outcome of this query will look like this:<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Step 2 – Pulling Landing Page (per Source).<\/h2>\n\n\n\n

We want to get a list of landing pages with the users who landed in them with their respective sources, to be used as our next building block, so we may, at a later step, join them all together. Pulling landing pages is much like pulling page_loaction<\/code>, with a small edition where event_param.key = 'entrances<\/code>‘ has a value of  \u20181\u2019, this is the indication that a specific page was also the landing page. My query pulls landing page per source, but you can modify it and add any additional dimensions you wish to use such as hostname <\/code>or device<\/code>.   <\/p>\n\n\n\n

SELECT event_date,CONCAT (\n  CASE\nWHEN (select value.string_value FROM UNNEST (event_params) where key='gclid') IS NOT NULL THEN 'google'\nELSE traffic_source.source\nEND, ' \/ ',\nCASE\nWHEN (select value.string_value FROM UNNEST (event_params) WHERE key='gclid') IS NOT NULL THEN 'cpc'\nelse traffic_source.medium\nend) AS source_medium,\nCASE\n    WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location')\n    end AS landing_page_location,\n    REGEXP_EXTRACT(case\n    WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location') end , r'(?:https?:\/\/[^\/]+)?([^?#]+)') AS landing_page_path,\nuser_pseudo_id AS users,\nCONCAT(user_pseudo_id, \"-\", (SELECT value.int_value FROM unnest(event_params) WHERE key=\"ga_session_id\")) AS sessions,\n    countif(event_name='page_view') AS pageviews\n    FROM `your-project-and-table-*`\/\/ modify to your dataset\nwhere _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))\n  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) \/\/ modify to your desired timeframe\nand event_name = 'page_view'\ngroup by all\nhaving landing_page IS NOT NULL<\/code><\/pre>\n\n\n\n

Note that this table is based on page_view<\/code> and our bounce table is based on session_start<\/code>. This approach is different from other queries you may find out there and the premise here is that sessions based metrics such as bounce rate should be based on session_start<\/code> event. While you can in principle base your bounce on the page_view<\/code> event (potentially creating a single query), In several clients of myn I\u2019ve noticed this isn\u2019t accurate and only when splitting the query I got a more accurate result. <\/p>\n\n\n\n

The outcome of this query will look like this:<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Notice I\u2019ve unified source & medium into a single column (and the CASE deals with the issue of organic source with gclid present) and I also have both the full page_location<\/code> and the page_path<\/code>. Page location is more ideally suited to be used as a join key and page path is more for display purpose.<\/p>\n\n\n\n

Step 3 – Joining Bounce Rate with Landing Page (per Source).<\/h2>\n\n\n\n

Now that we have all our building blocks together we just need to join them in the right way, which looks something like this:<\/p>\n\n\n\n

WITH landing AS (SELECT\u00a0CONCAT (\n\u00a0 CASE\nWHEN (select value.string_value FROM UNNEST (event_params) where key='gclid') IS NOT NULL THEN 'google'\nELSE traffic_source.source\nEND, ' \/ ',\nCASE\nWHEN (select value.string_value FROM UNNEST (event_params) WHERE key='gclid') IS NOT NULL THEN 'cpc'\nelse traffic_source.medium\nend) AS source_medium,\nCASE\n\u00a0 \u00a0 WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location')\n\u00a0 \u00a0 end AS landing_page_location,\n\u00a0 \u00a0 REGEXP_EXTRACT(case\n\u00a0 \u00a0 WHEN (select value.int_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'entrances') = 1 THEN (select value.string_value from unnest(event_params) WHERE event_name = 'page_view' and key = 'page_location') end , r'(?:https?:\/\/[^\/]+)?([^?#]+)') AS landing_page_path,\nuser_pseudo_id AS users,\nCONCAT(user_pseudo_id, \"-\", (SELECT value.int_value FROM unnest(event_params) WHERE key=\"ga_session_id\")) AS sessions,\n\u00a0 \u00a0 countif(event_name='page_view') AS pageviews\n\u00a0 \u00a0 FROM `your-project-and-table-*`\/\/ modify to your dataset\nwhere _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))\n\u00a0 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) \/\/ modify to your desired timeframe\nand event_name = 'page_view'\ngroup by all\nhaving landing_page IS NOT NULL),\nbounce AS (SELECT\nEvent_date,\n(select value.string_value FROM unnest(event_params) WHERE event_name = 'session_start' and key = 'page_location') AS page_location,\n\u00a0 if((SELECT value.int_value FROM unnest(event_params) WHERE key=\"session_engaged\")=1, CONCAT(user_pseudo_id, \"-\", (SELECT value.int_value FROM unnest(event_params) WHERE key=\"ga_session_id\")), null) AS engaged_sessions,\n\u00a0 user_pseudo_id AS users,\nCONCAT(user_pseudo_id, \"-\", (SELECT value.int_value FROM unnest(event_params) WHERE key=\"ga_session_id\")) AS sessions,\n\u00a0 \u00a0 FROM `your-project-and-table-*`\/\/ modify to your dataset\nwhere _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))\n\u00a0 AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) \/\/ modify to your desired timeframe\nand event_name = 'session_start')\nSELECT\u00a0landing.date AS date,\nsource_medium,\nlanding_page_location\nlanding_page_path\nlanding.users,\npageviews,\nbounce.sessions,\nengaged_sessions\n\u00a0FROM Landing \nLEFT JOIN bounce ON landing.date=bounce.date AND landing.landing_page_location=bounce.page_location AND landing.users=bounce.users AND landing.sessions=bounce.sessions<\/code><\/pre>\n\n\n\n

The outcome should look like this:<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

We are still missing our bounce rate, but we do have sessions and engaged_sessions, which is all we need since the formula to calculate bounce rate is: 1-(count_DISTINCT(engaged_sessions)  \/ COUNT_DISTINCT(sessions))<\/code><\/p>\n\n\n\n

While you can use this inside BigQuery the output will likely be too fragmented across pages and this is why I prefer to calculate bounce rate \u201coutside\u201d, in this case Looker Studio. So After saving the results of my query to a table and connecting that table as a source in Looker Studio you simply need to add the above formula as a new calculated field and that\u2019s it – you can now build a report with bounce rate per landing page from bigquery, with the final result looking like this<\/p>\n\n\n\n

\"\"<\/figure>\n","protected":false},"excerpt":{"rendered":"

This is almost a classic request from clients, wanting to know the bounce rate per landing page and if you can add the traffic source then that\u2019s even better. While Google Analytics 4 does offer this ability in the GA4 interface, when it comes to pulling bounce rate per landing page from bigquery it can […]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5829","post","type-post","status-publish","format-standard","hentry","category-google-analytics"],"_links":{"self":[{"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/posts\/5829","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/comments?post=5829"}],"version-history":[{"count":8,"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/posts\/5829\/revisions"}],"predecessor-version":[{"id":5843,"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/posts\/5829\/revisions\/5843"}],"wp:attachment":[{"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/media?parent=5829"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/categories?post=5829"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trackingchef.com\/wp-json\/wp\/v2\/tags?post=5829"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}