How to Stitch GA4 BigQuery Sessions Manually

Intermediate

How do I reconstruct GA4 sessions in BigQuery?

GA4 doesn't store sessions as records in BigQuery exports, only individual events with session identifiers. To reconstruct sessions: join on `user_pseudo_id` + `(SELECT value.int_value FROM UNNEST(event_params) WHERE key='ga_session_id')` as the unique session key, then aggregate with MIN(event_timestamp) for session start, MAX(event_timestamp) for session end, COUNTIF(event_name='page_view') for pageviews, and SUM(engagement_time_msec) for engagement time.

The session start traffic source comes from traffic_source.source/medium/name at the user's first event in that session.

The full SQL pattern is below.

Why GA4 doesn't store sessions

In Universal Analytics, sessions were a first-class entity, every session had its own row with start/end times and aggregate metrics. GA4 fundamentally changed this: everything is an event. Sessions exist only as a tag (ga_session_id) attached to events.

The implication: if you want session-level metrics in BigQuery, you build them yourself. The standard reports in the GA4 UI do this aggregation behind the scenes; BigQuery exports give you raw events and expect you to do the work.

The benefit: full control. You can define sessions however your business actually thinks about them, not just GA4's default 30-minute timeout.

The session reconstruction pattern

Here's the canonical SQL pattern for reconstructing sessions from raw GA4 BigQuery events:

Run this with the appropriate date range in _TABLE_SUFFIX and you have one row per session with everything you need.

Why user_pseudo_id + ga_session_id

Both fields are needed because:

  • user_pseudo_id alone doesn't isolate sessions, a single user has many sessions
  • ga_session_id alone isn't unique across users, different users can have the same session_id (the integer is just an incrementing counter per user, not globally unique)

The combination is the unique session key. Some implementations also include ga_session_number (the sequential number of the session for that user) for additional context, but it's not needed for uniqueness.

For cross-device or logged-in user analysis, also include user_id if your implementation sets it:

This unifies the user across devices when login is captured.

Engagement time, the GA4 UI match

The engagement time calculation needs care to match GA4's UI numbers:

engagement_time_msec is sent as an event parameter on most events. Sum across all events in the session, divide by 1000 for seconds. This matches GA4's "Average engagement time per session" calculation in standard reports.

A common gotcha: some implementations don't capture engagement_time_msec on every event. Verify by checking event_params for missing values, if more than 5% of events lack engagement_time_msec, your engagement metrics will under-report vs the GA4 UI.

The 'engaged session' definition

GA4 marks a session as "engaged" if any of these are true:

  • Engagement time >= 10 seconds, OR
  • 2+ pageviews/screen_views, OR
  • 1+ conversion event

Want to see which hidden implementation gaps are affecting your GA4 data quality?

The CASE statement in the SQL above replicates this exactly. It's how you reproduce GA4's "Engaged sessions" metric in your custom reporting.

Session start traffic source

Traffic source data lives on multiple events but the session-start source is what you typically want. Two approaches:

Approach 1: Use traffic_source on session_start event

Approach 2: Use ANY_VALUE() in aggregation The pattern above uses ANY_VALUE(source) which works because traffic_source is consistent within a session, every event in a session has the same traffic_source values. ANY_VALUE() is faster than the explicit join to session_start.

For most reporting, ANY_VALUE() is sufficient and more efficient.

Custom session definitions

GA4's default session timeout is 30 minutes of inactivity. To analyse with a different timeout (e.g., 15 minutes for short-cycle e-commerce, 4 hours for long-form content), you build the sessionisation in SQL:

This sessionises with a 15-minute timeout instead of the default 30. Adjust the 900 threshold (in seconds) for any custom timeout.

Common mistakes

1. Forgetting NULL session_ids. Some events (especially when the GA4 SDK initialises) don't have ga_session_id. Filter IS NOT NULL early or you'll get phantom sessions with NULL keys.

2. Treating ga_session_id as globally unique. It's not. Always join with user_pseudo_id.

3. Using traffic_source on every event. Some events don't have traffic_source populated. Use ANY_VALUE() to handle gaps gracefully.

4. Calculating session duration as MAX-MIN. This works for sessions with multiple events but gives 0 for single-event sessions. Either accept that or set a default minimum (e.g., 1 second for single-event sessions).

5. Not partitioning the output. Session reconstruction tables can grow large. Partition by event_date (or session_start_date) and cluster by source/medium for efficient downstream queries.

How to use this in a GA4 audit

Use this topic to support a BigQuery export audit and reporting-governance review. This article is about validating exported analytics data without assuming raw tables reproduce every GA4 reporting behavior exactly. Where possible, separate API-verified findings, browser-verified findings, and findings that depend on access to linked platforms.

What to verify

  • Use stable daily exports before drawing conclusions from intraday or best-effort tables.
  • Compare raw event and transaction slices before comparing attribution or modeled reporting outputs.
  • Confirm which fields are available only after processing delays or only in specific export modes.
  • State clearly when warehouse logic is an approximation of the GA4 interface rather than a perfect recreation.

Known limitations

  • BigQuery export, the Data API, and the GA4 interface can differ for legitimate processing reasons.
  • Attribution, identity, thresholding, and modeled features may not map one-to-one to raw export tables.

Before acting on the result

Use the visible evidence behind the finding before changing reporting, bidding, privacy controls, or executive dashboards. GA4 Audits findings should be reviewed by a qualified analyst before major business decisions are made.

Audit findings should be reviewed by a qualified analyst before they are used for major reporting, media, or implementation decisions. Review your findings

GA4 Audits Team

GA4 Audits Team

Analytics Engineering

Specialising in GA4 architecture, consent mode implementation, and multi-layer audit frameworks.

Share