How do I reduce BigQuery costs on GA4 exports?
The biggest cost wins come from nine SQL patterns: (1) partition pruning via `_TABLE_SUFFIX BETWEEN` (10 to 50x cost difference vs derived filters), (2) clustering on source/medium/event_name (30 to 60% reduction on top of partitioning), (3) explicit column selection (never SELECT *), (4) materialised views for repeated aggregations, (5) `APPROX_COUNT_DISTINCT` instead of COUNT(DISTINCT), (6) dry runs before ad-hoc queries, (7) physical storage billing (up to 70% reduction for compressible data), (8) summary tables that pre-aggregate raw events, and (9) BI Engine for dashboard queries. BigQuery on-demand pricing is $6.25/TB scanned as of 2026, with compute representing 85 to 90% of total BigQuery cost.
Combined, these patterns can drop a $5,000/month BigQuery bill to under $500.
The 2026 pricing model
Before optimising, understand what you're being charged for:
- On-demand compute: $6.25/TB scanned (was $5/TB before 2026)
- Storage (active): $0.02/GB/month for tables modified in last 90 days
- Storage (long-term): $0.01/GB/month, automatic 50% discount after 90 unmodified days
- Streaming inserts: $0.01/200MB
- BigQuery sandbox: Free, 1TB queries/month, 10GB storage, no streaming
Compute is 85 to 90% of total cost. Storage is rarely the optimisation target, query patterns are. The numbers below assume on-demand pricing; Editions/flat-rate has different mechanics covered later.
The 9 patterns
Pattern 1: Partition pruning via _TABLE_SUFFIX
The single highest-impact optimisation. GA4 exports are sharded into daily tables (events_20260501, events_20260502...). Use _TABLE_SUFFIX BETWEEN to limit scans:
The bad version looks equivalent but BigQuery cannot use partition pruning on a derived filter. This single mistake can 10 to 50x your query cost. Always use _TABLE_SUFFIX directly with literal date strings.
Pattern 2: Clustering on summary tables
For your flattened summary tables, cluster by frequently filtered columns:
When you filter by WHERE source='google' AND medium='organic', BigQuery only reads the blocks containing organic Google traffic rather than the entire partition. Clustering reduces bytes scanned 30 to 60% on top of partitioning, the simplest change with the biggest impact after partition pruning.
Combined partitioning + clustering: 90 to 99% reduction for well-targeted queries. On a 1TB table, that's scanning 1 to 10GB instead of the full terabyte.
Pattern 3: Never SELECT *
BigQuery uses columnar storage, it only charges for columns queried. SELECT * scans every column whether you need them or not.
Same data. 17x cost difference based purely on column selection. Always list explicit columns.
Pattern 4: Materialised views for repeated aggregations
If you query the same daily aggregation repeatedly (revenue by channel, sessions by country), materialise it:
Materialised views update incrementally as new data arrives. Querying the view costs cents instead of dollars per query. For dashboards refreshing 100+ times per day, this changes the economics entirely.
Pattern 5: APPROX_COUNT_DISTINCT vs COUNT(DISTINCT)
For unique-user counts where exact precision isn't required:
For dashboards and exploratory analysis where 98% accuracy is fine, APPROX_COUNT_DISTINCT scans dramatically less data and runs faster.
Pattern 6: Dry run before ad-hoc queries
Before running a query you suspect might be expensive, dry-run it:
The console also shows estimated bytes scanned in the top-right corner before you run any query. Dry runs are free, they tell you the cost before commitment. For exploratory work, the BigQuery Console's "Preview" tab is also free (it reads directly from storage without executing a query).
Pattern 7: Physical storage billing
BigQuery defaults to logical storage billing (uncompressed size). For compressible data (JSON logs, repetitive event params), physical storage billing can reduce costs up to 70%:
Want to see which hidden implementation gaps are affecting your GA4 data quality?
Caveats:
- You're also billed for the 7-day Time Travel window
- Fail-safe storage costs apply
- Best for datasets with high compression ratios (text-heavy, repetitive)
For GA4 exports specifically, physical billing can reduce storage cost meaningfully (often in the 30 to 60% range for highly-repetitive event_params payloads), test on your own dataset by comparing the logical and physical bytes shown in INFORMATION_SCHEMA before switching.
Pattern 8: Summary tables that pre-aggregate
Instead of querying raw event tables for every dashboard query, build daily summary tables:
Run this once per day via scheduled query. Dashboards then query sessions_daily (small, indexed) instead of raw events (huge). Cost drops by orders of magnitude.
Pattern 9: BI Engine for dashboard queries
For Looker / Looker Studio dashboards making frequent small queries, BI Engine caches frequently-accessed data in memory:
- Sub-second dashboard performance
- Reduced slot consumption
- Lower compute spend
Cost: BI Engine reservation pricing. Worthwhile for dashboards with high query volume, pays back when dashboard queries exceed ~100 per day.
When to switch from On-Demand to Editions
The threshold rule: if your on-demand BigQuery spend consistently exceeds $5,000/month, evaluate Editions (flat-rate slot pricing).
Editions tiers (2026):
- Standard, basic SQL, no Time Travel beyond 2 days
- Enterprise. Time Travel up to 7 days, materialised views, BI Engine
- Enterprise Plus, full features
You buy slots (compute capacity). Pay continuously for baseline; burst above on-demand. Properties with predictable workloads above $5,000/month often save meaningfully when switching to Editions; the actual saving depends on workload concurrency and whether you can right-size baseline slots, model both options in the BigQuery slot estimator before committing. Below $5,000/month, on-demand is more economical.
What doesn't save money
Three common misconceptions:
1. LIMIT does NOT save money. LIMIT 10 only limits returned rows. BigQuery still scans the same data. Use WHERE clauses with partition columns to actually reduce scan.
2. Storage tier doesn't help when compute dominates. Switching all your storage to long-term storage saves pennies if your real cost is compute. Optimise the queries first, storage second.
3. Sharding tables manually doesn't help. Creating events_20260101, events_20260102 as separate tables (instead of using BigQuery's built-in partitioning) creates metadata overhead and complicates permissions without performance benefit. Use partitioning, not sharding.
How to use this in a GA4 audit
Use this topic to support a BigQuery export audit and reporting-governance review. This article is about validating exported analytics data without assuming raw tables reproduce every GA4 reporting behavior exactly. Where possible, separate API-verified findings, browser-verified findings, and findings that depend on access to linked platforms.
What to verify
- Use stable daily exports before drawing conclusions from intraday or best-effort tables.
- Compare raw event and transaction slices before comparing attribution or modeled reporting outputs.
- Confirm which fields are available only after processing delays or only in specific export modes.
- State clearly when warehouse logic is an approximation of the GA4 interface rather than a perfect recreation.
Known limitations
- BigQuery export, the Data API, and the GA4 interface can differ for legitimate processing reasons.
- Attribution, identity, thresholding, and modeled features may not map one-to-one to raw export tables.
Before acting on the result
Use the visible evidence behind the finding before changing reporting, bidding, privacy controls, or executive dashboards. GA4 Audits findings should be reviewed by a qualified analyst before major business decisions are made.