PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back
Jun 03, 2025

Optimizing Apache Iceberg tables for real-time analytics

Learn how to use Iceberg's partitioning, sorting, and compaction features to build high-performance real-time analytics systems
Scalable Analytics Architecture
Alberto Romeu
Alberto RomeuSoftware Engineer

Apache Iceberg has all the features needed for high-performance analytics, but success depends on how you use them.

While Iceberg excels at analytical workloads, adapting it for real-time analytics requires understanding the performance characteristics and trade-offs.

The most common mistakes engineers make are:

  • Ignoring the fundamentals: Proper partitioning and sorting can make queries 1000x faster
  • Optimizing before understanding: Always analyze query patterns first, then optimize
  • Assuming more features = better performance: Each optimization has trade-offs
  • Not understanding the streaming challenges: Small files, metadata explosion, and compaction overhead

Let's explore Iceberg's partitioning, sorting, and compaction features to build high-performance real-time analytics systems.

1. Partitioning

Partitioning is a way to organize your data into logical groups. Think of partitioning as "Which folder should I look in?"

  • ❌ Without partitioning, every query scans all files.
  • ✅ Partition by your most common query filters.

Some practical examples:

Basic time-based partitioning

Explain code with AI
Copy
-- Partition by day for time-based queries
CREATE TABLE ecommerce_events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP,
  event_type STRING,
  product_id BIGINT,
  region STRING,
  session_id STRING,
  revenue DECIMAL(10,2)
) USING ICEBERG
PARTITIONED BY (days(event_time));

Directory structure

Explain code with AI
Copy
ecommerce_events/
├── event_time_day=2024-01-15/
│   └── data files
├── event_time_day=2024-01-16/
│   └── data files
└── event_time_day=2024-01-17/
    └── data files

Sample query

Explain code with AI
Copy
SELECT COUNT(*) FROM ecommerce_events
WHERE event_time = '2024-01-15';
-- Files read: Only event_time_day=2024-01-15 ✅

Query plan

Explain code with AI
Copy
-- Query plan shows partition filters:
== Physical Plan ==
... PartitionFilters: [isnotnull(event_time_day), (event_time_day = 2024-01-15)]

Multi-dimensional partitioning

Explain code with AI
Copy
-- Partition by time AND region for better pruning
CREATE TABLE ecommerce_events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP,
  event_type STRING,
  product_id BIGINT,
  region STRING,
  session_id STRING,
  revenue DECIMAL(10,2)
) USING ICEBERG
PARTITIONED BY (
  days(event_time),
  region,
  bucket(32, user_id)  -- Hash bucketing for load distribution
);

Directory structure

Explain code with AI
Copy
ecommerce_events/
├── event_time_day=2024-01-15/
│   ├── region=US/
│   │   ├── user_id_bucket=0/
│   │   │   ├── part-00000.parquet
│   │   │   └── part-00001.parquet
│   │   ├── user_id_bucket=1/
│   │   │   └── part-00002.parquet
│   │   └── ... (buckets 2-31)
│   ├── region=EU/
│   │   ├── user_id_bucket=0/
│   │   └── ... (buckets 1-31)
├── event_time_day=2024-01-16/
│   ├── region=US/
│   │   ├── user_id_bucket=0/
│   │   └── ... (buckets 1-31)
│   ├── region=EU/
│   └── region=APAC/
└── event_time_day=2024-01-17/
    ├── region=US/
    ├── region=EU/
    └── region=APAC/

Sample query

Explain code with AI
Copy
SELECT COUNT(*) FROM ecommerce_events
WHERE event_time = '2024-01-15' AND region = 'US';
-- Files read: Only event_time_day=2024-01-15/region=US/ ✅

Query plan

Explain code with AI
Copy
...
PartitionFilters: [
  isnotnull(event_time_day#999),
  (event_time_day#999 = 2024-01-15),     ← Time partition elimination
  isnotnull(region#789),
  (region#789 = US)                      ← Region partition elimination
]
...
PartitionsRead: 32 (out of 15,360 total partitions)
FilesRead: 64 (out of 245,760 total files)

Always define partitions based on actual query patterns. Some examples:

  • For time-series queries:
Explain code with AI
Copy
PARTITIONED BY (
  days(event_time),        -- Primary: time filtering (days or months depending on volume)
  bucket(8, user_id)       -- Secondary: load balancing only
)
  • For multi-tenant queries:
Explain code with AI
Copy
PARTITIONED BY (
  tenant_id,               -- Primary: perfect isolation
  days(event_time),        -- Secondary: time pruning
  bucket(4, user_id)       -- Tertiary: small buckets
)
  • For region-heavy queries:
Explain code with AI
Copy
PARTITIONED BY (
  region,                  -- Primary: geographic filtering
  days(event_time),        -- Secondary: time pruning
  bucket(16, user_id)      -- Tertiary: parallelism
)

Rules of thumb when partitioning Iceberg tables

  • Maximum 2-3 partition columns
  • 10-100 files per partition
  • 1GB-100GB total size per partition
  • Monitor and evolve partitions when required

Anti-patterns

Too many small partitions:

Explain code with AI
Copy
-- BAD: Creates tiny partitions
PARTITIONED BY (
  hours(event_time),       -- 24 partitions per day
  region,                  -- × 10 regions
  event_type,              -- × 20 event types
  user_segment             -- × 5 segments
)
-- Result: 24,000 tiny partitions per day

High cardinality partitions:

Explain code with AI
Copy
-- BAD: Partition explosion
PARTITIONED BY (
  user_id,                 -- Millions of partitions
  session_id               -- Even more partitions
)
-- Result: Metadata larger than data

Ignoring query patterns:

Explain code with AI
Copy
-- BAD: Partitioned by write pattern, not read pattern
PARTITIONED BY (
  ingestion_batch_id       -- How data arrives
)
-- But queries filter by:
WHERE event_time > '...' AND region = '...'  -- Different columns!

When to repartition

Inspect your table metadata and query patterns frequently to evolve partitioning.

Explain code with AI
Copy
-- Partition health check
WITH partition_stats AS (
  SELECT
    partition,
    COUNT(*) as file_count,
    SUM(file_size_in_bytes) as partition_bytes,
    AVG(file_size_in_bytes) as avg_file_bytes
  FROM table_name.files
  GROUP BY partition
)
SELECT
  COUNT(*) as total_partitions,
  AVG(file_count) as avg_files_per_partition,
  MAX(file_count) as max_files_per_partition,
  AVG(partition_bytes) / (1024*1024*1024) as avg_partition_gb,
  COUNT(CASE WHEN file_count > 1000 THEN 1 END) as problematic_partitions
FROM partition_stats;
  • Too many files per partition: > 1,000 files → Add bucketing
  • Too few files per partition: < 5 files → Reduce time granularity
  • Skewed partitions: 1 partition > 10× average → Add sub-partitioning
  • Query performance degraded: → Align partitioning with query patterns

2. Sorting

Sorting is a way to organize your data into a specific order. Think of sorting as a way to "skip files within partitions and blocks within files"

  • ❌ Even within partitions, data might be randomly distributed across files.
  • ✅ Sort data within files to create useful min/max statistics.

Regular sorting

Explain code with AI
Copy
-- Sort by primary access pattern
-- This creates a strict ordering: first by user_id, then by event_time
CALL catalog.system.rewrite_data_files('db.ecommerce_events',
  strategy => 'sort',
  sort_order => 'user_id, event_time'
);

File layout after regular sorting:

user_idevent_time
110:00
110:10
110:30
210:15
210:25
210:35
310:05
310:20

All user_id records are together, then event_time

  • Primary sort key: [■■■■■■■■■■] ← Excellent clustering
  • Secondary sort key: [■ ■ ■ ■ ■] ← Scattered across file

When to use regular sorting:

  • Single primary access pattern (e.g., 90% of queries filter by event_time)
Explain code with AI
Copy
SELECT * FROM ecommerce_events WHERE event_time BETWEEN '2024-01-01' AND '2024-01-02';
  • Hierarchical access (tenant → date → user)
Explain code with AI
Copy
SELECT * FROM multi_tenant_events
WHERE tenant_id = 'company_a' AND date >= '2024-01-01' AND user_id = 12345;
  • Time-series with mostly recent data access
Explain code with AI
Copy
sort_order => 'timestamp DESC'  -- Most recent data first

Z-Ordering for multi-dimensional queries

Z-order for queries filtering on multiple dimensions. Data gets interleaved to preserve locality in both dimensions

Explain code with AI
Copy
CALL catalog.system.rewrite_data_files('db.ecommerce_events',
  strategy => 'sort',
  sort_order => 'zorder(user_id, event_time)'
);

File layout after z-ordering:

user_idevent_time
110:00
210:15
110:10
310:05
110:30
210:25
310:20
210:35

All sort keys: [■■■ ■■■ ■■■] ← Balanced clustering

When to use z-ordering:

  • Queries filter on 2-4 columns together
Explain code with AI
Copy
SELECT * FROM ecommerce_events WHERE user_id = 12345 AND event_time > '2024-01-01';
SELECT * FROM ecommerce_events WHERE user_id BETWEEN 1000 AND 2000 AND event_time BETWEEN '2024-01-01' AND '2024-01-31';
  • Range queries on multiple columns (such as geospatial data)
Explain code with AI
Copy
SELECT * FROM locations WHERE latitude BETWEEN 40.0 AND 41.0 AND longitude BETWEEN -74.0 AND -73.0;
  • High-cardinality combinations
Explain code with AI
Copy
SELECT * FROM purchases
WHERE user_id IN (1,2,3) AND product_id IN (100,200) AND timestamp > '2024-01-01';

Z-ordering trades some single-dimension performance for better multi-dimensional performance, while regular sorting optimizes for the primary sort column at the expense of secondary columns.

File-Level statistics impact

Explain code with AI
Copy
-- Before sorting: user_id randomly distributed
-- File 1: user_id range [1-1000, 5000-6000] (fragmented)
-- File 2: user_id range [2000-3000, 500-800] (fragmented)

-- After sorting: clean ranges
-- File 1: user_id range [1-1000] ← min=1, max=1000
-- File 2: user_id range [1001-2000] ← min=1001, max=2000

SELECT * FROM ecommerce_events
WHERE user_id = 150 AND event_time >= '2024-01-15';
-- Only reads File 1 (150 is between 1-1000) ✅

3. Compaction

Iceberg partitions are composed of immutable files, writes may create many small files, hurting query performance.

Compaction is a process, typically run in the background, to merge several small files into a bigger one.

The process may look simple; just run the compaction process:

Explain code with AI
Copy
-- Compact all small files
CALL catalog.system.rewrite_data_files('db.ecommerce_events',
  strategy => 'binpack',
  options => map('target-file-size-bytes', '268435456') -- 256MB
);

But in reality, this is costly. Instead, you want to run compaction only over the necessary files (unless you want to rewrite a full table due to some schema evolution):

Explain code with AI
Copy
-- Compact only files from the last hour
CALL catalog.system.rewrite_data_files(
  table => 'db.ecommerce_events',
  strategy => 'binpack',
  where => 'event_time >= current_timestamp() - INTERVAL 1 HOUR',
  options => map(
    'target-file-size-bytes', '268435456',
    'min-input-files', '5',  -- Only compact if at least 5 files
    'max-concurrent-file-group-rewrites', '5' -- Parallelism level
  )
);

In real-world workloads you need to be smarter and customize your compaction logic so it runs only when needed:

Explain code with AI
Copy
-- Check if compaction is needed
WITH small_files_check AS (
  SELECT
    COUNT(*) as small_file_count,
    AVG(file_size_in_bytes) as avg_size_bytes
  FROM ecommerce_events.files
  WHERE file_size_in_bytes < 67108864  -- < 64MB
    AND partition LIKE '%2024-01-15%'  -- Today's data
)
SELECT
  CASE
    WHEN small_file_count >= 10 THEN 'COMPACT_RECOMMENDED'
    ELSE 'NO_ACTION_NEEDED'
  END as recommendation
FROM small_files_check;

Metadata optimization

An explosion of metadata files can hurt performance. Two good practices here:

  1. Expire snapshots, and
  2. Compact manifest files
Explain code with AI
Copy
-- Enable snapshot expiration for faster metadata reads
ALTER TABLE ecommerce_events SET TBLPROPERTIES (
  'history.expire.min-snapshots-to-keep' = '5',
  'history.expire.max-snapshot-age-ms' = '86400000' -- 1 day
);

-- Compact manifests regularly
CALL catalog.system.rewrite_manifests('db.ecommerce_events');

The 6-Level Data Elimination Stack

When you combine partitioning, sorting, and compaction, Iceberg creates a powerful 6-level data elimination system.

Let's trace through how a single query gets optimized:

Explain code with AI
Copy
SELECT user_id, event_type
FROM ecommerce_events
WHERE event_time = '2024-01-15' AND user_id = 150;

Level 1: Partition pruning

  • Iceberg skips entire directories based on partition columns
  • Only reads partitions where event_time = '2024-01-15'
Explain code with AI
Copy
-- Table structure:
ecommerce_events/
├── event_time_day=2024-01-14/  ← SKIP (date < 2024-01-15)
├── event_time_day=2024-01-15/  ← READ (matches filter)
├── event_time_day=2024-01-16/  ← SKIP (not needed)
└── event_time_day=2024-01-17/  ← SKIP (not needed)

-- Result: Skip entire directories based on partition columns
-- Data eliminated: 75% (3 out of 4 days)

Level 2: File elimination

  • Within event_time_day=2024-01-15/, skip files where user_id range doesn't include 150
  • Uses file-level statistics (min/max values) to skip files
Explain code with AI
Copy
-- Within event_time_day=2024-01-15/:
├── file_001.parquet  ← user_id range [1-1000]     ← READ (150 is in range)
├── file_002.parquet  ← user_id range [1001-2000]  ← SKIP (150 < 1001)
├── file_003.parquet  ← user_id range [2001-3000]  ← SKIP (150 < 2001)
└── file_004.parquet  ← user_id range [3001-4000]  ← SKIP (150 < 3001)

-- Result: Skip files using min/max statistics
-- Data eliminated: 75% more (3 out of 4 files)

Level 3: Row group elimination (inside parquet)

  • Within the chosen file, skip row groups where user_id min/max doesn't include 150
  • Uses Parquet column statistics
Explain code with AI
Copy
-- Within file_001.parquet:
├── Row Group 1  ← user_id range [1-500]    ← READ (150 is in range)
├── Row Group 2  ← user_id range [501-1000] ← SKIP (150 < 501)
└── Row Group 3  ← user_id range [1001-1500] ← SKIP (150 < 1001)

-- Result: Skip row groups using Parquet statistics
-- Data eliminated: 66% more (2 out of 3 row groups)

Level 4: Column elimination (projection pushdown)

  • Only read user_id and event_type column chunks, skip others
Explain code with AI
Copy
-- Row Group 1 contains columns:
├── user_id column chunk     ← READ (needed for SELECT)
├── event_type column chunk  ← READ (needed for SELECT)
├── event_time column chunk  ← SKIP (only used in WHERE, already filtered)
├── product_id column chunk  ← SKIP (not needed)
├── revenue column chunk     ← SKIP (not needed)
└── session_id column chunk  ← SKIP (not needed)

-- Result: Only deserialize needed columns
-- Data eliminated: 66% more (4 out of 6 columns)

Level 5: Page elimination (within row groups)

  • Within row groups, skip pages where user_id range doesn't include 150
Explain code with AI
Copy
-- Within user_id column chunk:
├── Page 1  ← user_id range [1-100]   ← SKIP (150 > 100)
├── Page 2  ← user_id range [101-200] ← READ (150 is in range)
├── Page 3  ← user_id range [201-300] ← SKIP (150 < 201)
└── Page 4  ← user_id range [301-400] ← SKIP (150 < 301)

-- Result: Skip pages within column chunks
-- Data eliminated: 75% more (3 out of 4 pages)

Level 6: Row filtering (post-deserialization)

Important: Unlike the previous 5 levels, this is filtering, not elimination.

  • The entire page must be read and deserialized from storage
  • Row-level filtering happens in memory after reading the page
Explain code with AI
Copy
-- Within Page 2 (user_id range [101-200]):
├── Read entire page from storage ← ALL rows [101-200] deserialized
├── Apply filter in memory: user_id = 150
└── Return only matching row(s)

-- Result: I/O saved = 0% (entire page must be read)
-- Processing saved = ~99% (only matching rows processed further)

Performance impact

Here's the cumulative effect of the 6-level optimization:

LevelElimination TargetData RemainingCumulative I/O Savings
Original Dataset-1TB (100%)-
1. PartitionSkip directories250GB (25%)75% saved
2. FileSkip files62GB (6.25%)94% saved
3. Row GroupSkip row groups21GB (2.1%)98% saved
4. ColumnSkip columns7GB (0.7%)99.3% saved
5. PageSkip pages1.7GB (0.17%)99.8% saved
6. RowFilter rows1.7GB (0.17%)99.8% I/O saved

Levels 1-5: True data elimination

  • These levels skip reading data entirely from storage
  • Provide both I/O savings and processing savings
  • This is where the massive performance gains come from

Level 6: In-memory filtering

  • Still reads the data from storage (no additional I/O savings)
  • Filters out unwanted rows in memory after deserialization
  • Provides CPU/memory savings but not I/O savings
  • This is traditional SQL row-level filtering

Final Result: Read 1.7GB instead of 1TB - that's 99.8% less I/O!

When Iceberg isn't enough

While the optimizations above work well for analytical workloads, streaming and real-time analytics expose fundamental limitations in Iceberg's architecture.

Streaming writes to Iceberg create several fundamental challenges due to its architecture based on immutable snapshots.

Challenge 1: Small file explosion

See this sample code:

Explain code with AI
Copy
# Streaming scenario: New ecommerce_events every second
stream.writeStream \
  .format("iceberg") \
  .trigger(processingTime="1 second") \
  .start()

# Results in:
# ecommerce_events/
# ├── file_001.parquet (1MB - 1 second of data)
# ├── file_002.parquet (1MB - next second)
# ├── file_003.parquet (1MB - next second)
# └── ... (86,400 tiny files per day!)

Since Iceberg files are immutable, each micro-batch creates new files instead of appending to existing ones.

Optimal file size is 128MB-1GB, but streaming creates 1-10MB files - 100x more files than optimal.

Challenge 2: Metadata explosion

Metadata files are written on each new file written.

Explain code with AI
Copy
// After 1 hour of streaming (3,600 commits):
{
  "snapshots": [
    {"snapshot-id": 1, "manifest-list": "snap-001.avro"},
    {"snapshot-id": 2, "manifest-list": "snap-002.avro"},
    // ... 3,598 more snapshots
    {"snapshot-id": 3600, "manifest-list": "snap-3600.avro"}
  ]
}

Table metadata grows linearly with commits. Query planning becomes slower than actual queries as it scans thousands of manifests.

After one hour a simple query like this must open thousands of small files:

Explain code with AI
Copy
SELECT COUNT(*) FROM ecommerce_events WHERE date = '2024-01-15';

Challenge 3: The compaction performance window

Compaction overcomes the small files and metadata explosion problems but it introduces other problems in streaming scenarios, such as windows of degraded performance.

Explain code with AI
Copy
-- Timeline of degraded performance:
-- 10:00 - Stream starts writing small files
-- 10:30 - 1,800 small files created (30 min × 1/sec)
-- 10:30 - Background compaction starts
-- 10:35 - Compaction completes, creates 2 large files

-- Query at 10:29: Reads 1,800 small files (SLOW - 45 seconds)
-- Query at 10:36: Reads 2 large files (FAST - 3 seconds)
-- Problem: 30-minute window of degraded performance!

Challenge 4: Concurrent writer conflicts

Scaling writes implies several concurrent workers writing to the same Iceberg table.

Due to the optimistic nature of concurrency in Iceberg, multiple streaming jobs writing to the same table may conflict, which can lead to exponential backoff retries and delays.

Explain code with AI
Copy
# Job 1: Processing user ecommerce_events
# Job 2: Processing system ecommerce_events
# Job 3: Processing audit ecommerce_events

# All trying to commit simultaneously:
# Commit 1: snapshot-1234 → SUCCESS
# Commit 2: snapshot-1234 → CONFLICT! (optimistic concurrency)
# Commit 3: snapshot-1234 → CONFLICT!

# Jobs 2 & 3 must retry, causing delays and wasted work

Challenge 5: Real-time aggregations and multiple indices

Real applications have multiple query patterns that need different sort orders:

Explain code with AI
Copy
-- Query Pattern 1: User analytics (90% of queries)
SELECT * FROM ecommerce_events WHERE user_id = 12345 AND event_time > '2024-01-01';
-- Needs: ORDER BY user_id, event_time

-- Query Pattern 2: Product analytics (5% of queries)
SELECT * FROM ecommerce_events WHERE product_id = 789 AND event_time > '2024-01-01';
-- Needs: ORDER BY product_id, event_time

-- Query Pattern 3: Geographic analysis (5% of queries)
SELECT * FROM ecommerce_events WHERE region = 'US' AND event_type = 'purchase';
-- Needs: ORDER BY region, event_type

and pre-computed aggregations:

Explain code with AI
Copy
-- Users want real-time dashboards showing:
SELECT
  region,
  event_type,
  COUNT(*) as ecommerce_events,
  SUM(revenue) as total_revenue,
  COUNT(DISTINCT user_id) as unique_users
FROM ecommerce_events
WHERE event_time >= current_timestamp() - INTERVAL 1 HOUR
GROUP BY region, event_type;

Iceberg approach: Run the aggregation query each time

Explain code with AI
Copy
-- Manual pre-aggregation with Iceberg
CREATE TABLE hourly_region_stats AS
SELECT
  date_trunc('hour', event_time) as hour,
  region,
  event_type,
  COUNT(*) as ecommerce_events,
  SUM(revenue) as total_revenue
FROM ecommerce_events
GROUP BY date_trunc('hour', event_time), region, event_type;

-- But now you need to:
-- 1. Keep it in sync with source table
-- 2. Handle late-arriving data
-- 3. Manage incremental updates
-- 4. Deal with duplicate processing

Specialized Real-Time Analytics Platforms

Apache Iceberg provides foundational features for high-performance and scalable analytics workloads, but real-time, user-facing analytics require specialized tools.

Use Iceberg when

  • Batch analytics with infrequent writes
  • Data lake scenarios with complex ETL
  • Schema evolution and time travel are critical
  • Query latency of 5-30 seconds is acceptable

Use specialized platforms when

  • Real-time apps need high-concurrency and consistent sub-second queries
  • High-frequency streaming writes (>1000 events/sec)
  • Multiple query patterns need different indexes
  • Pre-aggregations must update incrementally
  • Developer friendly workflows
Explain code with AI
Copy
┌─────────────┐    ┌────────────────────────┐    ┌─────────────────────┐
│             │    │                        │    │                     │
│ Events      │    │ Tinybird               │    │ Iceberg             │
│ stream      │--->│ (Real-time analytics)  │<---│ (Long-term storage) │
│             │  │ │                        │  │ │                     │
└─────────────┘  │ └────────────────────────┘  │ └─────────────────────┘
           Kafka ┘             │               └ iceberg() function
                               │ real-time API
                               │
                   ┌────────────────────────┐
                   │                        │
                   │ Real-time              │
                   │ application            │
                   │                        │
                   └────────────────────────┘

Tinybird is a real-time analytics platform that can ingest event streams and iceberge tables. To learn more about this pattern, you can read this Iceberg + Redpanda + Tinybird blog post, which explains how to leverage native Kafka streaming ingestion and incremental materialized views for sub-second public APIs while using Apache Iceberg as durable storage for analytics workloads.

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.
Tinybird wordmark

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline

Related posts

Scalable Analytics Architecture
May 20, 2025
Real-Time Analytics on Apache Iceberg with Tinybird
Alberto Romeu
Alberto RomeuSoftware Engineer
1Real-Time Analytics on Apache Iceberg with Tinybird
Scalable Analytics Architecture
May 23, 2025
Building Real-Time Analytics Applications with Redpanda, Iceberg, and Tinybird
Alberto Romeu
Alberto RomeuSoftware Engineer
1Building Real-Time Analytics Applications with Redpanda, Iceberg, and Tinybird
I Built This!
May 25, 2025
How to build CI/CD pipelines for real-time analytics projects
Gonzalo Gómez
Gonzalo GómezSoftware Engineer
1How to build CI/CD pipelines for real-time analytics projects
Scalable Analytics Architecture
Aug 18, 2023
Real-Time Data Ingestion: The Foundation for Real-time Analytics
Cameron Archer
Cameron ArcherTech Writer
1Real-Time Data Ingestion: The Foundation for Real-time Analytics
Tinybird news
May 06, 2025
Explorations: a chat UI for real-time analytics
Javi Santana
Javi SantanaCo-founder
1Explorations: a chat UI for real-time analytics
Scalable Analytics Architecture
Apr 24, 2025
dbt in real-time
Javi Santana
Javi SantanaCo-founder
1dbt in real-time
Engineering Excellence
May 06, 2025
Building a conversational AI tool for real-time analytics
Rafa Moreno
Rafa MorenoFrontend Engineer
1Building a conversational AI tool for real-time analytics
Scalable Analytics Architecture
Mar 07, 2025
How to run load tests in real-time data systems
Ana Guerrero
Ana GuerreroData Engineer
1How to run load tests in real-time data systems
I Built This!
Apr 04, 2025
How Inbox Zero uses Tinybird for real-time analytics
Elie Steinbock
Elie SteinbockFounder - Inbox Zero
1How Inbox Zero uses Tinybird for real-time analytics
Scalable Analytics Architecture
Jul 11, 2025
CI/CD with Tinybird Forward: Automating Real-Time Data Deployments
Iago Enríquez
Iago EnríquezData Engineer
1CI/CD with Tinybird Forward: Automating Real-Time Data Deployments