DuckDB
This feature is in alpha
Expect rapid changes, limited features, and possible breaking updates. Share feedback as we refine the experience and expand access.
DuckDB is a high-performance SQL database system optimized for analytical workloads. It can directly query Iceberg tables stored in your analytics buckets, making it ideal for data exploration and complex analytical queries.
Installation
Install DuckDB and the Iceberg extension:
1pip install duckdb duckdb-icebergConnecting to Analytics buckets
Here's a complete example of connecting to your Supabase analytics bucket and querying Iceberg tables:
1import duckdb2import os34# Configuration5PROJECT_REF = "your-project-ref"6WAREHOUSE = "your-analytics-bucket-name"7SERVICE_KEY = "your-service-key"89# S3 credentials10S3_ACCESS_KEY = "your-access-key"11S3_SECRET_KEY = "your-secret-key"12S3_REGION = "us-east-1"1314# Construct endpoints15S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3"16CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg"1718# Initialize DuckDB connection19conn = duckdb.connect(":memory:")2021# Install and load the Iceberg extension22conn.install_extension("iceberg")23conn.load_extension("iceberg")2425# Configure Iceberg catalog with Supabase credentials26conn.execute(f"""27 CREATE SECRET (28 TYPE S3,29 KEY_ID '{S3_ACCESS_KEY}',30 SECRET '{S3_SECRET_KEY}',31 REGION '{S3_REGION}',32 ENDPOINT '{S3_ENDPOINT}',33 URL_STYLE 'virtual'34 );35""")3637# Configure the REST catalog38conn.execute(f"""39 ATTACH 'iceberg://{CATALOG_URI}' AS iceberg_catalog40 (41 TYPE ICEBERG_REST,42 WAREHOUSE '{WAREHOUSE}',43 TOKEN '{SERVICE_KEY}'44 );45""")4647# Query your Iceberg tables48result = conn.execute("""49 SELECT *50 FROM iceberg_catalog.default.events51 LIMIT 1052""").fetchall()5354for row in result:55 print(row)5657# Complex aggregation example58analytics = conn.execute("""59 SELECT60 event_name,61 COUNT(*) as event_count,62 COUNT(DISTINCT user_id) as unique_users63 FROM iceberg_catalog.default.events64 GROUP BY event_name65 ORDER BY event_count DESC66""").fetchdf()6768print(analytics)Key features with DuckDB
Efficient data exploration
DuckDB's lazy evaluation means it only scans the data you need:
1# This only reads the columns you select2events = conn.execute("""3 SELECT event_id, event_name, event_timestamp4 FROM iceberg_catalog.default.events5 WHERE event_timestamp > NOW() - INTERVAL '7 days'6""").fetchdf()Converting to Pandas
Convert results to Pandas DataFrames for further analysis:
1df = conn.execute("""2 SELECT *3 FROM iceberg_catalog.default.events4""").fetchdf()56# Use pandas for visualization or further processing7print(df.describe())Exporting results
Save your analytical results to various formats:
1# Export to Parquet2conn.execute("""3 COPY (4 SELECT * FROM iceberg_catalog.default.events5 ) TO 'results.parquet'6""")78# Export to CSV9conn.execute("""10 COPY (11 SELECT event_name, COUNT(*) as count12 FROM iceberg_catalog.default.events13 GROUP BY event_name14 ) TO 'summary.csv' (FORMAT CSV, HEADER true)15""")Best practices
- Connection pooling - Reuse connections for multiple queries
- Partition pruning - Filter by partition columns to improve query performance
- Column selection - Only select columns you need to reduce I/O
- Limit results - Use LIMIT during exploration to avoid processing large datasets