July 8, 2025

S3 Managed Tables, Unmanaged Costs: The 20x Surprise with AWS S3 Tables

Written by:

Vinish Reddy Pannala

and

Kyle Weller

and

Vinish Reddy Pannala and Kyle Weller

S3 Managed Tables, Unmanaged Costs: The 20x Surprise with AWS S3 Tables

AWS S3 Tables promises to simplify table maintenance for Apache Iceberg, but our hands-on experience revealed long delays in compaction, minimal observability, and costs up to 20–30x higher than alternative approaches. Compaction delays of 2.5–3 hours cause persistent performance degradation. S3 Tables design suffers from a breaking abstraction that lacks inputs from writers to influence compaction controls to match workload performance.

Introduction to S3 Tables

For AWS users working with Apache Iceberg, S3 Tables is a fully managed service that provides built-in Apache Iceberg support to store tabular data at scale. S3 Tables introduces a "table bucket" abstraction on top of Amazon S3, enabling users to create tables that utilize Apache Iceberg for metadata and Apache Parquet for efficient, columnar data storage. In addition to this, S3 Tables provides fully managed table optimizations and enables seamless integration with other AWS analytics services like Amazon Athena, Amazon SageMaker etc.

S3 Tables is an exciting advancement for the data lakehouse community that validates the architecture pattern and highlights the need for an easier way to manage Iceberg tables. We heard the community voice concerns about costs in S3 Tables and we decided to investigate ourselves. In this blog we share technical data and results from our hands-on experience and benchmarks.

Small Files Problem and Why Compaction Matters

After releasing S3 Tables the AWS team highlighted that many businesses managing petabyte-scale data must optimize both storage and processing to deliver timely insights in a cost-effective manner. These workloads often involve frequent, granular updates that generate many small files. As the number of data files grows, query performance for downstream applications can degrade. To address this, a process known as Compaction consolidates small parquet files into larger ones, enabling query engines to scan data more efficiently with fewer I/O operations.

For table formats like Iceberg, where issues such as the "small files problem" and table optimizations like "compaction" aren't automatically managed, efficiently handling them with minimal resources can be a significant operational challenge. AWS cites in a recent blog that: “When talking with our customers, we learned that the most challenging aspect is the compaction of individual small files produced by each transactional write on tables into a few large files”.

S3 Tables automatically perform compaction in addition to other maintenance tasks, such as snapshot management and unreferenced file removal. If you are already operating within the AWS ecosystem, S3 Tables appears on the surface to be a compelling managed option to simplify your data lake operations, but how well does it hold up in real world usage?

Hands-on with S3 Tables

Poor observability

To evaluate S3 Tables in action, I created an Iceberg table using the default configuration without any custom overrides—and set up an Iceberg writer that generates synthetic insert records into the table.

The writer produced a nominal 1 GB of data per minute across 100 partitions, resulting in approximately 100 files per minute, each ranging between 7 MB and 15 MB. To my surprise, S3 Tables compaction didn’t trigger, even after the total number of files exceeded 10,000 and the dataset size surpassed 100 GB. I checked the documentation to see if any compaction metrics or dashboards were available, but AWS only provides a CLI command (S3 Tables maintenance job status) to retrieve the last compaction status. Even that still reflected only the initial table creation timestamp, with no indication that any background optimization had run.

aws s3tables get-table-maintenance-job-status --region us-west-2  --table-bucket-arn arn:aws:s3tables:us-west-2:****:bucket/**-benchmarks --namespace vinish_benchmarks --name compaction_test_table

...
{
    "tableARN": "arn:aws:s3tables:us-west-2:***:bucket/benchmarks/table/fc7ef97e-9ece-4fce-b408-4e017f164d84",
    "status": {
        "icebergCompaction": {
            "status": "Successful",
            "lastRunTimestamp": "2025-05-05T00:52:26.908000+00:00"
        },
 ....
}

Puzzled by the lack of compaction activity, I stopped the Iceberg writer to see how S3 Tables would respond. With no observability dashboards available to monitor compaction, I turned to the metadata directly, downloading the Iceberg table metadata JSON using the get-table-metadata-location CLI command for further analysis.

Roughly 3 hours after the table was created, S3 Tables finally triggered compaction executing 10 replace operations and compacting approximately 100 GB of data over the course of 1 hour. In the meantime, I reviewed the AWS documentation to determine whether users have any control over when compaction is triggered. As it turns out, the only tunable parameter is the target file size for compaction, which can be set between 64 MB and 512 MB (512MB is the default) using the AWS CLI via the put-table-maintenance-configuration command. Following this, I resumed the writer with a slightly adjusted ingestion rate—now writing 1 GB every 2.5 minutes, while keeping all other parameters the same. The pattern repeated: no compaction activity for the first 2.5 hours, after which S3 Tables issued 10 replace commits, compacting roughly 60 GB during the next hour.

The following diagram shows a visualization to help understand the timeline conceptually with approximate scale:

Flawed Approach to Compaction

This exposes a deeper flaw in the S3Tables approach, where it does not recognize that ideal compaction configurations are specific to different types of readers and writers. Storage service cannot opaquely optimize this without runtime inputs from writers and readers, since compaction frequency is often determined by a user's SLA requirements or data freshness needs. For example, users with streaming workloads typically require compaction to run frequently, much more often than the 2.5 to 3-hour delays observed with S3 Tables. In contrast, for batch workloads writing every 12h, triggering compaction every 3 hours may be unnecessary and even counterproductive, as it can result in sub-512 MB files that will need to be compacted again in subsequent rounds.

Even in its current form, S3 Tables offers very limited control in this aspect and the only tunable parameter is the target file size for compaction, which can be set between 64 MB and 512 MB (512MB is the default) using the AWS CLI via the put-table-maintenance-configuration command. During our testing we left the default file size target of 512MB in place. With long delays in compaction together with long compaction durations we never achieved the target file size until we stopped our write operations. In actual production workloads this limitation means that queries on recent data will perpetually be slow and not benefit from the performance improvements of compaction. The following diagram shows a measurement of the average file size over ~20h time period, that reflects these long periods of fluctuations where compaction improves the average file size infrequently compared to writes which decrease the average file size right after.

‍

In general, this opens up questions on whether S3 Tables is breaking abstractions by leaking database-like functionality into a file system/object storage layer, which has historically not yielded good results.

Breaking down costs

AWS data services provide the “amazon basics” versions of core data services you need to run your data platform. S3 Tables could add significant additional costs to your S3 bill that can quickly add up and you should be prepared to handle.

Current Calibration

Charge	S3 Standard	S3 Tables
Storage pricing	$0.023 per GB	$0.0265 per GB
Request pricing PUT, POST, LIST	$0.005 per 1k	$0.005 per 1k
Request pricing GET + others	$0.0004 per 1k	$0.0004 per 1k
Maintenance pricing	n/a	$0.05 per GB processed + $0.004 per 1,000 objects
Monitoring pricing	n/a	$0.025 per 1,000 objects

From an experiment compacting 100GB and 10K files, our S3 Tables maintenance cost came out matching the expected pricing estimates at $5.04. Estimating budgets when list price numbers are less than a penny is sometimes deceptively comforting until you realize how quickly they add up at scale and S3 Tables can quietly become a very expensive part of your stack. In comparison let’s explore some napkin math on how much this same compaction operation might take on another AWS service like EMR - the common alternative way for users to perform compaction today.

From recent benchmark activities we have measured it takes EMR ~25 seconds of compute time to compact 1GB of data with an m5.xlarge instance. EMR list price for m5.xlarge = $0.048/h and the ec2 list price is $0.192/h. So the calculation is: (25 seconds) x (($0.192+$0.048) / 3600 seconds)= $0.0017/GB or $0.17 for 100GB.

Current Calibration

Charge	AWS EMR	S3 Tables
$/GB compaction	$0.0017/GB	$0.05/GB processed + $0.004/ 1,000 objects
Total data processed	100GB	100GB
Total cost	$0.17 (~29x cheaper)	$5.04

Running Iceberg compaction with AWS EMR comes out to be ~29x cheaper than if you let AWS S3 Tables do it for you. Okay, okay, I know… cue the pitchforks: “but I don’t want to use EMR, it’s way too hard, I need the easy fully managed route”. What if there was a way to have your cake and eat it too?

What if you can have it all : Onehouse Table Optimizer

Onehouse offers a fully managed Lakehouse Table Optimizer, delivering automated services such as compaction, clustering, cleaning, and multi-catalog synchronization for Apache Hudi, Apache Iceberg and Delta Lake tables. We also run these operations at a fraction of the cost of AWS S3 Tables with the Onehouse Compute Runtime (OCR) that optimizes workloads and queries across all major engines. In this section, we’ll explore how Onehouse delivers a more powerful and flexible optimization experience compared to other managed iceberg services like AWS S3 Tables.

S3 Tables vs Onehouse Cost Comparison

Let’s start with a benchmark comparing performance and cost. S3 Tables is a black box service which hides what compute resources are used. So while creating an exact replica of the setup is not possible, it was easy to set up a comparison by measuring the single pricing metric for S3 Tables - $ per GB compacted.

For the benchmark we ran Onehouse Table Optimizer on top of a single AWS m8g.xlarge (16GB, 4vCPU). Smaller compute could be used which would further reduce the cost with plenty of room to still match or outperform the performance of the S3 Tables job. We used the same dataset and the same writer script to produce the same exact environment in S3 for S3 Tables and Onehouse. Below are the results showing how Onehouse was ~5x faster at 20-30x cheaper cost:

Current Calibration

Details	S3 Tables	Onehouse Table Optimizer
Setup Details (same for both)	1. Total data compacted: 953.7 GB 2. Writer batch size: 101.95GB 3. Writer File Size: 7–15MB 4. Target File Size: 512MB
Compaction frequency	~3h	~30min (6x more frequent)
Avg compaction duration	56.2min	11.3min (~5x faster)
Total cost	$47.685	$2.293 (20.8x cheaper) If spot nodes are used: $1.587 (30.1x cheaper)

These cost and performance advantages can be attributed to our new compaction techniques we pioneered in the Onehouse Compute Runtime. One example we highlighted in our detailed launch blog is our high performance Lakehouse I/O which includes an advanced vectorized columnar merging technique that contributes to accelerating compaction operations by up to 4x.

Granular control and visibility

Onehouse Table Optimizer allows you to control the frequency of compaction, target file size, budget of how much data to compact per execution, sort keys, layout strategy when re-writing files, and much more. You can enable Table Optimizer to run automatically and intelligently based on storage events. You have the flexibility to schedule the frequency of the compactions relative to the frequency of your writer commits. Or you can leverage our APIs for custom orchestration, time based intervals or even on demand push execution.

Current Calibration

Iceberg Compaction functionality	S3 Tables	Onehouse Table Optimizer
Orchestration	❌ Arbitrary ~3h delay cannot customize	✅ Full control for commit based frequency and even on-demand controls
Configurations	❌ Only can choose target file size	✅ File size, frequency, budgets, layout strategy
Monitoring	❌ No built in monitoring available	✅ Full web UI + advanced grafana log delivery
File-sizing effectiveness	❌ Compaction not effective for low latency workloads	✅ Compaction can keep up within 1min of write operations

*Monitor status and progress of compaction activities across all tables*

*Drill down into a timeline and specific compaction operation details*

*Interactive clickthrough to show you other surface areas of the product*

Conclusion

AWS embracing an open table format like Apache Iceberg as a first-class citizen in their backbone storage service of S3 is a major milestone for the lakehouse ecosystem and community. It validates the growing shift toward open lakehouse architectures and signals that the future of analytics will be built on open, modular storage decoupled from compute engines.

While S3 Tables lowers the barrier to entry for Iceberg on AWS, it also exposes key challenges of designing performance-critical lakehouse workloads. From limited compaction control and poor observability to unpredictable performance and non-trivial cost overheads, our experience shows that S3 Tables may not be ready for data intensive workloads at scale. It could be an acceptable solution for batch jobs that write infrequently to static tables on S3.

If you're looking for the easy button Iceberg table maintenance, but you care deeply about performance, cost-efficiency, or operational clarity, Onehouse offers a more robust alternative, designed from the ground up for modern Iceberg operations with a high-performance compute runtime and a fully managed optimization layer.

Onehouse Table Optimizer runs asynchronously and it blends into the background with your standard S3 buckets. With zero changes to your writer pipelines you can put your Iceberg, Hudi, or Delta tables on autopilot across all your buckets and clouds. Signup here for a personalized demo.

Authors

Vinish Reddy Pannala

Software Engineer

Vinish Reddy is a software engineer at Onehouse, building cutting-edge data infrastructure. He's a PPMC member for Apache XTable (Incubating) and an active contributor to open-source data projects.

Kyle Weller

Head of Product

Head of Product. Experience includes Azure Databricks, Azure ML, Cortana AI Agent, global scale data and experimentation platforms for Bing Search, and Office 365. Onehouse author and speaker.

S3 Managed Tables, Unmanaged Costs: The 20x Surprise with AWS S3 Tables

Introduction to S3 Tables

Small Files Problem and Why Compaction Matters