Powering Amazon Unit Economics at Scale Using Apache Hudi™

May 21, 2025

Speaker

Jason Liu

Senior Software Engineer

Amazon

Abhishek Srinivasa Raju Padmavathi

Senior Software Development Engineer

Amazon

Understanding and improving unit-level profitability at Amazon's scale is a massive challenge, one that requires flexibility, precision, and operational efficiency. It's not only about the massive amount of data we ingest and produce, but also the need to support our evergrowing businesses within Amazon. In this talk, we'll walk through how we built a scalable, configuration-driven platform called Nexus, and how Apache Hudi™ became the cornerstone of its data lake architecture.

Transcript

AI-generated, accuracy is not 100% guaranteed.

Demetrios - 00:00:07

I've got not one, but two speakers coming up in this next session. Let's bring them onto the stage right about now. Let's grab you Abby. Shaking Jason. Yay. Did you guys win some headphones?

‍

Abhishek Srinivasa Raju Padmavathi - 00:00:21

Nah, we felt,

‍

Demetrios - 00:00:24

Felt a little bit like cheating

‍

Abhishek Srinivasa Raju Padmavathi - 00:00:25

‍

Demetrios - 00:00:30

Oh man, I gotta go figure that out. While you all are given a talk, I'm gonna have to go sort through a whole lot of chat data. Alright, I'll be back very soon. I'll see you all in a bit.

‍

Abhishek Srinivasa Raju Padmavathi - 00:00:42

Thanks everyone for joining in. This is Abha and this is Jason. We are both senior engineers here at Amazon. Today we are gonna be talking about how we power Amazon Unit economics using Hudi and a config driven framework called Nexus. Some intro on where we belong within Amazon, we come under the worldwide Amazon stores division. And under that we are part of an organization called Profit Intelligence. The goal of Profit Intelligence is to provide accurate, timely, and granular profitability data for Amazon stores globally. This includes one of them being contribution profit. Contribution profit is a very standard business metric that is computed across different companies. One simple example is, if you ever bought a speaker from Amazon, our team pretty much computes the exact profit Amazon made, including the various cost and revenue segments like shipping cost, fees, and so on.

‍

Abhishek Srinivasa Raju Padmavathi - 00:01:53

This profitability data allows a large number of automated systems to make billion plus decisions every day. This includes pricing forecasting, and used by finance teams. That's our intro on where we belong. So some history of our team with Data Lake. We've been dealing with computing contribution profit for more than 15 years, which in essence means that we've been dealing with big data processing pretty much since the beginning. We initially just did the processing and published the data into a central data warehouse. But as the business requirements evolved, we eventually moved to owning our own data and maintaining our own data lakes for better control, making solving the business requirements more easily. We have gone through a lot of iterations of data lakes. We have custom SE data lake with custom query language, unstructured Redshift, ETLs on top of this SD Data Lake. Recently we have been consuming streaming sources as well via Flink. And the latest iteration is Nexus, which I will talk in more detail. At each of these iterations, the main thing is they were changing business requirements that drove these iterations. Given that we are kind of the owners for all the retail logic, to keep up with the business requirements and Amazon growth, we have to make sure our system is able to handle those requirements.

‍

Abhishek Srinivasa Raju Padmavathi - 00:03:37

In this, I'm just showing a quick preview of how we started ingesting streaming sources. The main thing to look at is we are using Flink for streaming both in streaming mode and batch mode. We have our own in-house absurd data lake, which we built using Spark Blue, and we still have the Redshift ETLs. In this iteration we started isolating the business logic to the input side. We had a little bit of control, but we still, because we are still using a custom solution for our data lake, we had scaling issues and some logic was still in the Redshift ETL. So, it was hard to make business changes.

‍

Abhishek Srinivasa Raju Padmavathi - 00:04:32

Our latest iteration of not just Data Lake, but overall system is Nexus. Nexus is a config driven data processing platform that allows customers to express their business logic. Customers here are the actual different business owners within Amazon retail, they are able to express their business logic as simple declarative configurations. The expectation is that they only interact with this configuration layer and Nexus as a system is able to go from this configuration to being able to produce the outputs by generating the relevant workflows, jobs. This kind of brings us closer to a self service world where the business owners own their own business logic and they're able to make changes on their own without relying on the engineering team. This kind of solves Amazon's growth and changing business requirements. At the high level, there are four main modules. I'll go over each of them briefly.

‍

Abhishek Srinivasa Raju Padmavathi - 00:05:37

We'll start with the Nexus flow. The main responsibility of Nexus Flow is handling all the orchestration responsibilities. Each of the Nexus modules operates at its own abstraction level. Nexus Flow operates at what we call as a workflow abstraction. Workflow here is basically a declarative configuration representing the workflows. Users are able to either define it by hand or in most of our cases, it is generated from the higher order of config that the business owners define in the configuration layer. Nexus Flow handles these workflows in multiple layers, the logical layer and the physical layer. Logical layer is where majority of the magic happens in terms of building a logical DAG, doing dependency inference, attaching runtime information. Physical layer is just a lightweight abstraction, or step functions, where the actual jobs execute.

‍

Abhishek Srinivasa Raju Padmavathi - 00:06:35

The overall theme of all the Nexus modules is it is very extensible. Let's say we have to add a new task type. It uses federated model where you just add a new implementation and everything works out of the box. Next we'll go Nexus ETL. Nexus ETL model deals with, it represents pretty much the compute or the ETL data processing aspects of Nexus platform. We can treat it as a library that is able to go from a job abstraction, which is job defined as configuration to an executable spark job, jobs. In the slide, I'm showing an example of a job, and job is defined as a list of operators, which should be very intuitive from what we see here. The operators can be either built in Spark transforms or we can use custom user defined functions. Currently this Nexus ETL we use spark primarily, but the concept should apply to other data processing platforms as well. Next, we'll go to Nexus Data Lake. This is the storage layer. Most of our tables are pretty much Hudi tables. This is where user defined config on the Hudi on the configuration layer kind of drives even the catalog management, including table creation, schema inference and schema evolution. Sometimes, we even provide configuration which gives hints on what distribution keys to use. All these configuration go hand in hand with the Nexus flow config Nexus ETL config, all of them together kind of work together.

‍

Abhishek Srinivasa Raju Padmavathi - 00:08:23

This is all access flow. There are user plans to add control, play and APIs to query the data. That's on Data Lake. Next, learning who using Data Lake.

‍

Jason Liu - 00:08:46

Okay, cool. Thanks Abha for going through of Data Lake. What is Nexus? A lot of people want to know, since it's in the title. We build Nexus data store specifically for those of you who is similar to Iceberg, like we talk about, but it has a lot more functionalities that we are utilizing. However, they also come along with some learnings that we're dealing with here. The first one is concurrency. So originally when we designing our table structure, we thought, oh, a couple jobs writing to the same Hudi table. This way when there's a schema update happens in one of the jobs. For example, we want to move one column from one to another, because we have a monolithic Hudi table, the transition just happens seamlessly, which is true.

‍

Jason Liu - 00:09:57

However, when we're actually running this in a shadow environment, we discovered because hudi uses this thing called optimistic concurrency, basically assuming for multi writer scenarios, it will optimistically determine the insertion will go through. If there are two jobs writing to the same file it, our actual shadow, the we sailors because there are a lot of two writers writing to the same file in this case. So we pivoted to this new table structure, and then using the orchestration layer to determine when each of these tables have been completely asserted, will run this joint job, that kind of the table output incremental query, and that would produce this combined hudi table. Another part of learning that we got was how Hudi manages metadata table.

‍

Jason Liu - 00:11:09

It is enabled via this Hudi metadata enable config. So what metadata table is is basically a smaller version of the table that sits inside your Hudi table that kind of keeps track of information like in indexing, just like metadata information. Originally when we started off running the jobs we synchronous cleaning, we realized as we a bit of time, so we thought just asynchronously. So we have another workflow that running that keeps cleaning off the table. Later on we discovered despite the data being cleaned, the metadata table were not being cleaned, which we filed a GitHub issue for the Hudi. People were very keen on responding. We got to know is physically fixed in later version where we're just using the older version basically to ensure the metadata table gets properly cleaned. We switched back our Hudi use synchronous screening process. We also discovered a bug on our end. The why the file listing while cleaning took a while just because file listing. Another section of learning we got was related to Hudi costs. Most of our tables are copy on write tables.

‍

Jason Liu - 00:12:33

All them our update pattern for is part the rest 30 to 60 partitions, and then we help kind of evenly spread across to 90 partitions. So with that we have in call table, we discover about 70% of the costs, requests, and then pull requests and get requests combined for about 80% of the costs. So even though we, due to customer requirement, we don't really delete our data, we found most of our costs related to Hudi are basically S3 requests rather than Esri source. We do see that that eventually changes as we store more data. So we looked at a couple saving strategy, one of them being S3 intelligence here. It's like a feature from AWS. The backend automatically figures out if your data is frequently accessed or infrequently accessed or very infrequently accessed. We also try the more aggressive table cleaning, which is a feature that hudi has that we mentioned before. It kind of cleans your data for you. And then we also try EMR, all the scaling optimize our compute. As part of this migration project, we migrated so many into probably about 300. We do a lot of them basically do the same thing. We maintain about 1200 tables increasing based on business needs and all of the, each of the table gets updated about five to 15 times a day. The total data size for our data lake is about four petabytes. With about one petabyte gets added and deleted every month, cleaned every month. Our daily, for our daily data size, every day we ingest

‍

Jason Liu - 00:14:43

Hundred. There are days where they either answer and as part of using Apache Hudi, rather than in-house build data lake, we save about one year of the developer days because the hudi Hudi is really just something we can just pick it up and start using. That's probably about it for the talk today. We have a full version of the talk that we did for Hudi Community Sync, that you guys can scan the QR code and the content of the talk is also converted to a blog thanks to the wonderful one Health people, which is all also in the QR code.

‍

Demetrios - 00:15:34

You guys were awesome. I gotta keep things moving. For anyone that has questions, drop 'em in the chat and I am assuming you guys are gonna be hanging around there and you can answer some questions that come through. Thank you fellas.