Cross paradigm compute engine for AI/ML data

calendar icon
May 21, 2025
Speaker
Nikhil Simha
CTO
Zipline

AI/ML systems require realtime information derived from many data sources. This context is needed to create prompts and features. Most successful AI models require rich context from a vast number of sources.

To power this, engineers need to manually split their logic and place it in various data processing “paradigms” - stream processing, batch processing, embedding generation and inference services.

Today practitioners need to spend tremendous effort to stitch together disparate technologies to power for *each* piece of context.

While at Airbnb, we created a system to automate the data and systems engineering required to power AI models both for training / fine-tuning and for online inference.

It is deployed in critical ML pathways and actively developed by Stripe, Uber, OpenAI and Roku (in addition to Airbnb).

In this talk I will go over use cases, the Chronon project overview, and future directions.

Transcript

AI-generated, accuracy is not 100% guaranteed.

Adam - 00:00:06  

The next talk for us is Nikhil. Nikhil, are you around? Can you hear us?  

Nikhil Simha - 00:00:13  

Hey Adam, I can hear you. Can you guys hear?  

Adam - 00:00:15  

How you doing? And where are you dialing in from?  

Nikhil Simha - 00:00:19  

I'm dialing in from East Bay, Fremont.  

Adam - 00:00:22  

From Fremont. Nice. We'll be there in just a few days for another big conference coming up. So Nikhil, I'm gonna clear up the stage and I'm gonna step down. I'll be back in 10 minutes. Nikhil floor is yours.  

Nikhil Simha - 00:00:37  

Hello everyone. Thanks for taking the time today. I'm going to talk to you about K Chronon. K Chronon is a data platform we built for ML and AI use cases while at AirBnB, also jointly with Stripe and we open sourced it last year.  

So a little bit about me. I'm Nikhil Simha. I was working on ML infrastructure stuff at AirBnB for a long time. Before that, I was doing stream processing, data stuff at Facebook. We built our own stream processing engines while there. And before that I was doing ML infrastructure again, back at Amazon and Walmart labs.  

Currently we have founded this company called Zipline AI to help people use Chronon.  

So what is Chronon? Chronon serves two main purposes. It turns raw data into training data and it also helps serve features. Because there is a lot of commonality in how people compute metrics, it also gets used accidentally for generating offline and online metrics.  

We have early adoption from a few companies. Contributors, evaluators, and adopters from a bunch of companies. It's been only a year now, so it's really early adoption from these companies. We work with a few of them to get their deployments into production.  

What do people use Chronon for? The main one is predictive machine learning use cases: search indexing and ranking, ads ranking, feed personalization, fraud and abuse prevention—both monetary and like hate speech and that sort of stuff. Also to personalize marketing material and to do pricing. There's a wide variety of use cases.  

This is what it's traditionally been used for, but in the last two, three years, we have seen a growing number of use cases that are more using LLMs and generative AI. Customer support, which used to be traditionally predictive ML, but now more generative AI oriented.  

It's also used for creating virtual assistants both for shopping, travel, etc. It's used for rule engines. Sometimes you don't have enough time to retrain a model when new fraud patterns occur. In that case, you want people to come up with heuristics for a day or two. These heuristics run and filter traffic out, and that's where rule engines come into play. You give your heuristic to this rule engine and it filters traffic out.  

It's also used in user-facing metrics, like I mentioned. High traffic landing pages, essentially like listing ratings or item ratings. You can imagine the amount of traffic these pages see is quite high and you want these ratings to be updated in real time, and just regular offline business metrics like customer 360, listings 360.  

Why do people use Chronon? The crux of the Chronon engine is this thing called incrementalization. To give you a vague intuition of what incrementalization is like, let's say you want to compute the average rating of a listing in the last 90 days. You're going to compute it today and then again tomorrow. You have about 89 days worth of overlap. So if you can effectively incrementalize this computation, you're going to save about 45x of computation or 90x in extreme cases.  

That's the crux of it. That makes the Chronon engine scale and generate training data and serve features at large scale. It's unified, meaning you write the feature definition once and it creates both training data from that definition and creates an online serving endpoint.  

It makes sure that the data that's being served is fresh. If you have event streams, it incorporates data from the event stream into the endpoints. It's very pluggable. You can plug in your event streams, either Kafka, Pub/Sub, or you can plug in your warehouse, which could be on Iceberg, traditional Hive, Hoodie. It could be on Google BigQuery and we'll be able to pull data from that, transform that into training data and features for serving both.  

Under the hood, we are built on a bunch of technologies, some open source, some not. We connect to these things and pull data from them. Chronon basically implements its incrementalization algorithms over two engines: one for batch processing, which is Apache Spark, another for stream processing, which is Flink. It stores online data and indexes them into... right now it's an open connector so anyone can implement any connector to any database, but we have implemented connectors for these three.  

How is it related to Zipline? Zipline is just like a bring your own cloud offering of the Chronon platform. We have things that make it production grade like built-in ML observability and governance. I'm going to talk a bit more about these today and other things like experiment isolation.  

In ML you want to have a very fast experimentation loop when you come up with new features and you want to isolate these experiments without affecting production. The other thing is compute sharing. A lot of the times these experiments do need to share compute. We want isolation but also want to share as much as we can. I'll talk a bit more about this later as well.  

We are integrated with some of the ML platforms that exist out there like Vertex AI and SageMaker, and we do native embedding generation. That's the delta between Chronon and Zipline.  

About governance. The first thing is compliance. The main questions that get asked here are: which models depend on a given input column? Another aspect of governance is lifecycle management: which models depend on a given column that I'm about to deprecate or change?  

These questions are similar and what is needed to answer them is a global column lineage graph. This basically says this column's data flows across all of these pipelines and ends up in a model which goes from the source of the data all the way into the endpoint that is serving the model, regardless of how many stages there are in between.  

One thing that makes Chronon really effective at doing this is that it's declarative. There is no black box Python code or Java code allowed. It's more data frame-like in every stage. Because it's data frame-like, we can extract the lineage statically, so users don't have to tell us the lineage. We can pull it out from the feature definitions that you already have written.  

This is roughly what gets pulled out. We have tables and columns in between and we know how columns are connected and what filters get applied, etc. We can create this graph globally across every source of data and across every feature and every model.  

The other one is observability. ML observability and AI observability is pretty tricky. It's a large area, much larger than the core of building features and prompts themselves. There are many metrics here. One is prediction drift, which measures how predictions have drifted relative to the past. Similarly, there's label drift and feature drift.  

The point of observing these things is that we know if there is something wrong going on with the model. Then there is feature importance, which determines the influence of a given feature on a decision. If you want to analyze why a model predicted a certain way, this is very useful. So when someone calls and says, "Hey, I'm banned, why is that?" people can look at the model and say, "Okay, this is probably why."  

Another one is model DK as real world preferences change. Labels and predictions no longer match. This is what we call model DK. Then there is consistency, which means the difference between training data and serving data. We need to measure all of this to know if an ML system is healthy or not.  

The difficulty here is that there are thousands of features per model and each feature has a lot of data quality metrics around just one feature. Multiply that and that's a lot of data quality metrics. This is usually harder than building the feature pipelines themselves, so most often these things don't get built.  

Another advantage Chronon has here is that it's declarative, which means we can derive what needs to be measured and auto compute these metrics and automate the observability of all these pipelines.  

This is roughly what it looks like. You just write your feature definitions and we generate these drift metrics and all the other metrics I showed you before.  

Another cool advantage is that we can put lineage and drift side by side and say, "Hey, this feature is drifting, so let's see what other pipeline is producing this," then go upstream to see where the drift is originating from and talk to the owner of that data.  

The other area I'll quickly cover is experimentation. We have safety and isolation as a requirement. We don't want to impact any production data and we want to use all existing data.  

This means we should read but cannot override production. Let's say this is a production set of pipelines and I want to make a change to one of the nodes. I want to make a change and make a copy of everything downstream and keep the production ones untouched, and at the same time reuse dependencies from production.  

This is what needs to happen and this is a very complex orchestration problem. This is something the Zipline engine solves for.  

To recap, Chronon is good for generating ML systems, especially generating training data and feature serving that applications can consume and metrics. Zipline on top of that adds observability, governance, and experimentation.  

Things I didn't talk about are embeddings and model integration. If you want to learn more, please join the Slack channel. You can visit Chronon.ai, join the Slack channel, and start using the open source project. If you're interested in a managed offering, you can reach out to us at hello@zipline.ai.  

Thank you for your time.  

Adam - 00:12:16  

Awesome Nikhil, that was excellent. It felt like it went by really quickly. That reframing slide was sick. I feel like we could just spend an hour zooming into how you guys are actually pulling this off. It does not sound trivial.