An Intro to Trajectory Data for AI Agents

calendar icon
May 21, 2025
Speaker
Shelby Heinecke
Senior AI Research Manager
Salesforce

AI agents need more than just language - they need to act. This talk introduces trajectory data - an emerging class of data used to training LLM agents. These are sequences of observations, actions, and outcomes that drive agent learning. Whether you're training agents or building the data pipelines behind them, this is your guide to the data powering the next generation of AI agents.

Transcript

AI-generated, accuracy is not 100% guaranteed.

Demetrios - 00:00:06  

Hey, there you are. How's it going?  

Shelby Heinecke - 00:00:09  

Good, good. Happy to be here.  

Demetrios - 00:00:11  

Yes. I'm very excited for your talk. I know that the last time that we had a little virtual encounter was at one of the ML ops community events. Was awesome. So, the floor is yours, and I have one job, which is to keep us on time, and I will see you in 10 minutes.  

Shelby Heinecke - 00:00:33  

Thank you so much. Yeah. Super excited to be here. So we're all here today for a similar reason. We know that data is the key for all of our enterprises, but more importantly, data is the key for AI, right? There's not going to be any good AI without strong, strong data. And so I'm really excited to talk to you about an emerging type of data that you're going to be seeing more and more today. It's called trajectory data. So not sure if you've heard of this, but this trajectory data is something my team at Salesforce spends a ton of time on. We spend so much time synthetically generating this specific type of data so that we can build really strong, really capable, really fast agents. So, yeah, let's just dive in. So before I dive in, just a little bit of background about myself as I just mentioned, I lead an AI research team at Salesforce.  

Shelby Heinecke - 00:01:21  

We build AI, the next generation of AI for our products, ultimately for our customers, right? And some of the key areas my team focuses on are AI agents, which is going to be the focus of today, but also similar topics, small language models that we'll need for agents, on-device AI, and maybe there's a future for on-device agents. And we publish a lot of our research. We open source a lot of our models and data sets. So everything I'm going to talk about today, it's going to be completely free and open source for you to get started with to try to experience. So I'll give you links at the end for you to be able to see what trajectory data really looks like and how you can even use it to train and uplevel your own agents. And always happy to connect. Feel free to scan the QR code to chat offline about these things.  

Shelby Heinecke - 00:02:09  

So, yeah, what is trajectory data? Let's go to something that we all know, we all know about LLMs at this point, right? We all know we can give an LLM a great prompt like this, draft an email, we send it to the LLM, it's going to generate a great email. We know LLMs at this point, but let's think about what we need for agents. Agents are more than just generating texts. Agents are able to take action. So if you look at this prompt here, which is saying, what meetings are on my calendar this afternoon, a plain LLM, just an LLM is not going to be able to generate the right answer for this alone, right? An LLM is trained on historical old data, essentially. There's no way an LLM will know what's going to be on your calendar this afternoon unless we empower that LLM to be able to take action.  

Shelby Heinecke - 00:02:59  

Maybe that LLM is able to call your calendar app and extract information from the calendar app or call any other apps. So the ability to take action is absolutely key for an AI agent, right? So we need our agents to be able to take action using external tools, apps, databases, and so on, so that it can actually answer prompts like this. So what exactly do we mean by an action? So it's still going to be the LLM generating this action. If we go back to that slide, the LLM is still generating an action. So, but what does an action look like? Well, it's still text, it's still text, it's text that specifies one, the correct tool or app to use, okay? And then two, that's not enough, right? Just knowing what app is relevant here is not enough to actually take action.  

Shelby Heinecke - 00:03:49  

We also need to know what functions or how to use that tool or app. So going in our running example, if we look at this prompt again, what meetings are on my calendar this afternoon, the correct action to make it executable would be to specify the calendar app and also specify which functions to use in that calendar app. As you can imagine, lots of things to do in a calendar app. We could add events, remove events. The key here is that given that prompt, the right function would be selected. So another key here is that that action is completely executable. So how do we get our models to be able to take action to generate actions? Well, a lot of LLMs, a lot of great foundation models are already pretty good at this, but for specific use cases, for more complex use cases, we need further training of our models.  

Shelby Heinecke - 00:04:45  

And the key here is that models learn to act by seeing actions. So this is what training data is all about, right? And this is exactly what trajectory data is. Trajectory data is step-by-step action demonstrations. So let's take a look at this quick example here. Imagine we have a task, maybe we have an agent that's able to help us on our phone, for example. And a sample task would be, what is the phone number and email of my friend Astro? Maybe that's something we would need the agent to help us with. The correct actions to take, or that action trajectory would be to navigate contacts and get the phone number of Astro. So that would be, as you can see here, this is a very simple API type call.  

Shelby Heinecke - 00:05:33  

The second step would be to navigate contacts and get the email of Astro. So this right here, what you see here, this task and action trajectory is one data point only. One data point. So imagine if we can generate hundreds or thousands of the data points, we can train our agents to be amazing at taking actions in the right situations. So here's some things to think about when we're developing trajectory training data, we want to have simple trajectories all the way to complex trajectories, right? That's going to really empower the abilities of our agents. So one thing we talk about is single step trajectories. Single step trajectories are just tasks that require only a single step to complete. You'll definitely want some of those in the training data. Now, as we all get comfortable with AI agents, we're using them more often.  

Shelby Heinecke - 00:06:27  

We're expecting more of them. We want our agents to be able to help us with more and more complex tasks. In that case, we'll want to have multiple steps or multiple turn trajectory data. So what I mean by that are tasks that are going to require multiple tool calls or multiple conversation calls, right? Agents are often going to need to talk to the user to get more information. So we'll want trajectory data that includes that. And finally, we're going to want domain-specific trajectory data. So imagine agents that are functioning in marketing tools or agents that are functioning on the web, or agents that are functioning among different databases, depending on where your agent is functioning, you're going to want to have trajectory data that represents that domain, that represents that use case. So this all sounds great, but how do we get this type of trajectory data?  

Shelby Heinecke - 00:07:22  

So I want to give you a couple ideas to help you get started today. The first is, and the main thing my team works on is synthetically generating this trajectory data. Again, remember when we talk about that trajectory data, it is task and correct actions to take to execute that task. So the correct API call, we synthetically generate that. And so what I have here today, what I've listed here are two papers that we recently released called API Gen and API Gen Multiterm. Check out these two papers. We've outlined our entire pipeline on how we've synthetically generated these types of trajectories. So we'll use LLMs to help us generate, but then we'll have systems for actually checking, is that trajectory correct? Is the quality good? Is it diverse enough? Is it domain specific? We've made progress in all those areas.  

Shelby Heinecke - 00:08:14  

Links will be at the end of this presentation too. Now, besides synthetically generating trajectory data, I want to share that there's a lot of trajectory data already open source as well. So we've released a couple of open source data sets on Hugging Face, and you can take a look at that at the end. And of course, human annotation. This is always a great option, right? If you can have a team of human annotators who can help you create those trajectories who maybe can actually use the systems. Maybe you have a team that is actually using your system. You could actually execute tasks right within your system. You can track the API calls they're making, the clicks, whatever it takes to get that task done. That's also a very valid way to get trajectory data.  

Shelby Heinecke - 00:08:57  

Now, I want to share, wrapping this up, I talked to you about what this data does and the power of it. But I want to show you, if you generate the synthetic data, how your models can become stronger. So we've done just that at Salesforce. We've synthetically generated lots of trajectories, we've trained a model and we call it XA. So this is completely open source everyone, it's completely available on Hugging Face. I'll share a link at the end. These are what we call our large action models. So what we did is we've taken several free trained base models, open source, we've synthetically generated thousands of trajectories, as I mentioned. We fine-tune those models and as a result, we have a set of five models ranging from 1 billion parameters all the way up to 7 billion parameters.  

Shelby Heinecke - 00:09:47  

That is most importantly ranking number one on function calling leaderboards. So our models, even being a fraction of the size, are beating GPT-4, Claude, and other industry leaders. And the key was that we synthetically generated these trajectories. So our models are just amazing at taking action. So again, all open source. And if you look at the Berkeley function calling leaderboard, this is one of the main leaderboards for function calling for agents. You can see that our models, sometimes being a fraction of the size, rank towards the top of that leaderboard. I'll specifically call out number four here. Our 8 billion parameter model is ranking above even GPT-4. And again, most of that was because of our training data. Data is the key to getting any of these AI models to perform.  

Shelby Heinecke - 00:10:34  

You can see here our 3 billion parameter model, really small model ranking 13, and even the 1 billion parameter model is still beating a lot of the industry leaders. Awesome. So hopefully that gave you a preview of what trajectory data is. Keep an eye out for it. You'll see more and more about trajectory data. If you're interested in checking out our completely open source models and that data that we open source, that trajectory data, you can scan the QR code here. Also, we started a YouTube series about this. So check out our YouTube series. Thank you all so much.