An Intro to Trajectory Data for AI Agents

May 21, 2025

Speaker

Shelby Heinecke

Senior AI Research Manager

AI agents need more than just language - they need to act. This talk introduces trajectory data - an emerging class of data used to training LLM agents. These are sequences of observations, actions, and outcomes that drive agent learning. Whether you're training agents or building the data pipelines behind them, this is your guide to the data powering the next generation of AI agents.

Transcript

AI-generated, accuracy is not 100% guaranteed.

Speaker 0 00:00:00
<silence>

Speaker 1 00:00:06
Hey, there you are. How's it going?

Speaker 2 00:00:09
Good, good. Happy to be here.

Speaker 1 00:00:11
Yes. I'm very excited for your talk. Uh oh. Yeah. I know that the last time that we had a little virtual encounter was at one of the ML ops community event. Yeah. Was awesome. So, <laugh>, the floor is yours, and I have one job, which is to keep us on time, and I will see you in 10 minutes.

Speaker 2 00:00:33
Thank you so much. Yeah. Super excited to be here. So we're all here today for, you know, a similar reason. We know that data is the key for all of our enterprises, but more importantly, data is the key for ai, right? There's not gonna be any good AI without strong, strong data. And so I'm really excited to talk to you about an emerging type of data that you're gonna be seeing more and more today. It's called trajectory data. So not sure if you've heard of this, but this trajectory data is something my team at Salesforce spends a ton of time on. We spend so much time synthetically generating this specific type of data so that we can build really strong, really capable of really fast agents. So, yeah, let's just dive in. So before I dive in, just a little bit of background about myself as I'm, as I just mentioned, I lead an AI research team at Salesforce.

Speaker 2 00:01:21
We build ai, the next generation of AI for our products, ultimately for our customers, right? And some of the key areas, my team focuses on our AI agents, which is gonna be the focus of today, but also similar topics, small language models that we'll need for agents on device ai, and maybe there's a future for on device agents. And we publish a lot of our research. We open source a lot of our models and data sets. So everything I'm gonna talk about today, it's gonna be completely free and open source for you to get started with to try to experience. So you'll, so I'll give you links at the end for you to be able to see what trajectory data really looks like and how you can even use it to tr to train and uplevel your own agents. And always happy to connect. Feel free to scan the QR code to chat offline about these things.

Speaker 2 00:02:09
So, yeah, what is trajectory data? Let's go to something that we all know, we all know about LLMs at this point, right? We all know we can give an LLMA great prompt like this, draft an email, we send it to the LLM, it's gonna generate a great email. We know LLMs at this point, but let's think about what we need for agents. Agents are more than just generating texts. Agents are able to take action. So if you look at this prompt here, which is saying, what meetings are on my calendar this afternoon, a plain LLM, just an LLM is not going to be able to generate the right answer for this, for this alone, right? An LLM is trained on historical old data, essentially. There's no way an LLM will know what's gonna be on your calendar this afternoon unless we empower that LLM to be able to take action.

Speaker 2 00:02:59
Maybe that LLM is able to call your calendar app and extract information from the calendar app or call any other apps. So the ability to, to take action is absolutely key for an AI agent, right? So we need our agents to be able to take action to using external tools, apps, databases, and so on, so that it can actually, you could actually answer prompts like this. So what exactly what do we mean by an action? So it's still going to be the LLM generating this action. If we go back to that slide, the LLM is still generating an action. So, but what does an action look like? Well, it's, it's still text, it's still text, it's it's text that specifies one, the correct tool or app to use, okay? And then two, that's not enough, right? Just, just knowing what app is relevant here is not enough to actually take action.

Speaker 2 00:03:49
We also need to know what functions or which, um, or how to use that tool or app. So going in our running example, if we look at this prompt, again, what meetings are on my calendar this afternoon, the, the correct action to make it executable would be to select, would be to specify the calendar app and also specify which functions to use in that calendar app. As you can imagine, lots of things to do in a calendar app. We could add events, remove events. The key here is that given that prompt, the right function would be ch would be selected. So another key here is that, that that action is completely executable. So how do we get our models to be able to take action to generate actions? Well, well, a lot of LLMs, you know, a lot of great foundation models are already pretty good at this, but for, for specific use cases, for more complex use cases, we need further training of our models.

Speaker 2 00:04:45
And the key here is that models learn to act by seeing actions. So they're gonna, this is what training data is all about, right? And this is exactly what trajectory data is. Trajectory data is step by step action demonstrations. So let's take a look at this, this quick example here. Imagine we have a task, maybe, maybe we have an agent that's able to help us with maybe this agent's able to help us on our phone, for example. And a sample task would be, what is the phone number and email of my friend astro? Maybe that's something we would need the agent to help us with the correct actions to take, or that, that action trajectory would be to navigate contacts and get the phone number of astro. So that would be, so as you can see here, this is a very simple API type call.

Speaker 2 00:05:33
The second step would be to navigate contacts and get the email of Astro. So this right here, what you see here, this task, an action trajectory is one data point only. One data point. So imagine if we can generate hundreds or thousands of the data points, we can train our agents to be amazing at taking actions in the right situations. So here's some things to think about when you're, when we're developing trajectory training data, we want to have simple trajectories all the way to complex trajectories, right? That's gonna really empower the abilities of our agents. So one thing we talk about is single step trajectories. Single step trajectories are just, uh, are just tasks that require only a single step, um, to complete. You'll definitely want a lot of want some of those in the training data. Now, we also, as we, you know, as we all get comfortable with AI agents, we're using them more often.

Speaker 2 00:06:27
We're expecting more of them. We want our agents to be able to help us with more and more complex tasks. In that case, we'll want to have multiple steps or multiple turn trajectory data. So what I mean by that are tasks that are gonna require multiple tool calls, mult or multiple conversation calls, right? Agents are not just going to, are often gonna need to talk to the user to get more information. So we'll want trajectory data that includes that. And finally, we're gonna want domain specific trajectory data. So imagine agents that are functioning in marketing tools or agents that's functioning on the web, or agents that are functioning, uh, among different databases, depending on where your agent is functioning, you're gonna wanna have trajectory data that represents that domain, that represents that use case. So this all sounds great, but how do we get this type of trajectory data?

Speaker 2 00:07:22
So I wanna give you a couple ideas to get to help you get started today. The first is, and the, and the, the main thing my team works on is synthetically generating this trajectory data. Again, remember again, when we talk about that trajectory data, it is task and correct actions to take to execute that task. So the correct API cycle, we synthetically generate that. And so what I have here today, what I've listed here are two papers that we recently released called API Gen and API, gen Multiterm, mt. Check out these two papers. We've outlined our entire pipeline on how we've synthetically generated, uh, these types of trajectories. So we'll use LLMs to help us generate, but then we'll have systems for actually checking, is that trajectory correct? Is the quality good? Is it diverse enough? Is it domain specific? We've got, we've, uh, we've made progress in all those areas.

Speaker 2 00:08:14
Links will be at the end of this presentation too. Now, so besides synthetically generating trajectory data, I wanna share that you can, there's a lot of tr data already open source as well. So we've released a couple of open source data sets on hugging face, and you can take a look at that at the end. And of course, human annotation. This is always a great option, right? If you can have a team of human annotators who can help you create those trajectories who maybe can actually use the systems. Maybe you have, maybe you have a team that is actually using your system. Uh, you could actually executing tasks right within your system. You can track the API calls, they're making the clicks, whatever it takes to get that task done. That, that's also very valid way to get trajectory data.

Speaker 2 00:08:57
Now, I want to share, uh, uh, you know, wrapping this up, I talked to you about what this data does in the, and the power of it. But I wanna sh I wanna show you, if you generate the synthetic data, how your models can become stronger. So we've done just that at Salesforce. We've, we've synthetically generated loss of trajectories, we've trained a model and we call it XA. So this is completely open source everyone, it's completely available on hugging face. I'll share a link at the end. These are what we call our large action models. So what we did is we've taken a, we've taken several free trained base models, open source, we've synthetically generated thousands of trajectories, as I mentioned. We fine tune those models and as a result, we have a set of five models ranging from one B 1 billion parameters, all the way up to 7 billion parameters.

Speaker 2 00:09:47
That is most importantly ranking number one on function calling leaderboards. So our models even being a fraction of the size are beating GPT-4, oh Claude and other industry leaders. And the key was that we synthetically generated these trajectories. So our models are just amazing at taking action. So again, all open source. And, uh, if you look at the bunk Berkeley function calling leaderboard, this is the one of the main leaderboards for school calling for agentic, for for agents. You can see that our models being a fract sometimes being a fraction of the size rank towards the top of that leaderboard. I'll specifically call out number four here. Our eight B model, 8 billion parameters you can see is ranking above even GPT-4 oh. And again, it all, you know, most of that was because of our training data. Data is the key to getting any of these AI models to perform. Uh, yeah. So I'll quickly, you can see here the our three B model, really small model ranking 13, and even the one B model, if you look at that's only 1 billion parameters, it's still beating a lot of the industry leaders. Awesome. So hopefully that gave you a preview, uh, of what trajectory data is. Keep an eye out for it. You'll see more and more about trajectory data. If you're interested in checking out our completely open source models and that data that we open source that trajectory data, you can scan the QR code here. Also, we started a YouTube series about this. So check out our YouTube series. Thank you all so much.