Introduction

Hello, and welcome back to the Cognitive Revolution!

Today my guest is Emily Sands, Head of Data and AI at Stripe, the programmable financial infrastructure company that, in 2024, processed $1.4 trillion dollars in payments – or roughly 1.3% of global GDP – for everyone from solo entrepreneurs to the Fortune 100, and which continues to grow at a blistering pace.

We begin by discussing the many fascinating details of Stripe's new foundation model for payments and how Stripe is using this model to deliver improved performance across their broad suite of products.

While it might seem unassuming at first glance, I would argue that the payments foundation model has several important lessons to teach us.

First, while payments are represented in text, the Payments Foundation Model is not a language model in the familiar sense – on the contrary, payments are treated as a distinct modality, and importantly – no payment is an island. To properly understand a single payment requires Stripe to assemble extensive context, including recent activity associated with multiple entities, including the buyer, the card, the device used to make the purchase, and the merchant.

So much context quickly becomes overwhelming to humans, but is exactly where neural networks can shine – and indeed, when Stripe first deployed this model to detect card testing – which is a process fraudsters use to determine which stolen credit cards actually work – they saw a jump from in their detection rate from 59% to 97% – obviously a massive win not just for Stripe but for the entire ecommerce ecosystem that collectively bears the cost of fraud.

If you've listened to this show for a while, you know that one of my pet theories is that the surest path to superintelligence is to integrate today's reasoning models with models trained on other modalities that humans aren't well-adapted to understand.

It's safe to say that the Payments Foundation Model is superhuman when it comes to understanding payments, and this conversation left me wondering how many other businesses are training foundation models on their own modalities, and how many interesting modalities might currently be hiding in plain text. I can imagine this proprietary modality strategy working for any number of domains, including health, cybersecurity, logistics, energy, and insurance – and to be honest I haven't found too many other examples of this strategy being used today. If you happen to know of other foundation models being trained on interesting proprietary modalities, please do ping me and let me know, as I'd love to do more episodes exploring this theme.

The next lesson: perhaps as important to Stripe's success as the model itself is the way they are using it. Rather than trying to design the foundation model to support all use cases directly, they are exposing payment foundation model representations and thus allowing engineers to use them as additional input to the many classification and other ML systems they've already developed. The richness of the foundation model signal makes everything else work better, but doesn't require a major re-thinking of existing systems. Again, outside of social network companies, who I do believe make their user and content representations available this way, I have not heard of other companies taking this approach, and it seems to me that more should consider it.

Finally, the most important lesson from a societal standpoint might be that AI strongly favors the incumbent platforms that have proprietary data at the scale required to train such differentiated models. The flywheel that Stripe has created here, which translates their incredible scale to commercial advantage, is allowing them to reduce the cost of fraud for their customers even as fraud is rising across the broader ecosystem – this makes them the obvious choice going forward, which further strengthens their data advantage and product lead. It is genuinely hard to imagine how anyone – aside from a few of the world's largest tech companies – could ever compete with them, meaning that even as history unfolds at a dizzying pace in many respects, competition in many markets may effectively come to an end. This isn't necessarily a problem, assuming companies like Stripe continue to do a great job – I've never supported punishing companies for their excellence and have never been convinced we should break up American tech companies – but it does seem like something that policy makers will need to think hard about as they envision the AI future and hopefully begin to imagine a new social contract.

There's a lot more in this episode beyond these strategic insights as well – including

how Stripe is designing processes to iterate quickly enough to stay ahead of fraudsters, including by using LLM as judge to fill in missing data,
how they ensure reliability in their LLM-powered talk to your data product experiences,
how developers can accelerate product development by treating Stripe as their payments database of record,
what Emily & team are seeing in agentic commerce today,
and how they think about scoping their AI ambitions and investments.

All in all, as you might expect from Stripe, it's a high-alpha episode, with practical lessons for rank & file AI engineers, and big picture implications for executive level AI strategists, so without further ado, I hope you enjoy this deep dive into how smart use of AI is transforming one of the world's most critical financial infrastructure companies, with Emily Sands, Head of Data and AI at Stripe.

Main Episode

speaker_0: Emily Sands, Head of Data and AI at Stripe, welcome to The Cognitive Revolution.

speaker_1: Thanks for having me.

speaker_0: I'm excited for this conversation. Stripe is a globally recognized leader in payments and doing interesting things in AI with high standards, and shares DNA with some of the big frontier AI developers. So there's a lot to get into today. For those who want a deeper dive into Stripe and the payments ecosystem, you did a podcast about six months ago with our sister podcast, Complex Systems, with Patio Eleven.

speaker_1: Patio Eleven.

speaker_0: And I would definitely recommend that for those who want a deeper primer on the payments world, which is a fascinating and byzantine one, with many rabbit holes to explore. We won't do nearly as much of that today. We'll stay more focused on the cool new AI developments you're working on. But for a quick primer, how would you describe Stripe's role in the economy? Then we'll use that as a jumping-off point for the AI topics.

speaker_1: You said payments infrastructure. We started as payments infrastructure, which is absolutely true. We now build broader programmable financial infrastructure. In plain terms, we give any business — it could be a teenager selling a Figma template or any one of now more than half of the Fortune 100 companies that run on Stripe — the rails and intelligence to move money online and to grow faster. Last year, companies processed $1.4 trillion through Stripe. We'll talk about AI today, and every one of those charges becomes training data for the AI systems we'll discuss. That flywheel also means that we're no longer just the payments API; we optimize the entire payments lifecycle. Yes, the gory details are covered with Patio Eleven, but it includes checkout UX, fraud prevention, bank routing, retries, and even things like dispute paperwork so that businesses can truly keep more of every hard-won dollar and scale up with very small teams. We think of the tools we're building as structural growth tailwinds, and we're already seeing it in the data. Businesses on Stripe grew 7X faster than the S&P 500 last year.

speaker_0: Wow, okay, a lot of good nuggets there. I've been a customer actually, for what it's worth, since maybe not the earliest days, but pretty early days, at least 10 years that I've been a Stripe customer with my company Waymark. So we've seen a lot of the evolution from the customer side. The biggest thing that has caught my attention in terms of what Stripe is doing with AI is the payments foundation model. I'd love to spend a good chunk of time really going into detail on that because one of the things that I have been fascinated with and trying to understand better is to what degree we will get a form of superintelligence via AIs that become natively capable of understanding a huge range of different modalities. People are familiar now with image generation; we had text-to-image models, and now those have come together in a tightly coupled, deeply integrated way with Nano Banana and other recent innovations in that space. I have this theory that one thing people in general really underappreciate is the degree to which training on these other modalities of data is just going to create superhuman capability in these domains that are familiar to us, but also in many ways very alien. So maybe for starters, what can you tell us about the fundamentals of the payments foundation model? What does the data look like? Obviously, there's transaction data, but give us more detail on that. What is transaction data when you really get into the weeds of it?

speaker_1: Yes, and it's a good point. There's been a lot of coverage of large-scale traditional LLMs and less coverage of domain-specific foundation models, of which the payments foundation model is one. For us, it has been a step-function change in the speed and quality with which we can deliver all those optimization solutions I talked about in authorization, fraud, and disputes. At its core, it's a transformer model that turns every payment — the tens of billions of transactions that run through Stripe — into a compact vector. It's like giving each transaction its own latitude and longitude. Once you have that map, you can use it for all sorts of downstream tasks, like figuring out what's fraud, how to authenticate, or what's a valid versus invalid dispute, without having to train a new model from scratch every time. What makes it work, the reason you can build a domain-specific foundation model in the payments context, is Stripe's scale. We process about 50,000 new transactions every minute, and at that density, payments start to look in many ways, not all, like language. So there's a syntax to a payment: the card bins, merchant codes, and amounts. And then there's an analogue to semantics, like how a device or card gets reused over time. In the same way that language transformers learn embeddings and words with similar meanings clustered together, the premise of the payments foundation model is, what if every charge or sequence of charges had its own vector in a similar space? So, the inputs are, you're right, just the raw payment signals as they come in: the card details, merchant categories, IPs, but also those sequences. What a given card or device or merchant bin or customer has been doing in the last few minutes or the last K transactions. That history actually turns out to be a huge unlock. And then from those inputs...

speaker_1: The model produces an output, which is just a reusable embedding. It's a dense vector for each payment or short sequence, and then we can layer lightweight classifiers on top for real-time detection. We also have a slower, higher-latency variant that generates explanations through a text decoder, and I think we'll get to a stage where that can be almost real-time as well, but we're not there just yet.

speaker_0: That's already a number of interesting things there. In terms of scale, the blog post that introduced the Payments Foundation Model said tens of billions of transactions, and it also indicated hundreds of subtle signals. Could you provide a couple of examples of the long tail of these signals that illustrate how much information the model is able to ultimately take in, which might be hard for a person to grasp? Because humans can classically handle about seven items in working memory. So, what are we missing with our feeble human working memories that the model is able to take in?

speaker_0: From there, I'm interested in the overall scale of data; it sounds like it's getting into the trillions of tokens, which would not be at the high end of text foundation models, but not too far off, maybe one order of magnitude less. So I wanted to sanity check my estimates with you on that.

speaker_1: Your math is legit. I'll answer the second question first, then the first question second. Yes, your math is legit, and the data is very different from the freeform text you'd use to train a model to write like Shakespeare. Payments data is highly structured and dense, so we actually build a custom tokenizer that compresses the numeric and categorical signals really efficiently. Yes, the dataset is big, but it's also packed with purpose-built information that's incredibly rich for the set of tasks we care about in our context. You asked what's hard for a human to eyeball. I think the thing that's hardest for a human to eyeball is looking across those dimensions, not within any one payment, but within any combinatorial sequence. If you think about it, what you need to look at to figure out if, say, a fraud attack is happening or how to get a payment authenticated, has very little to do with that particular transaction and everything to do with where that transaction sits in relation to the transactions that have come around it. So you're not looking at a single screen; you're looking at a clip of a movie. But there are a lot of different clips that include that screen that are relevant to look at. You want to know what I was doing, what the merchant was doing, what my card was doing, what my IP was doing. That's really where the model excels, making it really efficient, not just to look at the individual payment. It's hard to do with the scale of 50,000 a minute, but a human could, I suppose, if you had enough humans. It's really about the sequences that make the problem intractable for humans, but also very hard for traditional ML approaches where you have to hand-engineer features to capture what's happening in each of a range of different sequences. In our context, the foundation model pays off dramatically because it expands really three things. One is how much data we can learn from. We can learn from literally all of Stripe's history, not just a task-specific subset of history. It changes how richly we can learn. These dense embeddings capture very subtle interactions that manual feature lists, like counter features, wouldn't capture. Third, which is more about how we work internally, it changes how efficiently we can build. Once you have a shared embedding, then spinning up a new model becomes a weekend project, not a quarter project, and that means we can open the aperture for the types of ML-powered solutions we can build.

speaker_0: That's cool. I really like the idea of multiple clips. And I take it that just basically reflects the reality that there are multiple parties to any transaction, and I'm inferring that the pattern of behavior of each of those different parties is really where the strong signal is. If you looked at this particular transaction in isolation, you might not get much. But when you combine recent history for all of the parties to a single transaction, the combination of those recent histories is really what tells you what you need to know. Do I have that right?

speaker_1: There's nothing about me using my card in Boston that tells you it's fraudulent. But if I just used my card on my device at my home IP, which is, by the way, in Palo Alto, not in Boston, and typically buy things that are totally different from what you suddenly see someone doing in Boston, that's a red flag that it's actually fraudulent use of my card. Or conversely, if you see someone rotating across a small number of cards to buy thousands of accounts from a given AI provider, maybe the card is truly theirs, but you're almost certainly going to see some sort of reseller refund abuse happening, where they're trying to steal your compute. And it becomes more complicated when you add more entities like the merchant, where there can actually be internal collusion happening. So you're exactly right. It's not about how any one entity acts in isolation. An entity is an individual or a card or a merchant; that's like the node. Then the edges are the transactions, and it's about how much sense these edges make in relation to each other and in relation to the combination of edges that we've seen in the past.

speaker_0: Could that web extend farther? You could imagine including the rendered judgment on previous transactions. For example, if I'm trying to buy something from you and the model is looking for a signal of fraud, you could also consider all the transactions you've recently done as a seller, perhaps with a determination of whether they were fraudulent or not. Or you could even look at who those buyers are and what their history is. So, how far out does this path through the graph need to go? What is the shape of the curve in terms of scale versus diminishing returns?

speaker_1: Yes, totally. I talked about the scale of the Stripe network; it's a $1.4 trillion, very dense network. For example, 92% of cards a merchant sees for the first time, Stripe has seen before on another merchant. In those cases, you don't need many hops, although you do want to validate that nothing has changed about the card or how it's being used since then. Fraud and conversion are somewhat tail events. If you can get 1% more conversion or 1-2% less fraud, that makes a big difference. So, you benefit greatly from the dense network, but you also want to be able to traverse widely for more novel traffic.

speaker_0: Architecturally, this reminds me a little bit of some of the work Meta has done with their joint embedding video models. I'm not sure if that's the right intuition for me to have, but there seems to be a clear difference here where you're not trying to predict the next token, right? It seems like it would be more of a dedicated channel, like a masked situation where you could imagine a training setup that masks out something randomly and has the model learn to fill in the details.

speaker_1: Yes, exactly. Our V1 used a BERT-style masked modeling setup, which we paired with a second stage of explicit similarity fine-tuning. Most of the heavy lifting there involved curating the right sequences to learn from, building the right encodings, and then post-training with that similarity objective. This makes near neighbors in the payment space cluster together, while oddballs separate, allowing us to reason about those oddball clusters. The big unlock here was modeling short histories: what a card, device, or merchant bin is doing over a few minutes or the last K transactions, rather than just isolated payments. In what we call V1.5, we're moving towards encoder-decoder setups and compressed memory sequences, like a few vectors together. This makes it easier to catch subtle abuse in real time because you're not averaging across noise; you're distilling the full story into compact representations. So, the mental model for V1 was masked modeling plus similarity training. V1.5 is compression first with a tight sequence embedding, and then we can add lightweight, task-specific heads on top. These are crucial for charge path use cases, which require getting the job done in tens of milliseconds at most, making those lightweight task-specific heads important for latency and speed.

speaker_0: Can you say how big the model is? I assume 10 milliseconds doesn't allow it to be that big, although you don't have to do many steps in the forward pass.

speaker_1: The task-specific heads on top are small. When you consider it, all you need to do is place new charges and sequences into this dense embedding space as they come in, which is a much easier problem than the upfront training.

speaker_0: Yes, and it's also just one forward pass of the model, as opposed to having to generate a whole sequence. So that single-pass nature definitely helps with latency. That's really interesting. It also reminds me of one of the first language vision models I studied deeply, the BLIP family of models. I remember they had amazing success with a frozen language model and a frozen vision model, and they just trained a few million-parameter connector between the two to bridge from one latent space to the other. Of course, we've gone far beyond that in vision language now, but this was an early 2023 thing. You could get quite good captions from that setup, even though neither of the foundation models used had anticipated that use case.

speaker_1: Yeah.

speaker_0: So it sounds like you're-

speaker_1: BLIP in a while, but that's, yeah, that's an interesting, that's an interesting analog.

speaker_0: So it ... But it sounds like you've kind of created a similar situation-... is where people internally at Stripe can say, "Okay, I have a new use case idea for this. I can train something really small." You mentioned it can be, like, a weekend project instead of a, you know, multi-month project. And it- if I understand correctly, the idea is because you've got the foundation work done, you can train a, you know, few million connector or classifier head, whatever you want to call it, um, like very quickly. Like, that- that step becomes a rapid iteration step.

speaker_1: Yeah, exactly. And actually, you know , it can be even simpler, which is, um... So- so the embeddings themselves, yeah, where most people start actually is, most- most modelers start, is just taking the embeddings themselves, which are stored in Shepherd, which is our feature engineering platform, and literally just, like, adding them as a feature to existing models, like, whatever those existing models are-

speaker_0: Hmm.

speaker_1: ... and saying like, "Is there added signal from these embeddings?" And I would say you get some false negatives there, where obviously just, like, shoving the raw embedding into some number of, you know, SOTA models that have been, you know, iterated on over the last six years isn't gonna produce uplift. But in other cases, where it's sort of a lower priority model that's only in its V1 state, like, you actually do get something straight out of the gate and you can start to reason about, you know, we're- we've been talking a bunch about the payments embeddings, but, like, how much signal do I get from, like, understanding the payment better? How much signal do I get from understanding the customer better? How much signal do I get from understanding the merchant better for each of these use cases? And then that's also motivated where folks have kind of, for which applications folks have leaned in harder.

speaker_0: Yeah, that's really interesting. So just to make sure I understand that correctly, you've got, obviously, Stripe's been around for a number of years. There have been many types of problems that you've brought machine learning to over time, typically with a more classical feature engineering type of approach.

speaker_1: Yeah.

speaker_0: And-

speaker_1: Hundreds of production models, right? Like point- point solutions.

speaker_0: And so now, the foundation model embeddings can become just tacked on as additional features. We run that training and s- immediately back test against your set, and then you're like, "Okay, cool. We just made this better kind of for free because we were able to get additional signal." That's-

speaker_1: Exactly.

speaker_0: ... yeah, really interesting.

speaker_1: I mean, our s- standout application wasn't that, right? It was card testing, where we literally, like, it was a whole new approach to card testing with the foundation model, which, you know, I think card testing is like, you know, fraudsters trying hundreds of tiny authorizations, sort of iterating across stolen cards or literally just doing raw enumeration, just, like, trying a bunch of cards. And they bury those attempts inside floods of legitimate traffic, right? A big retailer can have hundreds of thousands of charges come through, and then there's, like, a couple hundred or maybe a thousand peppered fraudster charges of, like, 30 cents or 50 cents. And, like, classic models couldn't really pick up those kind of needles in the haystack. And so our first application of the foundation model was just treat those sequences, again, like frames in a movie, right? And suddenly, these 200 nearly identical requests, like, you know, same low entropy user agent rotating across proxies, coming in about every 40 seconds, like, they light up as an island in the embedding space, and they get blocked. And the impact of that one was huge. Like, our detection rate of card testing at large merchants went from 59%, which is, you know, not bad but not great, to 97% from that change. But then, as we started reasoning about where else could it be useful, um, yes, just expose the embeddings and let them be added as features to these sort of traditional single-task models was- was the- was the next step. And that was never intended to be the final state, but it's a way to get signal on where is there sort of incremental value or incremental signal from these embeddings that requires very little lift.

speaker_0: Yeah, fascinating. That's a very modular approach to AI deployment, and I can't recall hearing any organization that has had a similarly modular structure. Maybe Meta comes to mind as another one that might have a sort of user model that, you know, could then be bridged over to any other space or problem that you might want to apply them to. But this is, like, a fairly uncommon set-up, I would say, right? Do- are there others in the-

speaker_1: Don't know exactly what happened or how it went or how it worked, but I know, because I know the former leader, I know there was a Cortex org at Twitter that was basically doing horizontal models. Um, again, don't know , don't know exactly how it worked or the architecture. Um, you know, we're- we've been talking about it in the context of payments, but we actually did, we've done the same thing over the last year in the merchant space, where we have, it's called the merchant intelligence team, but it basically has this MI serve, so i- it- it can go out and find anything in the- in the web about a merchant and generate embeddings and be used to answer questions. And those merchant embeddings are also features in downstream, for example, merchant risk models. But and it's a service where the model owner can ask a merchant intelligence, the agent, to come up with a more custom embedding or more custom insight. So maybe you want to know what payment methods the merchant offers or whether they have anything that's counterfeit. And- and that's actually been another horizontal layer that's provided a ton of leverage for Stripe, right? Because historically, you know, you got a l- a lot of use cases. You want to know things about the merchant to understand supportability, like, you know, whether they- they meet the requirements of the card networks and the issuers and the banks. You want to know whether or not they're fraudulent. You want to know where they've had an account takeover. You want to know if they're creditworthy. You want to know whether or not we should give them Stripe Capital, like a loan. And you want to figure out whether you should be going to market with them, like all sorts of things you- you want to know about, uh, a merchant.

speaker_0: Yeah.

speaker_1: Historically, teams at Stripe, when LLMs hit the scene, were building their own custom versions of this. What we realized is there's actually one service now that does that much more efficiently than everyone rolling their own.

speaker_0: The bitter lesson strikes again. I have many different directions I want to go, but where does ground truth come from on some of these questions, and how long does that take? I imagine, especially in a fraud detection situation, fraudsters, I always assume, are going to be some of the most clever people in the world, diabolically so. Nevertheless, you have to respect the smarts of some of these folks. So I assume that they are very savvy to real-world events. You mentioned, I think in the conversation with Patrick, that somebody might have a flash sale and a spike like that. Obviously, you don't want to turn them off when they're having a flash sale because that's a horrible experience and a loss of business for the company running the flash sale. But at the same time, that's potentially a really good target for a card tester to come in and try to do whatever it is they want to do. I imagine you're in an eternal arms race between fraud and fraud detection. And what little I know from my experience as a consumer and as a business owner is that to actually close the whole loop and get to the point where whether this was fraudulent or not has actually been set in stone, that's a long process. So...

speaker_1: Well, it's a long process.

speaker_0: A long one.

speaker_1: If you even get to a definitive answer. But something like card testing, right? Actually, I said the first thing we did with the foundation model was deploy it for card testing. Actually, the first thing we did for the foundation model is deploy it internally for card testing, pass those labels to internal expert humans, have them go and validate the labels, then feed the validated labels into our traditional ML model for card testing, and suddenly our traditional ML model for card testing started doing much better because finally it had a more comprehensive source of truth for the labels. So that was actually the first version, although I hadn't revealed that fun fact before. Attackers iterate, and so do their models. So our job is just to iterate faster, and we are. I'll talk about some of the ways we get around the late-arriving labels or the missing labels altogether. But just to give you a sense of how we're comparing to the fraudsters: industry-wide, e-commerce fraud is up. I think it's up 15% year on year. But the dispute rate for the businesses that are running on Stripe are down 17% year on year. And that's because we, in a bunch of different ways, and I can give a couple of my favorite recent examples, are just consistently shortening the loop between when a new tactic shows up and defenses adapt. And that loop shortening is happening in production and, in some cases, in real time. So an example that our users are getting a ton of value from that we recently released is dynamic risk thresholds. So basically, Radar is out, they've got their threshold score, they block stuff above the threshold, but then when an attack starts, Radar learns an attack starts and it tightens the defenses. So it throttles, and that allows, okay, revenue's flowing freely when you're not under attack, but then we're much more aggressively blocking when an attack arises, because again, an attack is almost never a single event. It's almost always a true cluster. And in that case, the model is learning the policy of how to act. Now, it's not learning that policy online just yet, but it's learning the policy of how to act. Another powerful tool, and I think in payments, it's easy to think, I put in my credit card and then just an objective decision is made to block me or not. But that's not actually true, and so we've been leaning in harder on what we call soft block. So Adaptive 3DS is an example here. It applies that 3DS authentication, so if you're in the US, most of the time you don't get 3DS. But we can...

speaker_0: Can you tell me what that is? Because I definitely need... You might have defined it in the complex systems, but if so, I think it's a real question.

speaker_1: You have a second, a sort of two-factor auth that you would have. It's like a two-factor auth-esque experience where you're verifying to the credit card network or the credit card issuer that it is, in fact, you. This was very common in Europe, and it's very uncommon outside of Europe. And by the way, when it does happen, it often creates unnecessary friction. So part of what we do at Stripe is figure out when we need to authenticate and when we don't, but also, with Adaptive 3DS, we are pushing for authentication selectively in cases where we have a sense that the charge may not be good. So instead of just having this binary decision of block or don't block, you have this other arm you can go down, which is to hit them with 3DS. What ends up happening is the good guys get through the 3DS because they're excited to buy the thing and they're legit users, and the bad guys do not. A lot of the AI companies are using this. AI companies being hit with fraud is extra painful because their marginal costs are high, unlike for SaaS companies who care a lot less. So early adopters of Adaptive 3DS were ElevenLabs and Character.AI, and they're able to dramatically cut down fraudulent disputes without any effect on conversion because 3DS isn't super heavyweight for the end user. In fact, US checkout users, it's a little different in Europe because a lot of those folks already use 3DS, but US checkout users saw a 30% average drop in fraud, and they just turn this on with a single click in the dashboard. Then it lets us basically learn the policy of who is worth 3DSing to balance conversion and fraud to maximize their profits.

speaker_0: One way I frame these conversations is in terms of practical lessons people can apply in their AI pursuits. One takeaway is to add middle ground outcomes to your classifiers to avoid binary outcomes. Find a middle ground where something other than the model can help resolve challenging cases. For example, Claude can sometimes end a conversation if necessary.

speaker_1: If supplemental information could help and you can get it from users affordably, don't limit yourself to being a modeler. Be a product thinker and figure out how to acquire that information. The model is good at deciding; it's not brute force. You don't require additional information from everyone. Let the model decide where it needs more information and where it doesn't.

speaker_0: The adaptive threshold concept suggests a global state that is being fed into the model. I assume the model itself isn't calculating that on the fly. This would be a more global variable that the model would receive, or is it different?

speaker_1: No, it's actually like, hey, this merchant is starting to see clusters of scores creep up. Isn't that interesting? When we look at the subset of transactions that have those higher scores, maybe they're still below the block threshold but they're looking elevated. Is there anything about those that looks collusive, or coming from a small number of attackers, or rotating across IPs, or coming from a geography they haven't seen before? Once we get a signal that, hmm, there's a slice that's an attack, you can actually start to lower the threshold for that subset for what it takes to block.

speaker_0: Is that all happening with the same short histories you described previously? Or is there a longer history coming in to inform that kind of decision? Maybe not, I guess the short history could be enough.

speaker_1: There's a longer history, but less at the individual transaction level. It's basically detecting anomalies in slices of traffic. For example, this geography, these bins, this cart size, something anomalous is happening here. That anomalous thing has elevated risk scores. It reads a bit like an attack. What's interesting is rules are really good in a lot of ways. Another general lesson is rules are good, but they're also blunt, so figure out where you can blend rules with models. You asked earlier when disputes actually come in. Disputes are super lagged. They can take days, they can take months. As the cardholder, I have to see my bill, notice I didn't buy the thing, tell my bank, and my bank has to file with the network. So those labels, for sure, arrive late, but we don't wait. We use proxy signals and those weak labels show up much earlier, all the way to real-time issuer feedback. Real-time issuer feedback would be like a CVC mismatch, or the zip code doesn't match. It would be easy to write a blunt rule that said if a CVC doesn't match or if the zip code doesn't match, block it. But you'd be blocking a bunch of good revenue because people sometimes mistype their CVC or zip code in a hurry, or on their phone, or whatever. So we have these risk-based radar rules which are like, okay, take the model score, combine it with the issuer's real-time responses, and make a decision based on that intersection. For example, if it's looking marginally risky and the CVC is wrong, for sure block. But if it's a pretty known good user and they mistyped something, let them through. I think that blend of rules and models is important. It's easy for modelers to dismiss rules, and easy for rule makers to dismiss models that aren't fully explainable, but in plenty of contexts, blending the two actually does far better.

speaker_0: So you actually do let transactions go with a wrong three digits? What was that called? CVC?

speaker_1: "Nathan, I know you're good. You've bought from this person before. Maybe you even use the same credit card, you're coming from a legitimate IP, and I feel good about you in a lot of ways." The issuer comes back and says, "Hey, there's a mismatch," and we say, "Hey, let it through." When we let it through, we also have to get the issuer to let it through. We have data sharing with the issuers where we pass them our risk scores so they can understand why we passed it through, which motivates them to also pass it through when they see our signals. So it's a two-step process.

speaker_0: Very interesting. Let's return to how you're tightening the iteration loop. I believe this is something everyone developing AI products could improve upon. So, what have you found to be effective in shortening your cycle time?

speaker_1: This isn't one for us where there's some magical reinforcement learning we need to implement online for every single use case. I think it's actually been quite context-dependent for us. The things that matter are having enough labels, good labels, and getting those labels fast enough. You can get pretty creative about what the label is. We talked about some examples, including human-generated labels. Another thing we've been leaning into is using LLMs as a judge, especially for contexts where there is no source of truth. A simple example: we've been discussing fraudulent disputes, but there are many suspicious payments that never result in a fraudulent dispute. Perhaps the person starts a free trial and then cancels, or asks for a refund, or just spins up a bot account but never even reaches the checkout page. This type of friendly fraud is actually very costly to businesses. I think 47% of businesses say friendly fraud, which is a total misnomer because it's not friendly, hurts their business more than stolen card credentials or what most people think of as fraud. This cost of friendly fraud is particularly true for AI companies running on Stripe. It's very different from SaaS. They have inference costs, compute costs, and therefore very high marginal costs. So when someone engages in free trial abuse, reseller abuse, or refund abuse, it's super expensive to their unit economics. Anyway, built on the foundation model, we now identify these suspicious payments. These are fraudulent-like activities, but not in the traditional sense of leading to a fraudulent dispute. When we pass these over to AI companies, for example, we want to be able to describe to them why they're flagged as suspicious. For instance, it has an enumerated email, or it's cycling through a small number of IP addresses. These explainers are generated by the foundation model. But then the question, of course, is, 'How do you know if they're right?' So we have an LLM as a judge that sits on top, looking at every transaction-label combination and asking, 'Given everything you know about this transaction and the cluster to which it belongs, how do you feel about the quality of the label?' What ends up happening is that a large share of labels are good enough, trustworthy enough, that we pass them over to AI companies to make decisions on. And there's a small number that are too noisy, and we think, 'Okay, we need to work a little bit to make that label stronger.' I highlight that example because there's no source of truth here. Nobody can manually go through these transactions, and at that level, we're not going to. So it's been really helpful to have LLMs as a judge where there's no clear North Star.

speaker_0: That's fascinating, but I'm still a bit confused about one thing. When I advise people on AI broadly, one of the things I tell them is that AI models are not very adversarially robust. They are quite good these days at the happy path. If you dial in the performance and control the inputs, you can often achieve superhuman performance on routine tasks. However, if you don't control the inputs and expose your AI system to the world, you must be mindful that the systems are not adversarially robust. People can usually find some weakness. We even did an episode once on superhuman Go-playing AIs that were beaten by very simple attacks that no human would ever fall for. Yet, the AI, even though it was superhuman when playing Go in the normal way against high-quality Go players, was totally blind to this certain class of attack found through adversarial optimization. So it seems like you would be in an environment where you're constantly on hard mode. Anyone can test the system from any position. You can't deny people the ability to try a payment. So they can 'gray box' you; they can test from many different angles and try to see what gets through and what doesn't. Presumably, there's always some vulnerability you're not aware of that they can systematically attack or try to find through these types of attacks. My guess would then be that the only way to really deal with that is to constantly identify and iterate. But that still sounds hard, despite everything you've told me, to be as responsive as you would need to be, especially given that the actual ground truth is so lagging. So how do we not bleed a ton of money in one incident after another as attackers figure out a gap and then exploit it as much as they can until it's closed? How does that not end up being a huge problem?

speaker_1: A couple of thoughts. First, we expose capabilities to our users through products and APIs, not through raw weights. Anyone can try a payment and test for workarounds. But to be clear, we are not exposing the model for them to have an attack surface against. I think products and APIs better meet user needs and narrow the potential attack surface. That's clarification one. It's interesting to think about the relevant alternative. What is a fraudster's job? A fraudster's job is to find loopholes and exploit the system; that's how they make their money. So, the relevant alternative isn't perfectly airtight; it's baseline approaches. When you start to think about foundation models or LLMs, the payments foundation model, for example, is actually a lot more nuanced in the type of information it uses to make decisions. Compare that to an early transaction fraud model that uses last seven-day counters, where the fraudster might figure out, "As long as I'm eight days out, I'm safe. I'll do everything on day eight, hit them hard, and then go seven days back." Traditional ML is easier to get around, whereas the foundation model is more comprehensive. The other thing we have certainly long done and continue to do is a layered approach. There isn't just a single set of defenses. There are model defenses, rule-based defenses, the soft blocks I mentioned, and the user's own defense set, which can also vary, all the way to how they treat you at sign-up or how they block bots at sign-up. Fortunately for us, and unfortunately for the fraudsters, they are not fighting against one model. They are fighting against a whole system that, until I just said it, is opaque to them.

speaker_0: Would you like to tell us about any other big use cases? You mentioned off-fraud disputes. There's some interesting data and product experiences in Stripe. What stands out to you as the most interesting applications, not necessarily what moved the most money, but what would be most interesting to the AI engineers in the audience regarding interesting implementation details or surprises, or quirky stuff you've learned along the way?

speaker_1: We've talked a lot about transaction-level understanding and the path there. If you think about modality, it's mostly payments plus text. You have structured payment signals with language using contrastive learning, and you align the two, and you get the text decoder and whatever else. But payments plus text is only the start. The system is designed so that new modalities are considered tools that the router on top can invoke. So, if you wanted to add another encoder, perhaps for financial time series, which I'm very interested in but don't have anything to share yet, or for images, it doesn't require a whole rewriting of the system. It's a modular expansion. I'm starting to see multimodality truly shine at the merchant level. Yesterday, I was testing two lightweight agents. Neither of these are in production, just full disclosure. But the team has them in shadow. One crawls merchant sites to assess fraud, and it's incredibly relentless. The other spots counterfeit products, and it does so orders of magnitude better than our trained human reviewers at Stripe doing the same thing. For example, it will find a print shop with thousands of items, and the agent will patiently zero in on, say, a Spider-Gwen sticker I was looking at yesterday, with no sign of official licensing. That's a problem. But then it also knows, "Oh, this other site's Canada Goose marked with tags is secondhand," so it's actually fair game. This kind of multimodal roadmap is interesting, not for multimodality or the technology itself, but for where it's going to unlock real value.

speaker_0: On the 'talk to your data' feature in particular, I think many people have tried to build it themselves or use a product for it. It strikes me that where most people have gotten stuck is realizing, "I was able to get GPT or Claude to be pretty good, but it still made some mistakes. I didn't feel confident giving this tool to someone who wasn't a proper data analyst and being confident they would get good insights from it."

speaker_1: Mm-hmm.

speaker_0: So, you have that problem, perhaps on the biggest scale in the world. How did you think about the right threshold of accuracy for a talk-to-your-data model? I assume you didn't achieve 100% accuracy on this, but what was the threshold you felt you had to reach, and what was needed to keep dialing it in until you reached that threshold to deploy?

speaker_1: Yeah. So, one of the reasons this talk-to-your-data approach was interesting to us in Stripe's context is that much of what a business wants to know is captured in Stripe data. For example, who's selling what, for how much, to whom? Who's retaining and churning their subscriptions, et cetera? That's point one. Point two is the data is actually very well structured because it has to be, right? It's generated from transactions flowing through Stripe that are incredibly robust and well-documented, and the downstream schemas are also well-documented and make sense. A lot of this talk-to-your-data stuff often runs into the garbage in, garbage out problem, where my tables aren't well-labeled, my fields aren't well-labeled, or the underlying data isn't de-duped. So, you can't really tell if the issue was the text-to-SQL or if the underlying data was bad, or if the data structure was not understandable. We were able to leapfrog that, which is great. But still, LLMs... you're referring to our Sigma system. LLMs do make mistakes, so our approach is that if we have reasonable confidence, we'll provide it. But we, I don't know if you've ever used it, overlay a natural language explanation of what we're doing. So, if we thought you wanted to know, for example, "How did Black Friday this year compare to Black Friday the last two years?" We'll say, "Okay, these are the dates we used for Black Friday. These are the timestamps we used," because most things on Stripe happen in UTC, and many people aren't reasoning about their business only in UTC. "We looked at Black Friday over the last three years. Here's how we computed the percentage growth." You might think, "That's boring. Doesn't everyone compute percentage growth the same way?" But that actually allows someone who's not a data analyst to build comfort in the output, instead of either YOLOing it, just taking it and running with it, or throwing their hands up and saying, "I can't trust anything because I don't know what's happening under the hood. You just wrote a SQL query for me, but I have no idea how to interpret it." So, when I think about talk-to-your-data, it's about asking if your data is interesting to talk to. If yes, make sure it is well-structured and well-documented. If not, invest in that before you invest in a natural language interface on top. Then, just make sure the LLM is explaining what it's doing, which they are very good at doing now. That allows you to open the aperture a bit in terms of answering less certain questions because you know anyone can read the natural language and decide whether it was the right approach.

speaker_0: For folks who want to double-click on the process of getting the data into shape, the episode with the CEO of Illumix was really good on that. And just for what it's worth for you, they have built basically canonical structures of enterprises across a bunch of different categories, like a drug company, for example. They've built out a vast representation of data that, in their studied opinion, represents the canonical drug company. Then, when an actual drug company comes to them, they do this painstaking process of mapping all of their actual data with all of its idiosyncrasies onto the canonical version that they've made work well. That mapping becomes the cleanup process that gives them the reliability that customers obviously ultimately want. Pretty interesting.

speaker_1: Yeah, I was...

speaker_0: Explainability. Yes, please.

speaker_1: Well, I was just going to say, there's also an interesting feedback loop here with users, right? If you're a usage-based billing company, the types of metrics you want to know to reason about your business or to share with your investors are generally very similar to the types of questions that all the other usage-based billing AI startups also want to know. That means we can truly perfect the subset of questions that matter in a given domain. But also, forget the natural language to SQL interface or talk to your data. We can just push those commonly asked questions onto the dashboard and even benchmark you. We have smart benchmarking now. We can benchmark you on those metrics versus a peer group. By the way, that smart benchmarking is one of the applications of the merchant intelligence service, which figures out which websites are like this website in terms of being good comps because they have similar user bases and are at a similar stage of development.

speaker_0: Yeah. Cool. There's a good pattern there as well, for sure. I've been thinking about that in the context of agents lately. There's the 'choose your own adventure' agent, where you give it a bunch of tools, "Here's some MCPs, whatever. Have at it." And then there's what's sometimes better described as a workflow, maybe with a couple of forking decision points that people also call agents in many cases.

speaker_0: I'm starting to see the emerging pattern is to have that choose-your-own-adventure agent at the top level of user interaction. But then for the things it's choosing, make those pretty detailed workflows in many cases. This way, you know that as long as it makes the right choice at a high level, the process that will be kicked off is one you've deeply understood, dialed in for accuracy, and confirmed will work reliably. I think that's another point. You're talking about a push model instead of a pull, but there's a sort of isomorphism, I think, between those structures. Explainability is obviously huge. One thing I was interested in asking is, are you doing any mechanistic interpretability? Are there sparse autoencoder-type things now happening on the foundation model so you can learn in a semantic way what new features the model is learning?

speaker_1: Not literally. We're not dissecting individual neurons in the way some research groups are. I think our focus is really on making the outputs self-explaining in a way that we and our users, where it's user-facing, can actually trust. Mechanistic interpretability is important when you're releasing the full open-ended model into the wild. In our case, we control both the application and the environment, so our priority is practical explainability tailored to the payments application. For example, when the foundation model flags a transaction, it doesn't just say high risk. It says, gibberish email and enumerated name pattern and device concentration. That's the layer of explanation that allows the fraud analyst or even another system, like a follow-on agent, to act confidently. As we discussed, in many of those cases, there's no ground truth for that explainer. This comes up more and more as we expand to new domains, like detecting fraud earlier in your customer funnel, when someone is creating an account, well before they're entering credit card details. That's where things like the LLM-as-a-judge framework are really helpful. It will look at that and the cluster of similar events and the tag definitions, then output how confident the LLM is, essentially judging: How confident is it that the cluster really matches the label? So no, we are not peering in neuron by neuron, but we are focused on interpretability at the output level, which is valuable for us. For example, if you're in your dashboard and you're seeing a bunch of suspicious users, you want to know exactly why we flagged them as suspicious so you can decide how to take action. In our setting, that's what matters most.

speaker_0: Got it. You mentioned usage-based billing, and this led me to a very practical question about how you would recommend people build on top of Stripe today. Ten years ago or so, when I first became a Stripe customer, it was already a respected company, but not as foundational a part of the economy as it's become. So we thought, "Well, we might want to switch off of this one day," or who knows, we'll have our own database of all the transactions. Stripe will, of course, have their view of it, but we'll maintain our view. That was a lot of work back then. With usage-based billing, it sounds like an even more challenging project now, especially for your proverbial couple of people doing a hackathon who want to get something started.

speaker_0: Do you recommend that people—the alternative I have in mind, which I wonder if you ultimately recommend, is: could I just leave all of that to Stripe and basically make nothing but API calls, trust Stripe to be the real-time ground truth across the board, and not even have a financial side to my database, but just purely do real-time API calls?

speaker_1: Absolutely.

speaker_0: Definitely?

speaker_1: Absolutely. Don't even have it. Stripe's APIs run at six nines of uptime, so they are safe to use as your system of record for critical flows. Our usage-based billing APIs process 100,000 events per second, and they have all the built-in monitoring, alerting, and invoicing. The alternative is also quite painful. Building your own mirror of Stripe's data is complex. You have to sync all the events, build your own monitoring systems, and keep everything reconciled. Especially for a startup, that's a lot of work that doesn't create any differentiated value. Many companies that are taking off have five, ten, twenty employees. They shouldn't be spending any of that limited capacity on this. On the flip side, if you treat Stripe as your source of truth, you get real-time signals that you can act on within the Stripe ecosystem, such as a billing threshold being exceeded, without needing a whole parallel system. We talked about Sigma's system earlier, but with products like Sigma and Stripe Data Pipeline, you can still run all your analytics and reporting without building your own warehouse. You might be wondering about the downsides. Historically, the biggest downside of not mirroring and just leaning into Stripe was, "What about when I want to join Stripe data to my own business objects?" Now you actually can. You can extend Stripe's objects with metadata. Many users will attach their own order ID or shipment ID to an invoice, which closes most of the gap for most companies. It's different if you're a large enterprise with many other follow-on systems and data sources off Stripe, but for most startups, it's both simpler and much safer to let Stripe be the system of record.

speaker_0: Cool. I imagine some companies have achieved significant revenue growth over the last several months while still doing just that. I don't know if you'd want to highlight any by name, or if that's too secret. But when I see the curves from folks like Lovable, Bolt, Replit recently, and obviously things like Cursor, one starts to wonder about their headcounts. I bet a lot of them are probably doing exactly that and just trusting, which six nines gives you good reason to trust.

speaker_1: Yes, and that's just for the system of record. Lovable is a great example. They hit a hundred million in ARR in their first eight months, and their stack is basically a case study in going all-in on Stripe. Before they monetized, they incorporated the business with Stripe Atlas. From the very beginning, they used our Optimized Checkout suite. So the front-end, customer-facing service is our Optimized Checkout suite, which allowed them to localize payments in over 100 countries and get about 150 payment methods out of the box. They relied on billing for subscriptions, so they didn't build their own billing system or have to contract with another third party. They relied on Link, which is our one-click consumer checkout for fast checkout. By the way, nerds love to buy from nerds, so our concentration on Link of AI buyers is very, very high. They relied on Radar for fraud prevention. They relied on Sigma for analytics. So really, Stripe took care of the financial plumbing, allowing Lovable to focus with that small team on product and growth, which of course they nailed. But there are smaller ones, too. Retail.AI, they built, have you used them? They're like-

speaker_0: Co-agents, yep.

speaker_1: Yes, for customer support. They launched last year and I think they have over 10 million in ARR in their first year. We mentioned Link concentration. Link actually powers 38% of their payments. So 38% of their payments run through our consumer network, where the individual has an identity and their payment methods are saved on file, making it a one-click checkout for them. They use us for smart retries. For instance, when a transaction, usually a recurring bill, fails, we retry it at the optimal time, which allows them to recover about 60% of their failed charges. They use us for Stripe Tax, which keeps them compliant in a hundred countries. So it's just a great example of how these AI companies, with very lean teams, are growing fast, going global, and really scaling up to look like a much bigger company with Stripe behind them.

speaker_0: I've heard you talk a couple times, I know we don't have too much more time, so let's hit on a couple of last topics. I've heard you talk a couple times about the time you spend getting new clothes for your kids. That's mostly something, in all honesty, my wife does in our home.

speaker_1: Lucky for you.

speaker_0: But yes, I would flatter myself that I do my share in other ways, but she's definitely better suited to pick out what will make the kids look cute. What's interesting, folks who listen to this podcast will know the basics, that Perplexity has a shopping feature, and we know what MCPs are. Are there any recent developments, or is this really happening, or is it still, from what you've seen, the 'wouldn't it be cool if one day this were real' phase of agentic commerce?

speaker_1: It's both. It's definitely still early. There's a lot we're still sorting out about how this will actually work and how quickly it will take off. But we are seeing meaningful traction. You mentioned Perplexity, where you can discover and book hotels directly inside the app. But it's not just big players like Perplexity. Hipcamp is a smaller site that uses agents with virtual cards to book campsites off-platform. I'm from Montana. It's impossible to get into Yellowstone National Park. I hate using their website, although I love the park and value that they're not spending a ton investing in tech. But Hipcamp is actually solving that. And I think it's easy when people think about agentic commerce to think about buying kids' clothes as a consumer. But on the developer side, we're seeing the same trend. Developers now in Cursor can buy or sell services right inside their editor. That's a brand new channel, deeply embedded commerce directly in the workflow, and Stripe powers those transactions too. So we're not entirely new to this. It was last November when we launched our agent toolkit, but we still have thousands of downloads each week. And just looking at the pace of adoption and who's testing, I think agent commerce will be a major channel far sooner than most people think.

speaker_0: With that embedded stuff, like for selling Cursor,

speaker_1: Yes.

speaker_0: it seems like that's much more about the connective tissue of the user having an intent, and it's a question of how it's going to be executed, as opposed to any autonomous decision-making by the agent or any meaningful delegated discretion to the agent. Have you seen anything truly interesting where an agent is trusted to figure out what to buy and execute on it at this stage, or has that not really materialized yet?

speaker_1: I can't name names, but this idea of a business in a box. I want to build this business, and I don't know what third-party tools and services I need. I just want the business in a box to spin up the business. That's not just the payment provider, the front-end service, bot protection, or the HR system, but give me my whole business in a box. I think that could be an interesting direction. Getting that right for the whole world of businesses that might be created is hard. Getting that right for a focused AI wave that's coming online isn't a crazy thing to reason about. I agree with you that the option set in consumer is broader, and so there's more work for the agent to select from this very broad option set. But SaaS procurement is also very inefficient. Maybe we underestimate how inefficient it is, and that's not just in the selection of vendors; it's also in the pricing and negotiation with vendors. So I don't think it will happen tomorrow, but I think it will happen.

speaker_0: Cool. We'll keep watching out for that. Last question: about the future of platforms, scale, and market power. I often think back to the leaked Anthropic deck from two years ago where Anthropic claimed, "We believe that the companies that train the best models in 2025, 2026 may have such an advantage that nobody will be able to catch them." Why? Because, presumably, the models will help train their successors with all these data filtering, synthetic, and constitutional AI processes. Once you have Claude 4 contributing to the training of Claude 5, anyone who doesn't have Claude 4 and is still sourcing everything through Scale AI or similar is at a massive disadvantage. It seems that this basically applies to Stripe as well. Is there any hope for anyone to ever compete with Stripe given the 1.3% of global GDP flowing through the system and the massive data advantage that already exists? Or are we now in a future where we just need to rely on the Collison brothers to continue to be good actors? It seems like this position is almost unassailable.

speaker_1: I think financial services is a big, broad space, and there are many services one can provide in that space. In the context of data, 1.3 trillion a year is a lot, and volume is growing 38% year over year. That's a massive, growing dataset. The real advantage, I think, isn't the raw size, though. As you commented earlier, it's the compounding loop. In our context, that loop is: the more data we process, the better our models get. The better our models get, the more value we deliver to businesses. Incentives are super aligned. The more value we deliver to businesses, the more those businesses grow, which means the more transactions they run through Stripe, and that loop compounds year after year. Honestly, that's why we talked earlier about why it's hard to make horizontal bets, but that's why we can make horizontal bets. Not just because we have scale, but because we're in a position to harness that scale to create even better products, which then feeds the feedback loop. And we're pushing this further. You may have heard at Sessions last year, we announced a big push for modularity. So now products like Radar, our fraud prevention product, or our billing product, or that optimized checkout suite for your users are available multi-processor. They don't just work on Stripe transactions; they work on transactions or on billing plans or on checkouts that are happening outside of Stripe too, and that actually gives us a window into an even bigger data network and further reinforces that loop. So, I think there's a lot to be done in the financial infrastructure space, and I think there will be plenty of players playing important roles there. But I think we are quite differentiated in the intelligence that we can serve to users, and it's just really fun to see how that intelligence in turn helps them grow more profitably.

speaker_0: On the other end of that, do you ever think about trying to compete at the foundation model level? This is something that not many companies are able to do, but given the depth of ML experience and the unique dataset that does exist, and the reputation of the company, I expect that if there was a special fundraising round to raise $10 billion to train a Stripe One to compete with Claude 5 and GPT whatever, the money would be there. How do you think about going that hard or calibrating just how ambitious to be with AI investments?

speaker_1: Stripe has always leaned into new technology waves. When we were founded, it was the platforms and marketplaces wave that got us a lot of the way here. Today, it's the AI wave, and our mission is to build the economic infrastructure for AI. That shows up in four big bets. First, being the best partner for AI companies: helping them monetize effectively, scale globally, and manage billing, tax, and fraud. Two-thirds of the Forbes AI 50 already run on Stripe, and we're very focused on co-building with them, whether it's usage-based billing or whatever the next wave is. The second is enabling agentic commerce. Agents will be buying on your behalf, and we want that to work really well for the whole ecosystem—for the consumer, the seller, and the platform or commerce facilitator. The third area we're focused on is making Stripe native inside the AI-enabled tools that developers already use, whether that's Vercel, Replit, Cursor, or Mistral's Le Chat. Payments should show up right where the work is happening. The fourth is what we talked about today: deploying our foundation model across the network to improve fraud detection, boost authorization rates, and expand the intelligence layer we provide to every user. Those are the four big investments. Today we're hyper-focused on the economic infrastructure for AI, not on being an AI model shop directly.

speaker_0: Gotcha. Cool. This has been excellent. I really appreciate the time and the depth. Is there anything we didn't touch on that you would want to leave people with, or any concluding thoughts?

speaker_1: No, this was super fun. Thanks so much for having me.

speaker_0: Emily Sands, Head of Data and AI at Stripe, thank you for being part of the Cognitive Revolution.

speaker_1: Thanks so much.