All right. Thanks for joining us, everybody. Jim Fish with the Software team. We have the pleasure of having SoundHound with us today.
Thanks for having me.
Nitesh from SoundHound. How have you been? How was the flight in?
All great. Yeah, thanks for having me here.
Good, good. So SoundHound came public about two and a half years ago with an AI voice platform, initially geared really towards that B2C use case with, with Kevin, but now we're really actually more in the B2B world. So, you know, how do we think about that kind of evolution of SoundHound? And how do we think about... You guys talk about three different pillars. Like, help us kinda give an intro on that.
Yeah, if it's all right, let me do a little bit of an extended, you know, who are we? So the B2C aspect was really we've been around for almost 20 years. We are in the, I'll call it, hard AI space. So, you know, there's a lot of different aspects and elements of AI, but we do voice AI primarily. And voice AI is one of these things that understanding the complexity of language, and so forth, takes a long time. So our company had been designing in stealth for 10 plus years, the voice AI ecosystem, and concurrently, to kind of fund the company and going, we had a consumer app, particularly on music recognition. That was the B2C element. Funded us, we got a lot of notoriety. We were actually, in the tech standpoint, ahead of the curve, from competition.
But really, we launched in about the mid-2015, 2016, our voice AI platform, and our first entry was in automotive. It was really about how do you voice-enable driving? You know, it's not safe to be texting, and so forth, or pressing buttons. So just through voice interaction, how can we get traction? And actually, with a lot of the early investments, also from a strategic partner standpoint, came from automotive companies, because they were looking for a replacement of the incumbent player, which was primarily Nuance for a long time, was the big voice AI player. So we grew in that area, and we've been scaling, and really, you know, we look at it fundamentally, and the vision was always around voice-enabling the world with conversational intelligence.
We believe that if you think about how humans have interacted for the last, you know, our whole history, tens of thousands of years, it's primarily through voice, and yet, if you look at our interaction and interfaces with technology, it's been keyboard input, touch-type swipe, you know, GUI interfaces, and we believe that voice will be the unlock of completely new ways in which humans will interact with technology, and so that then applies to the strategy where we go, "We really want to voice-enable the world." Both number one, what you call our three-pillar strategy. Our pillar one is voice-enabling products, so cars is one application there, but we also voice-power TVs like VIZIO. We do more and more smart appliances, and there are strategic reasons why we believe voice is a better way than traditional interfaces of computer.
But voice-powering products, historically, last year and before, was the biggest chunk of our business, +90%. Voice, pillar two is voice-enabling services, and we're getting a lot of traction on the restaurant side, in particular, food ordering. But we're growing further and further. We just announced an acquisition not too long ago that extends our capabilities into other industries like healthcare, insurance, financial services. And then, really, what our vision is, this pillar three monetization ecosystem, where now you can connect voice-enabled services with voice-enabled products and build new monetization streams. So just give you one small example, imagine people are here local in Nashville, and they're driving into the conference, and they want to pick up coffee on the way because it's early.
Traditionally, you're probably either fumbling with your phone, maybe you're trying to type on your screen or something, but through voice interaction, you could just seamlessly say, "Hey, Kia..." One of the partners we work with is Hyundai. We also work with Stellantis and others. You can say, "Hey, Kia, order me coffee or find me coffee." And it could tell you, "Actually, there's a Starbucks at the next exit. It'd be a five-minute detour. There's a Peet's Coffee closer to the conference, and, you know, you want me to order your cappuccino for pickup?" That has new economic streams, you know, for us. And so our revenue model is diversified. There is a lot of applications, both on the B2C - B2B, but now to get back to your question, we really look at ourselves as a B2B2C company.
We interface with OEMs, we interface with QSRs on the restaurant side. We're interfacing with bigger and bigger money center banks, insurance companies, and so forth. And again, the long-term vision is changing the way... And every 15 years or so in tech land, there's new manifestations of how technology interfaces with humans, and we're trying to bring the next great change.
On that pillar one, we'll go in order, I guess, just to make sense. Auto is still a large part of your business. Understanding there's TVs within that voice-enabled products piece, but how do we think about the penetration today on autos? You know, what will it take for pretty much 100% AI attach?
Yeah, so there's two pieces. I think there's the penetration of voice in auto, and then maybe I'll talk about our penetration within that voice landscape. So today, probably roughly, you know, more and more, what you're finding is cars are connected, and they have voice enablement. So traditionally, voice enablement was sort of embedded. You could connect and talk, but it was in-car controls without sort of that internet connectivity. You could turn up the volume, turn on the AC, open/close windows, command controls like that. Over the last, call it, five, seven years, really been more and more inflection in growth of connected cars that now you can, you know, obviously have traffic patterns. Many people plug in CarPlay, and you can get mapping or, you know, connect to music and so forth.
But that capability is growing a lot, and we've been... That was really the differentiation that we came in. So in the market today, there's probably 75% penetration of voice into vehicles, and every year it's really growing faster and faster. Nuance was the major incumbent player, and we've been stealing share. So our penetration in that market is still in the single digits, but the brands we partner with represent about 20% of the global market. So just growing with existing customers is a massive growth opportunity for us. We comment on, like, bookings, backlog, and sort of metrics of those contracts that we have. So for us, there's a lot of growth opportunity, just scaling with existing customers, but we're continuously adding in... You know, our differentiation was always on the basis of tech.
We built; we're founded by PhDs from Stanford. We had to build, you know, a team, like I said, that worked in stealth to build the voice AI ecosystem, so we are one of just a handful of companies. We count the big tech in there, but we believe globally, one of just a handful of companies that have all the pieces, parts, the voice AI ecosystem, which is speech recognition, natural language understanding, generation of text, and then speaking it back, and so we've come as more flexible start-ups, sort of bring up change and speed, and so that's how we've been growing and getting greater and greater penetration.
So is it fair then to say we've moved beyond sort of the educational phase for that auto vertical and now more into the competitive, prove-it-out kind of phase?
Yeah, I think so, but I'd add that, there are behavioral things. We say this and why we also are very confident that now is the right time for voice AI to penetrate more. If you think of traditional interfaces, whether it's the car, people are used to, like, either they don't use the technology, or they press the speaker button on the steering wheel, and they'll say, "Call home," and they'll start calling your mom. Like, it hasn't worked, you know, or you think of other voice applications, traditional interactive voice response systems, where you press one for this, press two for this. People get very frustrated, and they're screaming, "Operator." You know, that's the kind of historical technology that people's mental map is like, Eesh, is this stuff really working?
Even if you were to say... If you look at some of our demos that we have on our website and things that were launched, like, nearly a decade ago, as compared to your interfaces with Siri, Google, Alexa, which even your Alexa is still in your kitchen, pretty much you're limited to, like, start a, you know, set a time or play music, very limited utility commands. Whereas our technology actually works in the context of how we're interacting, complex, compound queries, something as simple as negation. I want this, not this. Traditional architectures are keyword abiding. So there's a bit of a consumer journey that's migrating, and but I do think it's more and more here, to your question, but I would also argue we're doing so much innovation around it that there's a lot of new capabilities.
So back to the car example, recently, we built our own language models that can integrate and quickly, seamlessly integrate the operating manual of a car. So now, if you're driving, and your brakes make some weird noise, or you got a weird light that shows up on the dashboard, you got a flat tire. You don't have to go thumb around in your glove compartment, pull out the car manual, and go like, "Which page? A 195 . Okay, here's how I do this." You just talk to your car. Why is this light on my dashboard? And it can interface with you and tell you, "Okay, maybe you gotta go take it to the dealership." That's the level of advancements we're making. So I think there's still some learning and growth.
And the last one I'll add is that we're integrating other LLMs that are out there. So, for example, we interface. We're the first one to go and live in production with Stellantis in Europe, with connection and interface to OpenAI. So now think of the random things you can do. First, you always could do in-car controls. We've added a lot of cloud capabilities, and on top of that, you can now, if your daughter's in the back and you say, "Can you, you know, tell her a story about the cow jumping over the moon?" You can do something as random as that.
You're in Paris, driving around, you're about to go to Louvre, or you wanna go like, "Tell me more about Leonardo da Vinci, and what inspired him to make the Mona Lisa?" Those are the types of things you can enhance. So I think these, these capabilities, these feature sets are growing, and consumers will continue to benefit from that.
Yeah, my daughter would be all about Bluey and Ms. Rachel. I'm sure.
Yeah,
So maybe moving on to the sort of next pillar. You guys had a little bit of an announcement this past earnings around Amelia.
Yes.
Why Amelia? What does it bring to SoundHound? And what makes it so that Amelia gives you guys the right to win now in that space?
Yeah, so I mentioned we were historically, and our voice-enabled products business was the biggest chunk of our business. We've been growing aggressively in the restaurant space in particular, but, in our voice-enabled services. But we always saw voice-enabled services as a massive disruption opportunity. In fact, I'd argue that one of, you know, people look at what the last couple of years of what generative AI can do and large language models can do, generally, it'd come down to a couple main use cases, and one of those are the shift in conversational AI, in particular. So the ability to have conversations at much more rich level, as I mentioned, compared to the utility of voice commands previously or even interfaces with chatbot. I think ChatGPT has done a lot to show people, like, wow, you can actually engage at completely different levels.
And so we really started to grow, and we think there's massive disruption across every industry that will happen over, you know, the next five, ten years. So we organically were growing in a particular area, and we had this opportunity to partner and get to know Amelia, which has been a leader in this space for a long time, but has deep enterprise-grade integrations with large money center banks, insurance companies, healthcare providers, retail, hospitality. And that, for us, is tremendously attractive because, especially in those highly regulated industries, on our own, organically, probably would have taken us five to seven years to understand customer requirements, integrate with their complex systems, and so forth. And so now we can accelerate the customer journey. Full frankness, we weren't really looking to... We had so much opportunity in the restaurant space.
We weren't saying like, "Wait, let's-- the next big thing to get on right now is healthcare." We certainly weren't thinking of, you know, organically growing, but when we met the team, we got to know them, we knew where they were in their state. Really was an accelerant, and most importantly, it's really what you could do together. So, for example, for many of their customers, this opportunity for using voice as a catalyst, so they third-party the voice, you know.
Yeah
... capabilities. What we saw also with some restaurant opportunities, that we can integrate our own best-of-breed, better technology, frankly, to bring new innovation to their customer base, and there's margin benefit because it's our own proprietary tech that we built up and hundreds of patents around it. So that application of accelerating their journeys, and I'd say the same thing the other way. You know, the capabilities they bring, interfacing both with, like, IT operations or employee capabilities of just interactions within big enterprises that we could bring to our automotive partners, to our restaurant, QSR partners. There's a lot of cross-pollination opportunity here, so we're just, you know, getting started. Just announced it recently, as you announced, as you mentioned, so in integration mode. And, you know, and for us also, just gives scale.
We have an amazing tech stack and to be able to get that out and show the differentiation, the innovation to more and more customers is something we're trying to pursue.
Yeah.
Definitely.
And you brought up the restaurant side of things. So I think you're up to five of the top fifty QSRs at this point. You know, how long does that sort of deployment take across some of these large QSRs? And how much of a driver is this sort of labor arbitrage versus kind of tech investment to kind of fuel that?
Yeah. So we're actually five of the top twenty currently announced, and then there's several other conversations going on. So there's a ton of action here. You know, and the pacing and scaling really depends on the construct. So there's a few different things we're doing with restaurants. Number one, we do phone ordering, like a Jersey Mike's, where you call in, and that can scale much more quickly. Once you kind of get menu ingestion, integration with point of sales, you can scale that rapidly. And so, like with Jersey Mike's, for example, we went out first out of the gate, 50 locations right away. And then we're scaling across their full franchise fleet. We have other phone-type based ordering capabilities with, like, Papa Johns and some others. So we'll keep scaling.
On the drive-thru side, which is a huge area, you know, many people, when you go through drive-thrus, nobody goes at the end like, "Wow, that was the best experience I ever had." In fact, oftentimes people have labor shortage challenges and so forth, and, you know, they're trying to prepare the food in the background. But now with automation, there's a real complement there, and we're seeing multiple benefits, is that there's speed and sort of cost efficiency that you mentioned, but we're also seeing ticket upsells. So one major chicken fast food joint, I won't mention their name, but, you know, they're super excited because they see $20 ticket prices already going to $22 because the AI never hesitates to upsell. If you program it the right way, it's consistent in service.
So ultimately, the pacing on hardware on drive-thru, though, to your question, is a little bit gated on hardware requirements sometimes. You need a headset, you need a microphone, display boards, all those things. So that might gate it. But, you know, ultimately, for some of these, we're able to go quickly. Some of them are multi-year journeys. Some of these locations that we're now talking to have thousands and thousands of locations, and they all sometimes have different constructs. Very fragmented restaurant industry. One of the ways we are accelerating is we partner with channel partners, so we integrate with, like, Square, Toast, Olo, Oracle MICROS Simphony. And we're bringing platforms where the bigger and bigger ones have their own custom point of sale, so we can create interfaces that allow that to be more seamless.
Yeah, makes sense. And maybe on the customer service side, I know it's, you know, very early on that journey here for you guys, but what are you actually seeing at a high level for sort of this conversational AI replacing human seats at this point?
Well, I think two things are happening. I think the conversation with enterprises across all industries is only accelerating. People are curious, and they know, first of all, you know, just with the momentum and the narrative out there about AI transforming businesses for productivity purposes, but also just doing different and bigger things, every enterprise across industry is sort of curious about it. And what they're looking for are partners who really are true AI providers that have differentiation. So historically, SoundHound has been able to come in and say: Look, we are a differentiator, and we provide voice capabilities. We can extend that in conversation with Amelia. They have deep integrations with major enterprises. So I think really, I'll take the restaurant example to answer that and give you, but this, I think, applies elsewhere.
Eighteen months ago, two years ago, we were pushing and trying to get in front of more and more QSRs. Right now, we just have, you know, more than we can handle, in fact. So we're trying to calibrate investment, you know, towards our goal of getting profitability and so forth, while actually delivering. But the demand and the sort of inflow of demand is just accelerating, and so we're trying to pace. Like, if you get a massive QSR in here, says: I'd love to do X, Y, or Z with us, you know, we don't wanna put them on hold for six months. And so that's been more of the conversation in the most recent six months. So there's a ton of demand, and I think, but most people also try to understand what part of AI do I play in?
You know, I understand this generative AI integration with LLMs. You know, everybody forever was buying, like, GPUs and now cloud services. They're trying to find applied AI use cases, and that's where we play. Who can provide really integrated AI to benefit their customer base or their employee population, and so we're seeing a ton of demand for that.
Got it.
That's sort of, by the way, represented in our growth curve. Like, we've been growing consistently over 50%. Our expectations and outlook for future years are also in that. Frankly, over the five-year horizon, we think continued 50% CAGRs would be understated of our potential. We really see a lot of opportunity.
Understood. And maybe just to level set it all, you brought up a few competitors here, Nuance. I think Cerence does some stuff on the auto side. You guys have talked about sort of the cloud guys that are sort of internal capabilities. I guess, what does that competitive landscape look like in terms of who you guys are seeing most? And specifically, one of the questions we get is, like, what does SoundHound have as a kind of secret sauce that makes it so that the big cloud guys can't do what you guys can do?
Yeah. I mean, industry by industry, there's slightly different competitive sets. So there are some people who say they play in the voice space, but they'll only have one component of the engine. They'll do speech recognition, they'll do speech-to-text transcription, or those types of capabilities. Whereas we are, like I said, one of only a handful of companies that have the full, engine: voice, speech recognition, natural language understanding, generation, text-to-speech, content domain partners, integration, agnostic integration with different LLMs. So we do partner with OpenAI, we partner with Perplexity for real-time domains. And we, by the way, in the LLM world, like, there's fiefdoms being created, right? You got the Microsoft OpenAI ecosystem. Google is not gonna sort of partner with them. They're gonna do their own Gemini Bard thing. You got Anthropic. So we're agnostic.
We have an interface that sits on top. Number one, we believe our technology is better in terms of how humans interact so if we show our demos, you'll see that the complexity, the speed, the accuracy with which we work is step function different. We have actually benchmarked data on our investor website about our performance in-car. We did this against the benchmark of a big tech, and it talks about sentence accuracy, how well the conversation accurately worked. It stands up to higher and higher speeds with noisier and noisier, and we're constantly innovating. For example, where a lot of the LLM conversation's been on natural language understanding, when you text back and forth with GPT, it can understand what you're saying and hold a conversation, even though it might be hallucinating, but it can keep it going.
We're doing a lot of the similar innovation on speech recognition. So now we have, you know, our global footprint, so we're in a lot of languages and accents already in acoustic domains, but we're doing a lot on acoustic, text-based prompting, so you can handle basically. You know, one of the complexities of, of human language is just the variety of it and the different ways people infer things, and just with body language, people, you, you kind of pick up on stuff. That's things that we are constantly innovating, so our technology ends up being better. So now getting to your question on. That's why we think we're even differentiating against the big tech, who have unlimited resources.
We also, you know, with the QSR space, we are working on the integration side to make it effective for the restaurant, whereas, you know, in a lot of cases, the big tech don't provide full suite solutions. On the, on the car side, they don't provide edge solutions, right? A lot of times, it's they don't white label it. We work in service of the brand constantly. So if you're Hyundai, you want your consumer to say, "Hey, Hyundai." You don't want it to say, "Hey, Alexa." And then secondarily, you need to custom develop commands, because if you're in a Stellantis in a Jeep, and you ask now an LLM to say, "What's the best car out there?" You don't want it to say BMW, right? So you need to, like, custom develop those in service of the customer, and that's been the DNA of this company.
It's what works for the customer. So, you know, yeah, and by the way, I will clarify, the incumbent player was Nuance. It spun off Cerence as its sort of independent provider, and that's who we've been stealing share from. On the restaurant side, it's a lot more private companies, smaller companies, and that's more greenfield. And Amelia, who's been a tech differentiator, you know, top right on Magic Quadrants, on IDC charts, and a tech differentiator, you know, they're well-positioned across the industries that they play in. And that's been, again, the core essence of what makes us different and what we believe makes us better is technology. It's ultimately the value we can create for customers.
Got it. We've got about five minutes left. Is there any questions from the audience at this time? Okay.
So I think you touched on it a little bit earlier when you were talking, but you also have your Polaris model with multiple different partners. So can you help us understand maybe when you use your own Polaris model, when you partner with OEM, and then how you're thinking about, you know, why you need to build your own model rather than just fine-tuning the foundation models that are out there?
Yeah. I think, I think we can do best of both in our view. So language models are very helpful, and they're getting bigger and bigger and more and more... You know, the capabilities are expanding, and the acumen is improving continuously, but they don't do everything and anything. So, for example, again, you can't real-time apply it and get it to understand the car use case manuals for which oil you should use or so forth. We already integrate, and by the way, there's speed and cost efficiencies to utilize your own domain. And like I said, in Polaris example, we're innovating in areas that the LLMs are not addressing. So they're addressing sort of the understanding part. We're doing the speech recognition part as well. So I think there's a lot that will fit on top of LLMs.
We believe the LLM is transformative. We believe it's the next generation of shift in technology and how humans would all interact, but that alone will not, you know, serve all goods. So I don't think any company. And I think there's gonna be a lot of transformation in five years, but just taking that off the shelf and saying, "This can do everything I want," especially when you get into enterprise applications, it's just not gonna work. So, what we're finding is there's the right application, and again, we are not gonna be able to compete on creating the next three trillion parameter model and spend hundreds of millions of dollars. That's not our game.
But we can absolutely differentiate in getting better use cases for customers and integrating where LLMs make sense, where you can do a lot without LLMs, frankly, and still enhance the customer journey.
Does that change the sort of required CapEx for the business?
We leverage cloud partnerships do a lot of our innovation. There is a lot of sensitivity, and as we're growing, and we have a ton of data, so maintaining that data, where data store, which file storage system versus object storage system, those types of things we're constantly scrutinizing. But we have a great partnership with Oracle and Oracle Cloud, and that's who we've been scaling with, and they give us. You know, we have access to GPUs and so forth to do our own training of models. But again, we're not really competing on the trillion parameter basis. We do have, you know, billions, tens of billions of language models that can be very usable in applied use cases, and I think that's what the shift that will start to happen, as I mentioned earlier.
You know, there's been, again, need a lot of hardware, need GPUs, need cloud. What do I do with it? It's gonna be enterprise-level applications that are specific to domain. So, for example, if I'm going in to do a drive-thru, generally, you know, people don't go in and ask who's gonna win the next presidential election. They say, "I want a cheeseburger and French fries." So again, you can create use cases that are limited in capacity, cost, and create real value for a customer, and you don't need that AGI, sort of that can answer anything and everything and solve a math problem one minute and then talk about theory of relativity the next minute. Which has got a use case, by the way, and the world is shifting that way, so I'm not trying to minimize that.
It's just in all consumer journeys, you don't need, you know, the genius of everything to address that.
Yeah. So, a few moments ago, you talked about 50% sustainable growth. One of the lead metrics you talked about is cumulative backlog bookings, which was north of $700 million, growing 100% or so, north of 100%. You know, walk us through, because that's not a typical metric a lot of SaaS companies give. So what does that actually mean underneath? How do we think about the duration of that backlog? And is that what's helping to drive the confidence beyond just sort of this you know, pillar opportunity across each individual pillar opportunity?
Yeah. So, you know, we are a diversified business in the sense we have a royalty business on the product side, like cars, that we get paid on royalty stream. We are growing in our SaaS architecture, especially with restaurants and now more so with Amelia. And, you know, monetization is a different revenue stream. So this metric is intended to sort of aggregate the total book of business we have. You know, we have a massive TAM we're going after. We have a massive pipeline, but this represents signed customer contracts that we have, and they're slightly different flavors, depending on which industry. So the automotive tends to have long-term contracts, five years, seven years. We have some that are north of nine years. And part of that is upfront engineering work that's required.
Some of it is royalty streams that we're expected as the volume grows, as units are shipped. And then on restaurants, it's more about, you know, subscription revenue that we have in place with them. So yeah, all of that together is growing. We continue to get more and more demand, and, you know, we're at a small scale, but growing aggressively. But yeah, that's what gives us confidence about the roadmap. These are actually signed customer contracts who are trying to expand with us, and that gives us good runway and, you know, sort of foreshadowing the revenue that we feel confident about. But that's not all. We're always adding... Also, we're trying to displace other players out there.
But that does give us confidence in sort of what we can guide in terms of outlooks for next year and so forth.
Awesome. Well, you have a lot to tackle here and a lot of opportunity and, you know, really some interesting stuff underneath. So really appreciate you coming out to Nashville and telling us the SoundHound story. Thank you very much.
Thank you for having me, Jim. Yeah.
Thank you, everybody.
Thank you.
Thank you very much.