As the CFO, so let's start, Keyvan, with it's the year 2000. You're getting your Ph.D. at Stanford. You're thinking about opportunities. What was the genesis of SoundHound? Where did this all come from, and kind of walk us through that then to today? 'Cause I think it's helpful for people to understand that.
Yeah. Thank you for having me. Fun fact, I actually went to VCs in 2004, 20 years ago, and I pitched them that voice AI mass adoption will happen in 20 years. Most of them declined to fund us, but so 20 years is too long. But the fact that it's happening now is so incredible. To answer your question, I did my undergrad in engineering. I finished top of my class, but I also started three companies when I was 19, 20, and 21. So by the time I graduated from undergrad, I had, you know, this itch to be an entrepreneur. And I didn't want to be a serial entrepreneur and start little things. I wanted to be a technical founder to a technology company that would make a big impact in the world and spend decades of my life in it.
So with that thinking, I decided to go to grad school. So I got my PhD at Stanford in engineering also, and when I started my PhD, I was looking for that, what would be that next big change that I could work on, and I turned to science fiction. What do they have in Star Trek, for example, that we don't have? There are great product ideas in Star Trek advice for entrepreneurs, but some of them were a little far-fetched, like spaceships that go faster than light, teleportation devices, and holodecks and replicators. But the one that was less obvious was voice AI. Like they talk to robots and computers and they had conversation and ask them to get things done and get information. And I thought, this is going to happen for sure, and it's going to happen in my lifetime.
And I wanted to be a part of that transformation. So with that thinking, I chose my PhD thesis to be in speech recognition and machine learning, then founded SoundHound in a dorm room at Stanford with my co-founders, spent 10 years in stealth R&D to build the voice AI technology, unveiled it in 2015. And today, we power millions of cars and TVs, and we are in thousands of businesses and hundreds of large enterprise brands. And, again, we are so fortunate that we are here.
Yeah, and what's so important about voice and why is that the killer app for, you know, for information and AI? What is it about voice that made it so appealing to you?
Our business strategy is on top of two predictions. One is that speaking is the preferred way to interact with devices. And AI customer service will be as necessary for every business as Wi-Fi and electricity. So you sign up for a website, you sign up for electricity, and you sign up for AI customer service. And these are the two pillars of our business. In pillar one, we power devices like cars and TVs and IoT devices. In pillar two, we power AI customer service for businesses. And voice really unlocks the power of generative AI in these two pillars. And generative AI unlocks the power of voice also. So it's like the timing is so perfect.
Yeah.
And it creates this intersection that is very rare, the intersection of adoption demand and technology readiness, right? So we have, you know, the opposite examples where, you know, in autonomous driving, the demand is there. People want autonomous driving, but technology is still catching up to that demand. And then you have virtual reality where technology's amazing. I mean, Oculus. I tried it many years ago, and it was amazing. But still, you don't see the adoption, the mass adoption. But voice and conversational AI. You have the intersection with finally, the technology keeping up to that science fiction promise.
Mm-hmm.
People are adopting it.
So, what about when we think of other big, you know, and natural language processing agents, Siri, Alexa? How are you different in how we're, you know, what's your advantage when you think about that?
So several advantages. One is the core technology. So it's our core DNA is technology innovation. We have the core technology in-house. There are very few companies that have this stack of technology, like your own speech recognition, your own natural language understanding, and so on. And we've always been, in our opinion, ahead of others. We were the first, for example, back in 2015, to show very complex and compound queries that really delighted our audience. And we continue to innovate. And owning the technology also allows us to make our products better as opposed to being an API user of another platform. Second is data.
Mm-hmm.
We, you know, there are data companies that are now worth tens of, you know, over $10 billion. But we have more data than, you know, probably in the top 5, top 10 companies in terms of data. We've been live in production for almost 20 years, conversations and voices in from millions of users in dozens of languages. And that data is priceless, in my opinion, then I would say the ecosystem of our partners and customers and channel partners and investors that really want us to succeed. We are in millions of devices. We are in, you know, thousands of businesses. And those integrations are really hard to achieve. And the last thing I would say is our business strategy. We are in an area, a focus that is too hard for new players to catch up.
It just takes a long time and effort and resources. It's too unwise for the Big Tech to focus on, right?
Yeah.
That gives us another advantage.
And, you know, by the way just to add why to your question on, like, why voice is different.
Yeah.
If you just think of human history, tens of thousands of years, we naturally have most increasingly, you know, communicated via voice.
Right.
You have to learn how to type on a keyboard. You have to learn how to text really fast with your thumb.
Mm-hmm.
But a kid learns how to speak very early on. So as the technology has now become ubiquitous and pervasive and it's easy to communicate through natural conversations, it just unleashes whole new workloads of what you could get done with the human, technology interface.
So, Nitesh, when you think about the verticals that you're going after, and you know, one of the big ones is restaurants, obviously, quick serve. You guys are constantly coming out with press releases around that. What's the advantage from a financial perspective to the businesses? What's the advantage of using, you know, voice AI versus their traditional ways of doing business?
Yeah. It's multifold. You know, we've gotten traction over the last couple of years, and at first, just sort of post-pandemic, it was labor shortage. They just didn't have and even now, there's challenges of getting the resourcing. We had an example of a restaurant that we kicked off first in the Midwest, and then we were scaling towards the Southwest. Last year, when Taylor Swift was going through it, which I guess her Eras Tour just ended this week, so she went in St. Louis, and after the concert, there were just hordes of customers that came to White Castle, one of our earliest drive-through partners, and the staff subsequently only had two or three people there working there, and their AI system they called Julia.
Mm-hmm.
They were just raving about how Julia was such a great resource to help take the order so they could do food preparation or handle other customer service inquiries, so it really was a, a dependent worker that never gets tired, always on 24/7. You know, you can train it to upsell, and it'll never forget to upsell, so just to mitigate labor shortage. There's obviously a cost benefit. Again, you're not, you know, you don't have retraining and so forth. You don't have just, you know, the cost of acquisition of, of new talent. So there's a cost benefit, and ultimately, there's a, a cost, sorry, and ultimately, we're shifting, and we're seeing more and more data on revenue upsell, so now it's becoming a strategic pillar. We see we have a large major QSR in the chicken sandwich side that's seeing ticket prices uplift 10% to 20%.
$20 ticket prices going to $20 to 23, and that completely changes the architecture for a restaurant to say, "Okay. Now, not only is this good consistent acquisition of talent, cost arbitrage, but it's also revenue-generating as well." This is one of those things. It's early days still. We're getting 10, a lot of traction, as you noted. A lot of new customers were signing up. This is one of those I think you just don't turn back from. When you get it, there's a little bit of upfront work, but then it scales very well. That's what we're doing.
And you talk about the large, you know, Torchy's Tacos or White Castle, a lot of these bigger deployments. What about smaller kind of restaurants that are two, three, four? Can they deploy your solution, or is it only something you can do at scale?
Go ahead. Yeah, so we actually, we have built AI solutions for, you can be, a single location plumber and barbershop, and can have your own AI customer service in a matter of minutes. We call that Smart Answering.
Mm-hmm.
It's self-service. It really, you go sign up, and you enter your website, your phone number, answer a few questions, and you can customize it as much as you want. It learns as people use it. We really believe that's like a $100 billion TAM that is not addressed. Then we go all the way to, you know, the top 50 enterprise brands. You know, if you automate even a small percentage of their volume, the saving is in the millions. We are actually in the broad.
One of the ways we early on got traction was through integration with point-of-sale partners. So we integrated with Square, Toast, Olo, Oracle, Micros Symphony. And so that allowed us to sort of penetrate into those smaller, restaurant franchises, at least. When you get to larger QSRs, they oftentimes have custom point-of-sale systems. So we have our own gateway to kind of integrate quickly.
Got it. And as we think about the transition from pure, you know, language or voice processing into what you went with Amelia, can you talk a little bit about that and how that extended and expanded your breadth?
Sure. Yeah. You know, and maybe it's a good moment to kind of just talk through our business model and architecture.
Mm-hmm.
As Keyvan noted, we voice power products like cars, TVs, IoT devices, and it's a royalty business for us, and as Keyvan mentioned, we're in millions and millions of cars. We've seen a lot of opportunity to the second key assumption Keyvan had about voice-enabling customer service, and so restaurants was our first foray, but we absolutely saw this as similar to how Amazon entered with books, but they had a long-term vision of e-commerce for us that was very similar vein of, we wanted to enter with restaurants. It suited very well for the sort of feature and facets of our technology where you need precision. You don't just order any old pizza. You know, you want pepperoni and sausage, and our technology worked for that type of precision.
So restaurants made a lot of sense as the first foray, but we absolutely saw it as a disruption across all customer service. And Amelia allowed us to accelerate that. So to have now enterprise customers, deep money center banks and, you know, yeah, banking, financial services, healthcare, insurance, retail, hospitality, they had those customer bases. And there was a real complement on the technology side where a lot of those customers, as we were diligent in saying, really wanted voice AI as the next horizon. And so that complement has been really beneficial. So yes, we've now extended into an array of different industries. And we're just sort of, again, at the precipice of massive disruption that we're driving here.
And when you talk about your presence in automotive and you talk about your presence in restaurants as an example, is there a marriage between those two? Is there an opportunity? And what does that look like?
Yeah. So that's our third pillar.
Okay.
And so in pillar one, we power devices like cars and TVs and IoT devices. In pillar two, we power AI customer service. And then we had this vision years ago that, well, we should connect them, right? So if you're driving your car on the road and you're already talking to your car to control the air conditioning and radio and navigation, and then you want to order food at a drive-through, that we also power. Why do you actually have to go in the drive-through and go in line and wait to get to the AI kiosk? You can talk to your car, and this agent can talk to that agent. So while you're talking to your car, you know, 20 minutes before you get there, you can discover restaurants that take order, and you can place an order.
And you finally have the scale to make this happen. You know, the vision we've communicated for years, and it was a matter of having enough coverage, you know, brands nationally and also enough cars and TVs. And we finally have that. That we expect will generate the third pillar of our business that will also create a flywheel effect because, first of all, it creates value for all parties involved, right? So the most important one is the driver, right? The convenience of ordering in advance. You know, beyond just food, you can make appointments, make reservations, buy groceries, and so on. The merchants, like restaurants, get new leads. We monetize this lead generation moments, and then we share that revenue with the car makers. So they also generate revenue on a recurring basis, in most cases, for the first time.
So that should create a Flywheel Effect where more merchants will sign up because of the lead generation opportunity. More devices will use us because we've monetized for them, and more users will use it because there's more that they can do.
And has that already come to fruition, or is that still in its initial stages as you think about the lead gen model and all of that?
We have announced that we will unveil how it works actually with a live demo at CES this year in January.
Oh, so right around the corner.
Yeah.
That's great. And then, Polaris, your large language model. Can you talk about that a little bit or how we should think about that in the broader? I think you announced it in Q3. Just how that relates to the business and where that's going.
SoundHound is a technology company. We build a lot of tech in-house. We use technology innovation outside of our four walls, but there is a lot of innovation that happens in-house. We have a lot of data. We have a lot of know-how. Polaris is our own foundation model that is multimodal and multilingual. Audio input, text input, audio output, text output, visual output in some cases. We are already seeing, you know, massive beats of other models in terms of accuracy and speed and cost.
Mm-hmm.
And we have gone live now in one-third of our restaurant AI agents. And it's markedly better than, you know, what we had used before. And it gives us opportunity to do a lot more. One key difference I wanted to explain about, for example, what SoundHound does and what companies like OpenAI do. OpenAI is a great company, and we use some of their models, and they have really pioneered. But they, when they put something out there, it can be amazing 70% of the time, and it could fail maybe 30% of the time. And that's okay for their audience because their audience are there to see a glimpse of a future of the future, right? So when it doesn't work, it's okay. It's gonna get fixed eventually, right? But our audience is not forgiving, right?
You can't be the AI customer service of a large enterprise brand and be right 70% of the time and hallucinate or, you know, really bad outcome 30% of the time. I think a lot of other players are suffering because of that, because they see this opportunity of AI customer service as one of the biggest opportunities of generative AI. You can't. It's hard to go from those demos to actually go live. That's something we do really well. We can reduce hallucinations by orders of magnitude, make them negligible. Having our own models, our own data, and decades of experience really helps us.
Got it. And just out of curiosity, are we thinking of this as a domestic business, or is it a multi-language capability here? And how are you growing internationally?
Oh, we support dozens of languages. We are already live with customers, mostly in pillar one, in dozens of languages. We power cars, for example, Stellantis brands in Europe. We are in India in multiple languages. We are in Asia. But pillar two, big concentration of that is U.S., but you can see how the TAM will increase as we go to other languages.
And, I imagine, a lot of the restaurant and is U.S. because the drive-through availability and commonplace. Can you go?
We are actually live in three continents in drive-thru.
Okay. Great.
We are live with customers. We are working with 30% of the top 20 QSR brands. A lot of them are live already. Some of them are live but quietly, just to test and pilot. But we are in three of the top four pizza. We are expanding with Chipotle, Casey's, just announced Church's Chicken, Torchy's Tacos. And we are in over 20 car brands. And over 200 enterprise brands, including 7 of the top 10 financial institutions.
Nitesh, you think about the financials are starting to catch up with the opportunity. What are you most excited about from, you know, what you can say about kind of the opportunity ahead? And is it the monetization that could come off of, you know, the lead generation business? What is it that you think about?
Yeah. I mean, I'll, I'll start to extend off the last question. If the TAM is massive, we talk about $140 billion TAM, but that might be understating it in terms of the ultimate commerce opportunities that we can address. Even though it is a global opportunity, if you look at just the U.S. and just take drive-throughs, although we're doing phone ordering with, you know, several Chipotle, Jersey Mike's, etc., but just on the drive-through, there's nearly 300,000 drive-throughs. And by the way, strategically for restaurants is to have more convenience and pick up and go. That's actually a growing part of that sector. You know, if you do the math on scale times price, you'd get billion-dollar revenue plus per year easily for us. We have a running start competitively. We do believe there's some other players.
There's some legacy players that we don't believe have the tech to compete. And then there's some other, you know, larger players that are trying to come in, but it's just hard. So we have a real strong running start, deep competitive moat in a massive TAM. So just that one vector is a massive opportunity. We're a smaller, early-stage public company. We want to catalyze and do successful things and then grow to the next successful thing. But in the grand scheme of things, I'll go back to an earlier point. Like, I think I'll start with the vision, right? We do believe voice and natural conversations will unlock completely new pathways of how humans interact with technology. This is the next major horizon. This is the Gen AI era. Natural language conversation is the primary interface, and we believe voice AI is a killer app.
But we do believe in customer service in particular, going beyond restaurants into financial services, healthcare. Think about the accessibility opportunities for pervasiveness of a personal tutor or so. This broader what we're calling, you know, agentic AI revolution is absolutely here, so we're super bullish. For us, we need to be limited and, you know, sort of focused on the key things that we can address in the near term and then move on to the next, so I mean, I mean, that's what we've been trying to do. And we, we know the opportunity's so massive. It's growing so fast. We, we have to be agile, aggressive, and be thoughtful. So, you know, we've been doing this organically. We've been growing historically over 50% CAGR consistently.
And we've amplified that with some M&A as well this year, which we think allows us to accelerate that customer journey and customer adoption.
There's, you know, I read an interesting article about the use of AI in training. Can you talk about it a little bit, especially with training people up, seasonal employees, things like that? Because it seems to be an extra interesting little facet of what's going on inside of Amelia, where people are really getting a lot of value out of that. So that'd be interesting if you could talk about that a little bit.
Yeah. I'll actually touch on it a little bit on one of the products I think we briefly mentioned, Employee Assist.
Yeah.
Which actually there's a similar thing in Amelia. They call it Amelia Answers. And think of this more broadly as like the ingestion of any content that you can quickly train and then be either a very helpful assistant or even better, ultimately more autonomous. But one application I'll go to, the restaurant application again where you can ingest an employee manual and train an employee how to fix a machine or clean a machine or make a beverage or all these types of things. We have a similar application where you can ingest an operating manual into the automobile. So now, and the distinction between us and big tech is we're integrated with mechanics and the electrical system in the automotive.
Now, if you have a light that shows up on a display or you have a flat tire or something, you can just communicate with your car to get that advice rather than having to thumb through your glove compartment to find the page on the manual or to call on the service. So these types of complementary capabilities, whether it's in training, onboarding, yeah, I mean, just even for a user, to be able to do all sorts of capabilities, is the technology's there, and we're seeing more and more use cases.
And how close are you to the end customer, Keyvan? Do you know, do you have a regular conversation with your customer and hear from their customer's customers? Or how are you working that feedback loop, and how are you guys adjusting the technology to be, you know, better for the customer experience?
Yeah. So it's a wide range. So for small businesses, there's a lot of it is automated.
Mm-hmm.
It does learn from interactions. Then, for our automotive partners, there are constant surveys that we do have visibility into the interactions, and we can automatically detect if something can be improved, and we make those improvements. That's how our models have become better over time.
Mm-hmm.
But we also run surveys and get the results and react to it and share them with our customers and so on.
And you talk a lot about accuracy being important. How, what is the speed dynamic like? So I spent some time on TikTok looking at a bunch of White Castle drive-throughs, which you guys are TikTok famous, and looking through some of that and watching the experience and the time it takes. Is it meaningfully quicker? Is it about the same as human? You know, what is the how do we think about that?
So speed is one of the items we pitch to our customers that the throughput is gonna go up. And we have like side-by-side demos, I think, that others have created. One of the technologies we made was Dynamic Interaction. We pioneered that concept where this concept of turn-taking, it doesn't always have to be like that when you talk to an AI, right? So, for example, like with some of the brands like Alexa, for example, so Alexa, you wait for it to acknowledge you, then you ask, "How's the weather?" You wait for it to respond, and then you're gonna follow up. You have to say, "Alexa, again," and then, "How about tomorrow?
Mm-hmm.
That turn-taking is slow and tedious and not natural. Our first version of drive-through was similar to that. We, you know, the driver had to ask a question, and we had to acknowledge what they said, and we had to show them what we think they said, and then they had to acknowledge it, and they had to wait for each other, and sometimes we interrupted each other. We came up with a solution to that called Dynamic Interaction. First of all, the impact of that, I think, is almost like Apple's multi-touch. There was touchscreen before, and then Apple did this multi-touch that just made everything else obsolete. That's how I think about Dynamic Interaction where there is no turn-taking anymore. You just talk. You can talk to the person next to you.
You talk to the AI. The AI listens to everything and decides where you're at when you're talking to it and when it needs to take action.
Mm-hmm.
And when it takes action, sometimes it just updates the screen without speaking back to you. Sometimes it speaks back to you. It just tries to be smart about making you more efficient and natural. And that is Dynamic Interaction. There are really good demos online that you can watch.
And you can, you know, to give you a sharp example in drive-throughs, we've seen data now with some of our partners where 72-second sort of typical order is now going down below 60 seconds. So from the throughput point on speed, we're already seeing commercial benefits. And then you can think of the visual point Keyvan just raised. Instead of having to say, "Would you like to add some dessert with that?" you can just flash up a you know like a sundae and just human behavior. You're just like, "Oh, yeah. Maybe I'll get that sundae." And it doesn't take any increased time. And we are seeing some early signs of, you know, ticket price increases like I mentioned earlier.
That's awesome. And when you think about what's gonna happen in the next 10 years, right? So we won't go 20 years. We'll go 10 years. What does SoundHound look like in 10 years? What, how is it being used?
If you've got any of our products, again, I always ask myself, "What will happen for sure?" Right? Let's just talk about drive-through. Do you think like three years you will go to a drive-through and you will talk to that busy, distracted, tired person who's also making your food and doesn't wanna talk to you? Like, do you really think that's gonna happen? Or so and when was the last time you actually had that experience? And it was so good that, "I'm gonna go write a Yelp review for this restaurant because I just had a really great customer experience," right? So that kind of interaction will be automated by AI in hopefully a lot faster than three years. But I think in three years, it's all gonna be automated.
We like to be in places where the puck is going, and that's one of the areas I really believe in. I think the same thing about AI customer service. In the past, when you called a number, a customer service number, and you got an automated system, you got frustrated and pressed zero to get to, you know, talk to a human, and you just immediately wanna switch. That's gonna be one that's gonna have a 180 where you actually prefer to talk to the AI, right? So first of all, there's no wait time for AI, right? So you don't get put on hold for 45 minutes. It always picks up the phone. AI's gonna be more knowledgeable, more patient, more polite in many cases, and can do a lot, a lot more things.
So I think every business is gonna be AI again. AI for every business is gonna be like Wi-Fi and electricity. And we think SoundHound is gonna be one of the dominant leaders or prominent leaders of that.
Fantastic. We have a couple minutes. I wanted to open up to any questions if anybody had any questions for Keyvan or Nitesh. Yeah. Go ahead.
You mentioned that the data you ingested were. I'm sorry. I'm trying to clarify. Like, the data that you're ingesting through the QSRs are your proprietary data, or it becomes the QSR's proprietary data? How does that data get kind of stored and translated?
Every deal is different, but our strategy is to be friendly to all parties involved while we extract the reasonable value from that data. My opinion is that data is the end user's data. Like, if you speak to a device, your voice is your data, right? These products get a license to use your data, first of all, to serve you, for what you just asked for, and then use that data to improve the models in a generic way to serve you and your others better. That's how we've structured it, and it works well. We give a lot of control to our customers. If you wanna delete your data, if you wanna let your users delete your data, if you wanna delete the data by default right away, all of that is available.
But because we are in so many devices and so many customers, the amount of data that we get is massive and we are able to use it to improve our models. So we get voices, for example, from, you know, millions of users in noisy environments in lots of languages. You know, one example is actually very clear on my mind is the audio you get at a drive-through is so noisy, right? It's next to the freeway. The cars are passing. It's raining and just the audio quality, even for a human, is hard to understand what, but that's where AI actually is better than human. So our models, because we get that data, we learn from it. We can actually handle data like that.
I'd add on that I think this is one of our differentiating factors and even compared to the big tech because we are brand-centric, so we customize data. We customize privacy. We customize security, so we can do things that are sort of flexible and in service of the brand and their end customers, and that's just increasingly important as we move forward.
Great. Any other questions?
Is there any duration associated with the backlog numbers that you provided?
I've said on the last call, it's generally about six to seven years in range. Depending on the industry, the OEM, the auto side tends to run that, you know. Some of our contracts on the restaurant side, they actually are sometimes month to month, sometimes one year, sometimes three year. It really depends on the industry. When you get into enterprise, you know, financial services, healthcare, those tend to be multi-year contracts.
Is the backlog contracted? Like, they have to pay it?
There's a mix. So like generally, the backlog metric we convey is like signed customer contracts. So it's like a total contract value. But depending on the construct of the customer, sometimes there's projected volume. So we often have like upfront work that's part of it. We might have even received the cash that just has to amortize for revenue recognition rules. There could be minimum commitments where no matter what, yeah, it's bulletproof. We'll get the collection. But sometimes there's also volume projections based in there as well.
Obviously, those volume projections don't happen.
We would, yeah. You would adjust.
Absolutely. Pretty typical.
Great. I think we're out of time here. Oh, we got one more. Go ahead. Yeah.
Yeah. This is a question. In terms of your IP, do you license that to big tech, today, or do you have plans to do that in the future? By big tech, I mean companies like Meta, Amazon, Google, Apple, that sort of thing.
Well, yeah. We have patents, and we have IP that can be licensed. In our history, we have had licensing of our technology to big tech players over time. That has been in our history. Whether it's something that we have, we're able to announce in like specific names, probably not at the moment.
Great. Keyvan, Nitesh, thank you so much for being here. It's been an honor.
Thank you. Thank you.
And, thanks everybody for attending.
Thanks.