Good afternoon, everyone, and thank you for joining the H.C. Wainwright's 26th Annual Global Investment Conference. My name is Max Mah r. I'm an analyst here on the Corporate Access team. At HCW, we have a total of 24 publishing senior analysts and over 600 companies covered across all sectors. Please visit hcwco.com for more information. With that said, have a productive and enjoyable day, and I'd like to introduce our presenter. I'd like to welcome the SoundHound AI team.
Thank you. Well, thank you for having me. Thanks, everybody, for joining. My name is Nitesh Sharan. I'm the CFO at SoundHound AI. We're a conversational AI company that focuses on voice. We believe that voice as the interface through which humans have interacted for thousands, if not tens of thousands of years, has been elusive in our interaction with technology, and we're here to really change that. We believe that the next major shift in how humans interact with technology is gonna be increasingly through voice, in particular, building more natural conversations. I'll walk you through a little bit of our journey. I'll walk you through our technology.
I think the best way to understand what we do is to see it live in action, and so I'm gonna show you just a couple of demos of how we apply and what we're doing, today in the real world. And then, I think at the end, we'll have time for some questions. Just disclaimer, we're a public company, so I'll keep everything, aligned to forward-looking statements and things that, we say in the public domain. This first demo, this is actually technology that launched about eight years ago, and when I play it, I hope one thing you'll see is that it states, even at that juncture, how advanced it is from a technological capability in terms of speed and interaction, its technology's ability to understand conversations, even comparative today to diversity versus your typical voice assistants and so forth.
So it's a quick 50-second demo. The person speaking is our co-founder and CEO, Keyvan Mohajer. I'll talk more about their journey in a second, but I just want to play this to give you a flavor of what we do, what our technology is. This is fully home-built, our proprietary technology.
How many days are there between the day after tomorrow and three days before the second Thursday of November of two thousand and twenty-two?
Two thousand seven hundred and fourteen days between the day after tomorrow and Monday, 27 November 2022 .
What is the population and capital for Japan and China, and their areas in square miles and square kilometers? And also tell me, how many people live in India, and what is the area code for Germany, France, and Italy?
The population is 127,288,000 for Japan, and 1,330,044,000 for China, and the capital is Tokyo for Japan and Beijing for China. The area is 145,882 sq mi and 377,834 sq km for Japan, and 3,705,386 sq mi and 9,596,959 sq km for China. The population of India is 1,173,108,018. The area code is 49 for Germany, 33 for France, and 39 for Italy.
Thank you.
You're quite welcome.
This was a technology that was launched in, actually, 2015, and subsequent to that, that was sort of tech that we had built in-house and stealth for greater part of a decade. Keyvan himself and one of his other co-founders were doing their PhD at Stanford, met one of the third co-founder, who was doing his Bachelor of Science, and they really had this vision of voice-enabling the world. And Keyvan was inspired by science fiction at the time. He kind of looked at, you know, what do they have in science fiction that we don't have in real life? And at the time, he was inspired, particularly by Star Trek, and he noticed how you could talk to anything and everything.
You could talk to robots, the elevator, your coffee machine, and so forth, and just have natural conversations, and things would happen, and he said, "Well, you know, that absolutely will happen in my lifetime." When he compared it to other things, like traveling faster than the speed of light or replicators or transponders or so forth, he didn't think that that might happen, but definitely voice AI was something that he could build, so he started on that journey, and ten-plus years later, he was able to launch our platform, our voice AI platform. It got a lot of traction. The day that we had this. This was actually a grainy video that a journalist had sort of captured, went viral on Reddit, had 2 million views on that day.
Most of the time, when people see this demo, they kind of go, "Wow! You know, that's pretty impressive. Even better than human capabilities," because most people don't remember the population of every country out there or the area code or zip code and so forth. But you can quickly scrape the web. You can quickly access data domains that the internet has accessible and real-time have a conversation, get any information you want back, and so forth, and that was sort of the vision that was built. So, you know, with that, our ultimate mission is about voice-enabling the world with conversational intelligence.
Again, we think there are many, many use cases, so our goal is to build technology that exceeds human capabilities, and we're trying to create these ecosystems and interactions where we can provide value and delight to consumers all over the world, and we're doing this by integrating an ecosystem of billions of products. I'll unpack a bit more of what those are, and really ultimately trying to create new monetization streams, and I'll talk more about our business model as well. Our journey, again, I mentioned sort of foundings in the mid-2000s, pre-iPhone days. Really, we're working in stealth for many years. We got some notoriety around our music recognition app, which was, at the time, state-of-the-art, ahead of competition.
You could just hum if you had a song in your head, and you could hum and, you know, query and identify it. That was technology that, you know, even the big tech didn't really unlock until a decade later. So we always had this sort of technological angle, core engineering foundation, and deep innovation and technology, and we built up an IP stack in the hundreds that is a real differentiator and competitive moat for us. When we launched the platform in that video that you saw, we were able to get traction, and you can see that in the middle of the page with major brands, and particularly around voice-enabling products, and most specifically in automotive.
The automotive vendors, OEMs, were looking for a sort of replacement to the incumbent player and wanted somebody with better innovation, better technology, more flexible, faster, all those things. We got traction with the number of brands to where now, even today, we have over 20 of the global brands. You can see some of those on the page. One of the benefits of starting with automotive was that its global footprint required us to build out languages. On the bottom of the slide, you can see the languages or flags at least. We're in over 25 languages. We have a roadmap of another, you know, dozen or so, and we're in many acoustic variations, meaning there's a lot of background noise, a lot of different environments, but also different accents.
You know, English is very different in the U.S. as it is in the U.K., as it is in parts of Asia, as it is in Australia, and so forth. We've built that so that the technology can be global in footprint. We went public a couple of years ago. We've been scaling. We've been growing at a CAGR of over 50% for the last several years, and our outlook is to continue to grow and even inflect faster. We have been in investment mode historically. We are targeting getting to break even next year. You can see the latest inflections in our roadmap have been to expand into voice-enabled services, and you can see on the right side of this page, in particular around food ordering.
So we're getting a lot of traction with major QSRs, and think of this as, you know, when you go into a drive-thru, for example, most people don't go, "Wow, that was the best experience of, you know, ever!" But automation can really improve. First, you know, restaurants, particularly post-pandemic, have seen major labor shortage issues, cost pressures, inflationary both on the labor side as well as commodity costs, inconsistency of service. One of the benchmarks we use to talk about how effective is the technology is order completion rate. So if you, as a consumer, go into a drive-thru or call in for pizza, what's the percentage of time that you get exactly what you want and what you paid for? And what we found post-pandemic is actually humans themselves are not perfect.
I don't know about everybody, you know, personal experiences, but again, oftentimes you'll order cheeseburger without pickles, and you'll still get the pickles, or you'll say you wanted a, you know, large French fries, you'll get onion rings. Those things happen all the time. We're now launching and scaling with restaurants, where actually, day one out of the gate, we're at comparable to human performance, and in fact, in many cases, even exceeding it. The tech never gets tired. You know, runs twenty-four seven. Some of the partners we have are twenty-four seven restaurants. You know, once you train it, you don't have to retrain it, like, you know, your summer hires and so forth. And by the way, you can program it for upsells. One of the benefits is not only cost arbitrage, but it's also ticket upsells.
So we're seeing, in some cases, 10, 15, 20% uplift in ticket prices, which now all of a sudden changes the game and equation for QSRs to actual revenue generation opportunity. So scaling, again, you can see some of those brands that we've been doing. So to give you a little more flavor and being cognizant of time, there were two other quick demos I wanted to show. One is the application in the restaurant side to show you its use case. We also believe, by the way, sometimes the most efficient mechanism isn't necessarily voice-to-voice, meaning when you speak in and you say, "I want a cheeseburger, French fries, and Coke," you don't necessarily want to wait for it. Sometimes it's better to have a visual confirmation board, and that's what you'll see in this next demo. So I'll just play this really quickly.
... Dynamic Interaction from SoundHound. That means natural language understanding that happens as fast as you speak. The system is already listening. Watch the cart update in real time as we order. Can I get a pulled pork burger with a deli bun and mustard and pickles? I'd also like a kids' mac and cheese with milk and yogurt and a large Coke and french fries. We can edit the cart with voice. Make the Coke a small, I don't want the french fries, and change that to apple juice for the kids' mac and cheese. Notice how with Dynamic Interaction, we also get real-time suggestions triggered by what you say as fast as you need them. Clear cart. Classic cheeseburger, Swiss cheese, avocado, lettuce, and no mayo, and the suggestions pop up right when we need them. With a deli bun.
On the side, I'll have mozzarella sticks, and to drink, a vanilla milkshake, and for dessert, oatmeal raisin cookies. I want to modify a couple things. Make that a chocolate milkshake and remove tomatoes from the first one. Dynamic Interaction also supports multimodal interaction, such as touch and voice input at the same time. We can touch to choose an item, make that chocolate chip cookies, and we can swipe to delete, and sometimes you need to ask a question like: How much is a brownie?
The price of a brownie is $3.
Okay, add a brownie. Let's go faster. Start over. Pulled pork burger, lettuce wrapped, brownie, onion rings, spring water, vanilla cupcake. Delete the first item. French fries. Get rid of the spring water. Strawberry milkshake. Scratch the onion rings. Side salad with blue cheese. Delete the brownie. Chocolate chip cookies. Remove the French fries. Classic cheeseburger with American cheese, bacon, avocado, lettuce, tomatoes, ketchup, mustard, and pickles, and no onions, with a double patty on a pretzel roll. This is amazing!
So there's a few technical things to unpack here, but in the interest of time, I'm going to move forward, and actually, I think this video, just to make sure there's time for questions, I'm probably gonna skip it. Just imagine it's the best video you ever saw, but we do have it on our website. We're happy to send it to you, or you can go on the website. This one actually shows the application in the auto space, talks how we're integrating, you know, our technology to do both in-car controls, to integrate with cloud services like directions, navigation, weather, what happened in the game last night, what's going on with the stock, but also, importantly, integration with LLM, so we do now, and one of the, we think, differentiators from a tech stack is we've built the voice engine.
I mentioned deep IP stack. Well, also, we can integrate and sort of arbitrate between different LLMs that are being built, and, you know, there's a whole scale of them. We believe there's different fiefdoms being created between the Microsoft OpenAI ecosystem or the Google Gemini Bard ecosystem, or et cetera, et cetera. But we are going to arbitrate amongst and across all of them, and we think that's a differentiating capability. So you'll see in this video, if you get a chance to take a look, that it talks about somebody saying, "Hey, can I get directions here?" Or, "You know what? There's this weird sound on the brakes. You know, what's the problem?" We can ingest the operating manual of a car and actually have voice-enabled conversation for a driver, rather than having to go to the glove compartment and thumb through an operating manual.
We can... If you have your daughter in the back, you want to put her to sleep, you can say, "Could you please play," you know, or, "read a story to put her to bed about the cow jumping over the moon?" Like, you do those types of things now that allow major expansion to use cases, and we're seeing tremendous growth in terms of the usage. One of the measures we talk about is, in terms of consumer engagement, is queries, and we now have a run rate of over five billion queries that's been growing well north of 50%, for several years running. So this is, this is really happening, it's expanding, and we're making it live. Two really quick final ones.
To give you a sense of how we make money, what's our operating model, what's our revenue generation? We have these three pillars. On the left, we voice-enable products, so think of cars, TVs, other IoT, like smart appliances and so forth. One of the real benefits is that to unlock the power of the internet, not a lot of products, particularly with the growth in IoT, have the capacity, scale, or economics to have a keyboard, a GUI interface, et cetera. But all you need to unlock the power of voice is a small, inexpensive microphone. And so that, you know, hundreds of billions of products being produced every year and unlocking it, for us, this is a royalty stream. In the automotive application, think of it as a royalty per car as they're shipped. We get economics on that.
Historically, 2023 and prior, that was the biggest bulk of our business, over 90%, et cetera. But we're really growing in pillar two, which is services. So the food application I showed you is one vector of that, but you could think more broadly into customer service, appointments, reservations, other purchasing opportunities. This is more of a subscription SaaS model. So with a restaurant application like a White Castle or a Jersey Mike's, et cetera, it's more of a fixed fee per month per location on a recurring basis. And then ultimately, our vision and what we're trying to grow is monetization, and that's where you bring together voice-enabled services with voice-enabled products, and you can add on new economic streams and voice commerce and so forth, there ads or so forth. For example, imagine one example, your...
I'll just do it 'cause it's the time of year. You're watching football on a Sunday, and you want to order pizza. You can talk to your TV, and you say, "Hey, Vizio, order me Papa John's," and it can deliver your pepperoni pizza right in time for halftime, so you can get back for the second half. So that type of interaction or a car interaction with coffee, you know, on the way to work, those things allow for both lead gen economics and also, transactional economics, that our model is to share it with the product creator, which really entices. It's getting a lot of traction, particularly with automotive and TV manufacturers, because they're now seeing whole new revenue streams, that they're very excited about.
Last thing, we've been, you know, we believe fundamentally that it's, you know, there's a tremendous organic growth opportunity. Our growth historically, the 50% CAGR, has been fully organic. We have started to do acquisitions. We've done, actually just recently announced our third acquisition. And we believe that it's gonna happen both organically and inorganically, and we're trying to be very thoughtful and fiduciary stewards, but we think having a programmatic M&A profile makes a lot of sense. This most recent one, Amelia, it's actually headquartered here in New York. Very excited about. They have a long legacy of building very deep enterprise-grade integrations, and expansion into use cases around financial services, healthcare, insurance, retail, hospitality, and deep integrations. And what's also exciting for us is this common, sort of like, amplification across channel.
So they have very heavy text and chatbot-type interactions, and we can bring our voice capabilities on top of them. So very excited about this acquisition. Just in the interest of time, I think I will pause there and open it up for questions. Thank you. Thanks for the clap.
... We appreciate you from the team of SoundHound AI, and for everyone who is curious or has a question, I guess just raise your hand, and then we can go around.
Yeah, sorry. Yeah. Can you talk a little bit about the integration and what kind of lift some of these acquisitions are and how that impacts-
Yeah
Timeline to profitability?
Yeah. I mean, the three we did are very different flavors. One of them was in the restaurant tech space, which was an integration of team that had deep restaurant roots and expertise and had some integrations with some, like, you know, really good customers, like Chipotle, et cetera. And we embedded that deeply in our team, and it was complementary. You know, it was sort of a bunch of, not me, but bright Stanford engineers coming together with restaurant people and accelerating the journey. And so that was more of a quick integration. The one we did in the summer with the, which was more of an acquihire, a team dedicated, focused on building out that monetization pillar. Now, Amelia is more of running a bit as its own standalone.
Deep, you know, opportunities for integration across the voice stack and in the cloud back end, but it will run a little more independently. It's sort of the go-to-market motion with enterprise is different. So they all come in different flavors, different time horizons. Ultimately, we look at this as which one can be generative and accelerative in terms of our consumer journeys.
Nitesh, can you touch on the balance sheet real fast, and where are you-
Yeah
with the cash and
Yeah, we were excited. We had you know, we had paid down some debt earlier this summer, and we were, you know, even after doing that, had north of $160 million of cash with no debt, and so very healthy from our profile of getting to break even next year, in a very strong position. Have a lot of call it liquidity to operate and deal with the, you know, the ebbs and flows of the markets. We did this acquisition, which was predominantly stock-based. We did pay off some of their debt. So generally, we're in a very good spot, overall and, you know, feel very strong. Gives us the opportunity to scale.
Great. Thank you.
Go ahead.
Yeah, on your acquisition, is there a strategy that you are working within?
Yeah.
Are they more tuck-in acquisitions?
Yeah.
Or are you looking for something random?
Yeah, so our strategy kind of goes and starts with the mission. You know, we want to voice-enable the world with conversational intelligence, and we want to do that thoughtfully and programmatically. And then we kind of think through the three revenue pillars as some of the architecture. We want to voice-enable products, voice-enable services, and then monetize around that. But a lot of that is threaded based on consumer journeys. And so we look at any partnership, when we were landscaping all the time, to go, "Who's a new competitive threat? Who's a good partner to work with? Where is an acquisition and opportunity-accelerated journey?" And they are generally things that we can absorb and integrate, so we're not looking to do massive transformational deals.
We really see massive organic opportunity, but if there's a way that we can accelerate and, you know, maybe what would've taken us five years on our own, we can do that in one year, that makes a ton of sense. You know, I think of it both in cost. You know, ultimately, the value proposition is, can we get a return on capital well in excess of risk-adjusted cost of capital? That's the financial framework. You know, how do we feel about integration, our capabilities intersecting with theirs? So for example, in some of these, we have. Sorry, I'm speaking too quickly. If I know I do that, so I'll slow down.
But you know, we have our speech recognition engines that a lot of times, these partners, Amelia is a good example, but Sync 3 was also another example, where they're third-partying and paying others, like Microsoft, Google, for speech recognition or natural language understanding. We can bring our own proprietary tech, much better margin profile, actually better performance, and we do benchmarks against them, and we know that's an amplification. We know that because we can develop on our own for their customers who are thirsting for more voice, we can accelerate their own journey. So there's real revenue synergy opportunities. And then on the back end, there's cost synergy opportunities with, for example, cloud capabilities and so forth, and our own sort of support function. So we think of those across multiple dimensions.
We have a multi-pillar framework of when an acquisition may or may not make sense. I actually have my head of strategy and corporate development in the room, who's always landscaping stuff, but that's generally how we think of acquisitions.
Can you speak to the trajectory of gross margins? I think they were 70%-67%.
Yeah.
-adjusting for acquisitions, versus 79% year over year.
Yeah, so we're a software business, so we absolutely and we like being a software business, so at scale, we should be 70-plus% gross margins and actually 30-plus% EBIT margins. That's the profile we're building over the long term, and we absolutely see that. We're still disruptive hypergrowth, and we believe we can be one of the next major players in this new landscape of generative AI.
To the question of what's happening now, well, when we acquired our, you know, Sync 3, the restaurant acquisition, they had a call center capability that was a lower margin business, and we thought of sort of some cases with customers who were moving away from those contracts, saying that's not the right vision, where a customer's like, "I want to be call center forever." But where a call center is great data that we can improve our models and so forth, and they're willing to go on a journey to migrate to AI, that's something where we think is actually value accretive, so in the near term, as I've guided publicly, there will be near-term sort of pressure, so you're right, we're in the sixties rather than the seventies, but all on a pathway back north of seventies.
Now, I will say, even with Amelia, they have some parts of their business professional services that provide real, real deep integrations that are almost like competitive pillars or moats that we're establishing to allow for more software upsell that will be at that 85%+ gross margin basis. So long term, 70%+ and north of where we used to be. In the near term, to get more data, to get customer traction, to get deep competitive moats, we're willing to sort of trade off a little bit of a lower margin profile. I know I see flashing lights. Any last questions, or I think we're...? Well, thank you all for your time. Really appreciate it.