Hello, everyone, and good morning. Thank you for joining us at Needham's 28th Annual Growth Conference. My name is Quinn Bolton. I'm the Semiconductor Analyst for Needham & Company. It's my pleasure to host this presentation from GSI Technology, headquartered in my hometown of Sunnyvale, California. GSI is at the forefront of the AI revolution with its groundbreaking APU technology, designed for unparalleled efficiency in billion-item database searches and high-performance computing. GSI's innovations, Gemini-I and Gemini-II, offer scalable, low-power, high-capacity computing solutions that redefine edge computing capabilities. GSI is also a leader in SRAM technology and develops radiation-hardened memory solutions for space and military use cases. Joining me from the company this morning is Didier Lasserre, VP of Sales, and Kim Rogers, External Investor Relations.
Before I hand the call over to Didier for the presentation, I'd like to remind investors watching the webcast that if you would like to submit a question during the Q&A, please do so through the dialog box at the bottom of your screen. Didier, over to you.
Thank you, Quinn. As Quinn just mentioned, my name is Didier Lasserre. I'm Vice President of Worldwide Sales and Investor Relations. So thank you for joining us for the presentation this morning. I will go through this as quickly as I can so we leave some time for Q&A. We will be making some future-looking statements, so obviously, we will include the safe harbor statement here. A little background on GSI: the company was founded 30 years ago in Silicon Valley, as Quinn said, in Sunnyvale, California. We went public in 2007. We actually do not own a fab. We use TSMC, and we've had the relationship with TSMC since day one, a very strong relationship with them. We have been the technology leader with them on one of their process nodes and certainly have been working tightly with them all these years.
As Quinn mentioned, we started the company as an SRAM company. I'll talk a little bit in detail on that. We have the highest density, highest performance product line in the industry. We made an acquisition about 10 years ago for some AI technology. The name of that family is the APU, which stands for Associative Processing Unit, and I'll be spending the majority of this presentation talking about that, so the SRAM division has actually funded our R&D for the APU. To date, we've spent about $175 million for that R&D and bring-up of the Gemini-I and Gemini-II, and that's been funded by our SRAM business. Back in October, we raised net fees, $47 million, to help propel both the Gemini-II and our new roadmap device called Plato, which I'll talk about.
The revenues for last fiscal year, which ended in April, or I should say March 31st, was just over $20 million. We're on a run rate for this fiscal year to grow about 20%. Worldwide employees, 122, which the majority are engineers. We outsource the high-employee count functions, things like the fab. As I mentioned, we use TSMC. We also outsource our assembly. And so we're able to keep the headcount down very low. We have a unique technology with this APU, and we want to make sure that we protect that technology. We've really been active with our patent submissions. And so we have 144 total in the company, of which 85 are specific for the APU. And we continue to submit more applications going forward to make sure that, again, we protect this technology.
As of the end of September, we had $25 million in cash and cash equivalents. As I mentioned, that does not include the $47 million, which was raised post last quarter. Our market cap is roughly about $270 million, and we have a fairly significant insider ownership of 21%. Just very quickly on the workhorse product line, the SRAMs, as I mentioned, we have the highest density, highest performance memories in the market space. The majority of our revenues come from our Sigma Quad family. The Sigma Quad comes in the second generation, which does have some competition on some of the families. And then we also have our third and fourth generation, which has no competition. We're sole-sourced. And what we've seen is a lot of the new design wins over the last couple of years have come with that third and fourth generation.
And so the majority of our business going forward actually has no competition. And that's reflective in the fact that over the years, our ASPs have grown and our gross margins have grown just because, again, we're sole-sourced. We've used this family to extend into a new product line, which is, as Quinn mentioned, the rad-hard or the radiation-hardened and tolerant SRAM. This will open up new markets for us to go into space. Talking about these markets, essentially, what we've done is we've taken our commercial product line and we've hardened it to be able to withstand some of the harsh conditions in space, like radiation from the sun and other areas. And so what this has allowed us to do is, again, go after aerospace applications, which we couldn't before with our commercial product.
This is a nice market in the fact that the ASPs and the gross margins are extreme. If you look at the ASPs that we've sold to date, they've varied anywhere from as low as $10,000 to as high as $30,000. The gross margins are all north of 90%. It's a difficult market to get into. The design cycles to get in are as long, but once you get designed in, it's just a very easy market. The life cycles are very long, and there's no competition. Another added feature is that it's a growing market. As you can see, a few years ago, it was a $2 billion market. Then in 2032, it's looking to more than double to a nearly $5 billion market. The rest of this presentation, I'm going to focus on the APU.
The APU is, again, an AI chip, and we are focused on the edge. We'll talk about this later, why this is our strategy. If you look at the edge semiconductor market, it's looking at tremendous growth. This year, we're looking at just under $7 billion, while five years from now, it'll be over $16 billion market. Now, we have something very unique, and I mentioned that when I was talking about the patents. We have a true compute-in-memory architecture. I say a true, there are folks that talk about compute-in-memory, but it's not a true compute-in-memory. What they have is what's called near-memory, where they actually bring their memory closer to their compute elements, but they are still separate. We'll talk about why that's important to make that distinction.
The other advantage is the fact that we have millions of bit processors on our devices, on both our Gemini-I and our Gemini-II family. And so that's important because it allows us to do massive parallel processing. We're able to use those million-bit processors to work simultaneously. And as you know, in the AI market, parallel processing is key. Now, if you look at a CPU, they have dozens of computing cores. If you look at a GPU, they have thousands, 2,000 or so ALU cores. And then, again, as I mentioned, we have a million. So we can really take parallel processing to the extreme. So this slide really shows how different the APU is architected compared to a GPU. On the left is the APU. So you can see that the bit processors or the compute elements are tied directly to the memory.
I mean, they're really one. On the right-hand side is what a typical GPU looks like. So you have the data stored in memory in DRAM. In order for the compute element to do something, it needs to fetch that data. And through a GPU, it has to go through L2 cache, and then it has to go through L1 cache and the registers before it gets to the compute element. So obviously, this takes time, and it takes power. And if you notice, the arrows are bidirectional because once the compute element has fetched the data, it needs to write that data back to memory. So then now you have to go through that same structure of going back through the different caches back to memory. So obviously, this takes time and power.
Now, with the bit processor that we have, we also have another advantage in that we're not predetermined on what our resolution or our bit width is. If you look at a GPU, it's pre-wired, or I should say the elements inside will make it either a 32-bit resolution or a 16-bit or an 8-bit, and that's what you have to use. We are a bit engine. We don't care how you build it. You can use 1-bit, you can use 10-bit, anything between make up a number between one and a million, and that's the resolution you can make, so that's important. Oh, and by the way, that can change from cycle to cycle, so let's say you've decided to use an 8-bit model. The next cycle, you can change up to six or 16, so it's not hardwired at that point.
So this is important because, as researchers do, they try and optimize models. They're finding that different bit widths are optimal. And some of them don't fit with the GPUs. Some of them might be, let's say, make up a number, 12 bits. And so you would use a 16-bit GPU, use the 12, and then you basically are throwing away four bits. So it's not efficient. With us, you can build it to exactly the resolution you want. There's also a lot of new use cases like quantization that tries to use lower bit resolution in order to be able to deal with these enormous large language models. The databases are getting very large. And so they're finding that one or two or three-bit resolution is important. And again, we can do that today.
We don't have to wait for a future generation device in order to be able to hit that resolution. We have it today. So as we mentioned, right now, we have two families. The first family, the Gemini-I, was really set or made to showcase our unique technology. We have made some models, though, for some applications, anywhere from SAR applications, which stands for Synthetic Aperture Radar. If you watch CNN and maybe you see something about something going on in Ukraine and they show images, good chance that that image that they're showing is created using a SAR device. And so we have done that. The market for the Gemini-I really needs to be ground-based because we do have to attach it to a board that has an FPGA on there. So again, this Gemini-I was really used to showcase the technology.
A couple of months ago, there was a paper that was written by Cornell University that compared our APU to a GPU from NVIDIA. It was this first-generation family, the Gemini-I, that they used for that RAG application that showed that our architecture had a 98% less power usage than the GPU from NVIDIA. I'll focus more on the Gemini-II. So the Gemini-II, it's unique in the fact that we're not tied to the board, as I mentioned, with Gemini-I. So we can sell it as a chip if we desire. It could also be put on a board or multiple boards in a server. But now that it can be in chip form, excuse me, we're able to get closer to the edge.
Some of the early markets that we're going after, and this has a lot to do with some of the successes we've had with the military defense, is military drones and UAVs. There's a lot of different applications we can do with this device. We can do, as I mentioned, SAR for image creation. We can also do object detection. And because of those two features, it allows us to do autonomous drones in a GPS-denied environment. Again, if GPS is jammed, you can go ahead and use our device to create the images of what you're flying over. And then with object detection, you can identify where you are and navigate using those two features.
We announced yesterday that we have a partnership with a company called G2 Tech in conjunction with a POC that we're doing for the DoD, as formerly known, for a drone application. And it's actually more than a drone application. It's actually a multimodal VLM model that uses both cameras and drones. And so that POC is being done with Gemini-II. And I'll get a little bit more into detail on that. But with the fact that we have an extremely low-power budget device, it allows us to extend the mission times for a lot of these drones. So our next-generation part is Plato. If you look at Gemini-I and Gemini-II, they're very, very good for vector search. The way we've architected the part is we've made the internal bandwidth very high, and then the bandwidth going out to memory is lower.
And the reason we've done that is just because we generally would fit a model on our device. Now, for addressing the LLMs, if you're familiar with those, the database sizes are enormous. And so you cannot fit those on a device. And so what we've done with this is we've dramatically increased the bandwidth going out to memory. And so Plato is really going to be addressing LLMs, but not LLMs in the data centers, but LLMs on the edge. And we'll talk about why that's important. As we mentioned, edge is where we're focusing. And a lot of the applications are transitioning from the cloud to the edge. And what we've seen is that, obviously, they need to move those workloads closer to where they're happening.
And what we're finding also is that the cloud compute costs are becoming expensive, especially with some of these larger models. Having to go back and forth on the cloud becomes expensive. And then there are also a lot of applications, especially in military and defense, where the mission doesn't allow you to leave the local device. They do not want to have data going over the cloud. So they have to stay in place. And then also the fact that many of these applications require real-time either decisions or identification, there's not time for the latency to go over the cloud. And so what we find is that, obviously, these edge applications really require a very low compute per watt architecture. And that's exactly what our APUs do.
If you look at kind of what we're addressing, as I mentioned, we're really going after the edge and specifically in the military defense early on just because that's what we'll consider low-hanging fruits. But if you look at the way that our roadmap is lining up right now, so we have the Gemini-II now. The hardware is here. It's finalized. It's our production hardware. We are still working on some of the aspects of the software side, which was one of the reasons we raised the $47 million that we talked about. As I mentioned, Plato. So we'll have Plato. The tape-out will be done the beginning of 2027, so about a year from now, and we anticipate best case end of 2027, but really 2028 for the production to kick in.
And then we already have partners that are talking to us about the next-generation device after, and they really want to get into aggressive processing node architectures. So one of the areas I mentioned that we've been focusing on is mil defense. And that's because we've had a lot of traction in those markets. We've had already awarded to us $3.4 million in what's called SBIRs, which is, think of them as government grants. And that does not include the POC that we announced yesterday. I'll talk about that separately. But we've had phase two wins with the Space Development Agency. We've had phase two wins with the Air Force. And then also we've had a phase one win with the US Army last year. So we recently announced that we had an extension to one of our phase two. It's an extra $715,000.
That was granted to us for us to go do beam testing on our Gemini-II just to see what our commercial device looks like without any enhancement. Specifically, going back up to the POC with G2 Tech, as we mentioned, it's with the DoD and another foreign defense agency that would like to stay unnamed at this point. So we're going to have just over $1 million; that will be the POC dollars that come for us. We're in the process of right now optimizing the TTFT, the time-to-first token that we talked about on the last slide, to be able to have a full demo by summertime of this year. As far as future SBIRs, we submitted a few of them already. We have about $6-$10 million that have been submitted that we're waiting for answers on.
We've also submitted a proposal for what's called the Broad Agency Announcement. And these are much, think of them as SBIRs, but on a much higher level. And so these are values that could be up to $40 million. We're also right now pursuing doing a StratFi program, which essentially would have funding from a private party and then a government agency, and then would be matched by AFWERX. And these are numbers we're talking anywhere between $10-$20 million of funding. And then lastly, on the funding side, we've been having active conversations with some of the prime contractors and defense contractors to have strategic relationships with them to try and build some product. And that would also have some kind of R&D dollars involved as well. So if you look at the financial review, the quarterly revenues have been increasing.
As I mentioned, we're on a run rate now to grow year over year 20%. Our operating expenses have come down during this time, and as I mentioned, the cash and cash equivalents as of September is $25, but that does not include the $47 million net that we raised, as I mentioned, in October, so to finish the presentation, as I mentioned, we're really focused on the edge, both for the fact that that's where we've seen a lot of the interest in our device. That's also because of our unique low-power architecture that allows us to really be successful here, and we're also seeing, as we mentioned, a real move there. I mean, I know a lot of folks are really focused on the data centers, but the edge is where a lot of the volume will be in the future.
As I mentioned, again, we've had good traction with the defense and aerospace area and specifically through SBIRs and also some of these POCs. Now, what's interesting is, as we've had some of these discussions with some of these customers and partners, is you have NVIDIA, you have AMD, and then there's a big drop-off on AI. Most of the other folks, their AI are startups. And one of the concerns with startup AIs is, can you guys go to volume, especially with things like drones? I mean, it's critical that you can go to volume there. And what people are realizing with GSI is we may be a startup with the AI, but we are not a startup for manufacturing semiconductors. We've manufactured over 100 million, and we've shipped over 100 million SRAMs over the course of our history.
And so the same manufacturing model that we use for SRAM will be used for the APU. It's the same fab, TSMC. It's the same assembly house. And we'll be doing our own testing just like we do our own testing for SRAMs. So we have a proven manufacturing model for volume. And as I mentioned, with the $47 million net raise we did a couple of months back, we're using those dollars to both build out the software required to make Gemini-II successful. And then we've also used some of those funds to start Plato. So Plato, in order to start that design, we needed to purchase some IP. And so we were able to do that. So the Plato device or family has started the design. And as I mentioned, we'll be finishing it very early 2027.
And so we'll see some first silicon sometime summer of 2027. Quinn, at this point, we can open it up for Q&A.
Perfect. Didier, thank you for the presentation. I guess I wanted to start. There's been sort of increasing focus, I think, on inferencing. And admittedly, this is probably more inferencing in the data center. But if I look at some of the names, NVIDIA just bought Groq. Groq, I think a lot of folks will say, "Oh, they do processing or they store the data in SRAM on chip. And so it's much, much more power efficient." You talked about many competitors do near-memory compute using SRAM. You do in-memory compute with SRAM. And I was hoping you could just sort of further clarify for folks listening to the webcast because I think it's an important difference. Near-memory compute is not what you're doing.
You're fundamentally doing something different. I just want to kind of drive that point home for folks listening to the webcast.
Right. Yeah. So you actually had two points there. I want to hit both of them. The first one is SRAM, right? As you said, they're focusing on SRAM because that is the performance versus power. That is the best technology to use. And as I mentioned, we've been designing the highest performance SRAMs in the market for 30 years now. So we're SRAM experts. And so that's why this was really a good synergy, this technology, with what we've had in our history. So that's number one. But going back to your near-memory and compute in memory. So as I mentioned, the near-memory, so before you had your memory way over here and your processing elements way over here, and then there's a big transfer. So what these folks do is they're bringing it closer, but they're still separate, right?
And so what's happening is the distance that the data is transferring is less, but there's still that Von Neumann Bottleneck. You still have to transfer the data. With our architecture, we do the search or the processing on the memory bit line itself. And so the data resides where the compute is happening. And so what's nice is we don't need to go fetch the data. It's already there. And once we've used it, we don't then need to go write it back to memory. It remains where it's at. In fact, if you look at the name APU, it used to have a longer name. Right now, it's the Associative Processing Unit. It used to be called the In-Place Associative Processing Unit because the data is in place. You don't need to fetch it, and you don't need to rewrite it.
So the fundamental difference is that the data resides where the processing elements are. We don't have to fetch it.
Excellent. Thank you for that. And then just looking at some of the applications where Gemini-I, Gemini-II, and Plato going forward will be very effective. You mentioned vector search. What are some of the applications that take advantage of vector search? Are there some sort of real-world examples you can provide to investors where you could see your technology really being applied in some of these edge AI applications?
Yeah. So vector search is used for a lot of applications. It can be used for e-commerce. Again, if you type in a search in Amazon, right, and you get a match there, that's a vector search. Vector search is also facial recognition. It's also object detection. It's also looking for new molecules for drug discovery. Those all fall under vector search.
Is sort of content or context processing also an application of vector search where you're trying to find items sort of in a database that are similar to each other?
Absolutely. Yeah, and I mean, that's exactly what it is. I mean, when you say vector search, it's similarity search, right? In fact, that's why we named our first-generation parts Gemini-I, Gemini-II, right? Because Gemini is, if you go back in your mythology, it's a twin, right? So it's exactly what it is. We're doing a similarity search. We're looking for the closest match. That's exactly right.
Okay. Perfect. The person yesterday on the partnership with G2 Tech and to the defense entities, one U.S. DoD, maybe again, just to drive the point home for investors, how important is that POC to establishing the capabilities of Gemini-II in sort of a real-world application?
Yeah, that's a great question. So let's look at the application a little bit closer first. So what it is, it's an autonomous security and response system. And what I mean, it incorporates both cameras and drones. So you can have a perimeter set up with cameras, but there's always going to be blind spots, right? I mean, if there's a bad guy can hide behind a truck or you can hide behind something. And so that's why it also incorporates drones that can go around and get rid of the blind spots. And what this does, and it can work autonomously. So even if the system goes down, it can still work and do what it's doing. And so it's essentially looking to do detection, and then it's also identifying if there's a threat.
And then so what it does, it's basically this has always been done with manpower in the past. You have people in front of monitors checking everything out, right? In this case, there's not man in the loop. It's man on the loop. And what I mean by that is instead of a person that's doing all the monitoring and trying to understand if there's a threat, this autonomous system will do that. And then it also has something called TTFT, time-to-first token, which is basically you get an image or text. And that's a good point in the fact that this is multimodal. We can take computer vision or we can take text, which is kind of unique at this point. And then we can determine if there's a threat. And then the time-to-first token basically is how quickly can you identify if you have a threat?
And then you need to also be able to react quickly. How quickly can you react to that? And so our time-to-first token with Gemini-II is superior to the GPUs out there. If you look at the GPUs, the best-case scenario is about six seconds to get that time-to-first token, that first response. We are at two and a half seconds. And so it's significantly faster. So obviously, if you have a drone that takes six seconds to identify an issue versus two and a half, I can tell you which one's going to win.
All right. And so you're working on the proof of concept, trying to get the system ready for demonstration this summer. Talk a little bit about, I think you said you were porting the LLM onto Gemini-II and sort of doing that work to get the LLM onto Gemini-II. Is that most of the work you're doing between now and when that demonstration will be ready?
Yeah. So that work has actually already started. So we've been working on this for several months already, about six months. And in fact, we did a preliminary demo to the foreign entity in the equation. We did a preliminary demo in early October. And it was well before the algorithm was optimized. But when they saw the demonstration, they walked away impressed. And the other interesting aspect about that is originally this was set up to be, as I mentioned, for a drone market. And they actually looked at the demo and said, "This can be used for any kind of vehicle, not just drones." And so they're anticipating that this could be further spread as far as use cases.
Maybe for folks on the line, what would your dollar content be for the Gemini-II system and the model that you provide to G2 Tech or other drones? Once you've got this system, I guess it could generate interest for other drone manufacturers or other vehicle manufacturers, as you said. To the extent that this goes into production, what's your dollar content per system in this application?
That's a great question. So it's twofold. We have the hardware itself. And so in this case, they'll be buying a chip from us. So they'll buy a Gemini-II chip. I won't talk about ASPs, but let's just say round number, about $1,000 would probably be a good number to use. And then we also have the software content. And so they will need to license the software, this algorithm that we're doing, whether that algorithm is for the time-to-first token or if it's for the VLM or if it's for SAR or whatever that particular drone is doing. So they will have to license the software. And then there will also be a subscription fee, an ongoing subscription fee for that software algorithm.
Got it. We have a question from an investor. Your SRAM chip seems to offer superiority to the near-memory SRAM out there. Yes, GSI's revenue has only ranged from $21 million-$30 million in the last three years. What has been the impediment to consistently ramping revenue? What has to happen to get to profitability, timing of profitability? This is someone who is newer to the GSI story, not necessarily a semiconductor specialist.
So there was kind of a couple of questions that seemed kind of combined there. So on the SRAM side, so the SRAM market hasn't been a growing market. I'm not saying that there won't be some application that comes in the future that might drive the discrete SRAM market. But right now, it's been a flat market. And so that kind of explains that. As far as the ramping, the ramping would come from the APU. And the APU, right now, it's getting these POCs to translate into orders. And some of that is software support. As I mentioned, the hardware, we're in fantastic shape. We have the production-ready Gemini-II chip now, but we still need to work on the software. So we have lots of different libraries, different functional libraries that we've already done.
We've done some algorithms like SAR, and we're doing this TTFT, and there's a few other algorithms we're working on. We need to be able to let the masses do their own. So we need to work on the compiler stack that allows folks to do their own algorithms at a high-level language like Python, PyTorch, and be able to have it translated into a language our chip understands. Really, a lot of the software effort will be continuing to write more of these functional libraries along with bringing the compiler stack to the masses for them to do their own algorithms.
Maybe sort of as a follow-on to that last question, it sounds like this POC for Gemini-II is probably your lead vehicle, right? So let's assume the proof of concept and the demonstration this summer goes well. How long would it take until you could ramp that particular design win? And once you're spending money now to develop the compiler and the software solution set, let's say that that's done, I don't know, end of the year, what would be sort of reasonable expectation for additional Gemini design wins beyond this POC that you've talked about with G2 Tech?
Right. So first of all, let's make sure that the POC and the compiler stack are separate, right? So the POC, we are actually writing that algorithm. So we'll have it optimized. They won't need to do any of that work. As far as that goes, as you mentioned, excuse me, summertime is the expectation for the finalized demo. And at that point, they will start looking at it, maybe making a few changes. But it would be sometime end of the year where we maybe have some prototypes that would be shipped in. And then assuming that everything goes well, it would be a 2027 production model. Now, the other areas, as I mentioned, where some of these SBIRs we've been doing with the government. And so some of those are specific, like we did a YOLOv3, a YOLOv5 model.
And there's a few other areas we're looking at. So those SBIRs are laying the groundwork also for future design wins. And going back to the other half you mentioned, which is the compiler stack, the compiler stack is really more for other folks, not the guys we're working with directly in writing the algorithms for, but it's for folks that want to do their own work. And that's going to be an ongoing, because I mentioned it's a stack. So we'll be kind of releasing different parts of the stack as we go over time. So that's an ever-going kind of project.
Got it. Well, Didier, we are at the end of our session. So I just want to say thank you for joining us at the Needham Conference. Really appreciate you being with us today.
Thanks, Quinn.
Thanks, everybody.