My name is Alex Henderson. I'm the networking and security analyst at Needham. We have two awesome guys from Arista here, John McCool, Chief Platform Officer, and Martin Hull, VP of Product Management. We're gonna do a fireside chat for about 35 minutes. If you wanna ask a question, there's a dialogue box. You can write, type it in, and I will relay it as I see it. And you can also email me at ahenderson@needhamco.com, and I'll be happy to use that as an alternative if you prefer. And with that, thanks everybody for dialing in, and welcome, guys.
Thanks, Alex, appreciate you having us today.
So you just had your Analyst Day, lots of fresh content to talk about. Clearly, people are very excited about the, the AI opportunity. They're very excited about, new products, the front end, the back end. There's so much stuff here. So maybe you could, just give us, you know, a start. Over the last, three years or so, you know, very exceptional response. Can you talk about what's transitioned during those time frames, what's been driving the business, and where we are as we're rolling into 2024?
Yeah, maybe gotta start way back before we entered the COVID period. I think we talked a lot about, you know, how we felt coming into it, some of those cloud-based networks. We're a little bit under-provisioned, right? And then we saw the phenomenal growth of traffic in the cloud, the constraints around the supply chain. Cloud providers early on got in line or queue, if you will, to be able to upgrade those networks, with the enterprise joining a little bit later, but appreciating the extended lead times. And it's been, you know, really exciting. And then coming out of the tail end of this, the semiconductor crunch kept those lead times pretty high.
But, you know, with ChatGPT and the emergence of that last year, we saw a lot of our customers kind of review where they were in their AI development, and really a punctuation of what they were gonna do moving forward. And, you know, we talked a lot at the Analyst Day about the AI opportunity and how we define it for Arista. And we went through some of the trial activity and architectures that are being looked at by our customers, and I think that's a great opportunity for us as we look ahead.
Well, as I look in the rearview mirror, you've had three years of exceptional growth of 36%-56% product sales growth over the last three years. I think most people looking at that assume that all three of those years are driven by cloud AI, and I don't think that's the case. In fact, if I look at the growth in 2023, it actually looks like it's more driven by enterprise growth. Didn't your enterprise outgrow your cloud in 2023?
Yeah. So yeah, we talked about the enterprise growth, you know, accelerating in 2023 versus the cloud. Still, still growth, obviously, in the cloud. The cloud growth over the last couple of years was, you know, driven by general purpose networking, right? 400 gig cycle. Some of that, you know, front-end connection to AI networks, but predominantly just building out cloud networks. As I mentioned, the cloud guys got in line early on and realized the supply chain crunch. The enterprise folks, probably six, nine months behind that, recognizing the challenge, you know. And as we've worked with our cloud providers on those deployments, and had an opportunity here really to kind of accelerate the enterprise folks who have been waiting in line, in effect, to build out their networks.
In addition, you know, we've continued to see our organic growth in enterprises in general. As we entered the campus, we had a broader toolkit for our enterprise sales team to go after. And not just campus, now we're, you know, into routing. We announced products on the routing edge. We have network visibility products with our Awake acquisition and DR capability. So they have a broad set of portfolio to look for any opportunity in a Fortune 2000 account that comes their way, and they're doing a good job of inserting Arista, which we've seen in, uh, time and again, kind of leads to a land-and-expand opportunity for the next set of RFPs that come along.
So looking at the cloud customers, it seems pretty clear that over the last year, there's been a massive deceleration in growth in the AI or, or in the, in the broader cloud, from 30%-50% kind of growth rates down to, in many cases, teens. The year of efficiency is cleaning up a lot of, you know, of, of programming that was thrown out there and just left there, not running anything of value, but rather, just sloppy coding, the DevOps people, forgetting to turn things off. Now that that's all getting cleaned up, how do you see that dynamic turning? Do you expect that the year of efficiency will,
eventually turn into a year of re-accelerating application growth? You know, if I were to think about this as a curve and, you know, we're kind of doing a sine wave around that curve, where I gotta be at the trough of that sine wave.
Yeah, I mean, you know, I think the way that we view it is less from maybe the year of efficiency, but more of this transition from a CPU-dominated architecture to, "How do I accommodate this new growth of GPUs in my data centers, and what does that mean for my infrastructure, all the way from the GPU, but all the way up to the network?" The power density of this new model is extreme, so the physical aspects of deployment come into effect. You know, the traditional leaf that was a top of rack, how do I deploy that physically when I have less density of GPUs?
So, there's been this shift over the last year, probably starting, you know, almost a year ago, of, "What does this mean for me when I'm investing more in AI?" A lot of projects that were in trial activity at these customers got a fresh look, in terms of their deployment, how they were gonna monetize that, and, you know, that's really changed a lot of the conversations that we're having with cloud customers.
But going back to the point I'm driving at here, will the number of applications being run start to re-accelerate growth in traditional CPU-based cloud? Before we talk about the AI side of it, let's separate the two into logical
Yeah, sure
pieces.
Yeah, and I think that answer of what applications is very customer-dependent, right? What's mission-critical to them? What are they gonna deploy? And each of our customers have different levels of applications that they're trying to drive or ones that they're trying to optimize or combine. So I don't think we have a one-size-fits-all to that, to that question.
Sure.
Each of these customers are thinking about the new applications on AI.
Okay, well, so when you guys talked about AI on the analyst day, I think you indicated that your target long term is $750 million worth of AI sales. That's a very narrow description of AI, I think. You're only talking about, you know, board-to-board, GPU-to-GPU, within the back end of the network. If I were to think about the front end of that, and I get it that up front, we're putting clusters in. The clusters are, you know, probably the first cluster, you don't need a lot of front end off of it. But over time, won't the front end increase as a percentage of the spend for the networking piece and ultimately end up outgrowing the back end side?
Yeah, I mean, there's definitely a front-end element that we're participating in today, right? And it's very hard for us to distinguish when a 7800 is bought by one of these cloud companies, whether it's being interconnected for data center interconnect, or it's being used for general purpose GPU, or it's connecting to these new clusters. And to try to parse that and distinguish where they go would be a tremendous effort, right? What we see is, you know, a new opportunity for a new network that's not Ethernet today, and that's why we defined it as the back-end network. And there are technologies that we're developing to optimize those back-end networks, so there's an R&D component to that. So that's, you know, kind of communicating with the investor community.
That's really, you know, the shiny new thing for us to go after. We feel really confident on where we are with front-end networks today, and, you know, we'll continue to service them. But that's why we had this more
Yeah, but what I was trying to get at is, if I'm out building a multi-cluster AI platform, I'm gonna need a lot of more front end than I would have bought had I been just in the CPU world, right?
Right.
So what's the connect ratio, do you think, between this back-end investment up front and the eventual need for CPU connectivity?
Yeah
to that CPU-based infrastructure and GPU-to-GPU clusters? So isn't it as large or larger than what you're defining-
It's
as the back end?
I think it's pretty hard to define that. You know, the ratios on compute became really super standardized with, you know, we got 50 Gig NICs, and you have two, and they're redundant.
Yep.
And it became almost a industry-wide, you know, step-and-repeat type of operation. Folks are still trying to figure out how to optimize their designs for their particular AI application and those ratios, right, as we speak. I can't overemphasize the amount of diversity we're seeing in this customer base around how they're going after AI from a physical deployment and those definitely in a very early phase. And I'm sure as we look back five years from now, there'll start to be certain patterns that are replicable and more standardized across the industry, but there's still a lot of unknowns around some of these things.
If I may, Alex?
Please.
So there's not, There's been the front-end data center network for, let's say, a decade, right? eight years, 10 years. To try to say there's a net incremental step function there, I think it's very difficult to count that. And so, yeah, there will be a growth in that Ethernet TAM, and that's already modeled into most of the analysts and certainly our numbers, right? So the growth in that Ethernet TAM is already in there. I don't think there's gonna be a step function in the front-end network, when we saw the, you know, the analysis about how much traffic stays inside a data center versus how much leaves, a lot of traffic stays in those GPU clusters. Yes, some leaves, but it's not the same amount of traffic that has to go into and out of the GPU cluster.
It's a fragment of that, a very small percentage. So, yeah, there'll be a transition on that side, but that can be accommodated within the normal transition from 100 gig to 200 gig to 400 gig to 800 gig, and the ASP doesn't double every time you do that, so the ASPs start to smooth out. And so I think as you go from the lowest speeds to the highest speeds, there may not be a significant step function in the revenue on that front-end network.
Well, so, the other obvious question, relative to the back-end network, so, is today, AI is booming, and NVIDIA is shipping the vast majority of it. Most of the clusters that they're selling, and they don't sell, you know, single GPUs, they don't sell, you know, boards, they don't sell even racks, they sell the entire cluster, are architected with NVLink for the short
Yep
reach stuff, you know, kind of GPU to GPU, which is a very tight mesh, and then, you know, the longer stuff is InfiniBand. How does that impact your ability to win in the back end? And how do we anticipate the emergence of other vendors, such as AMD, you know, coming out with chips, altering the ratio of GPU builds versus Ethernet-dominated builds?
That is a great question. You know, I think, you know, at the lowest end, right, of dozens, multi-dozens, that NVLink technology is gonna be the dominant interconnect and maybe appropriate for some enterprise use. At the other extreme, you have the large cloud providers that see that incremental size of their clusters can yield better results and don't understand where the knee of the curve is yet, right? So they're the ones at this side that are gonna drive this move to Ethernet. They're also interested in multi-vendor capability, so if you're not NVIDIA and don't have access to InfiniBand technology, you're gonna be very interested in partnering with a company that's doing, you know, Ethernet. The cloud customers want diversity of their networks to put multiple different kinds of endpoints on.
So the Ethernet is gonna be driven from the high end probably down, and where that meets in the middle, I think we've got to sort out how that goes after time, but there's definitely a push from the cloud folks. And Martin, I don't know if you have anything else you wanna add into
Yeah, I mean, Alex made a good point, right? NVIDIA is shipping the majority of GPUs, but I don't think that's gonna remain the status quo. So when you do get a second vendor and a third vendor and maybe a fourth vendor, they're not gonna have an InfiniBand offering. It's gonna be Ethernet, and then there's an open market.
Mm-hmm.
And then if you look at the NVIDIA Mellanox side of it, they continue to say they offer both Ethernet and InfiniBand, recognizing there is a need for an Ethernet solution here. Once it's Ethernet, it's an open market, and these large customers, as you said, John, they want multi-vendor, right? They don't want single vendor, but where they are on the cycle, a solution is what they need, and then they're going back and they're re-engineering and they're benchmarking the different technologies, benchmarking the different network designs to make sure they get that biggest bang for the buck. And that almost goes back to that question you had on efficiency, right? Benchmarking to make sure that I'm getting the best value out of my GPU cluster. You know, we all know the cost of the GPU cluster is significantly higher than the cost of any networking interconnect.
So to optimize that GPU, putting in the best Ethernet network is the way you're gonna get the best value out of that GPU investment.
So just to level set everybody's understanding of what we're talking about here, so NVLink is essentially active optical cables. They're VCSEL-based transceivers. They're very short reach in nature, a lot lower cost. Where InfiniBand is either CW or EML-based lasers that have longer reach, can go across a data center as opposed to things. So in a CPU world, I think active optical cable was, what, 20% of the connections? In this GPU world, I think they're up, what, 40%-50% of the connections, is that right?
I don't have a good lens into that.
Yeah, we don't, we don't track the interconnects on that one. A lot of that gets sold with the GPU clusters.
Right. Well, that I guess the question ends up, and the reason I ask that is to get everybody to understand what the distance
Yeah
the difference is, but the question is, when we get from selling a single cluster to multi-clusters, and I would assume that that happens pretty quickly. You know, you get the first cluster in there, and you learn how to work it, and then you go to the next. Aren't cluster-to-cluster communications almost always gonna go Ethernet?
Yeah, front end. We if they're, if they're connected to different back-end networks, is how we would define a cluster. They're connected with Ethernet on the front end. I think the way I might frame that is, you know, you could think of NVLink maybe being optimal to hundreds of GPUs. Then there's this place around 1,000, where you sort of have maybe the InfiniBand versus Ethernet debate will linger on, and then a set that's going to multiple thousands or pushing towards tens of thousands is kind of an Ethernet Ethernet LAN. That's the kind of rough, a rough cutting of how you might think about this.
Now, you know, I think NVLink, certain in that small cluster, being a GPU-to-GPU-to-memory type of technology, probably isn't a great application for InfiniBand or Ethernet, but the rest of it, I think, is kind of up for grabs.
Okay, so how do you see the world shifting to. What's the timeline for when the world shifts to other vendors? And as that happens, how long do you think it'll take for Ethernet to become a demanded technology within the clusters sold by NVIDIA?
That's. Let me reframe that a little bit, what we see from our perspective, right?
Mm-hmm.
We, we talked about, you know, us being in trials today, and 2024 being proof of concept, so folks building a cluster, benchmarking, starting to bring it operational, and 2025 being, you know, more of a, production type of environments for Ethernet-based clusters. Now, that, that's what we have kind of line of sight into our visibility. I think you have a broader question of, you know, when does that influence the market and change the whole mix, right? And I think that, you know, what we've seen before with some of these technologies, it starts with the cloud, large cloud providers, and then is adopted into other segments, like enterprise or service provider. It goes on. But we see the large cloud titan-type customers really leading this charge, to Ether-based, Ethernet-based clusters.
Yeah, I think, Alex, you're going to see offerings as AI as a service, whether that's on-prem or cloud-based, is going to then depend on how big is that cluster, and is that cluster scalable using NVLink or a small InfiniBand network? Or with AI as a service, it needs to be extended across a whole data center floor. It's still AI as a service, so it's going to depend on how the market adoption is of these products effectively. Because AI as a technology, it's a load, doesn't answer the question. You need to have that solution, whether it's a vertically integrated product or a horizontally integrated product.
So the reason I hesitate on this definition, the way it's being phrased, is it strikes me that Ethernet won as a result of the fact that it's the most efficient technology for handling network traffic. InfiniBand, originally designed around large block transfer, has really been kind of nudged into this architecture, and the only reason people are choosing to buy it is because the sole vendor with very high prices is able to force them to take the entire architecture if they want to get the shipment. It strikes me that over time, as that becomes more democratized and disaggregated, that Ethernet will win out again, and therefore, Ethernet should gain share versus InfiniBand.
One would think that at some point, NVLink, which is not InfiniBand, you know, not an Ethernet comfortable protocol, would also shift. So why wouldn't we end up with a purely Ethernet, in the fullness of time, a fully, fully Ethernet articulated network?
I think it's that fullness of time piece. So I mean, you know, we're definitely driving this charge to move to Ethernet, no doubt about it. There's a technological piece. I'd say the other piece that's not appreciated by people is the adaptability of the community around Ethernet to drive new technologies and standards. The Ethernet we're shipping today is not the Ethernet we shipped in 2000 or 1995.
Mm-hmm.
Interesting piece of history here: InfiniBand led the charge to 10 Gb Ethernet, as well as RDMA. Both of those, the Ethernet community looked over their shoulders and said, "Well, we can do that too," and adopted high-speed ports and took RDMA and adopted a standard called RoCE, RDMA over Converged Ethernet. So it basically leveraged a lot of these capabilities and integrated it into Ethernet. And what you see happening today is, you know, the UEC taking a fresh look at what it means to run GPU traffic and defining a set of capabilities that will be interoperable amongst multiple vendors to make Ethernet work better for GPU traffic than InfiniBand does today. So it's that multi-vendor coopetition, cooperation that's really driven Ethernet through multiple generations, all the way going back to voice over Ethernet.
Well, so I guess my point would be, if 2024 seems to be that lull between the very rapid rise of AI, but heavily centric supplier, to a world that is more Ethernet-driven, doesn't that imply in 2025, 2026, 2027, that growth rate should accelerate your penetration of the marketplace?
I think the puts and takes around that are more GPU vendors. There's a lot of in-house opportunities that people are developing GPUs that would naturally go to Ethernet. And as that endpoint community broadens, as well as the successful deployments in these large cloud customers, I think that's, that's how this thing starts to take shape and move.
We changed the definition of the Cloud Titan group to add Oracle and remove Apple. That's, I thought, was pretty interesting. Where are you in terms of, you know, broadening your customer base from the Microsoft Meta, you know, dominance in your revenue streams to the second-tier, third-tier cloud customers and, you know, the Oracles of the world that add to that Cloud Titan group?
Sure. In our original definition and continued definition of that is really people who have 1 million servers plus, and according to analysts, you know, Oracle moved into that category, and another customer moved out of that category, so we
Mm
we just kind of adhered to that. I think what people need to appreciate is there's just a certain set of customers whose networks are enormous, and it takes, you know, multiple cloud specialty providers to add up to the same TAM, effectively, of those very large million-plus customers, right? We do really well with the cloud specialty group. They kind of have the same design principles, same care about, operational efficiency that the cloud folks do. There just need to be a lot more of them to equal that setting. In terms of diversification, I think the enterprise piece has been an ongoing and continued push for us, both through acquisitions, broadening the portfolio, and then also our sales and marketing coverage.
So Cisco claims it's got orders for $500 million in AI, and I know you don't want to talk about Cisco, but I kind of can't resist. I don't see them in any of the reference designs out there, yet I see Arista in every reference design I come across. Can you talk about where you are versus where they are within the AI market?
Not really. I mean, I don't know how they define that category. I think we're being very specific about what's AI to us, and, you know, you pushed on this call, Alex, the front-end piece. I think we didn't, You didn't pin us down on that, but it's hard to count, right? I mean, it's what's a front-end port that's going in with GPU, what's going to a front end of a cluster? So we're just looking at that back end because that represents, you know, a new opportunity and technological differentiation.
So that $500 million probably includes everything from ports to front end, back end, and wouldn't be surprised if it's got some optics and other stuff in there.
I don't know.
Yeah, but you don't see them in almost any reference designs. Am I right on that?
I
Not that I know of, but, I mean, it's a question for how they're defining their TAM or their segment is really a question for them.
Okay.
But we, we've defined what we're doing. And also, you know, they, they say they got orders. We're talking about our target for revenue, not our target for orders.
Right. Let's shift to enterprise. So clearly, enterprise grew faster than cloud in 2023. You know, how does this relate to the demand, and how much of that's you know, pent-up demand that couldn't be shipped in CY 2022, as a result of biasing to the cloud and the cloud guys getting early in the channel earlier?
It's a mix. So yeah, we've been very direct about the fact that we've watched the cloud deployments and the shipments to them as they're, you know, kind of doing their build-out. They got in line first, and it was an opportunity for the enterprise customers and their shipments to move forward. At the same time, you know, I think we're really pleased with the deployments that we've seen in the enterprise and subsequent wins. You know, I think somewhere in the last two or three years, we've started to break into some new verticals. We're always strong in the financial area, media, entertainment, but, you know, with the campus offering and some of those designs, healthcare has become really interesting, as well as general purpose industrial manufacturing areas.
So new wins, and I would also say that we still feel, even in our large accounts, fairly under-penetrated. So, you know, in those original large accounts known as a data center company, very strong there, branched out in the routing, but there's still opportunities for share gains within accounts we're in.
So when does the supply chain issues normalize, your lead times get to normal? And what's the slope of that?
Sure. There's definitely a new normal, and I'm not sure we completely understand what new normal is. We're, you know, well past the can't get people into factories, shortages, getting a call that things aren't coming in next week. Things have become more predictable. We have seen a shift from hard-to-get components and these little small analog devices to supply constraints around, you know, large chip capacity, 7 nm and below, substrates, and that's all driven by the demand on AI and subsequently down the supply chain for those process nodes and substrates. So our lead times remain, for those large chips, stubbornly high. They've reduced a bit, but they're still probably 2x from where they were pre-COVID. So that drives, you know, a continued extension of our lead times.
But we have cut the lead times in half from where we started the year, and I think we're comfortable with that. We look at inventories, both in inventory on hand as well as purchase commitments. So if you add those two, and I think Ita showed a nice chart at Analyst Day
Yeah.
You know, we have driven that down from a peak of about $6 billion, has come down to a combined roughly $4 billion number between the two. But we're still, you know
That was mostly purchasing commitments that came down, as opposed to inventory, though.
Inventories actually went up, so we're starting to see some of that inventory that we purchased come in, so the mix is about 50/50. So the team, we have a team that manages, you know, not only incoming inventory, but those purchase commitments to make sure that we have the right mix that comes in as we look at future forecasts.
Oh, man, I'm running out of time here. I've only got a couple more minutes left. I can't resist asking about ZTNA, but before I do, is inventory obsolescence becoming a problem with, You know, is there a nut of inventory carry costs or write-downs in each quarter?
So that's, I mean, when we went out and made those purchase commitments, the fortunate thing is we were early in the cycle with our 400 gig products. You know, we managed that carefully, and we're looking at inventory mix, and making sure we're optimizing what we're bringing in-house and where we are in terms of those commitments coming in.
With the last three minutes here, I really love the ZTNA partnership that you have announced with Zscaler, and you know, probably integrating with CrowdStrike as well. And I know this is really early days, and I know Liz didn't want me to ask the question, but it strikes me that we're in a world that has been client server for 35-40 years, perimeter defense architected, firewalls, you know, all of that kind of stuff, and we're moving to what you guys early on called points in the cloud, where user is taken off the enterprise network and is a point in the cloud. The application is an API gateway that's a point in the cloud, and we're just simply connecting across the cloud to all of these points.
It strikes me that you're extremely well-positioned to participate in this, in this new world, and that opens up significant venues of revenues to you from legacy architected products, like firewalls, that wouldn't be needed under this network, but is driven all by this single, you know, cloud-native, microservice-based kernel and operating environment that you've built. Can you talk a little bit about that broader vision?
Yeah, I think, you know, the firewall was constructed with a vision that the enterprise was physically secure. Nothing happens bad inside the enterprise. All the bad guys are outside, so if I put the firewall between the internet and my data center, everything's gonna be good, right? You know, what we found is a lot of things, bad things, happen with somebody bringing in a laptop that's impacted or, you know, end users are affected. So what's happening is you need some level of security for east-west traffic in the data center, and the bandwidth is enormous, right? If you think about all the traffic going from server to server or GPU to server or end user. So there's two things that happen.
Very secure-centric data centers are just adding tons of firewalls, and then there's an associated policy that has to be managed and administered to deal with that east-west traffic. It's not the right solution. And some people just don't do anything, so they don't really have any concept of east-west traffic protection because it's too expensive and too hard. So the network has always been good for segmenting traffic. You know, we've done VLANs, we've done other technology to keep things in different segments, and security products have always been good at, you know, asserting policy of how things traverse or also detecting, you know, what are malicious websites, et cetera.
If you can utilize the network for enforcement of a traffic movement, and combine with some policy type of aspect with a best-of-breed provider like Zscaler, we think you can build, you know, very exciting architectures that solve this fundamental problem.
With that, we've run out of time. I'm not, not great with managing time when I've got such an interesting subject to cover. Martin, John, thanks so much for joining us. Operators in the background, Zach, Olivia, Libby, thank you so much. For everybody who's Zoomed in, I hope that was constructive for you, and thanks for joining. With that, it's a wrap.
Thank you, Alex.
Thanks, Alex. Bye-bye.