Product Launch

Nov 8, 2021

Press release

Lisa Su

Chair and CEO, AMD

Welcome everyone, and thank you all for joining us. Today is all about the data center, and I'm looking forward to showing you the next generation of AMD products that will extend our technology leadership over the coming years. We're in a high performance computing mega cycle, driven by the growing need to deploy additional compute performance delivered more efficiently and at ever larger scale to power the services and devices that define modern life. At AMD, we're focused on pushing the envelope in high performance computing every day. We have made significant investments in multi-generational roadmaps to deliver leadership edge, networking, and cloud solutions. In the data center, workloads are diversifying and becoming even more compute intensive. This requires specialized approaches to address these unique needs. We see the data center compute evolving into four distinct categories.

General purpose computing covers the broadest set of mainstream workloads, both on-prem and in the cloud. Socket level performance is an important consideration for these workloads. Technical computing includes some of the most demanding workloads in the data center, and here per core performance matters the most for these workloads. Accelerated computing is focused on the forefront of human understanding, addressing scientific fields like climate change, materials research, and genomics. Highly parallel and massive computational capability is really the key. With cloud-native computing, maximum core and thread density are needed to support hyperscale applications. To deliver leadership compute across all these workloads, we must take a tailored approach focused on innovations in hardware, software, and system design. Today we're going to talk about our next generation data center CPUs and GPUs specifically designed for these workloads. This includes new cores, new packaging and process technologies, and new products.

We'll share with you how we bring it all together to power the exascale computing era. Now let's start with data center CPUs. We've made tremendous progress in the data center over the last four years. EPYC processors set a new trajectory for the industry in both performance and scalability while delivering new levels of data security. As a result, we're seeing growing customer preference for EPYC. In fact, to date, we have shipped more than 200 million AMD EPYC cores, powering the daily computing experience for billions of people across the cloud, enterprise, and HPC. We've seen tremendous cloud adoption and have built a deep partnership with many of the largest cloud companies in the world, including AWS, Azure, Google Cloud, Tencent, and Oracle Cloud.

Earlier this year, we introduced 3rd Gen EPYC with leadership performance, core density, and power efficiency, enabling the largest cloud companies to deploy at scale with the best total cost of ownership. Today, I'm pleased to announce that Facebook, now called Meta, is the newest cloud partner adopting AMD EPYC processors. We've been working together to jointly define an open cloud scale single socket server designed for world-class performance and power efficiency. We optimized 3rd Gen EPYC for their newest North Dome systems to deliver leadership performance per watt across major workloads. We'll announce more details together at the OCP Global Summit later this week. I wanna thank Facebook for the partnership and strong engineering collaboration. We're extremely excited to work with them to support their future data center expansions. This means EPYC is now designed into the data centers of 10 of the world's largest hyperscalers.

The performance, features, and efficiency of 3rd Gen EPYC are also driving strong adoption in the enterprise market. Some of the best known Fortune Global 500 companies have deployed EPYC in their data centers to run their most important workloads. SAP is one of the leading producers of software for managing business processes. Many of the largest companies in the world use solutions based on SAP S/4HANA. Today, we're very excited to announce a new step in our partnership with SAP, focused on EPYC-powered infrastructure as part of the RISE with SAP offering, anchored by SAP S/4HANA Cloud. Together, we'll improve the TCO for our joint customers while also reducing the carbon footprint of the platform. AMD expects to be one of the first adopters of S/4HANA Cloud solutions hosted on our AMD EPYC-powered infrastructure. I'm really thrilled to expand our partnership.

While the momentum we've achieved with EPYC is great, we're always pushing the envelope on CPU performance and performance per watt. So let's get to our first major product innovation update today. Our investments and innovations in packaging have been a multi-year, multi-technology journey. We introduced HBM and silicon interposer technology in our GPUs in 2015, delivering industry-leading memory bandwidth in a small form factor. We set a new performance trajectory for compute in the data center and PC markets in 2017 with multi-chip modules. In 2019, we introduced chiplet technology, combining chips built using different process nodes in the same package, which really enabled significantly higher performance and capabilities. At Computex earlier this year, I showed you the next big step for the industry, one we developed in close collaboration with TSMC based on their 3D Fabric technology.

Combining chiplets with die stacking to create a 3D chiplet architecture for high-performance computing products. The first demonstration of our 3D chiplet technology is stacked cache memory directly on top of a Ryzen desktop prototype to deliver a significant increase in gaming performance. Today, I'm excited to announce that we're bringing 3D chiplet technology to the data center and our EPYC CPUs, adding a new 3D cache design to the leadership Milan product family. We're using an industry-first hybrid bonding plus through-silicon vias approach that provides over 200 times the interconnect density of 2D chiplets and more than 15 times the density compared to existing 3D stacking solutions. This enables a much more efficient and denser integration of our IP. The die-to-die interface is using a direct copper-to-copper bond with no solder bumps of any kind.

This approach improves thermals, transistor density, and interconnect pitch over other 3D approaches and is the most flexible, active on active silicon stacking technology in the world, consuming less than one-third the energy per signal of micro bump approaches. Our first server CPU with 3D V-Cache technology are code-named Milan-X. These processors have 3 times the L3 cache compared to standard Milan processors. At the top of the stack, that adds up to 804 MB of total cache per socket. This additional L3 cache relieves memory bandwidth pressure and reduces latency, and that in turn speeds up application performance dramatically. Milan-X is built on the same Zen 3 cores as our general purpose Milan processors with up to 64 total cores.

It's the fastest server processor for technical computing workloads with more than a 50% uplift compared to Milan processors, which are already the fastest in the market today. They're fully compatible with 3rd Gen EPYC platforms. With a simple BIOS upgrade, our customers can drop Milan-X into existing platforms. This accelerates customer qualification and enables faster deployments. These CPUs also take advantage of software as is, with no changes required. Now let me show it to you for the first time. This is the 3rd Gen AMD EPYC processor with 3D V-Cache, Milan-X. We have removed the lid from this package so that you can see the 6 mm by 6 mm SRAMs hybrid bonded to each of the eight Zen 3 CCDs. To tell us more about the breakthrough per core performance that 3D V-Cache technology brings to EPYC, let me welcome Dan McNamara.

Dan McNamara

SVP and General Manager of Server Business Unit, AMD

Thank you, Lisa. The market traction with EPYC continues to accelerate, and we are pleased with our customer partner adoption since our Milan launch. The next step in our journey is to deliver more differentiation and value with a focus on performance per core. We are really excited about bringing 3D V-Cache to market with Milan-X. As a design target for Milan-X, we zeroed in on technical computing applications. These are some of the most complex and demanding workloads in the data center. These applications are typically enablers of product design. Finite element analysis and structural analysis tools are used to simulate and improve the design of physical systems. Computational fluid dynamics is used to simulate physical interactions across a broad range of applications, from consumer product designs to aerospace engineering.

Just as these software solutions are used to simulate the physical world around us, EDA tools are used to simulate and optimize chip design. While we were architecting Milan-X, we looked deeply into how these applications behave and found that a large cache was critical to attaining better performance. More L3 cache ensures that critical data is closer to the cores, and that reduces latency in the system. We saw a great opportunity to apply our innovative AMD 3D V-Cache to these applications and deliver a new level of performance to our customers. Before I show you what Milan-X can do, let me first refresh you on the Milan processors currently in the market. Today's Milan processors deliver clear performance leadership across a wide range of technical computing workloads.

Here are benchmark results comparing our 32-core EPYC 75F3 versus the 32-core Xeon 8362 on key technical computing workloads. As you can see, Milan delivers distinct advantages compared to 3rd Gen Xeon Scalable processors with anywhere from 33%-40% uplift in performance. With this as a backdrop, I will show you how we are extending our leadership even further with Milan-X. Let's take a look at EDA. Chip design is an iconic technical computing workload. It is highly compute intensive and complex. One of the most important tasks in SoC design is verification. Verification proves that each structure in the design does what it's supposed to do. It also catches defects early in the development process before a chip is committed to silicon. Today, we are showing a demo of the Synopsys VCS tool.

VCS is the primary verification solution used by many of the world's top semiconductor companies. On the left side, you see our leading 3rd Gen EPYC server CPU, and on the right side, you see our Milan-X CPU with AMD 3D V-Cache. Both are running Synopsys VCS. Each server is simulating an AMD RDNA 2 graphics core. VCS generates a model of this chip from source code and then uses that model to simulate design by running various tests. You'll see individual tests for the design change color as each is completed. As you can see, Milan-X completes more tests in an hour, getting to full coverage in a shorter amount of time.

These results show that the Milan-X based verification completes 66% more jobs than Milan. If you consider the competitive analysis that I started with, you can see that this new solution will bring the next level of value and performance to our customers. Users can finish their verification and get to market faster or add more tests to further improve the quality and robustness of their design. Either way, Milan-X delivers 66% more performance, and that will translate directly to the efficiency and quality of product development. This step function in performance will be delivered out of the box with existing applications when Milan-X launches. These applications are developed by leading ISVs for some of the simulations I just covered and many others. We have deep engineering engagements with key market-leading software vendors, Altair, Ansys, Cadence, Siemens, and Synopsys, just to name a few.

They're all very excited about the capability and performance of Milan-X. We're working closely with them to bring the combined hardware and software solution to market. As we continue tuning and optimizing these applications, we expect even more benefit for our customers. Our partners will be ready with certified and highly performant applications running on day one at launch. While we see tremendous value across technical computing with Milan-X, we also see that a broader set of applications can benefit from a larger L3 cache. In today's data-driven economy, real-time decision-making is a must. In applications like data mining, risk analysis, and anomaly detection, getting to insights faster is extremely important. With Milan-X, more data can be kept closer to the processor, driving faster outcomes.

For media and entertainment, an industry that is transforming to deliver high fidelity in real time, large L3 cache will translate to more parallel live streams per server. With AI, fitting more model weights and activations into a larger L3 cache can enable real-time inferencing. We are engaging today with ecosystem partners across these domains to develop turnkey solutions with increased performance and scalability. As you can see, we're excited about the impact Milan-X will drive across the technical computing landscape, delivering value in three important ways, increased designer productivity, higher quality products, and faster product design cycles, leading to faster time to market. There is tremendous enthusiasm among our partners to bring Milan-X solutions to market. Now, let me hand it back to Lisa to talk more about our partner plans.

Lisa Su

Chair and CEO, AMD

Thank you, Dan. Now let's talk about Milan-X availability. One of our premier cloud partners, Microsoft Azure, is first to take advantage of the benefits of Milan-X. To tell us more, here is Executive Vice President of Microsoft Azure, Jason Zander.

Jason Zander

EVP of Microsoft Azure, Microsoft

Thank you, Lisa. Microsoft and AMD share a vision for a new era of high-performance computing in the cloud. One defined by continuous improvements to the critical research and business workloads that matter most to our customers. We've partnered with AMD to make this vision a reality in Azure with our HB-series of virtual machines, which offer up to 12 times the performance of other clouds and rival some of the most powerful supercomputers in the world. It's a fantastic platform for our customers to solve their HPC challenges radically faster and with greater cost-effectiveness. Today, we're excited to announce the latest enhancements to the Azure HPC platform. Milan-X processors are coming soon to third-generation Azure HB-series virtual machines. We're also announcing today a preview program for customers to get early access to Milan-X processors in Azure.

We're most excited about how these performance gains will help our customers and partners do their work better, as the significant improvements to memory latency and bandwidth with Milan-X are a big win. For example, Ansys Cloud is an integrated suite of engineering simulation tools and services all hosted on Azure. In the early testing of HB-series with Milan-X, Ansys saw up to an 80% increase in the performance of their customers' aerospace simulations using Fluent. For other customer workloads, such as automotive crash test modeling, we're seeing up to a 50% higher performance. That doesn't even begin to tell the story of the manifold increase customers can experience over most on-premises hardware in use today. Finally, we're extremely excited about the ability of Milan-X to advance the performance and total cost of RTL simulations in Azure.

This is the key HPC workload for digital and mixed-signal silicon companies. Milan-X brings some of the largest performance enhancements to RTL simulation in the modern history of silicon design. It's a giant leap forward for Azure to becoming the best platform in the world for silicon design, both now and far into the future. Our ongoing partnership with AMD and the innovations that we've seen along the way continue to move us forward and empower our customers to achieve more.

Lisa Su

Chair and CEO, AMD

Thank you, Jason. We're so appreciative of the partnership between AMD and Azure, and we're excited about the preview with Azure HPC powered by Milan-X. We're also working with the world's leading OEMs on Milan-X. Milan-X platforms will be broadly available from Cisco, Dell, HPE, Lenovo, and Supermicro. I'm excited to announce we're on track to launch Milan-X in the first quarter of 2022. Okay, now let's turn our attention to accelerated computing in the data center. This is where the demand for compute power from scientists and researchers in order to analyze and make sense of incredible amounts of data at the highest of speeds has never been more important. GPUs are the accelerator of choice on these ultra demanding workloads. We've been on a journey to build a leadership compute GPU architecture and roadmap.

Last year, we introduced CDNA, our first GPU architecture optimized specifically for the data center. With it, we delivered up to 11.5 teraflops of FP64 performance with our MI100 products. Today, we're introducing our first CDNA 2 architecture-based products. CDNA 2 was designed specifically to enable exascale computing. I'm very excited to introduce the AMD Instinct MI200 GPU, built with CDNA 2 architecture. The MI200 series delivers up to a 4.9x increase in HPC performance over the competition. It's just a massive step. With this leap in capability, MI200 will set new performance records across a broad set of HPC applications. MI200 delivers up to 1.2x higher peak flops of mixed precision performance for leadership AI training, helping to fuel the convergence of HPC and AI.

With MI200 and ROCm, the world's most powerful high-performance computing and AI platform, we're shortening the time between initial hypothesis and discovery. For example, drug interaction simulations that would take days to run can now provide researchers with results overnight. Now let me show you the top of the stack MI200 for the first time. It contains two CDNA 2 GPU dies for a total of 58 billion transistors in 6 nanometers. This allows for up to 220 compute units and 880 matrix cores, which is 1.8x more than MI100. It also contains up to eight stacks of HBM2E memory, making it the world's first GPU available with 128 GB of HBM2E. That's 4x more capacity and 2.7x more bandwidth than MI100.

Now to tell us more about the MI200 series and to see it in action, here's Forrest Norrod.

Forrest Norrod

EVP and General Manager of Data Center Solutions Business Group, AMD

Thanks, Lisa. Today, we're announcing two members of the MI200 family. The MI200 OEM in production today is a compact module that enables some of the world's most powerful supercomputers. The MI200 PCIe card will be available soon for a broad set of platforms and customers beyond supercomputing. The MI200 is amazing. Let's look at the three pillars that make it unique. The first is our AMD CDNA 2 architecture, which is designed to do one thing extremely well, run compute-intensive HPC and AI workloads. The second is our innovative packaging technology that enables the MI200 to be the world's first multi-chip GPU. Finally, our third-gen Infinity Architecture is delivering unified compute at exascale with high-speed links and CPU, GPU, memory coherence, maximizing system throughput. Lifting the lid, you'll see the multi-die construction of the MI200.

Dual AMD CDNA 2 dies, 4 ultra-high bandwidth, low latency interconnects between them, eight stacks of HBM2E memory, and another eight Infinity Fabric links to connect to EPYC CPUs and other GPUs in the node. We put all of this together by continuing our packaging innovation. Today, we are introducing AMD EFB, Elevated Fanout Bridge, a silicon bridge technology. Unlike substrate-embedded silicon bridge to architectures, EFB enables use of standard substrates and assembly techniques, providing better precision, scalability, and yields while maintaining high performance. With all of this, the MI200 OEM is shattering performance barriers and delivers a multi-generational leap in performance. The MI200 OEM is 4.9x faster than NVIDIA's A100 GPU in peak FP64 performance. This is critical for HPC workloads requiring the highest level of precision, like weather forecasting. The MI200's peak FP32 vector performance is about 2.5x faster.

These are the types of matrix operations used for vaccine simulations. MI200 matrix cores deliver 95.7 teraflops of FP32 matrix operations, great for high precision machine learning and training. It also produces over 380 teraflops of peak FP16 and bfloat16 performance, 20% more than the A100. For data-intensive applications, MI200 OEM has an industry-leading 128 GB of HBM2E memory, as Lisa said before, that has a staggering 3.2 TB/s of total bandwidth. Put it all together, the MI200 is showing incredible performance on HPC benchmarks and science applications. AMG and HPL benchmarks about 3x higher than the competition. About twice the performance across a range of HPC research applications like OpenMM, HACC, and LSMS. The MI200 is delivering the fastest application performance ever seen.

Now let's look at another important research application, this time in molecular dynamics. Climate change brought on by greenhouse gas emissions is one of today's most pressing problems. To create more efficient combustion engines and fuels, scientists use high-performance computing to run simulations at the molecular level. To demonstrate the performance of the MI200, let's look at a combustion simulation of a hydrocarbon molecule using LAMMPS. LAMMPS is an open source molecular dynamics code widely used by researchers all over the world. On your left, we're running LAMMPS on 4 NVIDIA A100 SXM s. On the right, in the same simulation, on four MI200 OAMs. This is a simulation of a fuel rapidly expanding after detonation. It's about 20 million atoms and captures the first nanosecond as the chemical bonds begin to break. This typically takes days to complete. Obviously, we've time-lapsed the simulation here.

The MI200 completes the simulation before the A100 completed half. What does that mean? Scientists typically run hundreds or even thousands of these simulations to gain insights on new fuel alternatives or engine designs. With the MI200, the time to analyze new compounds is cut by more than half, potentially reducing the characterization time from months down to weeks. This will dramatically accelerate the discoveries that reduce our emissions and carbon footprint globally. Now, we need to scale that performance to exascale. Our third generation Infinity Architecture is the key foundational building block. The Infinity Architecture provides high speed interconnects, unifying the CPU and all the GPUs in the node to deliver up to 800 GB/second of aggregate bandwidth. It also unifies the CPU and GPU memory with coherent connectivity, reducing data movement and simplifying memory management.

This dramatically increases developer productivity and streamlines the programmability of GPUs. Finally, this unified architecture provides a critical leap in making it easier to accelerate legacy CPU and GPU codes to more quickly tap into the power of the MI200. Going even further, scientists are incorporating AI techniques now into HPC workloads to further accelerate data-driven research. AMD provides an open source GPU compute platform, ROCm, that supports all of the major machine learning frameworks. That means developers can use the most popular AI frameworks on all of our Instinct accelerators, including the MI200. With ROCm 5.0, MI200 will have the key ML models optimized, including ResNet, BERT, and DLRM. Now that we've showcased MI200 performance and capabilities, let's hear from one of our first customers on what it all means, and not just any customer.

I would like to invite Thomas Zacharia from Oak Ridge National Laboratory to share more about the beginnings of the exascale era with the MI200 powered Frontier.

Thomas Zacharia

Director, Oak Ridge National Laboratory

I'm Thomas Zacharia, the Director of Oak Ridge National Laboratory. Thank you for letting me be part of this, amazing announcement of your GPU and CPUs, which is driving and powering America's first exascale supercomputer, Frontier. Now, Frontier is going to be this amazing machine, an amazing scientific tool that is going to allow the dreams of many scientists from the world over to be realized. Because they have this powerful tool that'll allow them to calculate and simulate important challenges. As we think about the most compelling challenges facing our generation, it's about energy transitions, it is climate change, and, issues that we are currently facing as a society, tackling the pandemic.

Frontier is going to allow us to tackle these important challenges, using the capability of the machine, driven and powered by the AMD processors, which makes the MI200 the most powerful processor that has ever been made available to the scientists. A single GPU is more powerful than the entire node of Summit, which is currently the fastest supercomputer in the United States. AMD has gone out of their way to make this a very efficient processor. Therefore, it makes Frontier a very efficient supercomputer. MI200 is a culmination of a deep-seated partnership between AMD, Oak Ridge National Laboratory. Frontier is a partnership between AMD, Oak Ridge National Laboratory, and HPE. These important national challenges could not be achieved without commitment at the top.

We are installing the supercomputer as we speak, and we are excited to make this supercomputer available to our scientists and engineers early next year. What Frontier is going to do is to accelerate science and scientific discovery so that we can continue to tackle the important challenges facing humanity.

Forrest Norrod

EVP and General Manager of Data Center Solutions Business Group, AMD

Thanks, Thomas, for sharing your journey with us. All of us at AMD are extremely proud to be part of the efforts to bring Frontier, to bring the exascale era to life. On behalf of the engineering and support teams across AMD, as well as our strategic partners, the Cray team at Hewlett Packard Enterprise, I'd now like to show you Frontier. Here is America's first exascale system, Frontier, powered by AMD EPYC processors and MI200s. Frontier is currently being installed at Oak Ridge National Laboratory and will be coming online very soon and open to scientists next year. At AMD, we're proud to power the largest supercomputers, and with a growing list of partners supporting the MI200, customers at all scales will be able to choose from a range of platforms and solutions to fit their unique needs. AMD is on a journey in accelerated computing.

We will make the right engines available to accelerate targeted workloads. We will make them easier to use, and we will help solve some of the world's most challenging problems faster. We look forward to sharing more with you as we continue to push the boundaries in data center computing, making the best even better. Now let me welcome Lisa back, please.

Lisa Su

Chair and CEO, AMD

Thank you, Forrest. I have a couple more updates for you today. Now, you've seen this roadmap before. This is our server CPU roadmap that we've shown you for the last couple of years. We have executed very well to this roadmap, delivering Naples, Rome, and Milan to market on time and exceeding product expectations. The adoption of Milan has significantly outpaced Rome as our momentum builds. Today, I'm happy to provide an update on Genoa, which will be our flagship fourth-gen EPYC server processor. The engine driving Genoa is our next generation high-performance core, called Zen 4, built in industry-leading five-nanometer process technology. Five-nanometer is doing extremely well. We've worked with TSMC to optimize five-nanometer for high-performance computing, and it offers twice the density, twice the power efficiency, and 1.25x the performance of the seven-nanometer process we're using in today's products.

Genoa will be our first server CPU using that Zen 4 core in 5-nanometer. When introduced, we expect Genoa will be the world's highest performance processor for general purpose computing. It's designed to excel across a broad range of data center workloads, from enterprise to HPC to the public cloud. Genoa will extend our performance leadership both at the socket level and the per core level with up to 96 Zen 4 cores. It supports next generation DDR5 and PCIe Gen 5 memory and IO technologies, combining next gen platform capabilities that fully complement the new Zen 4 core. The Genoa platform also includes support for the new CXL interface and will have breakthrough memory expansion capabilities for data center applications. I'm happy to tell you that Genoa looks great. We're now sampling to customers and on track for 2022 production and launch.

Finally, let's turn to the cloud. Cloud native workloads are a fast-growing class of applications that are developed, deployed, and updated rapidly. These applications typically are very throughput-oriented, and they can take advantage of a high number of threads. We have created a new version of Zen 4 specifically for cloud native computing, and we call that core Zen 4c. It's fully software compatible with Zen 4, with specific cloud enhancements, including a new density optimized cache hierarchy to enable additional higher core count configurations for cloud native workloads that benefit from maximum thread density. It also includes significantly improved power efficiency and breakthrough performance per socket. We're bringing the Zen 4c core to market with Bergamo, our new cloud native server processor. Bergamo is a high core count, power efficient CPU purpose-built for cloud native applications.

It offers up to 128 high performance Zen 4C cores to deliver breakthrough performance and power efficiency for cloud native workloads. It comes with all the same features as Genoa, including DDR5, PCIe Gen 5, CXL, and the full suite of Infinity Guard security features. Bergamo is also socket compatible with Genoa with the same Zen 4 instruction set and can be deployed on the same platforms that our customers and partners are qualifying now. We're on track to ship Bergamo in the first half of 2023. Now you see the new and expanded AMD EPYC CPU roadmap. Our investment in multi-generational CPU core roadmaps, combined with advanced process and packaging technology, enables us to deliver leadership across general purpose, technical computing, and cloud workloads.

We're extremely excited about the value that our next generation EPYC processors will deliver and look forward to bringing them to market. To wrap up, our CPU, GPU, and process and packaging innovations are enabling AMD to deliver leadership performance across the data center. As we come to the end of our time together today, I hope you now see why we're so excited about our vision and plans for the accelerated data center. You can count on us to continue to push the envelope in high-performance computing. Thank you for joining us today.