Accelerating AI/ML Inferencing With GDDR6 DRAM – SemiEngineering

  • Lauren
  • June 10, 2021
  • Comments Off on Accelerating AI/ML Inferencing With GDDR6 DRAM – SemiEngineering

The origins of graphics double data rate (GDDR) memory can be traced to the rise of 3D gaming on PCs and consoles. The first graphics processing units (GPU) packed single data rate (SDR) and double data rate (DDR) DRAM – the same solution used for CPU main memory. As gaming evolved, the demand for higher frame rates at ever higher resolutions drove the need for a graphics-workload specific memory solution.
Ultimately, the success of gaming PCs and consoles made an application-specific graphics memory commercially viable. In 2003, NVIDIA launched the GeForce FX 5700 Ultra which was built on 130nm silicon process technology and paired with 256 MBytes of GDDR2 DRAM. It should be noted that most of the 5000 series (the fifth generation of GeForce GPUs) still used DDR DRAM memory. It was not until late 2018 that GDDR was deployed across the entire GPU product lineup. In those 15 years, GDDR capacity had increased over 40X to 11 GBytes of memory for the high-end GeForce RTX 2080 Ti. With per device bandwidth up by a factor of 8 from GDDR2 to GDDR6, aggregate memory bandwidth rose from 29 gigabytes per second (GB/s) in the FX 5700 Ultra to 616 GB/s in the RTX 2080 Ti.
Today, GDDR6 is a state-of-the-art graphics memory solution with performance demonstrated to 18 gigabits per second (Gbps) – and per device bandwidth of 72 GB/s. GDDR6 DRAM employs a 32-bit wide interface comprised of two fully independent 16-bit channels. For each channel, a write or read memory access is 256-bits or 32-bytes. A parallel-to-serial converter translates each 256-bit data packet into sixteen 16-bit data words that are transmitted sequentially over the 16-bit channel data bus. Due to this 16n prefetch, an internal array cycle time of 1ns equals a data rate of 16 Gbps.
Artificial intelligence and machine learning (AI/ML) applications share the same raw hunger for bandwidth as 3D gaming. For training, memory bandwidth and capacity are especially critical due to the size and complexity of neural networks that continue to increase by an average of 10X per year. Neural network accuracy depends on the quality and quantity of examples in large training data sets. GDDR6 performance and cost benefits will enable the migration of training to the edge of the network.
For inference, memory throughput speed and low latency are critical, especially when real-time action is needed. This is because an inference engine may need to handle a broad array of simultaneous inputs. For example, an autonomous vehicle must process visual, LIDAR, radar, ultrasonic, inertial, and satellite navigation data. As inferencing moves increasingly to AI-powered endpoints, the need for a memory solution that is manufacturing-proven is paramount. With reliability demonstrated across millions of devices, efficient cost, and outstanding bandwidth and latency performance, GDDR6 memory is an excellent choice for AI/ML inferencing.
Designed for performance and power efficiency, the Rambus GDDR6 memory subsystem supports the high-bandwidth, low-latency requirements of AI/ML for both training and inference. It consists of a co-verified PHY and digital controller – providing a complete GDDR6 memory subsystem. The Rambus GDDR6 interface is fully compliant with the JEDEC GDDR6 JESD250 standard, supporting up to 16 Gbps per pin. The GDDR6 interface supports 2 channels, each with 16 bits for a total data width of 32 bits. At 16 Gbps per pin, the Rambus GDDR6 interface offers a bandwidth of 64 GB/s.
Rambus works directly with customers to create an optimized chip layout by providing full-system signal and power integrity (SI/PI) analysis. Customers receive a hard macro solution with a full suite of test software for quick turn-on, characterization, and debug. Rambus also supports customers with both board and package design – very key when designing systems at 16Gbps.
AI/ML applications continue to evolve at a lightning pace. Training capabilities, which are increasing at a rate of 10X per year, are driving the development of specialized AI accelerators. Meanwhile, AI inference capabilities are deploying across the network edge and in a broad spectrum of IoT devices, as well as automotive/ADAS applications. As discussed above, training and inference have unique application requirements, with GDDR6 being an excellent choice for AI/ML inference. Designers can harness all the benefits of GDDR6 memory for their AI accelerator SoCs with high-performance memory interface solutions from Rambus.
Additional Resources:

Frank Ferro   (all posts)

Frank Ferro is senior director of product marketing for IP cores at Rambus.