How AI/ML Improves Fab Operations

Chip shortages are forcing fabs and OSATs to maximize capacity and assess how much benefit AI and machine learning can provide.

This is particularly important in light of the growth projections by market analysts. The chip manufacturing industry is expected to double in size over the next five years, and collective improvements in factories, AI databases, and tools will be essential for doubling down on productivity.

“We’re not going to fail on this digital transformation, because there’s no option,” said John Behnke, general manager in charge of smart manufacturing at Inficon. “All the fabs are collectively going to make 20% to 40% more product, but they can’t get a new tool right now for 18 to 36 months. To leverage all this potential, we’re going to overcome the historical human fear of change.”

Moreover, that change needs to have a clear ROI. “For me, everything comes down to cost,” said Abeer Singhal, CEO of Sentient, a start-up provider of AI-driven APC software. “Why should we move data to the cloud? Because we want it to be accessible, and we want to be able to compute it. There are downloading, storage, and computing costs, but engineers want to be freed from having to call IT for everything. They want to collect high frequency data and make smart decisions at the same time.”

One of the big challenges is a highly risk-averse manufacturing sector, which has seen significant gains through mostly incremental improvements. “The semiconductor industry has lots of technology-type advancements, but we’re usually very slow in making business changes,” said Bill Pierson, vice president of Semiconductor and Manufacturing at KX Systems, a database supplier. “Part of that just because you’re in a factory that has been built, it’s running and getting high yields, so why change it? But we’re seeing top-down management with a strategy of trying to break down the data silos, making sure that the data being collected is going to be available to the engineers across all the necessary areas.”

Others point to similar trends. “Humans don’t do change well,” said Scott Tenenbaum, head of nanobioscience and professor at SUNY Polytechnic Institute. “COVID was a great example of where people tried things they would have never, ever tried unless they had to. A lot of our technology is like that. An old technology goes away and you have no choice but to use the new technology.”

During a panel discussion at SEMI’s Advanced Semiconductor Manufacturing Conference last week, participants pointed to 10 trends/recommendations involving AI/ML and fabs worldwide:

1. $100B AI/ML in semiconductors by 2025;
2. Engineers pick low-hanging fruit in scheduling and defect classification;
3. Digital twins and analysis are enabling predictive maintenance;
4. Non-value-adding steps may be skipped, shortened, and/or moved;
5. Fabs are now hiring data engineers;
6. Big data is good, but the right data is better;
7. The tool states standard (SEMI E10) helps transparency;
8. ML in test balances yield, defects, test cost;
9. Industry reluctance is overcome by ROI promise, and
10. Database security must be built-in.

Panelists (L-R): John Behnke of Inficon, Scott Tenenbaum of SUNY Polytechnic, Bill Pierson of KX, and Abeer Singhal of Sentient. Moderator: Laura Peters/Semiconductor Engineering. Source: SEMI

SE: A recent report from McKinsey indicated that AI and machine learning generated about $5 billion to $8 billion in chip revenues, or around 10% of total device revenues. This is expected to grow to around $100 billion by 2025 (see figure 1). Do you agree with that estimate?

Behnke: I certainly see the ability of semiconductor manufacturing to generate 15% more value in the semiconductor industry using AI, ML and what I call advanced math. I don’t think that means it generates another industry where the fabs are giving folks like us $95 billion, but the fabs will be able to leverage the advanced capabilities to generate at least another $95 billion in five years.

Fig. 1: AI/ML in semiconductors generated $7B in value in 2021 or 10% of chip revenues, expected to rise to 20% of device revenues $90B perhaps by 2025. Source: McKinsey & Co.

Fig. 1: AI/ML in semiconductors generated $7B in value in 2021 or 10% of chip revenues, expected to rise to 20% of device revenues $90B perhaps by 2025. Source: McKinsey & Co.

Pierson: We’re trying to become much more productive and reduce costs for the fab largely by increasing the productivity of these engineers in the workforce. And the workforce is a critical part of it.

SE: Regarding that, where is the low-hanging fruit for AI/ML implementation in fabs?

Singhal: Big data and AI algorithms represent a paradigm shift for an APC engineer. Complex process models can be built in minutes compared to days. For instance, an AI assisted run-to-run controller can pull inline SPC data, marry that with 100+ FDC and yield indicators to provide insight into the health of the system and recommend improvements.

Another AI use case would be to build adaptive tool-state models that prevent excessive send-ahead runs following a scheduled or unscheduled tool event. The potential is endless.

Behnke: By far, the application that has the highest ROI for fabs is scheduling. And tools have to deliver value within six months or less — ROI, or cycle-time improvements, or other KPI. So smart manufacturing is largely that you take these historical, data-rich environments that engineers have been working in for decades, and you upscale those to create a digital twin, which is kind of like a simulator on steroids, using metrology and sensors and other data sources, such that the twin’s information actually has much higher fidelity and understanding of the factory. That digital representation looks at the current events in front of you, and the options, leverages the value it has through the historical ML-based learning, and quickly determines what should be done next. The obvious one is scheduling. What lot should I put on what tool and in what order? More importantly, what should the tools be set up to do? This applies to APC, FDC 2.0 (fault defect classification), etc. Chip manufacturers have the tools to do this at the fab level today, and what will be really exciting is when, in five years or so, all factories within a company take advantage of these tools, from fab to assembly and packaging.

Singhal: My customers want to be freed from having to call IT for everything. That’s a big driver in pushing everything to the cloud. But they also want to be able to download at 100 gigs a second. So we’re talking about bandwidth, and there’s a cost to that, and to storing the data and making it available. Then you have the chips that are coming out now which are being purposely designed to be able to crunch that kind of information on chip, instead of being in different systems that are working together.

Fig. 2: Advanced algorithms can crunch historical data with actionable data to enable real-time decisions, where value of the decisions is highest. Source: KX Systems

Fig. 2: Advanced algorithms can crunch historical data with actionable data to enable real-time decisions, where value of the decisions is highest. Source: Gartner

SE: How did the semiconductor shortage change chip manufacturing?

Behnke: The world is completely different today than it was, so people now understand that we need to start doing things smarter. And the executives are on board, and their boards are on board, and they’re under tremendous pressure. In new board meeting minutes, they’re being asked, ‘What’s your smart manufacturing strategy?’ That was unheard of two years ago before COVID.

Pierson: It’s also important to apply context. We’re moving to a data-centric world, and one of the changes I see happening is Tier 1 chip companies all have these groups of what are called data engineers. These are not data scientists, and they are not subject matter experts. They are data engineers that go through the organization and prep the data for people throughout that organization to use. There needs to be a recognition, as this data explosion occurs, that there is a big need for this defined role out there, and people need to recognize you need to format the data and then get it into the hands of the subject matter experts, the engineers who need to use it. Most all the data has a timestamp, so maybe we could index off the time series.

SE: What does the learning curve look like for legacy fabs, in particular?

Pierson: This is a journey, and some companies are more advanced than others. Some engineers might be using a pencil and pad of paper, and they just need to be able to store the data and want a dashboard. That is an early part of the journey. Some are talking about doing digital twins and expanding throughout the whole factory. This journey will continue, and grading how we’re doing will take 5 to 10 years. Every fab is different, and you have to meet them where they are.

Fig. 3: All aspects of factory operation can take advantage of data crunching and analytics performed using the digital twin. Source: Inficon

Fig. 3: All aspects of factory operation can take advantage of data crunching and analytics performed using the digital twin. Source: Inficon

Behnke: One of the great things about AI is you can explore so many possibilities. It doesn’t just say, ‘Give me that parameter.’ It says, ‘Give me every parameter and every combination.’ That’s a really powerful thing. There is no human I’m aware of who could do that. And remember, you are adding new sensors so you’re going to add more intelligence to your system, and AI is the best technology we’ve come across for seeing the things you should be looking at that you would never would have thought of.

Tenenbaum: The counterpoint is that AI is over-hyped. It is quite good at identifying patterns that are only identifiable when looking at really large datasets. It’s not so good at identifying random events. A good example of that is AI technology for predicting the stock market. It worked great when it was based on professional traders. But now we’ve got a bunch of day traders and Robin Hood, and there’s wildcards and Bitcoin, which analytics isn’t very good with. So when you’re talking about how to interject AI into product manufacturing, it will prove very useful at predicting things that are predictable. But one-off events that can be just as costly will prove very challenging to detect.

SE: When the industry went to 300mm wafers, there was collaboration among companies and SEMI standards. What do you see happening here?

Behnke: There is a new update to the historical E10/E79 standard (SEMI E10 specification for equipment reliability, availability and maintainability, which enables better tracking of tool chambers in modular systems for better utilization), which includes many more state levels. Perhaps it hasn’t been that widely publicized. People are starting to adopt it and it will be a big help as these solutions move forward.

Other trends
The intense focus on good wafers puts a lot of pressure on reducing the non-value-added steps, specifically metrology and testing of wafers and devices. In test, however, system-level test is being added to ensure detection of latent defects, which can impact reliability.

“The leaders like Nvidia, AMD, and Intel have been doing it for years, but recently companies have been doing more system level test sooner to perform functional test on memory and other testing,” said Dave Armstrong of Advantest. “And they’re finding that two instructional tests are not sufficient, so they need to do high speed testing to ensure a known good chiplet, for example.”

The need for additional testing is largely driven by new knowledge of latent defect problems, information gained through learning platforms. “Advanced data analytics is providing insight into outlier detection and guiding us in tester and test program design,” Armstrong said.

Semiconductor testing also benefits greatly from the ability to cut through the masses of test data. A large test facility can generate up to 4TB of data per day, which is used in feedback processes to improve yield and quality.

Ken Lanier, director of strategic business development at Teradyne, noted that multiple sensors on testers also monitor the die’s voltage, temperature, and other parameters, which together with machine learning enables real-time modification of test processes.

“Since testing is the last step prior to releasing to production, there’s intense pressure to debug enormous software programs and test pattern data in weeks, because a programming error could cause an IC producer to throw away millions of dollars of good devices — or worse, ship bad devices. Tradeoffs among yield, defect rates, and test cost cause intense investment in design simulation, in test programs, and machine learning tools to identify the slightest anomaly. They flag issues on test equipment and shorten the debug cycle time,” Lanier said.

Going forward, data security is another issue that needs to be addressed, because it serves as a barrier to data sharing. “These databases need to be very secure environments because nobody wants to share data for IP reasons,” said Inficon’s Behnke. “Nobody wants to share data at all, because they’re additionally worried about the security aspects.”

The semiconductor industry is working hard to protect its own on-chip data, but it also is extending that effort to include electronic systems and databases in its factories. This may take time, but it’s viewed as a necessary step as silos are broken down and data crosses traditional demarcation lines.

Hannah