Making Sure AI/ML Works In Test Systems

Artificial intelligence/machine learning is being utilized increasingly to find patterns and outlier data in chip manufacturing and test, improving the overall yield and reliability of end devices. But there are too many variables and unknowns to reliably predict how a chip will behave in the field using just AI.

Today, every AI use case — whether a self-driving car or an industrial sorting machine — solves a specific problem, which makes it unique. Likewise, AI-based systems for semiconductor manufacturing and test are geared to solving specific problems in the fab or packaging house. The problem is keeping ML algorithms — which are the core of the AI system — up to date as conditions change over time. The ML algorithms and models need to adapt to other changes in the equipment and the devices being manufactured and packaged.

“We will likely have to establish a continuous training and monitoring process,” said Keith Schaub, vice president of technology and strategy at Advantest America. “Processes drift, which means the data drifts, which means you need to continuously monitor your data and trigger retraining as the process drifts. We know how to do this. The challenge is to know how much to train and how often to re-train. How much drift before I trigger retraining?”

This whole process becomes significantly more complex when AI/ML systems are used to create AI/ML chips, and it will become still more complex as machines are used to train other machines. Measurements in AI are reported in probabilities and distributions rather than fixed numbers, and any inherent variation in a number of process steps, including packaging, can be incremental and additive.

Testing ML algorithms and models used in fabs
Basic techniques for checking ML algorithms and AI systems do exist, and there are numerous ways to validate how effective an ML algorithm is for manufacturing and test flows. But even with those established techniques, a successful AI implementation has to take into account what happens over time, such as changes in the fab or the assembly house.

“When you’re testing your model in the fab, you want to do a real-life production simulation that’s cognizant of time,” said Jeff David, vice president of AI solutions for PDF Solutions. “There’s a number of ways that we test it in the fab. For example, you have blind holdout datasets. Basically that means you have a validation data set that is completely separate from data that’s used to train or choose the models. That dataset isn’t exposed at all during the training phase. There are a number of different ways you can do that validation. One of the most famous ways is called k-fold cross-validation, where k stands for any number of integers.”

So there may be 8-fold or 10-fold cross-validation. A 10-fold cross validation means breaking up the dataset into 10 chunks.

“Let’s say you break up the dataset that’s either completely random or stratified in some way across lot boundaries that you want to choose,” David said. “You don’t want to train and test on data from the same lot. That would be cheating, because in the real world you’re never going to have a situation where you can do that. So the idea is you break up the dataset into 10% chunks. Chunk A is 10% of the data, and that can be randomly selected, of course, stratified across lot boundaries. And then you have 10% chunks of that to make up the whole dataset. Ten-fold cross validation means you would basically rotate through all 10 of those chunks, train 90% of the data, and then test on the other steps. That’s done on the remaining 10%. And then you rotate to the next chunk, and the next chunk, and the next chunk. You do that 10 times. You train and test 10 times. Then you’re getting a good feel for that data set or how robust your model is across all that different data.”

The big problem is that conditions in the fab are in motion, both physically and temporally. Sensors in tools drift, and equipment is constantly re-calibrated. In addition, many of the algorithms themselves are updated, while not all of the equipment in a fab or assembly house is identical. Simulations need to be overlaid across these models to incorporate all of these changes, and it’s not trivial to build that kind of precision into the models.

“There’s going to be drift and shift in the data that’s coming out of your tools,” said David. Over time the tool settings may change. An operator may make a change to the tester that affects the data collected from sensors. One way to handle a temporal component added to testing the algorithm is to break time periods into chunks using data you already have — for instance a year’s worth of data — simulating training by time periods and then comparing it to ground truth. “You basically move through time as you’re constantly training and testing your model, as if you’re in a real-life production simulation. And then you see how it holds up, because in that case you have the ground truth,” said David.

In the past, these kinds of issues could be dealt with by adding margin into manufacturing processes. But particularly at advanced nodes, and in heterogeneous packages where some of the chips are developed at leading-edge nodes, tolerances are increasingly tight and precision is required. Add too much margin, and reliability can suffer. Add too little, and yield can suffer. While AI/ML can help identify some of these problems, the data generated by these systems has to be processed in the context of a lot of moving pieces. So now, rather than taking snapshot-in-time measurements, those measurements need to incorporate simulations across various periods of time.

On one level, all of this can be broken into manageable parts. “The industry uses training data and verification datasets,” said Advantest’s Schaub. “The verification datasets are used to sanity-check that the ML is working properly.”

Fig. 1: A basic fab-based AI system. Machine learning is an algorithm that learns from the data to create a model. Once trained and deployed, the model can make predictions. The AI system is built around it. Source: PDF Solutions/Semiconductor Engineering

On another level, there are now a lot of unknowns in this process, so it doesn’t hurt to have a backup strategy against which to assess all of these changes. Adding more sensitivity into manufacturing and test equipment can help significantly in this regard. In fact, the less sensitive the tools, the less chance of success for ML models, said Yudong Hao, senior director of marketing at Onto Innovation.

“First, in metrology, the number one thing is that you need to have sensitivity,” Hao said. “Your tool must have the sensitivity to the dimensional change that is happening in the process. Without any sensitivity, no machine learning or any other technology will help you. Secondly, because of the low sensitivity and the complexity of the device we’re measuring, using classic physics-based modeling technology is no longer sufficient. That’s where machine learning comes into play. On the other hand, machine learning itself may not be the sole solution. Physics is still important.”

AI chips vs. AI equipment
With AI/ML, things get confusing very quickly because AI/ML technology increasingly is being used in the manufacturing of AI/ML chips.

“To test chips in the fab, inspection and metrology processes are employed for defect detection,” said Tim Skunes, vice president of R&D at CyberOptics. “AI chips can be inspected in similar ways to other chips during the manufacturing process.”

But these chips also can look and behave quite differently from other chips. “In some aspects, an AI chip as yet is just a really complicated SoC,” said Randy Fish, director of marketing for Silicon Lifecycle Management at Synopsys. “However, the architectures or the micro-architectures are fundamentally different from what we’re used to in SoCs, historically.”

For one thing, AI is bifurcated by training and inference. “Those two environments are very different in their constraints. But as far as how it is tested, it comes out of the fab and you get some information from the fab, some test information, some wafer test stuff. And then you go into wafer-level testing, so you’re at your OSAT, and they’re doing either logic BiST or memory BiST, or your DFT,” said Fish. “We work with a number of AI chips. For us, in a lot of cases it’s yet another test challenge. It’s very hierarchical. An interesting side is that a lot of these chips are arrayed structures. And so there are ways that you can attack the testing problem that way.”

In an AI chip, the arrayed structures are used to create the network. But instead of producing assembly code or mapping to a binary, which happens in a standard processor, an AI chip maps to a network.

“You go through the training phase, and it creates a network where there are weights,” he explained. “Then those get mapped onto the chip that has no personality until you provide this network. So that’s the first kind of programming. And then you flow data across that network, and it’s doing its inferencing. It’s inferring things from that. We don’t test at that level, right, but that’s similar to if we’re working with an application processor on a phone, we’re not testing all those functions. Structural tests and system-level testing is a whole area unto itself.”

AI chips can spend more time on the tester and repair is a part of the picture. “We’re involved with some that are very large, reticle-limited designs, particularly on the training side more than anything,” said Fish. “Test time is sensitive because they’re going to be on a tester for an extended period. And there’s also repair. In these larger array structures, you’re not just doing memory repair. You can actually repair processors, as in you may leave a processing element out during tests. You can individually test the processing units, and if one is bad, it just gets mapped out. And so there is more at a macro level of test and repair at this point.”

Making such changes may then require going back and checking the software compiler. “With this redundancy, or leaving out a processor and remapping, the compiler of course needs to understand,” said Johannes Stahl, senior director of product marketing for emulation and prototyping at Synopsys. “Hence, the need to test this compiler capability again through silicon.”

Put in perspective, the chip is changing, the algorithms both on-chip and in the test equipment are changing, and the manufacturing and packaging equipment sensors that utilize AI are drifting. So in addition to looking at all of this globally, the different pieces have to be addressed individually.

“An AI-enabled chip is built on top of a CPU and/or GPU,” said Schaub. “Thus, testing at the transistor level remains largely unchanged. The challenge becomes once there is an embedded AI algorithm, where the algorithm could be a ‘black box’. We need to come up with a reliable method to ensure the black box is performing properly.”

That requires a way of assessing the accuracy of those tests, and machine learning is being applied here. “Not all machine learning systems are equal,” said CyberOptics’ Skunes. “You want your machine learning algorithm to be effective. You want to get good performance really quickly. For example, machine learning algorithms such as AI2, where you teach by showing good/defect-free images, or images of defects, can improve processes and yields. The operator can quickly teach, then monitor, learn from the results, and improve and adapt by updating the training sets if required. We design our machine learning algorithms to be biased with the goal of no escapes, so no bad product leaves the factory.”

The final step in the fab is making sure the AI chip or system functions as expected, which is the job of system-level test. “From fab to wafer to package (FT), the test will still be at the transistor level, so not much changes there. It will be at system-level test, where the software is loaded with the AI algorithms, that things get interesting,” said Schaub. “As long as the AI/ML is static, which is where things are, this shouldn’t be much of a problem in the near term. Things will get interesting once we start deploying self-learning systems. With self-learning system, we will likely see specific AI calibration and diagnostics deployed in parallel that continuously monitor and check the AI itself.”

PDF’s David agrees. “You should constantly be validating your system — constantly. You can get confident that works on some past data. But going forward, am I really confident enough I’m going to release my production to this thing and trust that the system is going to work?” The answer is usually never completely 100% trust.

AI testing before fab — emulation
While all of this needs to be understood in the fab, it also needs to be fed back into the design process where it can be simulated and incorporated into design for test plans. Much work needs to be done on this front. AI software is not yet ready for emulation in the early design phases of AI systems.

“In the last 5 to 10 years, we had the canonical architecture of the CPU and the GPUs, and the memories and peripherals,” said Synopsys’ Stahl. “The whole design community knows how to do this, and they have the software to run on these chips already available, such as an Android or an iOS, or whatever needs to run on these silicon chips. So the problem was mostly to actually bring up the software on the hardware and try to do this pre-silicon to make sure that later on in the field there were no surprises — or later on in the bring-up, just after manufacturing, there were no surprises. The industry understood that over the last 10 years. We came in over the last 5 years with very fast emulation technology to allow this software bring-up process prior to silicon. So that was all done for normal CPU or processor-based chips.”

AI is different, and the compiler with AI software can be an issue. “AI increased that problem by a whole level, and here’s why,” said Stahl. “In AI, the software doesn’t exist as a standard software. It’s all application-specific. And not only does the software not exist for every one of these AI architectures, these companies have to develop a new software stack — a compiler that takes any of the AI applications and can compile them to run on their target architecture. And because all of these compilers are new, they can be buggy and inefficient. All of these AI companies create a software model of their silicon and then develop the compiler’s software stack-based software model, but at the end that’s not enough. So when we engaged a few years back with the first AI company, what was our target? They needed to run all these different versions of the software compiler on the actual hardware and figure out how it works. And then, over the several generations of customers, we had one customer that actually worked with emulation. Over the course of one year they optimized their software stack so that the performance of the chip was 30 times higher than when they started. You can see the potential of what they need to do for success in the market, but they need to all come up with the best performance for these chips so they can function in real life. That’s what we have done over the last several years.”

It’s still early days for AI/ML in semiconductor manufacturing and test. So while AI/ML holds great promise for ferreting out potential corner cases and finding latent defects, it cannot be trusted 100%.

PDF’s David recommends keeping a tight leash on AI/ML, rather than layers of algorithms used to monitor other algorithms. “If you’re creating machine learning algorithms to make predictions to fix other machine learning algorithms, it gets computationally very expensive,” said David.

And there’s always safe mode. If the customers’ confidence in the AI systems’ prediction is low, the fab or OSAT can always go back to the way they did things without the AI system, before improving the ML algorithms and models.

“Physical models and machine learning models are both predictive models,” said Onto’s Hao. “We found out that by combining physics and machine learning together, we can get the best performance. Machine learning is complementary to physics. It can help physics, but it is not going to replace physics.”

Related Stories

Too Much Fab And Test Data, Low Utilization

How And Where ML Is Being Used In IC Manufacturing

Using Machine Learning In Fabs

Finding Defects In Chips With Machine Learning