Challenges In Using AI In Verification

Pressure to use AI/ML techniques in design and verification is growing as the amount of data generated from complex chips continues to explode, but how to begin building those capabilities into tools, flows and methodologies isn’t always obvious.

For starters, there is debate about whether the data needs to be better understood before those techniques are used, or whether it’s best to figure out where to use these capabilities before developing the techniques. This may seem like nitpicking, but it’s an important discussion for the chip industry. Applying AI/ML can affect time to market, reliability, and the overall cost of developing increasingly complex systems. And if the industry ultimately decides that the solution is a combination of both approaches, it likely will require more iterations and standards.

“This is both yin and yang,” said Paul Cunningham, corporate vice president and general manager of the System & Verification Group at Cadence. “AI is only empowered by the data on which it’s based, so which comes first? The AI or the data? Data in verification is very central. It’s important to not overlook that. In fact, if you imagine you already have billions of gates in your design, but now every one of those gates can actually have a waveform. It’s going to behave and insert waves based on an actual use case. You’ve got an infinite number of possible use cases, and then you can record the whole waveform. If you saved a full trace, a simulation trace, or a waveform dump for an entire SoC verification and tried to keep the totality of all that data, it would be absolutely off the charts. It may be in the petabytes, but hundreds of terabytes at least. You then have to answer the question of what to look at, what is good to keep, and how to be efficient about that.”

Put simply, automation has allowed design teams to create more complex chips, which in turn require more automation to hit market windows. This is particularly evident with verification management systems.

“It’s not just a glorified spreadsheet,” Cunningham said. “You’re talking about a big data platform. You’re talking about something that has to run across multiple sites and multiple live projects. You have to have redundancy and scalability. It needs APIs. As more people are querying, the APIs have to scale. You have to have a proxy server architecture.”

The vision is to have multiple engines running — simulators, formal tools, emulation, prototyping — each of which is generating data that goes into the verification management system. But all of that data has to be structured and analyzed in order to find bugs — irregularities that may or may not cause problems — and it needs to put all of that into the context of how a chip will be used over time. Moreover, all of this has to happen quicker than in the past, and with a better understanding of how the application of different tools at different times and under different conditions can impact coverage, yield, and reliability.

“The whole history of EDA has always been around the single event,” Cunningham said. “I will place and route a design. I will run a test. Nobody’s looking at the campaign level. You might run place-and-route thousands of times in lots of different experiments. You’re going to run verification lots of different ways, and lots of different tests, so it’s about the logistics of chip design and verification. That’s the next era and that’s where AI can play a huge role. It’s about learning from what I did yesterday and being better today.”

This is easier said than done, even with the best of tools.

“If you take the literal translation of machine learning and artificial intelligence, there is the learning part of it,” said Neil Hand, director of marketing for the Design Verification Technology Division at Mentor, a Siemens Business. “But over time things are changing. You give it different sets of data. It’s going to evolve over time. For core verification, you often can’t do that. You can’t take the set of data that is from 10 different customers, learn from that, and build a model that can help others because customers don’t like you sharing that data, even if it is extracted, abstracted, put into a model. It’s impossible to get anywhere near the original information because of the resistance there. That doesn’t mean we’re not doing it in other ways. There is a role for data analytics to, for example, identify coverage holes to say, ‘You’ve got all this data. Here’s where we think you should be looking.’ And it’s not strictly AI and ML. It’s using some of those algorithms, but mostly just deep data analytics.”

Customers have been asking for more efficient ways to utilize data for some time. “They want that data to do AI and ML over a period of time over multiple designs, but they need the training data,” said Hand. “So we did that. There is still a place for AI and ML in core EDA, because you want it to replace the heuristics methodology we have used over decades. You can never do all of the verification.”

The key is being able to focus on the right areas. With AI and ML, data from internal datasets can replace some of the heuristics in the engines, but it’s a balancing act of when to switch from heuristics to AI/ML. That requires an in-depth understanding of the data set you’re working with, how that applies over time, where is that data coming from, and what are the challenges in coverage closure.

Hand pointed to three challenges in coverage closure. The first is what coverage did you define, and is that sufficient? The second is gaining sufficient insight into a design to understand what is being simulated. And third, where should resources and energy be focused.

“If you’re looking at data analytics, for the first challenge, there’s not a lot to play there,” he said. “With coverage closure, you’re going to throw AI/ML at it. It’s going to be better than heuristics. We’ve been doing that with graph-based learning. We’re looking at upgrading those algorithms to a new class of algorithms. In formal, we’ve been doing reinforcement learning and will be using that to hit coverage points. Those together will give a set of data — which deep data analytics can be applied to ask where to put the effort based on the test time, based on requirements that can even be pushed down from the PLM level. Where I should spend my effort?”

Work is also underway with users on a combination of AI and data analytics. “This is where you look at what you tested, and you feel comfortable there,” he said. “You’ve looked at how you identify the holes in what you’ve tested. But what about the other stuff like bug hunting? If you have a jar full of black jelly beans, with only a few colored ones in there, that’s the verification stage you’re hitting. You’re looking for the colored jelly beans, not the black ones. How do you identify those? That’s where a lot of the energy needs to go — to understand what we have and haven’t tested, and how we should go on that bug hunting route. It’s not going to be just data analytics. It’s not going to be just ML/AI. It’s going to be a combination of all of those algorithms together.”

Fig. 1: Why there is so much data. Mobile chip designs with more than 40 million gates. Source: Mentor/Wilson Research Group Functional Study, Aug. 2020

Chicken and egg
So which comes first? It depends on who you ask, where they are in the flow, and what they’re trying to accomplish.

Darko Tomusilovic, lead verification engineer at Vtool, believes the correct approach is to start with the data.

“Look at the data,” Tomusilovic said. “Focus on the problem you have. Decide if there is a right AI algorithm that is suitable for resolving your question. Again AI is not always the answer in the verification. Whatever problem we have, do not think AI will magically resolve it. We have to remember verification is a process which humans solve with the help of additional tools. Being an engineer means we choose in life to look at the problems and try to find the best solution. AI will always be some kind of helping tool. It will always require quite a lot of human intelligence, as opposed to some other areas in which is just the click of a button and you’ll get an answer.”

Over the past 18 to 24 months, perceptions about AI/ML have changed significantly. Initially, the idea was that AI would replace experts entirely, but that has shifted to AI being a useful tool for those with domain expertise. That has a big impact on discussions about where to start with AI/ML in verification.

“Let’s say that you have some kind of Google app in which you show an interest in visiting Prague and Berlin,” said Tomusilovic. “The AI will help you to also choose some other destination, which is similar. Basically, a huge amount of data is processed, and then you get some answer based on it. In verification, it will be much different because the key here will be to understand how to utilize the AI in order to get to providing the meaningful data, and then to ask the right questions in order to get meaningful data out of it. This means some human intelligence is needed to process the results that the AI provides. In general, AI can be seen as a right hand to application engineers, as opposed to the team, which sorts out the problems. Human intelligence must provide meaningful data to the AI. It must be processed, well structured, not just throwing a lot of garbage data in, and expecting the AI to do wonders. Just by the fact that we must structure it, and that we must provide meaningful data, it will require us to think in a way that is suitable for the AI. This is a good first step toward some kind of automation. The next step is the AI processes this information using the algorithms. Then the next critical phase, where humans should be involved, is the analysis of the results. It can be very helpful even if they put in some data, and the AI runs some algorithms over them and it does not provide any results. The fact that you understand there is no correlation between the inputs you provided and the outcome provided by the AI can be helpful because you will not waste your time to reach the conclusion.”

Hagai Arbel, CEO of Vtool, said this will be true for the next decade or two. But after that, he said AI likely will be used to design chips. When that happens, the current design/verification paradigm collapses. “We see some of it today. If you have parts of a chip being generated automatically by a script, and if you validate the script, then you don’t need to do verification.”

Arbel stressed that it’s important to understand what humans are good at, and what computers are good at. “After we understand that, then we can apply all kinds of techniques. Some of them are machine learning. Some of them are AI, supervised or unsupervised. But first, we must understand humans are very good with visuals, but very bad at remembering things. Humans are very bad in processing a lot of text or crunching numbers. Verification is a lot of things, but if you take the debug process, for example, one reason it takes a lot of time is simply because people cannot remember where they were. They cannot remember the previous step that led them to this step of debug. They cannot remember that yesterday they debugged the exact same thing and they already found the bug. They simply cannot hold all the information in their heads. And here is where computers in general, and machine learning and AI specifically, can help a lot.”

This is easier said than done, however. Verification is still a new application of AI/ML, and there are unresolved issues.

“If you have supervised machine learning, you need a good set of labels in order to train software to recognize issues,” said Roddy Urquhart, senior marketing director at Codasip. “It is not clear how to get good label sets. With unsupervised machine learning you do not have labels to begin with, but you aim to get insights by finding patterns in the data. With SoC verification, there are a lot of variables — the functionality your design is supposed to have and the DUT itself can be very varied. Getting a generic approach to improving verification seems very tough indeed.”

Another often-overlooked problem is data privacy. “The classic supervised machine learning approach is to take big datasets, train them offline, and then deploy classification systems,” Urquhart said. “At a recent conference, some speakers pointed out that in manufacturing, the reliability and performance of machines in a plant was very sensitive. They could not imagine sharing data with a third-party AI/ML company, which then potentially could deploy the results of the training on their competitor. Some companies have addressed the problem by installing ML units on machines at a customer and that learn in situ.”

Some applications of AI/ML work better than others. “Applying AI/ML or predictive algorithms to smaller problems, such as identifying useful seeds in a regression, are more likely to yield results than trying to tackle coarse-grained verification challenges,” he said. “This approach also can work locally (specific design, specific customer), rather than requiring huge amounts of data to be shared with an EDA vendor for training.”

Understanding the data
Regardless of which approach is taken, the goal is to understand the data, and that can vary greatly from project to project.

Sergio Marchese, technical marketing manager at OneSpin Solutions, noted there are two main sources of data in verification. The first is the data that EDA companies have. The second is generated by an organization from a specific project. “You need both to develop robust solutions. Many modern tools have already started to leverage both sets of data,” he said.

Along with other EDA providers, OneSpin has accumulated an enormous amount of test cases over the past 20 years, beginning from its genesis in Siemens R&D. “If you consider the slow evolution of RTL coding, these test cases cover a considerable number of modern hardware coding patterns. OneSpin uses ML algorithms to analyze a large number of design characteristics and automatically configure proof strategies. This reduces the need for users to learn and fiddle with the details of the proof engines parameters and speed up proofs automatically,” Marchese explained.

ML-type algorithms seem to be well suited for this task. What is missing, he said, is data on RTL bugs, and associated fixes. “Many bugs ultimately fit into a well-defined category, and could be identified and perhaps fixed automatically. Most companies use formal tools to find bugs early during RTL development. However, these tools only detect a limited number of bug types, and much more could be done by having large and high-quality data sets. In more general terms, while it is clear that verification generates a lot of data, I am not sure it produces the most valuable data. This discussion in a sense comes before deciding whether AI is the right algorithm or not.”

There’s also the question of how to analyze verification data, and how much data collected at an earlier stage can be applied.

“In terms of the flow, verification is one step even before we get into a fast synthesis mode,” said Arvind Narayanan, director of product marketing at Synopsys. “You’re coding RTL, you’re running verification, you’re making sure that all the requirements needed for verification are being satisfied before you even start the synthesis run. In terms of the data that’s being collected, and the models that are being built, that is common. That data can be used by any engine in the platform, whereby you have this humongous data set that can be collected, that can be harvested, and regardless of where you are in the design cycle — whether it is implementation, whether it is verification — the goal is to learn from the existing data set and build models so you can predict what’s going to come next. The same is applicable for the verification cycle. The same is applicable for the design cycle. It helps improve runtime, for sure. And in a lot of cases, it also improves your PPA. So it helps in both ways.”

Thinking differently
What all of this comes down to is a need to think about design and verification data from the standpoint of leveraging advanced algorithms in data analytics and AI/ML. There is no shortage of data. The challenge is how to harness it.

“The whole world is grappling with a data explosion, so we really need to not reinvent the wheel,” said Cadence’s Cunningham. “The information flow, from the end application down to the underlying layer — all of it is on the same bare metal, the same cloud, and we’re going to buy the same kind of disks or storage. At some point in that stack you really need to get a horizontal sharing of insights.”

Others agree. “If you look across EDA, the place where, whether it be data analytics, whether it be AI and ML, whatever the terminology that you use, it all comes down to the fact that we can generate a huge amount of data in verification. We expose today a fraction of that data. The amount of data we expose from a vendor perspective is tiny compared with the data that exists there,” Hand noted.

That data can be utilized more effectively in a collaborative effort between design teams and EDA vendors, as well. “We’ve always been generating more data than anyone looks at, along with transitions in methodology that say, ‘We want to expose this data,’” Hand said. “Coverage is no different. Coverage-driven verification came in, the simulation stayed the same. It was simply expressing the data users wanted to expose by creating the APIs to expose that data. But as we move into this new realm where things get more and more complex, you’re dealing with orders of magnitude more data. We cannot brute force it. It will need multiple engines. There will be mixed simulations for more hardware assist. There will be a need to mix different types of coverage data.”

Finally, if there are a few months before tape-out, users could get help to find bugs they didn’t anticipate, and that are not expressed in coverage goals. “A lot of that is going to be data in the engines. It is going to be new data analytics platforms. But it’s also going to be creating new linkages such as time requirements into test planning, into verification management, and linking those together for full visibility. If you’re building a system of systems, or even just a large system, you suddenly can see an unacceptable risk because three or four levels down in the requirements linked to a complex test plan, linked to a set of coverage goals, there’s a discrepancy. That kind of data becomes really interesting and incredibly valuable.”

Hannah