AI/ML workloads in containers: 6 things to know

Two of today’s big IT trends, AI/ML and containers, have become part of the same conversation at many organizations. They’re increasingly paired together, as teams look for better ways to manage their Artificial Intelligence and Machine Learning workloads – enabled by a growing menu of commercial and open source technologies for doing so.

“Tooling and processes for running machine learning at scale in containers has improved significantly over the past few years.”

“The best news for IT leaders is that tooling and processes for running machine learning at scale in containers has improved significantly over the past few years,” says Blair Hanley Frank, enterprise technology analyst at ISG. “There is no shortage of available open source tooling, commercial products, and tutorials to help data scientists and IT teams get these systems up and running.”

Running AI/ML workloads in containers: 6 key facts

Before IT leaders and their teams begin to dig into the nitty-gritty technical aspects of containerizing AI/ML workloads, some principles are worth thinking about up front. Here are six essentials to consider.

[ Check out our primer on 10 key artificial intelligence terms for IT and business leaders: Cheat sheet: AI glossary. ]

1. AI/ML workloads represent workflows

LIke many other workload types, AI/ML workloads can also be described as workflows, according to Red Hat technology evangelist Gordon Haff. Thinking in terms of workflow can help illuminate some basic concepts about running AI/ML in containers.

With AI/ML, the workflow starts with the gathering and preparation of data: Your models won’t get very far without it.

“Data gets gathered, cleaned, and processed,” Haff says. Then, the work continues: “Now it’s time to train a model, tuning parameters based on a set of training data. After model training, the next step of the workflow is [deploying to] production. Finally, data scientists need to monitor the performance of models in production, tracking prediction, and performance metrics.”

Haff describes this workflow in straightforward terms, but doesn’t discount the amount of effort that can be involved in terms of people, processes, and environments. Containerization can simplify that effort by bringing greater consistency and repeatability.

A container platform-based workflow enables self-service for data scientists.

“Traditionally, this workflow might have involved two or three handoffs to different individuals using different environments,” Haff says. “However, a container platform-based workflow enables the sort of self-service that increasingly allows data scientists to take responsibility for both developing models and integrating into applications.”

[ Want best practices for AI workloads? Get the eBook: Top considerations for building a production-ready AI/ML environment. ]

2. The benefits are similar to other containerized workloads

Nauman Mustafa, head of AI & ML at Autify, sees three overarching benefits of containerization in the context of AI/ML workflows:

Modularity: It makes important components of the workflow – such as model training and deployment – more modular. This is similar to how containerization can enable more modular architectures, namely microservices, in the broader world of software development.
Speed: Containerization “accelerates the development/deployment and release cycle,” Mustafa says. (We’ll get back to speed in a moment.)
People management: Containerization also makes it “[easier] to manage teams by reducing cross-team dependencies,” Mustafa says. As in other IT arenas, containerization can help cut down on the “hand off and forget” mindset as work moves from one functional group to another.

While a machine learning model may have different technical requirements and considerations from another application or service, the potential benefits of containerization are quite similar.

Audrey Reznik, data scientist at Red Hat, points to increased portability and scalability of your AI/ML workloads or solutions – think hybrid cloud – as one example. Reznik lists less overhead as another.

“Containers use less system resources than bare metal or VM systems,” Reznik says.

This helps lead to faster deployments. “I like to use the phrase ‘how fast can you code,’ because as soon as you finish coding we can deploy your solution with a container,” Reznik says.

[ Public data sets can also help with speedy results. Read also: 6 misconceptions about AIOps, explained. ]

3. Teams need to be aligned

Just because you make the workflow more modular doesn’t mean everything – and everyone – no longer needs to work well together.

“Operations engineers may be familiar with running Kubernetes, but may not understand the specific needs of data science workloads.”

“Make sure everyone involved in building and operating machine learning workloads in a containerized environment is on the same page,” says Frank from ISG. “Operations engineers may be familiar with running Kubernetes, but may not understand the specific needs of data science workloads. At the same time, data scientists are familiar with the process of building and deploying machine learning models, but may require additional help when moving them to containers or operating them going forward.”

Containerization should improve alignment and collaboration (thanks to consistency, repeatability, and other characteristics), but don’t take this benefit as a given.

“In a world where repeatability of results is critical, organizations can use containers to democratize access to AI/ML technology and allow data scientists to share and replicate experiments with ease, all while being compliant with the latest IT and InfoSec standards,” says Sherard Griffin, director of global software engineering at Red Hat.

Let’s look at three additional principles to consider carefully:

Running AI/ML workloads in containers: 6 key facts

1. AI/ML workloads represent workflows

2. The benefits are similar to other containerized workloads

3. Teams need to be aligned

Hannah