An AI-focused neural network software engineer walks into a data shop says hello to the shopkeeper. “I’ll have two data preparation functions, one testing and debugging toolset, a couple of application log tracking systems and a bag of potatoes,” asks the engineer.
Okay it’s not a great joke, there’s no punchline and the potatoes part is definitely just a ruse, but the way we might build the Artificial Intelligence (AI) functions of tomorrow has a kind of composabe, package-able feel. If it’s not quite off-the-shelf AI, then its composable AI that brings together some of the core functions that smart systems use regularly. It’s still down to our neural network engineer to know the recipe and peel the spuds, but we can start to shop for many of the individual components needed now.
Manual AI development is so last year
It might sound almost counter-intuitive to say out loud, but as AI is getting smarter at AI. But this is good, because AI is also getting far more complex. These are data science software systems with extremely complicated convoluted structures that need to juggle a myriad selection of algorithms, data manipulation strategies and ‘data pipeline’ steps (such as Extract, Transform & Load ETL tasks) alongside other feature selection, training, testing, deployment and monitoring functions. We have, perhaps unsurprisingly, gone beyond the point where we can handle all of those things manually
But (and this is more good news) many of those individual component tasks, functions and jobs are now well understood and documented.
Co-founder and CTO of Tel Aviv based data science platform company Iguazio is Yaron Haviv. Explaining that his firm is currently driving its technology towards delivering real-time AI functions, Haviv says that what we need now is a way to create, test, debug and publish popularized functions. Then, AI engineers can compose Machine Learning (ML) pipelines from those functions. He suggests that we can even store a simple or repetitive pipeline as a reusable component and ultimately build a bigger pipeline from it.
“One of the main challenges in Machine Learning deployments today is the transfer time it takes to get from prototyping and research to production. The various functions on an ML team (everyone from data scientists, data engineers, DevOps, application developers) often work in silos with minimal collaboration. Moving to a composable ML/AI architecture is an inevitable step as the complexity of ML pipelines increases. We must move to an architecture that enables collaboration and reuse: It’s time to transition to a composable ML architecture,” argues Haviv.
MORE FOR YOU
Creating ‘Lego’ building blocks out of AI functions
So what are we talking about in practical terms? When a data scientist writes an AI training function or a piece of data preparation logic, its is then packaged up and made available for other software application developers. The next developer can grab it, tune it, strengthen its robustness and security (if needed, as per the level of mission critical business needs that it serves)… and variously tweak it depending on the nuances of its use case and their own personal preferences.
A shared AI function catalog is created for others to use and it can be shared across teams, across departments or across the wider community through open source repositories if appropriate. It’s all good says Haviv, but there are some guiding principles we all need to observe when creating and cataloging AI & ML components.
“All components must be documented, including details about their usage and parameters – and they should be ‘discoverable’ and so easy to look up from the catalog. Components must be versioned – and we must record the exact version we used on every execution. It should be easy to plug in different data sources and their security credentials without modifying the function code, so we can use it in different scenarios. We need a common mechanism to collect the outputs, logs and metrics of the function. Also here, we should be able to run the same functions in different locations: on our laptops for development and testing, on a local or remote cluster and on any cloud,” explained Haviv.
As a clarifying example, Iguazio’s Haviv points to MLRun as an open source project that implements composable ML. He notes that MLRun has elastic serverless functions that can run anywhere and batch or real-time pipelines can be built by composing multiple functions and executing them. Functions can be selected from a public or local marketplace, or custom functions can be written and published.
“AI and ML engineers should be able to compose a system on demand from a shared pool of resources, allocated as and when needed. This means fully automating and intelligently augmenting developer actions when building data-intensive applications. Data teams need tools for processing, orchestrating and analyzing complex data models. Composable ML/AI is a model that breaks down the silos between functions, so there’s no effort duplication, and everyone on the team can leverage the same components to build a new solution,” concluded Haviv.
The AI data engineer goes into a shop analogy (joke, of sorts) holds water to an extent. The shopkeeper could have replied, “I knew you were going to ask for those items, the last seven AI engineers that have walked in before you had exactly the same shopping list!” At least we can all agree on the fact that code refactoring and reinventing the same wheel for AI is not going to get the potatoes peeled any quicker.