It doesn’t matter if you are a start-up or a large corporate, you will know whether your AI platform is on fire or not! On the downside, being “on fire” could mean you are struggling to achieve more than 60% model accuracy, or you can only handle data velocity in the realms of a few thousand tps. On the positive side, being “on fire” could mean your platform easily handles a sustained throughput of billions of data rows per second with >90% model accuracy.
Of course, in reality performance is always relative to the problem you are trying to solve – 90% accuracy may be great if your platform is a recommender engine or in fraud detection scenarios but terrible if it is predicting stress-induced mechanical failures in aircraft engines!
There are many challenges in performance tuning your AI/ML platform that extend well beyond just physical infrastructure characteristics such as CPUs/GPUs, memory size and network bandwidth. Putting these to one side, if you don’t approach Data Preparation and Model Tuning in the right way you will likely find that both performance and budgets can be quickly wiped out.
Here are some key takeaways:
1. Performance Profiles – Start with a detailed and accurate performance profile to understand where your pain points are and where you are on fire!
2. Data Pipelines – Optimising data pipelines will be heavily impacted by key factors such as the underlying tech stack (e.g. instance types, cluster sizing, stream processing engine, ML framework, etc), data types (geospatial vs timeseries), data volumes and approach to handling missing data values. Take the time to understand which engines/frameworks are best suited for your problem domain, the nature and quality of the data sources you plan to use and what level of data cleansing/merging/enriching may be needed vs. potential impact on model accuracy.
3. Model Trade-offs – Have you achieved the right balance in model trade-offs between model size vs complexity vs latency vs accuracy? Adjusting the mix between these can have significant impacts on model performance.
4. Model Tuning – Profile/test/optimise hyperparameters and feature selection/engineering to minimise under/overfitting. While there are many potential sources of poor model accuracy, ensemble methods (Bagging, Random Forest, Boosted Methods) can often provide an alternative approach to improving accuracy – just make sure your data “House” is in order first!
Read the full article on Medium @ https://bit.ly/33DjjWQ.