The manifold machine learning techniques available to the enterprise for solving any particular business problem are so expansive, they can easily overwhelm organizations.
The array of options includes advanced and classic machine learning approaches. There’s the familiar supervised learning and unsupervised learning dichotomy, as well as more emergent varieties pertaining to contrastive learning, reinforcement learning, and self-supervised learning.
Additionally, there are choices involving graph analytics, deep neural networks, segmentation, behavioral analysis, and numerous other techniques. When trying to solve an intricate business problem at scale—such as fortifying Anti-Money Laundering measures to counteract financial crimes—howsoever are organizations to decide which approach has the greatest efficacy?
With ensemble modeling, that question (and all the painstaking research that goes into answering it) becomes much less relevant. This machine learning approach enables organizations to utilize an assortment of models and combine them, and their predictive accuracies, to get the best result.
The diversification of this method is critical for providing the full context necessary for high-dimensionality data regularly used in financial services, fraud detection, and cyber security. The hodgepodge of models involved in ensemble modeling allows organizations to “keep them, by design, very diverse,” acknowledged Martin Rehak, Resistant AI CEO. “We don’t want a single kind of model to be overwhelming.”
That diversity enables organizations to employ an array of different algorithms to assess respective aspects of a business problem for a fully informed, consensus approach to decision-making—that’s also happens to be explainable.
Consensus-Based Model Decisions
The ensemble modeling rationale is irrefutable. Instead of data scientists taking exorbitant amounts of time to devise perfect models for business cases, they can simply combine imperfect ones to aggregate their predictive prowess. “When you look at machine learning in the ensemble approach, you build the decision from small algorithms,” Rehak noted. “And, those algorithms are combined dynamically, in our case, for each transaction so as to build the optimal decision.” More importantly, perhaps, each of those models can hone in on a specific aspect of the task at hand, such as identifying instances of money laundering.
For example, one model might focus on the size of transactions. Another might pertain to its location. A different model could examine which specific actors are involved in the transaction. The goal is a situation in which, “we don’t have any peaks,” Rehak explained. “We have very flat distributions, and these correspond to very weak evidence. And, by combining many elements of weak evidence, we actually reach a more robust decision.” An added benefit is that by using classic machine learning and simpler models, there’s less training data (and annotations) required to put models into production. They’re also more explainable than are deep neural networks that necessitate greater amounts of training data.
It’s important to distinguish the flatter distribution of the ensembling approach Rehak described from other ensemble modeling techniques. The most prevalent examples of ensemble modeling involving either bagging or boosting (the latter of which can entail Xtreme Gradient Boosting). Random Forest is an example of boosting that’s predicated on the combination of different decision trees. With this methodology, “you build ensembles, one by one, based on the previous version in the set,” Rehak commented. Although this approach is a rapid way to build models with commendable predictive accuracy, it runs the risk of overfitting (in which models become less applicable to production data because their training datasets are too small).
Rehak’s ensemble approach is preferable for AML use cases because it’s based on the broader context impacting these occurrences. “If you ask a money laundering expert if a transaction is malicious or not, the first thing they do is look at the history of the account and how the person behaved in the past,” Rehak mentioned. With his approach, the sundry of contextual factors pertaining to geographic location, time, respective parties and financial institutions, and more are examined by individual machine learning models. Only by forming a composite of each of the results of those models does the underlying AI system determine there may be criminal transactions. Consequently, there is a marked reduction in false positives. “We can explain away lots of the outliers that would otherwise flood the Anti-Money Laundering teams,” Rehak remarked.
When ensemble modeling for this use case, it’s not unusual to have upwards of 60 models analyzing different aspects of transactions. The real-time ramifications of this approach are ideal for this application. “One of these 60 algorithms would segment everything into segments and then model average transaction size per second,” Rehak disclosed. “We could have thousands of segments, and this is all dynamically updated at the same time.”
With the plenitude of models incorporated into such an ensemble, each of which is evaluating different facets of a transaction for potential criminal behavior, it’s difficult to create a more comprehensive approach. “We look at you from so many angles that shaping your behavior to avoid all of them at once becomes very hard,” Rehak revealed. “Because, [malefactors] are essentially trying to avoid not one decision boundary, but plenty of dynamically moving and shaking decision boundaries. Each of those algorithms is learning independently and then we combine them together.”
There are numerous dimensions to how these ensembles reinforce explainability and interpretability. Firstly, they don’t have an undue reliance on advanced machine learning and may consist solely of simple, more explainable algorithms (involving traditional machine learning). Those models become the building blocks for evaluating individual considerations for why certain transactions may be criminal. “When we say something is important we can tell you why,” Rehak said. “We can tell you which indicators point to this. We can essentially write a report for each finding that says this should be a high risk because of these factors.” Although each of the respective algorithms concentrates on a particular feature, not all of them are given equal weight in the model. It’s not uncommon for algorithms involving graph analytics (which excel at examining relationships) to have a greater weight than those of other model types.
Models can explain instances of suspicious behavior and why certain events are outliers. “Typically we have four or five algorithms in the ensemble that are dominant, that say I believe this is an outlier and someone else concurs, someone meaning algorithm,” Rehak specified. “And, we have four or five that trigger and kind of skew the outcome towards being an anomaly.” Because individual models assess just one factor in a transaction, they provide interpretability for their scores and full explainability with words. “Because we know it’s this ensemble, we know what is the micro-segment, and we know it’s the size of the transaction, we can easily have a textual output next to the score saying the size of the transaction is way too high for the economic sector of this company,” Rehak added.
Ultimately, the use of ensemble modeling exceeds any one application, its enormous helpfulness for AML activities notwithstanding. When properly applied, this technique can boost explainability and interpretability, while diminishing the amounts of training data and annotations necessary for solving business critical problems.
It’s also a means of utilizing a variety of data science techniques for addressing these issues, instead of limiting them to just one or two. Consequently, this democratic approach will likely continue to typify some of the most meaningful statistic AI deployments in the future.
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1