Application service providers manage huge and complex infrastructures. Like any complex systems, things could go wrong from time to time, due to various reasons (for example, network connection response problems, infrastructure resource limitations, software malfunctioning issues, and so on). As a result, the question of how to quickly resolve issues when they happen becomes critical to help improve customer satisfaction and retention.
Recently, the fast advancement of natural language processing (NLP) algorithms have helped solve many practical problems by analyzing text information. Powerful algorithms have been developed to interpret human language and derive predictions. Ensemble models are also well suited to further improve performances as they are able to explore the latent space better to take advantage of features/weights discovered from a group of trained models.
In this blog post, we introduce an effective machine learning algorithm to help application service providers manage and automatically resolve trouble tickets, and significantly improve user experience and operational efficiency. We will present a hybrid approach that uses both unsupervised clustering approach and supervised deep learning embedding to maximize the feature exploration and learning efficacy. Multiple optimized models are then ensembled to build the recommendation engine. This approach helps incorporate the most relevant information, explores the corpus better for a given problem and results in more consistent and robust predictions.
Conventional ticket handling
Trouble ticket handling is a time consuming and tedious process. Traditionally, it heavily relies on human knowledge. It is also very process driven and labor intensive. Figure 1, for example, shows the life cycle of a typical trouble ticket resolution process. From the time the incident is reported to the time when the issue is successfully closed, many repeated steps will occur.
First the ticket will be assigned to be categorized and allocated to a specific team. Then the triaging process starts, when multiple people go through many steps to identify additional information, carry out trouble shooting procedures before a resolution is derived. During this laborious process, additional work log information could be entered into the system, which provides additional information for further troubleshooting. Even with intensive training, a particular individual in the service team assigned to resolve a ticket, will not be able to cover all the variety of issues, problems and their resolutions, let alone resolve them efficiently.
Incorrect triaging could often happen, which may lead to a reopening of the issue and an extended time for resolution. Ticket reopening and the long lead time to resolution significantly impacts customer satisfaction, which could cause unwanted potential churn. Moreover, the human knowledge gained from extensive trial and error and years of experience could easily be lost when there is personnel turn over.
Figure 1: Trouble ticket processing process (manual vs. machine learning) – Courtesy of Shailesh Shrivastava.
We can replace the tedious and time-consuming triaging process with intelligent recommendations and an AI-assisted approach. The time to resolution is expected to be significantly reduced (up to 75 percent) without multiple ticket reopening and long triaging.
As indicated in Figure 1, a model trained with historical data and past resolutions could be used to map and encode the tickets with the most relevant resolutions ranked according to the similarity between new incoming tickets and any existing ticket in the system. The service personnel will then be able to have a ranked list of the most likely resolutions based on historical data with high accuracy, and use this information for fast triaging and assignment.
How to implement natural language models to process trouble tickets
In order to process the trouble tickets, the tickets need to be converted into a format that can be understood by a machine. Natural language processing (NLP) is a set of processes for translating natural languages (words and sentences) to something that a machine can understand and interpret. A series of preprocessing steps have to happen before the natural language can be converted to numerical representations (that a machine can understand), for example, stemming, stop word removal, lemmatization, tokenization, and so on. The language then needs to be grouped and encoded to derive interpretations. We can encode frequency (count vectorizer, TF-IDF) or inherent meaning (word embedding) or the inherent meaning of a sequence of words (LSTM, or Long Short Term Memory, LSTM with attention).
The following sections describe our approach to encode the trouble tickets.
Topic modeling is a type of unsupervised machine learning technique for discovering the abstract “topics” that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic modeling and is used to cluster text in a document to particular topics.
LDA was first applied to cluster the text corpus into a few topics and is an unsupervised approach. It requires frequent retraining to reflect the data drift. However, the discovered topics can provide good initial mapping to clusters. At inference, each new ticket is mapped to a specific cluster that has been trained, and the most frequent resolutions associated with the cluster of tickets can then be recommended. Figure 2 shows the result from topic modeling.
Figure 2-1: Topic clustering.
Figure 2-2: Topic clustering: top relevant topics after training.
Word embedding with deep neural networks
In our implementation of language modeling, we have used multiple deep neural networks to help encode the text input in the trouble tickets (Figure 3). Leveraging the learning capability of neural networks, we can capture not just the frequency of words but also the sequence and deep inherent information of the text in the latent feature space.
Ideally, a well-designed neural network can map the text, represented as features, into a latent space to create better similarity measures for downstream classifier or recommender. For example, in our case, we’d like the neural network to learn the mapping so that similar issues are mapped to the same correct resolutions. This mapping is called embedding (Figure 3). In the latent space, similar issues should be closer to each other, so that we can separate them from other issues easily using a classifier or a recommender.
Figure 3: Using deep neural networks to map text into latent space to acquire better similarity measurement.
To tackle the challenge of an evolving corpus resulted from variations of problem description styles and expressions, or even changes in the system caused by software upgrades, we have built our customized embedding by enriching a pretrained embedding with accumulated text and worklogs, information and knowledge captured overtime.
The model is trained with 3 different mapping techniques applied to connect trouble ticket to a known resolution using historical data (Figure 4). Specifically, index encoding, LSTM encoding and one-shot encoding. For each of these approaches, positive pairs (correctly mapped ticket to resolution) and negative pairs (wrongly mapped ticket to resolution) are generated from an existing corpus to train the model.
Index model is designed to encode the matching pairs based on previous resolved issues and learn the latent encoding to represent this match. During training, it’s treated as a binary classification problem, when positive pairs have positive labels (e.g., 1) and negative pairs have the other label (e.g., 0). LSTM encoding takes into consideration the word sequence in the given text and tries to represent that. A similar approach is used during training to train the network as a binary classification problem. Once trained, the latent encoding obtained from the trained neural network will be able to represent the characteristics from the patterns represented in the word sequence of long sentences.
In the case of one shot encoding for a given ticket, a triplet, formed by the positive match of the resolution (correct resolution) and the negative match of the resolution (randomly generated from the data), is used. The distance between the positive pair is minimized with a neural network, while the distance between the negative pair with the neural network of the same architecture is maximized. A customized loss function is designed to achieve both at the same time. In this way, we learn a neural network to generate embeddings that represent the tickets to differentiate the mapping to correct and wrong resolutions.
Figure 4: Deep neural network-based language models.
How does the ensemble recommender work?
In essence, a recommender is a ranking mechanism to rank and select the items with the highest probability. The probability can be defined based on similarity, some distance measure or a combination of many factors (for example, frequency, recency and so on). From the four models described in the previous section, we will be able to obtain top recommendations from each model by choosing the top ranked resolutions based on the distance metric (cosine similarity, for example). We then use a stacking mechanism to create new features from all the recommendations from the four models (Figure 5). A suitable machine learning model can then take these features and improve recommendations by using a simple multiclass classifier, as the number of candidate resolutions have been significantly reduced after going through the first round of the recommending process.
Figure 5: Training and inference for a recommender.
The experiment result using Bitex data  has shown the benefit of combining different approaches, both unsupervised learning and supervised learning to produce more consistent and robust predictions by the recommender engine (Table 1).
Table 1: Experiment result.
Topics and patterns discovered by topic modeling, using an unsupervised learning approach, provide a very stable initial baseline when the number of topics is properly tuned. The various deep learning approaches illustrated in this article are able to address different aspects of language interpretation problems in the context of trouble ticket resolution. Specifically, an index-based neural network is able to learn the latent encoding based on the matched ticket-resolution pairs with relatively low compute resource requirements.
LSTM is able to encode more detailed information from word sequences in a description from the positive and negative pairs of ticket-resolution matches. However, the compute resource consumption by LSTM is huge compared to other approaches  and a customized loss function. The hybrid approach combining these algorithms is able to give a good performance boost in terms of accuracy (or error) above what each individual model is able to achieve.
Given the compute resource requirement for LSTM, for a lightweight resource implementation, the ensemble with LDA, index encoding, and one-shot encoding can provide very good performance. For any implementation that can afford longer training/retraining time, adding LSTM in the mix for ensemble could provide superb performance.
We have also observed in practical scenarios when applying the learning algorithms, that domain knowledge – represented by how the description is curated and organized – plays an important role. This inherent knowledge itself, if handled properly and curated consistently over time, will provide great potential for the learning algorithms to continue to improve.
As a result of this design, and by ensambling multiple models with each capturing a specific aspect of the data, we’d like to encode (for example, specific topic patterns, differentiation between correctly and wrongly mapped resolutions, and a word sequence in a sentence), our recommender system is able to automatically resolve trouble tickets and reduce mean time to repair (MTTR) by around more than 40 percent.
What does the complete solution look like?
Our recommender engine is designed to map a new ticket to the correct resolution using our pretrained models. The recommender engine is integrated with the client application using a REST (Representational State Transfer) API. When a new ticket is created, the raw information is passed to our engine through the prediction endpoint. The ticket first goes through a Pre-Processer. The focus here is to ensure that only the relevant information is carried to the next step. The issue related information in the trouble ticket is often suppressed by the high amount of irrelevant content. Any such content in a ticket that adds no value to the task is considered noise. The noise that gets mixed up with the data needs to be eliminated. In NLP, this is done using various methods like regex matching, removing HTML tags, spell corrections, keyword mapping, and so on.
The cleansed data then flows to the recommender engine, where the trained deep learning model embeds the text information and maps it to a resolution based on probability. This recommendation for the ticket is fed back to the client application. A support staff member who works on the ticket will receive a recommended solution enabling them to resolve the issue faster. For well-defined resolution steps, the recommendation can be linked to a set of automated scripts, where a certain percentage of tickets can even be automatically resolved without any human intervention.
The predictions of our recommender are maintained in an in-memory cache. This integration ensures that the resolution can be easily retrieved without going through the recommender for previously predicted tickets if the contents of the ticket remain unchanged. If the ticket gets updated with more data, the recommender is called again, as we now have more information that can potentially help map the ticket to a more accurate resolution.
Figure 6: End-to-end recommender solution.
The quality of predictions of any machine learning model depends on the data it has been trained on. Each trouble ticket is mapped to its correct resolution (label) by training the model on historical tickets and their corresponding resolution status. The algorithm learns the inherent relationship between the attributes of the ticket and the labels, and applies this learned logic to new unseen tickets. However, this learning is an iterative process. As we expose our deep neural network model to more data, the algorithm will be able to provide recommendations with higher certainty.
To achieve this, we feed the human-verified predictions of each ticket back to the model using a feedback loop. When a support engineer closes a ticket, they update the root cause (label) of the ticket. This information along with the ticket, is fed back to our recommender engine through the feedback endpoint. We are in effect giving the model a chance to learn from what it already knows by reinforcing its training and tuning its parameters for a better recommendation.
The feedback loop updates the historical database with tickets along with their labels and retrains the deep neural network on this additional data. This continuous cycle of training the model, predicting resolution, and retraining the model with additional data, ensures that the model automatically learns from any change in the nature of the trouble ticket, thus continuously improving over time.
In the future, we plan to extend the existing framework to connect multiple content sources and ticketing systems to enrich the telecommunication specific corpus. This will enable better embedding, and better mapping, therefore further generalizing the current models and improving performance.
This article is the result of contributions from the following team members in various organizations within Ericsson: Lule Yu, Xuancheng Fan, Nicolas Ferland, Nikita Butakov, Jieneng Yang, Yi Li, Zhaoji Huang, Ashish Singh M, Shailesh Shrivastava, Sneha Wadhwa, Gregory Dutrieux, Yue Ju, Manoj Nambiar, Salman Memon.
“Bitext’s customer support dataset.” https://blog.bitext.com/freecustomer-support-dataset, 2015.
Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image recognition,” 2015.
Read our blog post on How to build robust anomaly detectors with machine learning.
Dive into our introduction to data-driven network architecture.
Connect with Alka and Wenting on LinkedIn.