The Potential of Machine Learning in CX

18 Mar, 2021

4 min read

Back to all posts >

The Potential of Machine Learning in CX

Using Machine Learning Algorithms to Predict Customer Satisfaction for Non-Respondents

Predicting likelihood to recommend or an overall satisfaction score based on survey responses is relatively simple. Especially when you are basing it on a well built survey. You could probably use a multiple linear regression or another statistical analysis, and in most cases you would get above 80% of explained variance. For most analytical purposes, this is good enough.

That is of course, assuming the customer answered a survey and rated different attributes of the interaction they had with your company. But what do we do with non-respondents? Can we predict the customer experience satisfaction, with just the operational data we get from the transaction with a customer, without their answers to the survey? You have probably heard about Machine Learning, it is THE BIG THING nowadays. It is a very powerful tool that can accomplish a variety of different tasks, which explains its wide usage.

In this short article, I will discuss machine learning applications in predicting customer experience by using an example from an electronics company call center.

There is no question whether we can or cannot predict customer satisfaction of non-respondents, because we certainly can.

The Data:

Let’s start with the data we have. Everytime someone contacts the company’s customer service call center, there is certain operational data being gathered: the customer’s identity, their wait time, their product and what issue they called to resolve, etc.

I used a dataset of 100,000 observations that had the following transactional factors:

Product category customer called about
Agent tenure
Interaction duration
Call Center name
Agent role (agent, trainer, supervisor, etc.)
Issue category

There are other data points that call centers collect that certainly would have been helpful for development of our model: wait time, how many times a customer contacted the call center in general and for the reported issue specifically, if they used chat to try and solve the issue prior to calling (if available), etc.

I specifically tried not to use any survey data, but this doesn’t mean you shouldn’t. For example, average agent satisfaction scores or customer reported resolution rate for each issue category would also have been good predictors of customer satisfaction. One of the reasons I used so few operational factors was the missing values. The fewer missing values, the better. Of course, this cannot always be avoided. Some assumptions and “intelligent guessing” need to be made for a model to work. I had to impute the ‘Interaction Duration’ factor and I used agent average interaction duration per product, assuming some agents are faster than others and issues with certain products take longer to diagnose and resolve.

The Method:

I chose to focus on a simple classification problem of predicting if a customer was unsatisfied with the interaction or not. This is based on the overall satisfaction question that customers rate on a scale of 1 to 5, where 1 is not satisfied at all and 5 is very satisfied. And in our case, “unsatisfied” means that a customer gave a score of 1 or 2 to the interaction they had with the call center agent.

I used the three most common machine learning algorithms for classification problems:

Logistic regression
Random forest
Extreme Gradient Boosting (XGBoost)

XGBoost performed better than the other two algorithms, although the difference between all three was minimal.

The Result:

There are several measures for performance of the machine learning algorithm, but for simplicity’s sake I will take the “Accuracy” – how many results the algorithm predicted correctly.

We split our 100,000 observations dataset into two: the training set (on which to train the algorithm) and the test set (which was used to test how well the algorithm would perform on the dataset it has not “seen” before).

The model we got was accurate 84% of the time in the test dataset. While there is room for improvement in the model given the limited available data, these results should only encourage you to pursue the development of your own machine learning algorithm to predict customer satisfaction.

The Applications:

Being able to comprehend a non-responder’s customer satisfaction has multiple applications. Listed below are a few examples.

I split the possible applications into two categories: damage control and improvement of customer satisfaction in the future. For example, if we are able to predict who is unsatisfied with the interaction they had based on their operational or transactional data, we can proactively reach out and try to mitigate the damage and prevent negative word of mouth. Closing the loop with dissatisfied customers is one of the best ways to retain customers and avoid churn.

Although the results of the machine learning algorithms do not give you the impact of each variable on the outcome as a linear regression would, it does list the importance of each factor on the model. In our case, the most important factor was interaction duration. This connection needs to be further investigated using other analysis methods, but importance tells you what operational factors are impacting customer satisfaction. Improvement of these operational factors is under your control rather than subjective customer perception.

Although I developed a very simple classification model of customer satisfaction, more complex predictions can be done such as predicting if a customer is a promoter, passive or detractor. This will give you more visibility into NPS and customer loyalty, rather than satisfaction with a single interaction. With more customer personal data available, we could even match the customer to an experience or a product that they are most likely to perceive favourably.

There is no question whether we can or cannot predict customer satisfaction of non-respondents because we certainly can. Ultimately, it is a question of data availability and your willingness to invest in developing your own machine learning model to predict customer satisfaction.

Janna Lemster, Data Scientist

#marketresearch #solution #cx #leadership #research #consumers #business #products #platform #teams #marketing #infographics #digitalmedia #dataanalysis #AI #machinelearning #predictions #experience #ogcglobal

Back to all posts

← What is subscription fatigue, and are we there yet? Mindset Matters →

You may also like…

Simpson’s Paradox in CX Insights

Designing Smarter Customer Trials

Research Communities are Changing the Game

Let’s grow together