What is the next step I need to take to close his deal? What will this customer ask for next and how can I drive it to him? What is the shortest path to close a deal?
How much do all my marketing and sales activities really cost? How much does one action or marketing channel costs?
All these are common questions marketing and sales are facing with on daily bases.
Luckily, there is an answer.
Here I represent the advantages on using machine learning models that will produce multichannel attribution models. I strongly recommend you to read it.
Knowing how much you spend on each of the marketing and sales channels and activities is from essential importance for your business success.
Channel attribution is an important and useful concept in interactive marketing. It is helpful for many managerial problems. The most obvious question is the budgeting of marketing expenditures for customer acquisition. Also, proper attribution can help to allocate spending across media (mail vs. telephone vs. television), vehicles (list A vs. list B), and programs (gift vs. special price), as well as to inform decisions concerning retaining existing customers. In many cases, companies implementing proper attribution models can achieve a strategic competitive advantage.
Besides channel attribution, modeling customer relationship is fundamentally essential. Once you have successfully modeled channel attribution and your customer relationships, you can gain a unique understanding of your customer, how each of them interacts with your offers and most importantly, you can oversee possible future interactions and outcomes of each.
Marketing multichannel attribution in practice
I have been experimenting with a couple of different techniques to set up a good LTV model and create a successful model for setting a customer relationship. From all the experiments I concluded that Markov Chain Models (MCM), is most appropriate for modeling customer relationships and calculating LTV.
Here is why:
The most significant advantage of the Markov chain model is its flexibility. The Markov Chain Models can handle both customer migration and customer retention situations. It is important to point out that Markov Chain Models can apply either to a customer or a prospect.
Thanks to its flexibility, the Markov chain model can be used in many situations that were not covered by previous models.
Another advantage of using Markov Chain Models is, the model accounts for the uncertainty surrounding customer relationships. This is possible because Markov Chain Models is a probabilistic model.
As an outcome of using Markov chains, we get a probability and expected value. In that way the model allows one to talk about the company’s future relationship with an individual customer.
As direct marketers move toward right onetoone marketing, their approach will also change. Instead of talking about groups or cohorts of customers, direct marketers will talk about a specific person, John Smith, for example. Marketing and sales will speak about the probability John Smith will be retained, instead of talking about retention rates,
Instead of discussions about average profits from a segment of customers, now we can have conversations about the expected benefit from the company’s relationship with John Smith.
Because the Markov chain models give probability and the expected value of a specific action, it can be ideally used for facilitating right onetoone marketing. Moreover, thanks to that you can introduce the correct personalization to each of your marketing campaigns.
How does Markov Chains model work?
Markov chains model a stochastic model that is describing a sequence of events in which the probability of each event depends only on the state attained in the previous event.
The contents of the sequences are determined by the Markov order, which ranges from 0 to 4:
 Order 0: Doesn’t know where the user came from or what step the user is on, only the probability of going to any page. For example, you have a set of products. Markov chains form order 0 will give you just the likelihood of outcome to happen for a specific Market without knowing any other information ( i.e., buyer history).
 Order 1 – “Memoryless”: Looks back zero steps. You are currently at Step A. The probability of going anywhere is based on being at that step.
 Order 2: Looks back one step. You came from Step A (Sequence A) and are currently at Step B. The probability of going anywhere is based on where you were and where you are.
 Order 3: Looks back two steps. You came from Step A > B (Sequence A) and are currently at Step C. The probability of going anywhere is based on where you were and where you are.
 Order 4: Looks back three steps. You came from Step A > B > C (Sequence A) and are currently at Step D. The probability of going anywhere is based on where you were and where you are.
Example of Markov Chains for multichannel marketing
Let’s look at an example of the firstorder Markov chains. It is called “memoryfree” because the probability of reaching one state depends only on the previous state visited.
For instance, customer journeys contain three unique channels C1, C2, and C3. Additionally, we need to add three individual states to each Markov chains graph:
 start – representing a starting point
 conversion – purchase or conversion
 null – unsuccessful conversion.
Transitions from identical channels are possible (e.g., C1 > C1) but can be omitted for different reasons.
From the example above, we can see all the possible interactions done previously by our users. We have the following:
 start > C1>C3> unsuccessful conversion
 start>C1>C2> conversion
 start>C1> unsuccessful conversion
It’s easy to conclude that Channel 2 is essential for us to close a deal. But how important it is?
Calculating the probabilities for conversion from one state to another
To calculate the probabilities for conversion from one state to another it helpful to create a table of all states.
From  To  Probability 
Start  C1  3/4 
Start  C3  1/4 
C1  C2  1/3 
C1  C3  1/3 
C1  null  1/3 
C2  Convert  1 
C3  Null  1 
Now our graph will look like this:
Now that we know what the probability of conversion from one channel to another is, we can calculate how important each channel is.
Calculating the importance of a channel
To estimate the importance of each channel, we will introduce the term Removal effect.
The principle of removal effect is to remove each channel from the graph consecutively and measure how many conversions could be made without that channel.
The logic is the following: if we obtain N conversions without a specific channel compared to total conversions T of the complete model, that means the channel reflects the change in total conversions (or value). In the end, channels are estimated: we have to weight them because the total sum of (T – Ni) would be bigger than T and it usually is.
For a start, let’s remove one channel.
Therefore the probability of conversion is 0. Meaning we can’t remove Channel 1 in any case. In this case, its good idea to ask your data science team, check why C3 isn’t performing.
Let’s take a look another example:
In this example, we have a 50% probability that C3 will lead to another channel. In this case, we aren’t dependent only on C1 for success, but C3 can also bring us some profits.
Let’s calculate how important each of the channels is.
What is the probability of conversion?
First, what is the probability of conversion for the complete model?
We need to calculate the probability of each of the conversion paths and sum them up together.
 start>C1>C2>convert = 0.5*0.33*1 = 0.165
 start>C1>C3>C2>convert = 0.5*0.33*0.5*1=0.0825
 start>C3>C2>convert=0.5*0.5*1=0.25
From above we can calculate that the probability of conversion of the complete model is 0.165+0.0825+0.25 = 0.4975 or 49.75%.
How important each channel is?
For simplicity sake, let’s assume that we have only one conversion.
Lets calculate the probability of conversion without C1:
start>C3>C2>convert = 0.25
Removal effect of C1 is 10.25/0.4975 = 0.497
That means that we will lose 49.7% of all opportunities if C1 was remowed as a channel.
Similarly goes for C2 and C3.
The probability of conversion without C2 is 0, so the removal effect of C2 is 100%, meaning we will lose 100% of our opportunities if we remove C2.
The probability of conversion without C3 is 0.165, giving a removal effect of C3 10.165/0.497 = 0.66 making it a more important channel than C1.
How much should you invest in each of the channels?
With the above calculations, we can prove how important each of the channels is.
But how can we divide our budget for each of the channels?
To do that, we need to weight the indexes and multiply them by the total number of conversions (1 in our case):
This is the formula: Cn=RemovalEffectOfCn/SUM of all RemovalEffects

 C1:0.497 / (0.497+ 1 +0.66) =0.23 * 1 conversion =0.23 > 23%

 C2: 1 /(0.497+ 1 +0.66)=0.46 * 1 conversion = 0.47 > 47%
 C3: 0.66 /(0.497+ 1 +0.66) = 0.4 * 1 conversion = 0.3 >30%
Therefore, we distributed 1 conversion for all channels.
Scaling the multichanel model
Once we understand the concept behind the model we can start implementing it in real life example.
Luckily, there is already R package (ChannelAttribution) that can help us with most of the hard work.
In the next article, I’ll guide you with the technical implementation.
[…] is a good example of exploratory predictive analytics and how to solve […]