Effective product recommendation systems are key to profitability of any e-commerce store. Usually these are based on simple rules, but AI is increasingly used to make e-commerce recommendations better, leading to more conversions and bigger sale values.
Within the traditional approach to building recommendation engines supported by AI (or more specifically by the Machine Learning branch of it), we gather historical data, build a model, and then apply that model to generate recommendations for new customers.
In this process we could choose from within a range of models like Collaborative Filtering or Content-based filtering. Still, all the same, we follow the fundamental, underlying assumption that the future will be to some extent similar to the past. This is the paradigm of all models that fall into the category of Machine Learning models.
Although this assumption is never fully true, it generally works satisfactorily well. The implementation of Collaborative Filtering by Amazon paved the way for broad adoption of the approach in the industry and became a standard in the market. We could benefit from rich long-term historical data and the traditional approach simplicity for a relatively stable environment.
However, will it always work well? Unfortunately not. The main motto of the Shopify 2022 Ecommerce conference is:
“The only constant in commerce is change.”
Within this post, we will discuss the more potent emerging approach for building the recommendation engines drawn on the subcategory of Reinforcement Learning, Multi-Armed Bandit, and Contextual Bandit to handle the above challenge.
The challenges to recommendations in e-commerce
Quite often, we face a shifting environment or even abrupt changes, including but not limited to:
- Dynamic changes in the products offer – some products might not be available for sale or have low priority in the current sales target.
- The introduction of new products (we face a ‘cold start problem’ – we don’t have data for new products to assess their popularity).
- Changes in the website content, for example, when providing content for a blog or personalised content. One way of generating leads and boosting sales is to publish catchy articles to attract customers. As the content of articles differs, you will attract a different group of clients – we need to adapt our recommendations to the content.
- Seasonality – We could have products sold only within a particular season, products that have better conversion in certain seasons, and seasonally invariant products. The dynamic recommendations could be used to smoothly adapt to seasonal changes within the first two categories mentioned above.
- Changes in product popularity, including new trends and the emergence of market hits – similarly to changes in products offered, we need to adapt to a new market setting.
- Changes in customer preferences and behaviour.
Within the above cases, the traditional assumption that the “future will be the same as the past” won’t hold even in proximity.
Our models won’t work as we expected.
Introducing new products for online sale
Let’s discuss an example. Say, we operate an online e-commerce platform and work with many merchant partners. Our product offer changes daily, or even hourly, and our task is to choose the best products to be promoted at any given time (and to a given customer). For simplicity, let’s also assume we deal with a new, unidentifiable customer, so we can’t rely on the customer’s individual characteristics. We will relax this assumption later.
Displaying all of the items frequently enough to establish reliable conversion rates and then choosing the best performing items is not a visible solution in this case. It’s too costly, and we have too many products to choose from to show them all.
This is like looking for a needle in a haystack
So how could we approach the challenge?
Based on our existing knowledge, we could offer one or a few of the best-selling products, but what if other products sell better? We will not know unless we try.
This is the classic exploit vs explore problem.
How can we try? We could use the rule of thumb – checking the alternatives from time to time, or maybe we could always show new products for a certain amount of time, and then stick to the best performing.
So, what strategy should you choose to get more revenue? There are a host of scientific methods that answer this question.
Reinforcement Learning and AlphaGo
Do you remember DeepMind’s AlphaGo program beating the Go champion Lee Sedol in 2016? Or have you seen Boston Dynamics robots in action?
These are the most astonishing technological achievements of Reinforcement Learning. These programs act in a dynamically changing environment and quickly adapt to the changes.
Reinforcement Learning is a part of Artificial Intelligence where we actively interact with the environment and learn how to choose the best action by trial and error.
Wouldn’t it be great to implement the same approach in e-commerce to handle changes in our environment?
Yes, it would. There is an issue, though.
Why was AlphaGo so effective?
AlphaGo was so successful in large part because it could produce the data itself, and quickly test it in an endless fast-flowing loop. With almost endless simulations, it creates different board configurations and plays with itself.
Could we do the same? Unfortunately, No. We are winning the satisfaction of our clients, not the abstract construct of the game. Instead of a well-defined reward, we have rewards depending on the unknown preferences of our customers. Preferences that we need to estimate from the data.
(I’m not quite sure if this explanation is sufficient; in the GO game, we play the adversary game with the well-defined reward function. The reward is determined by losing or winning the game. In the case of customer interaction, customers do not act as an adversary, and we don’t know their reward function.)
What we could do is to put constraints on the problem formulation. More constraints means less complex problem representation. Less complex problem representation requires fewer parameters and fewer parameters mean smaller data needed to estimate them.
The natural assumption that we could make is that our interaction with the individual customer does not change the behaviour of the customer population as a whole – customers’ preferences are not changing due to individual interaction.
In the case of customers arriving on our website, we are assuming that they come from the same population no matter how we interacted with the previous customers. We could, though, observe the behaviour of our customers and use this knowledge to take the best action and provide adequate recommendations for the future visitors.
To simplify the representation of the problem even more, we usually assume a single-move perspective as opposed to a sequence of decisions and interactions. It greatly reduces the number of parameters required for the estimation as they explode exponentially with the number of steps within the sequence, while maintaining high accuracy and general performance of the predictive model.
Multi-Armed Bandits
Multi-Armed Bandits (MABs) explore a trial-and-error approach. MABs could be thought of as dynamic, sequential A/B testing, that give a scientifically sound answer to the question: “What products should you display to maximise your revenue?”
We try to find a strategy that balances exploitation vs exploration. In Reinforcement Learning terminology, monetizing knowledge and choosing the most beneficial action at the time is called exploitation. When we display our best-selling products, it’s exploitation; when we try to experiment with the products of unknown performance, it’s exploration. We could also look at this from the perspective of the trade-off between long- and short-term benefits. Exploration means investing now to harvest benefits in the future.
In the traditional approach for building the recommender, we do not influence the process of gathering data. We collect data and then build or update a model when new data arrives, so we are not planning ahead on how to collect data.
In the MABs approach, every product display event tries to balance two aims: to get as many ‘clicks’ as possible now (short-term benefits) and to collect the best possible data to re-estimate the model (long-term benefits).
The models are trained (re-estimated) constantly with every ‘click’ and ‘view’ (for the most deployment, batch processing of ‘clicks’ and ‘views’ would be sufficient).
The most prominent applications of MAB in e-commerce include Yahoo, Netflix, etc.
Contextual Bandits
Multi-Armed Bandits work amazingly well in many business applications, but we could leverage and further improve their effectiveness by including information about products’ and customers’ characteristics.
Contextual Bandit combines two areas of AI: Reinforcement Learning and Machine Learning. We use Reinforcement Learning to make the most of the error and trial (exploration) approach and balance long-term and short-term benefits. We use a machine learning model to condition our actions on additional information (products and customer characteristics).
‘Full’ Reinforcement Learning in e-commerce
Are there cases in e-commerce where we could relax the previously discussed assumption – about the visitor being anonymous for the first time, and so not having any information about their demographics – and go for full reinforcement learning problem formulation?
Yes, there are.
One example could be modelling and optimising customers’ paths by getting into interaction as a customer is traversing through our webpage, trying to influence their trajectory in a way that could bring more revenue.
In this case, broadly speaking, our sequence of moves in the Go board transforms into a path of customer interactions. We know the algorithms that solve the problem but gathering the data will be a challenge and will take time to bear full fruit.
Closing remark
The recommendation systems took a long time from the first application in e-commerce to reach the current state-of-the-art solutions.
We have a wide range of fundamentally different approaches and even more hybrid ones. The recommendation systems have become ubiquitous in e-commerce, and the best practice is to look at your actual needs and the specificity of your data to opt for the most effective solution.
This post gives a brief introduction to the recommendation systems that, in many cases, strongly outperform alternative approaches.
The effort is considerable, but the rewards are well worth it. Every increase in your conversions translates into pure profits, and – with the costs of bringing a visitor to your site already committed – the maths on this is exceptionally favourable.
A good recommendation system can be worth a fortune and redefine your business performance
And the effort, cost and time to deployment have never been lower than with our dedicated AdBrain Profitable E-Commerce AI Platform.
If you care about maximising your profits and want to explore how you can use our AdBrain Profitable E-Commerce AI Platform to your advantage, visit our E-Commerce AI Solutions page to learn more, or Contact us to discuss your business with our friendly and professional team.
Do not wait.
Bigger profits are just a few clicks away!