In the last two years, the rate at which data generates has gone significantly higher. Our options to learn, create, explore are limitless on the internet. However; anything in abundance creates a problem, so the data. It creates the paradox of choices. Because we have so many choices, chances are we will end up choosing the wrong one, that’s where the recommendation systems comes in.
What if the internet suggests you the things you may like and be accurate about it, sounds amazing, right? In this recommendation systems python tutorial, you will learn how this thing works and what are the different approaches.
The formal definition of recommendation systems would be, it is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item. Recommendation systems have become increasingly popular in recent years and are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags, and products in general.
Two Different Approaches
These Recommendation systems basically generates recommendations in one of two ways, Collaborative Filtering and Content-based Filtering. Both approaches have their own advantages and disadvantages but before that let’s understand how these approaches work.
Collaborative filtering approaches build a model from a user’s past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in.
To exemplify, let’s take the example of last.fm, popular music recommender system. It creates a station of recommended songs by perceiving what artists and bands the user has listened to on a day-to-day basis and comparing the same with the listening behavior of other users. As this approach influence the behavior of users, it is a perfect example of a collaborative filtering technique.
However; each system has its own strength and weakness. In this example, last.fm requires a large amount of information to make right and accurate recommendations. This is a problem called Cold start problem, which is common in collaborative filtering.
Content-based filtering approaches utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties. To do this, keywords are used to describe the items and a user profile is built to indicate the type of item this user likes. In simple words, these algorithms try to recommend items that are similar to those that a user liked in the past.
For example, Pandora, another music recommendation platform. It only uses characteristics of a song or artist to create a station that basically plays music with similar properties. User feedback is also used to redefine the results. When a user dislikes a particular song, emphasis less on some attributes related to that song. Accordingly, when a user likes a particular song, emphasis more on the attributes of that song and show more songs like that in the future.
However; just like collaborative filtering, content-based too have some limitations. Although it needs very little information to start with, the scope of content-based filtering is limited.
A new approach, Hybrid Recommendation Systems
Recent research has demonstrated that the hybrid approach, combining both collaborative and content-based filtering, could be more effective in some cases. There are several approaches by which you can do this,
- Making both predictions separately and then combining them
- Adding content-based to collaborative approach
- Adding collaborative to content-based approach
- Unifying approaches into one model
Netflix, as we all know, is a robust example of hybrid recommendation systems. It recommends watching and searching habits of similar users, which is collaborative filtering. Also, it recommends movies that share the same characteristics the used has highly rated, which is content-based filtering.
Recommendation system using python
Now that you have a basic idea about what a recommendation system is and how it works, building a recommendation system with python is the next thing you want to do. Let’s create our own basic movie recommender system using python.
First, recommender system python code requires dependencies so we start with importing them. Numpy and Scipy will help us do some math while LightFm is the python recommender system library which allows us to perform any popular recommendation algorithms. LightFm is a huge library so we will only fetch modules we need, fetch_movielens will get the data for us while LightFm will later create a model for us.
import numpy as np import pandas as pd from lightfm.datasets import fetch_movielens from lightfm import LightFM
Download Detailed Curriculum and Get Complimentary access to Orientation Session
Time: 11:00 AM to 12:30 PM (IST/GMT +5:30)
Now we will fetch the data form movie lens dataset, which is a huge data set with around 100K movie ratings from 1K users. As an optional parameter, we will take min_rating. That means we are collecting movies with rating 4.0 or higher. And then the method will create an interaction matrix on our CSV file and store in our data variable as a dictionary. Dictionary is a way to store data, just like lists. But however, in dictionaries, you can store anything. We will store it as strings. Our fetch_movielens method will split our data in training and testing as we can retrieve them by keywords ‘training’ and ‘testing’.
data = fetch_movielens(min_rating=4.0) print(repr(data['train'])) print(repr(data['test']))
Then in creating model we will take parameter loss as ‘warp’ which stands for Weighted Approximate-Rank Pairwise. It uses gradient descent to find weights and improve our predictions for recommendations. It is a hybrid system.
#Generate model model = LightFM(loss='warp') #Train model model.fit(data['train'], epochs=30, num_threads=2)
This function is our recommendation function which ranks each movie based on rating and at last with the help of numpy sort them in descending order to see which movie is the highest recommendation for the user.
def sample_reco(model, data, user_ids): #no of users and movies in training data n_users, n_items = data['train'].shape for user_id in user_ids: #movies they already like known_positive = data['item_labels'][data['train'].tocsr()[user_id].indices] #movies our model predicted scores = model.predict(user_id, np.arange(n_items)) #rank movies top_items = data['item_labels'][np.argsort(-scores)] #resulets print('\n') print("User %s" % user_id) print("known positives:") for x in known_positive[:3]: print('\n %s' % x) print("Recomended:") for x in top_items[:3]: print('\n %s' % x) sample_reco(model, data, [1,2,3])
In a nutshell, recommendation systems help us make decisions by learning our preferences. With new approaches like Hybrid recommendations, we are getting better at this game everyday. I provided you everything you need to get started with recommendation systems, grab some data and create one by yourself.