Attend FREE Webinar on Digital Marketing for Career & Business Growth Register Now

Data Analytics Blog

Data Analytics Case Studies, WhyTos, HowTos, Interviews, News, Events, Jobs and more...

Introduction to Recommendation Systems in Python

5 (100%) 5 votes

Introduction

In the last two years, the rate at which data generates has gone significantly higher. Our options to learn, create, explore are limitless on the internet. However; anything in abundance creates a problem, so the data. It creates the paradox of choices. Because we have so much choices, chances are we will end up choosing the wrong one, that’s where recommendation system comes in. What if the internet suggests you the things you may like and be accurate about it, sounds amazing, right? In this recommendation system python tutorial you will learn how this thing work and what are the different approaches.

The formal definition of recommendation systems would be, it is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item. Recommendation systems have become increasingly popular in recent years and are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags, and products in general.

Two different Approaches

These Recommendation systems basically generates recommendations in one of two ways, Collaborative Filtering and Content-based Filtering. Both approaches have its own advantages and disadvantages but before that let’s understand how these approaches work.

Collaborative

Collaborative filtering approaches build a model from a user’s past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in.

To exemplify, let’s take example of last.fm, popular music recommender system. It creates station of recommended songs by perceiving what artist and bands the user has listened to on day-to-day basis and comparing the same with listening behavior of other users. As this approach influence the behavior of users, it is a perfect example of a collaborative filtering technique.

However; each system has its own strength and weakness. In this example, last.fm requires large amount of information to make right and accurate recommendations. This is a problem called Cold start problem, which is common in collaborative filtering.

Content-base

Content-based filtering approaches utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties. To do this, keywords are used to describe the items and a user profile is built to indicate the type of item this user likes. In simple words, these algorithms try to recommend items that are similar to those that a user liked in the past.

For example, Pandora, another music recommendation platform. It only uses characteristics of a song or artist to create a station that basically plays music with similar properties. User feedback is also used to redefine the results. When a user dislikes a particular song, emphasis less on some attributes related to that song. Accordingly, when a user like a particular song, emphasis more on the attributes of that song and show more songs like that in future.

However; just like the collaborative filtering, content-based too have some limitations. Although it needs very little information to start with, the scope of content-based filtering is limited.

A new approach, Hybrid Recommendation Systems

Recent research has demonstrated that the hybrid approach, combining both collaborative and content-based filtering, could be more effective in some cases. There are several approaches by which you can do this,

  • Making both predictions separately and then combining them
  • Adding content-based to collaborative approach
  • Adding collaborative to content-based approach
  • Unifying approaches into one model

Netflix, as we all know, is the robust example of hybrid recommendation systems. It recommends watching and searching habits of similar users, which is collaborative filtering. Also, it recommends movies that share same characteristics the used has highly rated, which is content-based filtering.

Recommendation system using python

Now that you have basic idea about what a recommendation system is and how it works, building a recommendation system with python is the next thing you want to do. Let’s create our own basic movie recommender system using python.

First, recomender system python code requires dependencies so we start with importing them. Numpy and Scipy will help us do some math while LightFm is the python recommender system library which allows us to perform any popular recommendation algorithms. LightFm is a huge library so we will only fetch modules we need, fetch_movielens will get the data for us while LightFm will later create a model for us.

import numpy as np
import pandas as pd
from lightfm.datasets import fetch_movielens
from lightfm import LightFM

Data Analytics Course by Digital Vidya

Free Data Analytics Webinar

Date: 25th Oct, 2018 (Thursday)
Time: 3 PM to 4 PM (IST/GMT +5:30)

Now we will fetch the data form movie lens dataset, which is a huge data set with around 100K movie ratings from 1K users. As an optional parameter, we will take min_rating. That means we are collecting movies with rating 4.0 or higher. And then the method will create an interaction matrix on our csv file and store in our data variable as a dictionary. Dictionary is a way to store data, just like lists. But however, in dictionaries you can store anything. We will store it as strings. Our fetch_movielens method will split our data in traing and testing ans we can retrive them by keywords ‘training’ and ‘testing’.

data = fetch_movielens(min_rating=4.0)

print(repr(data['train']))

print(repr(data['test']))

Then in creating model we will take parameter loss as ‘warp’ which stands for Weighted Approximate-Rank Pairwise. It uses gradient descent to find weights and improve our predictions for recommendations. It is a hybrid system.

#Generate model

model = LightFM(loss='warp')

 #Train model

model.fit(data['train'], epochs=30, num_threads=2)

This function is our recommendation function which ranks each movies based on rating and at last with help of numpy sort them in descending order to see which movie is the highest recommendation for the user.

def sample_reco(model, data, user_ids):

            #no of users and movies in training data

            n_users, n_items = data['train'].shape

            for user_id in user_ids:

                        #movies they already like

                        known_positive = data['item_labels'][data['train'].tocsr()[user_id].indices]

                        #movies our model predicted

                        scores = model.predict(user_id, np.arange(n_items))

                        #rank movies

                        top_items = data['item_labels'][np.argsort(-scores)]

                        #resulets

                        print('\n')

                        print("User %s" % user_id)

                        print("known positives:")

                                    for x in known_positive[:3]:

                                                print('\n %s' % x)

print("Recomended:")

for x in top_items[:3]:

                                                print('\n %s' % x)

sample_reco(model, data, [1,2,3])

End notes

In a nutshell, recommendation systems help us make decisions by learning our preferences. With new approaches like Hybrid recommendations, we are getting better at this game everyday. I provided you everything you need to get started with recommendation systems, grab some data and create one by yourself.

Happy learning.

Guest Blogger (Data Science) at Digital Vidya. A Data passionate who loves reading and diving deeper into the Machine Learning and Data Science arts. Always eager to learn about new research and new ways to solve problems using ML and AI.

  • Data-Analytics


  • There is 1 comment


    Your Comment

    Your email address will not be published.