Attend FREE Webinar on Digital Marketing for Career & Business Growth Register Now

Data Analytics Blog

Data Analytics Case Studies, WhyTos, HowTos, Interviews, News, Events, Jobs and more...

Python for Data Science Cheat Sheet – A Beginners Guide

4.6 (92.9%) 62 votes

I am pretty sure whenever you would have tried to teach yourself Data Science, You must have felt completely exhausted and there is nothing wrong in that as Data Science is ever growing field with thousands of packages and hundreds of programming functions out there in the data science world! 

But Don’t you worry, don’t you worry data science enthusiast. See I have got a plan for you 😉 .

I am going to help you to get started with Data Science using Python in this beginners guide. An awesome powerful high-level, object-oriented programming language preferred by Data Scientist.In this cheat sheet, we’ll summarize some of the most common and useful functionality.

Python for Data Science Cheat Sheet

1.) Getting Started With Python

Why Do Data Scientists Love Coding in Python? 

Many of you must be wondering why most of data scientist love coding in Python? Snake lover, maybe? 😀

Python Image

Okay, before you start judging me with my sense of humour lets go through the actual reasons which justify their love for Python:

  • Large active online community.
  • Open-Source.
  • A large variety of Data Analytics/Data Science Libraries as compared to others.
  • Python is easy to learn.
  • IPython-Notebook  This is the most awesome interactive computational environment.You can run multiple lines of code in different cells and can even get your results appear underneath the cell and also provide a lot of good features for documenting during coding itself.

How to Install Python?

For hassle-free installation, I would recommend you to download Anaconda (supports macOS, windows, Linux)  which comes with pre-installed libraries used in Python. After downloading and installing it go to anaconda navigator and there you will see many IDEs. I personally prefer Jupyter Notebook(IPython-Notebook) and you already know all the reasons.

We will be using Jupyter Notebook in this complete tutorial.

Jupyter Get Along Tips:

  • You can start Jupyter Notebook also by writing “Jupyter Notebook” on your terminal/command line, depending on the OS you are working on.
  • In Jupyter Notebook In [*] stands for inputs and Out[*]  stands for output.
  • To execute a code press “Shift + Enter”.

2.) Python Basic Cheatsheet

This Python basic cheat sheet will guide you through variables, datatypes, comparison operators, range and other things which you should know before proceeding to next steps.

Basics and Getting Help

name_of_variable = 2 |Variables name cannot start with number or special characters

x = 2| Assigning number value 2 to variable named x

print(x)|Prints the value of x

dtype(x)| Returns datatype of variable x (int(integer) in this case)

help(x) | Show documentation for the str data type

help(print) | Show documentation for the print( ) function

Mathematical Operation


string is usually a bit of text you want to display to someone, or “export” out of the program you are writing.Python knows you want something to be a string when you put either ” (double-quotes) or ‘ (single-quotes) around the text.


The list is a most versatile datatype available in Python which can be written as a list of comma-separated values (items) between square brackets.

An important thing about a list is that items in a list need not be of the same type.


Each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly braces, like this: {}.

Keys are unique within a dictionary while values may not be. The values of a dictionary can be of any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.


tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets. Creating a tuple is as simple as putting different comma-separated values.

Boolean Comparisons

3.) Data Importing, Munging  Exploratory Data Analysis

Pandas, Numpy, and Scikit-Learn are among the most popular libraries for data science and analysis with Python. Numpy is used for lower level scientific computation. Pandas is built on top of Numpy and designed for practical data analysis in Python. Scikit-Learn comes with many machine learning models and there are different algorithms for a different type of learning and problem statement.Machine learning is the vast and intermediate topic and it is difficult to cover every algorithm in the same blog and hence I will give you the template for it and rest you can try yourself from the  Scikit-Learn documentation.

Importing Data

The first and foremost step is importing various kind of data to jupyter notebook before getting started with data analysis and prediction/classification.Lets now import the data.

Data Exploration

As you have imported data into Pandas data-frame, your next step is to know the nature of data and make sense out of it. This process is also called as EDA(Exploratory Data Analysis) as well.Below are the different methods to make sense out of data.

Data Munging

Data Munging (sometimes referred to as data wrangling) is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for analysis.

Data Scientist spends 80% of their time in data munging and wrangling, moving and transforming it from one format to another.Even if you’re fortunate enough to know where to find it, is almost never in the nicely organized format you need for your analysis.

a)  Joining and Concatenating

There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating. Let’s go through 3 methods with examples.

b) Filter, Sort and Groupby

You can filter, sort and group by data as per your need for analysis purpose.

Exporting Data

When you have produced results with your analysis following are the steps to export your data.

4.) Machine Learning


The Scikit-Learn library contains useful methods for training and applying machine learning models. Using sklearn you can do Preprocessing, Dimensionality Reduction which are initial steps for creating machine learning models and also you can solve Classification, Regression and Clustering problems as well.

Data Analytics Course by Digital Vidya

Free Data Analytics Webinar

Date: 29th Mar, 2018 (Thursday)
Time: 3 PM to 4 PM (IST/GMT +5:30)

Types of ML

All ML tasks can be classified into several categories, the main ones are:

  • Supervised ML
  • Unsupervised ML

Supervised ML relies on data where the true label/class(targeted variable) is indicated. Imagine that we want to teach our model to differentiate pictures of cats and dogs.We need to feed our model with labelled pictures of cats and dogs. So we know the true labels of the pictures and can use this to supervise our algorithm in learning the right way to classify images. Once our models learn how to classify images we can use it on new data and predict labels (‘cat’ or ‘dog’ in this case) on unseen images.

Unsupervised ML means that we don’t have labelled responses as we had in supervised learning.We just feed data to our model without any target/response variables.Consider you don’t have labelled images of cats and dogs. But you still want to cluster this data into 2 categories.In such cases, you can employ unsupervised ML (in this case a technique called clustering) to separate your images in two groups based on some inherent features of the pictures.

To learn and know more about Supervised Learning, Unsupervised Learning, Dataset Transformation and Model Evaluation  go to this awesome user-guide 

Machine Learning Template

You can get the template for machine learning models from SuperDataScience

Data Source

To practice on different types of datasets visit UCI machine learning repository


Here are the sklearn  tutorials to learn machine learning from scratch.

Often the hardest part of solving a machine learning problem can be finding the right estimator/algorithm for the job. Different estimators/algorithms are better suited for different types/sizes of data and different problems. Hence, if at all you get perplexed when to apply what and which estimator/algorithm is suitable for particular size of data &  problem then the cheat sheet  from the sklearn documentation is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data.


We’ve covered the basics of python and libraries which you should know in Data Science but there are lots of things which still we can do with Python and Data Science. I hope this article has given enough to get you started with data science. Always remember  All Data Enthusiast must practice. 

Photo Credits:

Sanjay is Data Scientist at Camelport Logistics.He is a firm believer of ‘In God we trust, all others must bring data’.Strongly believes MOOCs is basic right.

He is a curious person and his very nature led him to this field.Previously he has done a various case study of different startups like Oyo, Snapdeal, Ola, Redbus, Lenskart and Shopclues and even learned digital marketing while doing Entrepreneurship program at Upgrad. He loves narrating the stories and disagrees with anyone who believes Elon Musk is not God.

  • Data-Analytics

  • There are 3 comments

    • 3 months ago

      Neelam   /   Reply

      It is very helpful. Thanks for this blog Mr. Sanjay Pandey..

    • 3 months ago

      Vidhi Thakkar   /   Reply

      Such a nice gist to Python, very well explained!!

    • 3 months ago

      Ranjana Pandey   /   Reply

      Very Informative Post for beginners. Thanks for posting.

    Your Comment

    Your email address will not be published.