Python for Data Science Cheat Sheet – A Beginners Guide

8 Min Read. |

I am pretty sure whenever you would have tried to teach yourself Data Science, You must have felt completely exhausted and there is nothing wrong in that as Data Science is ever-growing field with thousands of packages and hundreds of programming functions out there in the data science world! 

But Don’t you worry, don’t you worry data science enthusiast. See I have got a plan for you;).

I am going to help you to get started with Data Science using Python in this beginner’s guide. An awesome powerful high-level, object-oriented programming language preferred by Data Scientist. In this cheat sheet, we’ll summarize some of the most common and useful functionality.

Python for Data Science Cheat Sheet

1.) Getting Started With Python

Why Do Data Scientists Love Coding in Python? 

Many of you must be wondering why most data scientist love coding in Python? Snake lover, maybe? 😀

Python Image

Okay, before you start judging me with my sense of humour lets go through the actual reasons which justify their love for Python:

  • Large active online community.
  • Open-Source.
  • A large variety of Data Analytics/Data Science Libraries as compared to others.
  • Python is easy to learn.
  • IPython-Notebook  This is the most awesome interactive computational environment.You can run multiple lines of code in different cells and can even get your results appear underneath the cell and also provide a lot of good features for documenting during coding itself.

How to Install Python?

For hassle-free installation, I would recommend you to download the Anaconda (supports macOS, windows, Linux)  which comes with pre-installed libraries used in Python. After downloading and installing it go to anaconda navigator and there you will see many IDEs. I personally prefer Jupyter Notebook(IPython-Notebook) and you already know all the reasons.

We will be using Jupyter Notebook in this complete tutorial.

Jupyter Get Along Tips:

  • You can start Jupyter Notebook also by writing “Jupyter Notebook” on your terminal/command line, depending on the OS you are working on.
  • In Jupyter Notebook In [*] stands for inputs and Out[*]  stands for output.
  • To execute a code press “Shift + Enter”.

2.) Python Basic Cheatsheet

This Python basic cheat sheet will guide you through variables, data types, comparison operators, range and other things which you should know before proceeding to the next steps.

Basics and Getting Help

name_of_variable = 2 |Variables name cannot start with number or special characters

x = 2| Assigning number value 2 to variable named x

print(x)|Prints the value of x

dtype(x)| Returns datatype of variable x (int(integer) in this case)

help(x) | Show documentation for the str data type

help(print) | Show documentation for the print( ) function

Mathematical Operation


string is usually a bit of text you want to display to someone, or “export” out of the program you are writing. Python knows you want something to be a string when you put either ” (double-quotes) or ‘ (single-quotes) around the text.


The list is a most versatile datatype available in Python which can be written as a list of comma-separated values (items) between square brackets.

An important thing about a list is that items in a list need not be of the same type.


Each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly braces, like this: {}.

Keys are unique within a dictionary while values may not be. The values of a dictionary can be of any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.


tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets. Creating a tuple is as simple as putting different comma-separated values.

Boolean Comparisons

3.) Data Importing, Munging  Exploratory Data Analysis

Pandas, Numpy, and Scikit-Learn are among the most popular libraries for data science and analysis with Python. Numpy is used for lower-level scientific computation. Pandas is built on top of Numpy and designed for practical data analysis in Python. Scikit-Learn comes with many machine learning models and there are different algorithms for a different type of learning and problem statement. Machine learning is the vast and intermediate topic and it is difficult to cover every algorithm in the same blog and hence I will give you the template for it and rest you can try yourself from the  Scikit-Learn documentation.

Importing Data

The first and foremost step is importing various kinds of data to jupyter notebook before getting started with data analysis and prediction/classification. Lets now import the data.

Data Exploration

As you have imported data into Pandas data-frame, your next step is to know the nature of data and make sense out of it. This process is also called as EDA(Exploratory Data Analysis) as well.Below are the different methods to make sense out of data.

Data Munging

Data Munging (sometimes referred to as data wrangling) is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for analysis.

Data Scientist spends 80% of their time in data munging and wrangling, moving and transforming it from one format to another. Even if you’re fortunate enough to know where to find it, is almost never in the nicely organized format you need for your analysis.

a)  Joining and Concatenating

There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating. Let’s go through 3 methods with examples.

b) Filter, Sort and Groupby

You can filter, sort and group by data as per your need for analysis purpose.

Exporting Data

When you have produced results with your analysis following are the steps to export your data.

4.) Machine Learning


The Scikit-Learn library contains useful methods for training and applying machine learning models. Using sklearn you can do Preprocessing, Dimensionality Reduction which are initial steps for creating machine learning models and also you can solve Classification, Regression and Clustering problems as well.

Data Analytics Course by Digital Vidya

Free Data Analytics Webinar

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
python-for-data-science-cheatsheet, CCBot/2.0 (

Types of ML

All ML tasks can be classified into several categories, the main ones are:

  • Supervised ML
  • Unsupervised ML

Supervised ML relies on data where the true label/class(targeted variable) is indicated. Imagine that we want to teach our model to differentiate pictures of cats and dogs. We need to feed our model with labeled pictures of cats and dogs. So we know the true labels of the pictures and can use this to supervise our algorithm in learning the right way to classify images. Once our models learn how to classify images we can use it on new data and predict labels (‘cat’ or ‘dog’ in this case) on unseen images.

Unsupervised ML means that we don’t have labeled responses as we had in supervised learning. We just feed data to our model without any target/response variables. Consider you don’t have labelled images of cats and dogs. But you still want to cluster this data into 2 categories. In such cases, you can employ unsupervised ML (in this case a technique called clustering) to separate your images into two groups based on some inherent features of the pictures.

To learn and know more about Supervised Learning, Unsupervised Learning, Dataset Transformation and Model Evaluation  go to this awesome user-guide 

Machine Learning Template

You can get the template for machine learning models from SuperDataScience

Data Source

To practice on different types of datasets visit UCI machine learning repository


Here are the sklearn tutorials to learn machine learning from scratch.

Often the hardest part of solving a machine learning problem can be finding the right estimator/algorithm for the job. Different estimators/algorithms are better suited for different types/sizes of data and different problems. Hence, if at all you get perplexed when to apply what and which estimator/algorithm is suitable for particular size of data &  problem then the cheat sheet  from the sklearn documentation is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data.


We’ve covered the basics of python and libraries which you should know in Data Science but there are lots of things which still we can do with Python Data Science Course. I hope this article has given enough to get you started with data science. Always remember  All Data Enthusiast must practice. 

Photo Credits:

Register for FREE Digital Marketing Orientation Class
Date: 27th Jan, 2021 (Wed)
Time: 3:00 PM to 4:30 PM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.
We are good people. We don't spam.

You May Also Like…

Linear Programming and its Uses

Linear Programming and its Uses

Optimization is the new need of the hour. Everything in this world revolves around the concept of optimization.  It...

An overview of Anomaly Detection

An overview of Anomaly Detection

Companies produce massive amounts of data every day. If this data is processed correctly, it can help the business to...


  1. Neelam

    It is very helpful. Thanks for this blog Mr. Sanjay Pandey..

  2. Vidhi Thakkar

    Such a nice gist to Python, very well explained!!

  3. Ranjana Pandey

    Very Informative Post for beginners. Thanks for posting.


Submit a Comment

Your email address will not be published. Required fields are marked *