Attend FREE Webinar on Digital Marketing for Career & Business Growth Register Now

Data Analytics Blog

Data Analytics Case Studies, WhyTos, HowTos, Interviews, News, Events, Jobs and more...

8 Popular Types of Data Visualizations in Python

5 (100%) 4 votes

Introduction

Data Visualization turns data into images that nearly anyone can understand making them invaluable for explaining the significance of digits to people who are more visually oriented

~Jonsen Carmack

 

Visualization

Not everytime the numbers will sound meaningful to people working with data. This is where Data Visualization comes in. It is a technique of encoding those numbers into images which can be much more helpful to gain meaningful insights. It is one of the essential steps in every Data Science process.

But, Do not get upset if Data Visualization is a new term for you. We’ll talk about Data Visualization in Python throughout this blog.

If you are a Beginner in Python, I recommend you to please refer to this blog before proceeding further, in case you haven’t :).

Why Python for Data Visualization?

8 Popular Ways To Perform Data Visualization in Python

Image: http://www.programmingbuddy.club/2017/02/udemy-python-for-data-analysis-and.html

Though there are lots of tools available for Data Visualization, Python has few best libraries that makes Python Visualization easy for any dataset. These libraries makes Python Visualization affordable for large and small datasets.  There are several courses available on the internet that just focuses on Data Visualization with Python and especially with Matplotlib. Matplotlib is very useful to create and present Python Visualization.

Popular Libraries For Data Visualization in Python:

Some of the most popular Libraries for Python Data Visualizations are:

  1. Matplotlib
  2. Seaborn
  3. Pandas
  4. Plotly
  • and many more

Further, We’ll create different types of Python Visualizations using these libraries.

Types of Python Visualization:

Let us explore different types of techniques for python visualization. we’ll use a jupyter notebook with python for writing all the codes.

Visualization

 

First, we’ll import Python Visualization Libraries using following code.

Import all necessary libraries

Remeber, %matplotlib inline is only for jupyter notebooks, if you are using another editor, you’ll use: plt.show() at the end of all your plotting commands to have the figure pop up in another window.

Now, we’ll import an inbuilt iris dataset from Seaborn library which will be used to create various Python Visualization.

Iris Dataset from Seaborn library

Now, we’ll use this dataset to create various Python Visualization.

1.) Scatterplot:

This is used to find a relationship in a bivariate data. It is most commonly used to find correlations between two continuous variables. Here, we’ll see scatter plot for Petal Length and Petal Width using matplotlib.

 

Scatterplot using Matplotlib

We can notice that the relationship between the two variables is linear and positive.

We used plt.title to add a title to our post, plt.xlabel to add a label for x-axis and similarly plt.ylabel to add a label for the y-axis. There are plenty of such options which can be useful for adding/modifying plots. you can refer the matplotlib documentation for a complete guide.

2.) Histogram:

Histogram shows the distribution of a continuous variable.  It can discover the frequency distribution for a single variable in a univariate analysis.

Here we’ll plot a histogram for sepal width to check it’s frequency distribution.

Histogram

Histogram using Matplotlib

We observe that the distribution is normally distributed. bins is used to divide the entire range of values into a series of intervals.

3.) Bar Chart:

Bar Chart or Bar Plot is used to represent categorical data with vertical or horizontal bars. It is a general plot that allows you to aggregate the categorical data based off some function, by default the mean. 

Here we’ll plot a Bar Chart for the three Species with Sepal Length using Seaborn.

Bar Chart

Bar Chart using Seaborn

We can notice that the y-axis is mean of Sepal Length for the three classes of Species namely Setosa, Versicolor, and Virginica.  Also, the three bars have different colors which represent each of the species uniquely.

4.) Pie Chart:

Pie Chart is a type of plot which is used to represent the proportion of each category in a categorical data. The whole pie is divided into slices which are equal to the number of categories.

Pie Chart

Pie Chart using Matplotlib

The three slices in the above chart represent three categories of species. we have used explode to separate the three slices. Similar to a histogram, The three slices have different colors which represent each of the categories uniquely.

5.) Countplot:

Countplot is similar to a bar plot except that we only pass the X-axis and Y-axis represents explicitly counting the number of occurrences.Each bar represents count for each category of species.

Here, we’ll plot Countplot for three categories of species using Seaborn.

Countplot

Countplot using SNS

We can observe that the three bars represent the count for the three categories of species.

6.) Boxplot:

Boxplot is used to show the distribution of a variable. The box plot is a standardized way of displaying the distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum.

Here, we’ll plot a Boxplot for checking the distribution of Sepal Length.

Also, A box plot shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable.

Here, we’ll plot Boxplot to compare the distribution of Sepal Length for each level of Species.

We can also plot a Boxplot for the entire dataset with Horizontal orientation.

So, we can observe that all the plots represent the distribution of dataset with four quartiles. Also, it represents the maximum and minimum value. While the dots outside the plot represent outliers.

7.) Heatmap:

Heatmap is a type of Matrix plot that allows you to plot data as color-encoded matrices. It is mostly used to find multi-colinearity in a dataset.

To plot a heatmap, your data should already be in a matrix form, the heatmap basically just colors it in for you.

Here, we’ll plot a heatmap to find the correlation between variables of iris dataset. First, we’ll create a correlation matrix for iris dataset.

Correlation Matrix

Correlation Matrix for iris dataset

Now, we’ll plot the heatmap for above correlation matrix.

Here, we can observe that the correlation is shown with color-coded matrices. The value of correlation ranging from 0 to 1. cmap is used to change the color codings and annot is used to display the value of correlation in the plot.

Data Analytics Course by Digital Vidya

Free Data Analytics Webinar

Date: 22nd Feb, 2018 (Thursday)
Time: 3 PM to 4 PM (IST/GMT +5:30)

8.) Distplot:

The Distplot shows the distribution of a univariate data

Here, we’ll use Distplot to check distribution for Sepal Width.

So, we can observe that the distribution is normal. Also, to remove the distribution layer we can use kde = False

9.) Jointplot:

Jointplot is used to represent the distribution of one variable to match up with the distribution of another variable. To be more specific, Jointplot allows you to basically match up two Distplots for bivariate data.

Here, we’ll plot a Jointplot for petal length and sepal length.

Grids, Style, and Color

Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features.

First, we’ll create a subplot grid for plotting pairwise relationships in a dataset using pairgrid. Then we’ll map the pairwise relationship to those grids.

PairGrid

Grids_map

Here, sns.PairGrid() will create a pairwise grid of variables in a dataset and the map function will map the relationship among variables to those grids.

Also, we can use map.upper, map.lower, map.diag to map different type of relationship for upper, lower and diagonal pairs.

Now, we’ll see how to control figure aesthetics in seaborn briefly.

we’ll see how we can change the grid style or color using seaborn.

There are five preset seaborn themes: darkgrid, whitegrid, dark, white, and ticks. darkgrid is the default for Seaborn. For all the plots above, we have used white grid style Set defaults using sns.set().

 

Default Grid Style

Default Grid Style (darkgrid)

 

Also, We can change the grid style in seaborn using sns.set_style().

 

you should try with different grid options available in Seaborn and notice changes in the grid style.

Similar to grids, we have control on spines i.e. borders of a plot using Seaborn. We can remove spines from a graph as per our requirement.

So, sns.despine() will remove borders from top and right side of the figure. Further, we can also remove border from the left as well as bottom using the argument, left= True & bottom= True.

We can use matplotlib’s plt.figure(figsize=(width,height) to change the size of most seaborn plots. Also,  can control the size and aspect ratio of the plots by passing in parameters: size, and aspect. 

Now, Let’s have a look at an example.

Changing size of a plot

Changing size of a plot

So, we can see that the Width and Height of the plot have changed according to the parameters passed.  For some of the plots, we can also pass these parameters inside the sns.

For example:

The set_context() allows you to override default parameters in order to scale the plot:


Conclusions:

Hence, we have covered most of the basics of Python Visualization using seaborn and matplotlib. I hope this article will give you a head start for diving into Python Visualization. Also, You can refer the official documentation for Matplotlib and Seaborn for further reference and deep understandings.

  • Data-Analytics

  • Your Comment

    Your email address will not be published.