Don’t you think descriptive statistics is now a universal language?
Karl Pearson once said Statistics is the grammar of science.
Researchers, scientists, and other statisticians working in the related fields utilize statistical analysis to find answers to various questions. Questions like:
What population of X animal in Y area is below the acceptable limit of healthy?
Or did the fertilizer compound X, Y, or Z impacted the crop A positively?
Finding answers to these typical questions is only possible with the help of statistics, both descriptive statistics and inferential statistics.
However, to find accurate answers and estimations, we need a random sample of quality data collected from the relevant population related to the topic. This data is then summarized and organized with the help of descriptive statistics in excel.
Then, inferential statistics help in utilizing these results and extending it to the population.
Want to know the path to become
a Data Science & Analytics Expert ?
Without a descriptive analysis, inferential analysis is crucial. Hence, in this article, we will discuss what is descriptive statistics, descriptive statistics examples, and other aspects of descriptive statistics.
What is Descriptive Statistics?
Descriptive statistics allow the summarization of a given set of data. This can be a population sample or an entire population’s representation.
Through this type of statistic, you can understand and present the features related to a set of data in the form of summaries.
Check this video to understand descriptive statistics.
To understand what is descriptive statistics, check the descriptive statistics examples below:
1. All the rivers in area X.
2. 100 fish sampled in Y lake.
3. 50 A-type animals in the Z area.
4. All A-type animals in the Z area.
1st and 4th are the entire population’s representation and 2nd and 3rd are the representation of a sample population.
These population characteristics or features are defined with the help of parameters. For example, the mean of population or variance.
Another major factor impacting the population as utilized above is variables. For instance:
1. pH value of the rivers in area X.
2. Weight of A-type animals in the Z area.
We can divide these variable types into two categories: quantitative and qualitative. As the name suggests, quantitative values are generally numeric, and you can define these through measurements like height, weight, age, length, etc. However, qualitative values are attributes, which can’t be numeric, such as gender, color, race, etc.
You can further divide the quantitative variables into two categories, continuous and discrete variables. Continuous variables are defined using infinite numbers such as the measure of milk. However, discrete variables have defined value such as count of eggs.
Usually, descriptive statistics are widely used to simplify the quantitative analysis of a set of data. For example, when you wish to know average passes related to a player in a football match. There are multiple other activities, and descriptive statistics helps in making it simple to extract summary data.
Another great descriptive statistics example is Grade Point Average. It can tell you the summarized performance of a student. While this statistic won’t be able to tell you individual performance in subjects or courses, it can give a summarized overview, which is required and beneficial many times.
Let’s move forward to the types of descriptive statistics and descriptive statistics in excel.
Types of Descriptive Statistics
Distribution
A distribution is nothing but a summary containing the value frequencies of variables. In a simple table for distribution, you will find a value list placed against a number of units or individuals. For instance, defining the percentage marks of each college student or total count of students in all the subject streams.
In the cases described above, the variables are few, which means you can manually list out the distribution and form a table. However, in many cases, variables may be too many, which makes distribution a hectic task. To make it simpler, various scores may be clubbed into groups across value ranges.
Check the descriptive statistics distribution example below:
Score |
Grade |
100-80 |
A |
80-60 |
B |
60-50 |
C |
50-40 |
D |
Below 40 |
F |
Many times the range can also be defined using frequency distribution and percentages. For instance, the percentage of females in a village.
Central Tendency
Central tendency is a term that defines the idea of having one number which can summarize the full data set. Simply put, it is a number that lies in the center of the set.
Here are the three main measures utilized to define what is descriptive statistics through central tendency.
1. Mean
Mean statistics are the most commonly utilized statistics in descriptive statistics excel. For this, you only have to find the average of every value present in the sample range. For example, if you have 10 values, then sum all the values and divide it by 10 to find the mean.
If you have 10 students in class and you need to find the mean of the marks obtained by these students, then you can proceed like this:
58, 59, 46, 48, 60, 45, 55, 59, 43, 53
526 divided by 10
52.6 is the mean.
2. Median
As the name suggests, the median lies in the middle of the range. However, firstly, you need to assign the values in the numerical order and then extract the exact central value for the median.
Now, in the above example, you can get two values at the exact center. What do you do now?
Let’s arrange in numerical order first:
43, 45, 46, 48, 53, 55, 58, 59, 59, 60
53 and 55 are middle values.
Add these two values and divide by 2 (or find the mean of these values) to find the median in this case.
(53 + 55) divided by 2
54 is the median in the above example.
3. Mode
The mode is a value in the range which occurs frequently. You can determine the mode statistics by finding the number which is occurring most times. For this, you again need to arrange the numbers in ascending order which will highlight the most occurring values of the range.
If we again use the above example as reference, then the mode is 59.
43, 45, 46, 48, 53, 55, 58, 59, 59, 60
This is because 59 appears twice in the range and all the other values appear only once. Hence, the mode is 59.
In many cases, you can even find two-mode values. For example:
43, 43, 46, 48, 53, 55, 58, 59, 59, 60
Here, both 43 and 59 are mode values.
Central tendency chart for the above descriptive statistics examples:
Mean |
52.6 |
Median |
54 |
Mode |
59 |
Dispersion
Dispersion is utilized to define how values of the distribution are spread across the central tendency.
Here are the measures utilized to find the dispersion of the central tendency values in the descriptive statistics.
1. Range
The range is the simplest measure in the descriptive statistics in excel for dispersion. You can find the range by subtracting the minimum value from the maximum value.
In the above example, we have the values:
43, 45, 46, 48, 53, 55, 58, 59, 59, 60
60-43 will give you the range.
It is 17 in this case.
Register For a
Free Webinar
Time: 3 PM (IST/GMT +5:30)
2. Variance
The variance is calculated by finding a difference in consecutive values, then adding their square values, and dividing by (n-1). Here, n is the total number of values.
Check the below example to understand how variance is calculated in descriptive statistics.
43, 45, 46 is the range
Calculate (43-45), (45-46), and (46-43).
You will get -2, -1, and 3.
Add their squares, 4, 1, and 9 and divide by 2.
7 is the variance in the above example of descriptive statistics.
3. Standard Deviation
Standard deviation is the most essential part of descriptive statistics because it closely defines the relation of every value to the mean.
Let’s understand standard deviation with an example:
43, 45, 46, 48, 53, 55, 58, 59, 59, 60
Start by subtracting mean from each value.
43 – 52.6 = -9.6
45 – 52.6 = – 7.6
46 – 52.6 = – 6.6
48 – 52.6 = -4.6
53 – 52.6 = 0.4
55 – 52.6 = 2.4
58 – 52.6 = 5.4
59 – 52.6 = 6.4
59 – 52.6 = 6.4
60 – 52.6 = 7.4
Here, every value which is greater to the mean gives positive result and every value smaller to the mean gives negative result.
Now, find square of all the above values.
– 9.6 x – 9.6 = 92.16
– 7.6 x – 7.6 = 57.76
– 6.6 x – 6.6 = 43.56
– 4.6 x – 4.6 = 21.16
0.4 x 0.4 = 0.16
2.4 x 2.4 = 5.76
5.4 x 5.4 = 29.16
6.4 x 6.4 = 40.96
6.4 x 6.4 = 40.96
7.4 x 7.4 = 54.76
Calculate the variance by adding these squares and dividing it by (n-1)
92.16 + 57.76 + 43.56 + 21.16 + 0.16 + 5.76 + 29.16 + 40.96 + 40.96 + 54.76
386.40 divided by 9
42.93 is the variance.
Standard deviation will be the square root of this value.
Standard deviation = 6.55
Standard deviation is necessary for descriptive statistics as it helps in drawing various conclusions on the value we have found.
4. Standard Errors
Usually, when we find a sample mean of the population, it is a random variable. Here’s how:
You need to find the height of the trees, which are 100 years old. So, you have randomly selected 100 100-year old trees and found its sample mean. This means is then used as the estimate in other calculations of descriptive statistics.
However, it is necessary to understand that the sample means we have taken is just one of the possibilities. If we select another sample mean of the same size, it will have different random values.
This is why the sample means is always a random variable. This random variable has a probability distribution which is commonly referred to as sampling distribution. This distribution has a standard deviation which is also a standard error that is calculated through different methods.
5. Variability
Variability is defined in multiple ways.
(i) Standard deviation defines variability in point to point fashion in a sample. This means it measures variations in the sampling units.
(ii) The coefficient of variation also measures variability in point to point fashion. But, it measures variability as per a relative basis, which is not affected by any measurement units.
(iii) Standard errors define variability on the sample to sample basis. This means variability is calculated according to repeated samples.
6. Interquartile Range
One of the important measures of descriptive statistics is the interquartile range.
In this method, you have four quarters in your range. This means ¼ th of the data lies in 1st, 2nd, 3rd, and 4th quarters. In these quarters, you will find a number which divides 3rd and 4th quarters and 1st and 2nd quarters. The number that lies between the 2nd and 3rd quarters is also called the median.
The Normal Curve
You can plot the descriptive statistics values of central tendency and dispersion in the form of a curve. It is a bell-shaped curve that contains most values in the middle and some values at the extremes.
For example, height. Most of the people have a height between 5-6 feet, and some have a height below 5, and some above 6.
In a normal curve, all these values of descriptive statistics in excel such as mean, mode, and median lie on the same line.
Conclusion
Descriptive statistics are necessary to represent data in a way that can be analyzed and utilized by individuals. This means that this type of statistics help individuals and statistician evaluate large sets of data in a simple manner.
For example, if you need to see how your students performed in a test, you can find the central tendency to achieve the same. If you have 60 students in the class, then mean, median, and mode can help you take a collaborated look at data.
Another example of descriptive statistics is finding the group of students who received the best marks or judging the average marks of all the class students.
Register For a
Free Webinar
Time: 3 PM (IST/GMT +5:30)
In all, descriptive statistics is an optimum and valuable method to extract value from a large set of data and present it in a meaningful and understandable format.
You may also enroll in a Data Analytics Course for more lucrative career options in Data Analytics.