One of the essential skills that a Data Scientist needs is that of statistics, both descriptive and inferential statistics. Through the power of mathematical statistics, it is possible to predict possible outcomes and judge current market trends.
Take descriptive and inferential statistics, for example. While descriptive statistics help in describing data, inferential statistics help in making predictions from this data.
It’s a mark of a truly intelligent person to be moved by statistics – George Bernard Shaw
You are in a mall asking 100 people whether they like to shop at the X brand. If you go ahead and make a yes or no bar graph or chart using this data, then that is Descriptive Statistics. But, if you use this data to make predictions about the population such as A% people like to shop at X brand, then that is inferential statistics.
In this article, we will discuss inferential statistics types and inferential statistics examples in detail.
What is Inferential Statistics?
Inferential statistics are utilized when you have to infer the situation of a population as per sample data. For example, if you have to judge who was a better president, A or B, then you can’t talk to the whole population. In this case, you’d use sample data to judge the situation.
Download Detailed Curriculum and Get Complimentary access to Orientation Session
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
There are two major inferential statistics types in what is inferential statistics.
Hypothesis Tests
Hypothesis tests are utilized to answer questions of research using the collected sample data. For example, understanding how breakfast leads to a productive day for both children and adults.
Estimating Parameters
Estimating parameters are the statistical features of your data such as the mean or if you can use population mean to predict factors about the population.
What is the difference between descriptive and inferential statistics?
Inferential Statistics |
Descriptive Statistics |
It takes a sample and infers the impact on the population. However, in this case, a random sample should be selected to reduce errors in our prediction. |
It can describe data. For example, the most occurring value in the data. |
The representation of data is rather complex in inferential statistics. |
It utilizes graphical representations and numerical calculations to describe data. This means the representation of data is simple. |
It relies on F-ratio, Z score, and T score for results. |
It relies on median, mean, mode, standard deviation, etc. for results. |
Both inferential and descriptive statistics utilize the same type of data. Inferential statistics utilize this data to predict the results related to a larger group. But, descriptive statistics use this data to give a dedicated result or statistic.
Inferential Statistics Types
Z Statistics
Z statistics is all about the Z score, using which inferential statistics or predictions about the population is made.
Z score, also known as a standard score, depicts the standard deviations which fall below and above a data point. It ranges from -3 to +3 on the data line.
A positive score means Z score is above the given mean and negative score means Z score is below the given mean.
Why we use Z score in inferential statistics?
Inferential statistics type Z score helps in finding the relative position of a score. For instance, whether a score lies in the top 10% or not. Knowing the relative position of a value in the entire dataset helps in finding various details about the population.
For instance, you can compare two values or compare values from varied data sets as well.
Inferential statistics example of Z score:
If Jack and John took two different exams but scored the same, how can you compare their performance?
Through Z Score
John took an English test and scored 50%, and Jack took a History test and scored the same marks. You can utilize the Z score of inferential statistics to find out the performance of each in relevance to the population.
The formula of Z Score
Z Score = (Datapoint – Mean)/ Standard Deviation
However, it is necessary to note that we can only use the Z score when you have sample data of more than 30 people or values. If the data set is less than that, then you should utilize the T score of inferential statistics. We have explained the T score in the following sections.
Let’s consider the above example of Jack and John to understand how the Z score helps in calculating the performance of the two students.
For the English test, the standard deviation was 10, and the mean was 40.
For the History test, the standard deviation was 10, and the mean was 60.
Z score for History or Jack is (50-60)/10 = – 1
Z score for English or John is (50-40)/10 = 1
This shows that John performed better than the average child of the class, but Jack did not.
Hypothesis Testing
What is inferential statistics hypothesis testing?
In simple words, the hypothesis is making a guess about your surroundings or any event with the help of data-driven statistics. Hypothesis testing is utilized to test whether a study’s results are valid or not. This is achieved using a random sample, which allows the data scientist to analyze if the test results were archived by chance or are repeatable.
For instance, if you need to find out who is a better president A or B, you take the hypothesis that B is better than A. Based on this assumption, we either prove the hypothesis true or false.
Steps for Hypothesis Testing
(i) Firstly, define the null hypothesis. This should be the fact accepted widely.
(ii) Then, define an alternative hypothesis. We will try to prove this hypothesis true or false.
(iii) Now, define the significance level (a), which is usually .05, .02, or .01 depending upon the test.
(iv) Select the score that best suits the situation such as T score or Z score. This will be your (p) value.
(v) Compare (a) and (p) values to prove the null hypothesis true or false.
It is best to write if-else hypothesis statements to make the task easier.
Null Hypothesis
The null hypothesis is usually a fact. You know this is accepted widely. For instance, Obama is better than Trump or independent and dependent variables have no relationship. In your hypothesis test for inferential statistics, you will either try to accept this fact or disapprove it.
Download Detailed Curriculum and Get Complimentary access to Orientation Session
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
Alternative Hypothesis
An alternative hypothesis is something we are willing to prove right. While the null hypothesis can only be an equality operator, the alternative hypothesis can be less than, inequality, or greater than the operator.
But, it should be remembered that the alternative hypothesis is mutually exclusive with the null hypothesis – always.
Confidence Level
If you take multiple random samples, then the times a given result will turn out to be true is related to the confidence level. The percentage of this true result is the confidence level measure, such as 95%, is highly common.
Significance Level
The significance level is a measure of probability related to the rejection of the null hypothesis. In this score, we find out the rejection probability of a null hypothesis. This is when the given null hypothesis is true in reality.
A = 1 – C
Here, C is the confidence level, and A is the significance level.
Rejection or Acceptance of the Given Null Hypothesis
P-value denotes the probability related to the unusual results achieved with a true null hypothesis. Hence, to reject or accept a null hypothesis, we compare (a) and (p) values.
T Statistics
T statistics of inferential statistics, also known as Student’s T, is a measure that is the same as Z statistics. The only difference is you can describe the sample using T statistics rather than describing the population.
Generally, T statistics of inferential statistics are used when you have lower than 30 sample units, or the standard deviation of the given population is not known to us. If in case, we take a sample with higher values than 30, it may turn out to be the same as Z statistics or distribution chart.
Here, the degree of freedom holds high importance. It is the count of total interdependent operations found in the data set.
Degree of freedom or df = number of samples or n – 1
Note: In T statistics of inferential statistics examples, it would be difficult to prove the null hypothesis false because of the type of distribution observed.
The Formula of T Score
T score = (x–u)/ (S/ √(df))
Here, x is the mean of the sample, u is the mean of the population, SD is the sample’s standard deviation, and df is the degree of freedom.
Central Limit Theorem
We commonly compare Z score or Z statistics with the normal distribution or expressing Z score in terms of standard deviation.
The central limit theorem is important for the normal distribution of inferential statistics. This theorem says that as you keep increasing the sample size, the mean of the sample moves toward normal distribution. This is regardless of the population distribution shape.
For instance, if you take the Facebook posting habits of 200 people, it will give you a distorted distribution. But, if you increase the sample size and take 2000 people, you will get a bell-shaped curve like a normal distribution for Facebook positing habits of people.
Some properties of the central limit theorem of inferential statistics are:
(i) The population means will be nearly similar to sampling distribution population.
(ii) If you divide the standard deviation or standard error of the population with a sample size’s square root, you will get nearly similar value to the sampling distribution’s standard deviation.
(iii) Even if your population distribution with a small sample was bimodal or skewed, you would get a normal distribution with a large sample. (We have already covered this in the above example).
Confidence Interval
In inferential statistics, we use a sample mean and use it to move towards the mean of the population. But, knowing how accurately this sample would be able to give an idea of the population is hard. Hence, we use the confidence interval for this.
A confidence interval gives a range that will give you the population parameter.
(i) When using a one-sided interval, we may take 5% to the right or left of the given distribution. This is when considering a 95% confidence interval.
(ii) When using a two-sided interval, we may take 2.5% on the right and left sides.
Need for Inferential Statistics
While the need for inferential statistics is clear from the above inferential statistics examples, in this section, we will discuss it in detail. For that, let’s consider an example of school children age 10-12. You need to find out the average number of hours every student of the given age watches television.
To start with, you have found out the average is 2 hours in your locality. But, you don’t yet know the average television watching hours of all the children of 10-12 years of age.
There are two methods which can help you find this:
(i) Utilize the above data of locality for the overall average.
(ii) Check the average television watching hours of all the children.
While 1 is achievable, 2 is not. Even if you were to somehow accomplish this task, you would need so many resources and so much time. Also, the money spent on 2 would be just too much.
Let’s include an additional factor in the above-given data. Consider that the children in your locality are more inclined towards playing outside. You can find out the average television watching hours considering this factor using inferential statistics as well.
This is how inferential statistics help you reduce the resources, time, and energy spent on statistics by allowing you to find an estimated prediction of a larger population.
Some of the inferential statistic examples:
(i) Making a prediction about the whole population just by using a random sample.
(ii) Understanding the differences in the random sample when compared to the whole population such as the sports element above.
(iii) Understanding the impact of a feature on the hypothesis or result.
Conclusion
Inferential statistics are important to judge the feature of the entire population without actually taking the opinion or data related to the entire population.
Download Detailed Curriculum and Get Complimentary access to Orientation Session
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
You can include additional factors and elements in this hypothesis and still receive a valuable result. This is why inferential statistics is considered as one of the most important disciplines of statistics.
With the emerging use of Data in almost every field, the demand for Data Science professionals has increased considerably. If you are also looking forward to building a career in Data Science, enrol in the Data Science Course today.