Starter’s Guide into Natural Language Processing with Python

by | Nov 16, 2019 | Python Programming

11 Min Read. |

In the early days, we used to say that the computer works on data and instructions. We fed data and the required instructions to the computer to get the desired output. But in today’s era of computation and digitization, we communicate with computers using human languages. In order to produce significant and actionable insights from text data, it is important to get acquainted with the techniques and principles of Natural Language Processing with Python.

Around 57.8 million adult people use smart speakers, and 27% of the online population in the world uses voice search on their mobile.

We ask questions to Alexa using a natural language and get the right answers. Natural Language Processing with Python enables machines to perform exactly the same. This is a Natural Language Processing with Python tutorial to give you more detail about the same.

What will you get by the end of this natural language processing with python tutorial?

This natural language processing with python tutorial focuses on the following:

  • Fundamentals of Natural Language Processing
  • Application of Natural Language Processing
  • Various stages of Natural Language Processing

At the end of this tutorial, not only will you have mastered all that there is to know about Natural language processing, but you will be able to independently take up Natural language processing with python projects on various new topics in this field.

Natural Language Processing with Python

Natural Language Processing with Python

What is a natural language? Natural language is any of the human languages that we use to express ourselves. To be more specific, it is a set of mutually agreed protocols to communicate with each other. So Does Alexa, a non-living object, understand our language? If yes, then how?

Think about another example. You must have a Gmail account, and almost every day you notice that some emails in your mailbox are automatically labeled as spam.

Does Google’s spam filter analyze the content of the email and then decide whether these emails need to be labeled as spam or not? The answer is yes, and this can be accomplished by Text Analytics and Natural Language Processing with Python.

Natural Language Processing with Python

Natural Language Processing with Python

Fundamentals of Natural Language Processing

Text Analytics is a scientific process of extracting significant and useful information from the natural language text, which is not necessarily human language.

Natural Language Processing is not just restricted to text, it deals with human language. Voice recognition, analysis, Natural Language Understanding (NLU), and Natural Language Generation (NLG) also come under the sphere of Natural Language Processing.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 26th Sep, 2020 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

NLU is a process by which a non-living object (like Alexa) is capable of understanding the human language. NLG is a process by which a non-living object is able to express its thoughts by phrasing meaningful sentences and share the humans.

These tasks are not taken care of by Text Analytics. However, Text Analytics and Natural Language Processing go hand in hand.

Text Analytics and Natural Language Processing go hand in hand.

Text Analytics and Natural Language Processing go hand in hand.

Natural Language Processing has emerged from the development of various disciplines like linguistics, formal languages, computations, and artificial intelligence.

With the substantial advancement of computing technologies and with the increased availability of unstructured data generated from our speech, from our posts on the social networking sites, from the messages we send as SMSs or over WhatsApp, machine learning and deep learning techniques are used to process natural language.

Applications of Natural Language Processing

One of the most important applications of Natural Language Processing is Sentiment Analysis. Another application is the implementation of chatbots used by different organizations. The same process runs behind the voice assistants like Siri, Google Voice Assistant, and others.

Google Translator that translates texts from one language to another is the application of Natural Language Processing. Providing text suggestions, keyword searching, and advertisement matching are also Natural Language Processing applications.

Applications of Natural Language Processing

Applications of Natural Language Processing

Some of the ideas for Natural Language Processing with python projects could be used in the following ways:

1. Next-Word Prediction

By using this method, you can predict the next word as the user types them in real-time.

2. Topic Segmentation

You can use this method to classify users emails/status updates/ tweets/ feedback/ reviews/questions into numerous categories.

3. Sentiment Analysis

You can use this method to classify the sentiment of users with their statuses, messages, reviews, etc. into negative or positive sentiments.

These are only some of the possibilities for Natural Language Processing with python projects. If you are looking for more ideas, going through this guide will surely make you able enough to find new ideas for Natural Language Processing with python projects.

Various Stages of Natural Language Processing

NLU is harder than NLG as it takes a lot of time to understand the human language especially when a recipient is a non-living object. There are a number of pre-processing stages that are needed to carry out Natural Language Processing tasks to get meaningful information from the text.

Various Stages of Natural Language Processing

Various Stages of Natural Language Processing

Now, where will you execute these stages? Here, Natural Language Processing with Python comes into the picture with the great support for a number of APIs. The latest version of Python can be downloaded and installed from the https://www.python.org/downloads/ link.

Moreover, Natural Language Processing with Python provides a complete toolkit called NLTK. NLTK contains packages that enable machines to recognize human language and response to it with the right answer. Therefore, after installing Python, it is essential to install NLTK and download all of its packages.

Let us understand the various stages of Natural Language Processing with Python.

1. Tokenization

Tokenization denotes to the technique of splitting a sentence into its constituent words. You can perform tokenization of words and tokenization of sentences as well by using Python. The following Natural Language Processing with Python source code snippet shows an example of tokenization of words:

from nltk.tokenize import word_tokenize
var = "I am learning Python. Natural Language Processing with Python is a fun."
print(word_tokenize(var))

In the preceding code snippet, we are importing the word_tokenize module from the NLTK library. Next, we declare a variable with the name var that stores two sentences. Then, we call the word_tokenize module by passing the var variable and print the result.

The preceding code snippet generates the following output.

['I', 'am', 'learning', 'Python', '.' ' Natural', ' Language', 'Processing', ' with', 'Python', 'is', 'a', 'gun', '.']

You can see that the word_tokenize module extracts each word with punctuation.
Let us check the following Natural Language Processing with Python code snippet that shows the difference between sentence tokenization and words tokenization:

from nltk.tokenize import sent_tokenize
var = "I am learning Python. Natural Language Processing with Python is a fun."
print(sent_tokenize(var))

The preceding code snippet executes the sent_tokenize module that extract the sentences from the text and generates the following output:

['I am learning Python.', 'Natural Language Processing with Python is a fun.']

2. Stemming

A single word can be transformed into various forms in English like languages. For example, the word, Product. It can have various forms, such as Products, Production, Productive, and Productivity.

The process of converting these variations into their base word is called stemming, which is an important task in Natural Language Processing with Python. The stem module in NLTK enables us to perform stemming.

Let us consider the following Natural Language Processing with Python source code snippet:

from nltk.stem import PorterStemmer
word_list= ["Products", " Production", " Productive", " Productivity"]
pstemmer =PorterStemmer()
for words in word_list:
baseWord= pstemmer.stem(word_list)
print(baseWord)

The preceding code snippet displays the word, Product, four times as the output.

In the preceding code snippet, we have imported only PorterStemmer instead of the entire stem module as it makes the program heavy.

The word_list variable stores a dummy list of variations of the word, Product.

Then we have created an object with the name:

pstemmer that belongs to the nltk.stem.porter.PorterStemmer

Then we have called the stem method that passes the variations one by one using the for loop. Finally, we get the base word of each variation mentioned in the list as the output.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 26th Sep, 2020 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

3. Lemmatization

The stemming algorithm has some limitations. Stemming simply cuts the suffix from the beginning or end of the word. This may result in incorrect output in some cases. Let us understand this by using the following Natural Language Processing with Python source code snippet:

from nltk.stem import PorterStemmer
pstemmer =PorterStemmer()
baseWord= pstemmer.stem("battling")
print(baseWord)

The preceding code snippet displays “battl”, which has no meaning.

Lemmatization provides a solution to overcome these problems with stemming. Lemmatization algorithm performs the morphological analysis of words depending on their meaning and checks the dictionary to extract the base or dictionary form of a word, which is called lemma.

The additional checking slows down the process but returns the appropriate result. Let us understand this with the following Natural Language Processing with Python source code example:

from nltk.stem.wordnet import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize('battling'))

In the preceding example, we have first imported WordNetLemmatizer from the wordnet module. Then created an object of the WordNetLemmatizer class and brought the word to its base form by executing the lemmatize() method. With the input “battling”, the lemmatize() method returns its base form, “battle”.

4. PoS Tagging

PoS stands for parts of speech. PoS tagging refers to the process of labeling each word within the sentence with their respective parts of speech tag. Let us consider the following Natural Language Processing with Python source code snippet:

from nltk import word_tokenize
words = word_tokenize("I am reading Natural Language Processing fundamentals")
nltk.pos_tag(words)

In the preceding code snippet, we are using the word_tokenize() method in order to extract the tokens in the sentence. Then we are using the pos_tag() method that retrieves the assigned PoS for each word and getting the following output:

[('I', 'PRP'), ('am', 'VBP'), ('reading', VBG'), ('Natural', 'NNP'), ('Language', 'NNP'), ('fundamentals', 'NNS')]

In the preceding output, you can see that a PoS is assigned for each token. Here, PRP represents a personal pronoun, VBP represents verb present, VGB represents verb gerund, NNP represents proper noun singular, and NNS represents for noun plural.

5. Named Entity Recognition (NER)

Named entities are proper nouns, such as names of persons, organizations, and locations that are not present in dictionaries. Therefore, we need to take care of these entities separately. NER is one of the main tasks of the Natural Language Processing with Python. It recognizes the named entities and maps them to the categories that are already defined.

In order to implement NER by using Python NLTK, first import the necessary libraries by using the following Natural Language Processing with Python source code snippet:

import nltk
from nltk import word_tokenize
nltk.download('maxent_ne_chunker')
nltk.download('words')
Next, declare a variable with the name line and assigned a string to the variable.
line = "John lives in Birmingham."

In order to find the named entities from the string, execute the nltk.ne_chunk method.

ner = nltk.ne_chunk(nltk.pos_tag(word_tokenize(line)), binary=True)
[a for a in ner if len(a)==1]
The following output will be generated:
[Tree(‘NE’, [(‘John’, ‘NNP’)], Tree(‘NE’, [(‘Birmingham’, ‘NNP’)])]

You must have noticed that the named entities “John” and “Birmingham” are identified and mapped by an already-defined category “NNP”.

6. Stop Words Removal

Stop words are frequently used words, such as “I”, “you”, “is”, “am”, “the”, and many more that are used to provide support for constructing sentences.

However, the presence of these words does not have much impact on understanding the meaning of the sentences in which they are present. Therefore, we can remove these stop words while analyzing the texts.

If we want to check the list of stop words provided by the English language, first we need to import the necessary libraries.

nltk.download('stopwords')
from nltk import word_tokenize
from nltk.corpus import stopwords

Then, we need to call the words() function by passing “English” as its parameter.

stop_words_list = stopwords.words('English')

In the preceding code snippet, the list of stop words provided by the English language is stored in the stop_words_list variable.

Now use the print() function to display the list.
print(stop_words_list)

The following Natural Language Processing with Python source code snippet can be o remove the stop words from a sentence:

line = "I am learning Python. It is one of the most popular programming languages for Natural Language Processing."
line_words = word_tokenize(line)
stop_words_removed = ' '.join([word for word in line_words if word not in stop_words_list])
print(stop_words_removed)

You must have noticed that we have first assigned a string to the line variable and tokenized the string into words by using the word_tokenize() method. Now a loop executes to remove the stop words (by checking if there are any) and then finally combines the rest of the words to form a complete sentence.

7. Text Normalization

There are some words that are used in different forms but represent the same thing. For example, UK and United Kingdom, the US and United States, Bangalore and Bangaluru, 19 and 2019. These are different words but they carry the same meaning.

Text normalization is a process of converting the different variations of words into a standard form. For this, the replace() function can be executed, as shown in the following code snippet:

str = "I will visit the US on 22-10-19"
normalized_str = str.replace("US", "United States")..replace("-19", "-2019")
print(normalized_str)

The preceding code snippet displays the following output:

I will visit the United States on 22-10-2019

Please note that Stemming and Lemmatization are also the form text normalization.

Word Sense Disambiguation

Sometimes words with the same spelling convey the different meanings depending on how the word is associated with the other words in a sentence. Let us consider the following sentences:

  • I know how to play Guitar
  • We play only cricket
  • Please play the next song

In the preceding three sentences, the “play” word conveys different meanings in different contexts.
Therefore, we need to map a word to the correct sense it carries. This process is known as word sense disambiguation, which ensures that the words are treated as different entities according to their contexts.

The lesk module of Python NLTK provides the lesk algorithm that helps us to identify the sense of the word according to the context.

Closing Thoughts

Now you have learned that through this Natural Language Processing with Python tutorial that Natural Language Processing with Python uses machine learning techniques to make computers understand human language.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 26th Sep, 2020 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

You are also introduced to the various preprocessing stages of Natural Language Processing with Python. And you must have realized that Natural Language Processing with Python is actually not that complicated to understand as it seems.

Python also provides flexibility in terms of career opportunities. One can start as a developer or programmer and then later turn to the role of a data scientist. You can also become a qualified teacher in Python by taking a Python Programming Course.

Register for FREE Digital Marketing Orientation Class
Date: 26th Sep, 2020 (Sat)
Time: 11 AM to 12:30 PM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.
We are good people. We don't spam.

You May Also Like…

Top 21 DevOps Interview Questions

Top 21 DevOps Interview Questions

DevOps interview questions can be tricky and involve some prior preparation. One of the most profitable careers in...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *