Attend FREE Webinar on Data Science for Career Growth Register Now

Data Analytics Blog

Data Analytics Case Studies, WhyTos, HowTos, Interviews, News, Events, Jobs and more...

A Complete Guide To Data Structures And Algorithms In Python

5 (100%) 2 votes

Introduction

Programming is intrinsically difficult activity. Just like “there is no royal road to geometry,” there is no royal road to programming. Though each programming language is different (not as different as their designers would have us believe), there are some dimensions along which they can be related and differentiate easily. For example, low-level versus high-level, general versus specific task domain, interpreted versus compiled, and so on. Python is a general-purpose programming language and it is relatively simple and easy to learn. In addition, it can provide the kind of run time feedback that is helpful to novice programmers.

Just like any other programming language, data structure and algorithms in Python are its basic building blocks. Let us understand the basic data structures first, followed by what is abstract data types before ending with algorithms and why they are important.

Note- If you really want to learn from this post, reading it will not be enough. At the very least you should try running some snippets in the post. And if you are completely new, know how to install python.

Basic Data Structures

Data Structure and Algorithms in Python are its building blocks, in fact it goes for any programming language, just as grammar is for high-level language. I urge you to follow comments in the snippets and you will understand the example easily. Even though, after considering comments, if you don’t understand something, let us know in the comments. We would love to solve your queries. Let’s dive into programming data structures and algorithms in Python.

List

List in Python is same as ‘array’ in other languages. Lists are mutable data types. Which means you can edit the content of list by applying methods. In addition, Python lists are heterogeneous, which means, you can add any kind of a value in a list. This can be string, numerical, a boolean, or even a list itself.

listA = [1,2,3,4,5]                 #Create list using square braces []
 
listA.extend([6,7])                 #Extend the list
print(list)
 
listA.append([8,9])                 #Append elements to the list
print(list)
 
listB = [10,11,12]                  #Create another list
 
listoflist = [listA, listB]         #Join two lists
print(listoflist)                   #Print

Tuples

# indexing is from 0
# Same as list apart from tuples are immutable, which means you can not change the content once the tuple is created.
# Use () instead of []
 
tuple = (2, 4, 6, 7)
 
#print length of tuple
print(len(tuple))
 
#print element at index 2
print(tuple[2])
 
#just like lists you can merge tuples too
a = (1, 2, 3)
b = (4, 5, 6)
merge = (a, b)
print(merge)

Dictionary

Dictionary needs a little introduction. Dictionaries in Python is nothing but a table that stores information with help of a unique key. Whenever you need that data, call it by the key it was assigned. Here key is ‘ship1’, ‘ship2’ and the values are name of captains.

#create the dictionary
captains = {}
 
#add values into it
captains["ship1"] = "Kirk"
captains["ship2"] = "Marcus"
captains["ship3"] = "Phillips"
captains["ship4"] = "Mike"
 
#fetch the data
#two ways to fetch the result from dictionary
print (captains["ship3"])
print (captains.get("ship4"))

Abstract Data Types

Those which we saw above were inbuilt or pre-defined data types. So now what is Abstract Data Type, you must be wondering, right? An Abstract Data Type (or ADT) is a programmer-defined data type that specifies a set of data values and a collection of well-defined operation that can be performed on those values.

Abstract data types are defined independent of their implementation, allowing us to focus on the use of the new data type instead of how it’s implemented. This separation is typically enforced by requiring interaction with the abstract data type through an interface or defined set of operations. This is known as information hiding. By hiding the implementation details and requiring ADTs to be accessed through an interface, we can work with an abstraction and focus on what functionality the ADT provides instead of how that functionality is implemented.

Abstract data types can be viewed like black boxes as illustrated in figure. User programs interact with instances of the ADT by invoking one of the several operations defined by its interface. The set of operations can be grouped into four categories:

Constructors: Create and initialize new instances of the ADT

Accessors: Return data contained in an instance without modifying it.

Mutators: Modify the contents of an ADT instance

Iterators: Process individual data components sequentially.

To know more about ADT you can download data structures and algorithms in Python pdf by Necasie, everything explained there is worth reading.

Algorithms: why they are so important?

The most important thing to think about when designing and implementing a program is that is should produce results that can be relied upon. We want our bank balances to be calculated correctly. We want the fuel injectors in our automobiles to inject appropriate amounts of fuel. We would prefer that neither airplanes nor operating systems crash. Isn’t it? That is why Algorithms and data structures in Python are important.

Thinking about Computational Complexity

How should one go about answering the question “How long will any single function take to run?” We could run the program on some input and time it. But that wouldn’t be particularly informative because the result would depend upon, first, speed of the computer on which it is run, second, efficiency of the Python implementation on that machine, and finally, value of input.

For simplicity, and to get around the first two issues, we will use more abstract measure of time. Instead of measuring time in milliseconds, we measure time in terms of the number of basic steps executed by the program. Now, the value of input, of course, the actual running time of an algorithm depends not only on size of the input, but also upon their values. Let us take an example to understand this better. Consider, for example, the linear search algorithm implemented by

def linearSearch(L, x):
    for e in L:
        if e == x:
            return True 
        return False
  • Suppose that L is a list containing a million elements, and consider the call linearSearch(L, 3).
  • If the first element in L is 3, it will return True almost immediately.
  • On the other hand, if 3 is not in L, linearSearch will have to examine all one million elements before returning false.
  • So, in general, there are three broad cases to think about:

The best-case running time is the running time of the algorithm when the inputs are as favourable as possible. Similarly, the worst-case running time is the maximum running time over all the possible inputs of a given size. And finally, by the analogy with the definitions of the best-case and worst-case running time, the average-case (also called expected-case) running time is the average running time over all possible inputs of a given size.

This was the walk in the park, easy stuff. But sit tight as what we are looking now is walking in the park, and that park Jurassic park.

Data Analytics Course by Digital Vidya

Free Data Analytics Webinar

Date: 15th Nov, 2018 (Thu)
Time: 3 PM to 4 PM (IST/GMT +5:30)

Asymptotic Notation

Analysis of the relation between the running time of an algorithm and the size of its inputs leads us to use the following rules of thumb in describing the asymptotic complexity of an algorithm.

If the running time is the sum of multiple terms, keep the one with the largest growth rate, and drop the others.

If the remaining term is a product, drop any constants.

The most commonly used asymptotic notation is called ”Big O” notation, which is used to give an upper bound on the asymptotic growth of a function. We, like many computer scientists, will often abuse Big O notation by marking statements like, “the complexity of f(x) is O(x2).” By this we mean that in the worst-case f will take O(x2) steps to run.

However, the difference between a function being “in O(x2)” and “being O(x2)” is subtle but important. When we say that f(x) is O(x2), we are implying that x2 is both an upper bound and a lower bound on the asymptotic worst-case running time. This is often called the tight-bound.

Some Important Complexity Classes

Some of the most common instances of Big O are listed below. In each case, n is a measure of the size of the inputs to the function.

O(1) denotes constant running time

O(log n) denotes logarithmic running time

O(n) denotes linear running time

O(n log n) denotes log-linear running time

O(nk) denotes polynomial running time. K is constant.

O(cn) denotes exponential running time. N is constant.

Big-O complexity chart Source- bogocheetsheet.com

Complexity for basic data structures Source- bigocheatsheet.com

Sorting algorithms complexity Source- bigocheatsheet.org

Simple Algorithm for illustration

Data structures and algorithms with object-oriented design patterns in Python are very important. Let’s take a look at simple search algorithm to give you better understanding of how it works. A search algorithm is a method for finding an item or group of items with specific properties within a collection of items. We refer to the Binary Search, it searches a sorted array by over and over dividing the search interval in half. We basically ignore half of the elements just after one comparison. The procedure is simple,

  • Compare x with the middle element.
  • If x matches with middle element, we return the mid index.
  • Else If x is greater than the mid element, then x can only lie in right half subarray after the mid element. So we recur for right half.
  • Else (x is smaller) recur for the left half.

Below is the Python implementation of Recursive Binary Search.

# Python Program for recursive binary search.
 
# Returns index of x in arr if present, else -1
def binarySearch (arr, l, r, x):
 
     # Check base case     
     if r >= l:    
 
              mid = l + (r - l)/2             
 
              # If element is present at the middle itself              
              if arr[mid] == x:            
              return mid             
 
             # If element is smaller than mid, then it              
            # can only be present in left subarray            
            elif arr[mid] > x:         
            return binarySearch(arr, l, mid-1, x)            
 
            # Else the element can only be present             
            # in right subarray          
            else:            else:
            return binarySearch(arr, mid+1, r, x)            
 
     else:  
          # Element is not present in the array         
          return -1         
 
# Test array
arr = [ 2, 3, 4, 10, 40 ]
x = 10
# Function call
result = binarySearch(arr, 0, len(arr)-1, x)
if result != -1:
        print "Element is present at index %d" % result        
else:
        print "Element is not present in array"

Conclusion

Efficient algorithms are hard to invent. Successful professional computer scientists might invent one algorithm during their whole career- if they are lucky. Most of us never try to think about inventing a novel algorithm. Instead what we do is learn how to reduce the most complex aspects of the problems we are faced with to previously solved problems. Hope you at least try to invent a new algorithm.

Guest Blogger (Data Science) at Digital Vidya. A Data passionate who loves reading and diving deeper into the Machine Learning and Data Science arts. Always eager to learn about new research and new ways to solve problems using ML and AI.

  • Data-Analytics

  • Your Comment

    Your email address will not be published.