A Complete Guide To Data Structures And Algorithms In Python

Introduction

Programming is an intrinsically difficult activity. Just like “there is no royal road to geometry”, likewise there is no royal road to data structures. Though each programming language is different but there are some dimensions where their data structures can be related and differentiate easily. For example, low-level versus high-level, general versus specific task domain, interpreted versus compiled, and so on. Python is a general-purpose programming language and it is relatively simple and easy to learn. In addition, it provides the kind of run time feedback which will help the novice programmers.

Just like any other programming language, data structure and algorithms in Python are its basic building blocks. Let us understand the basic data structures first and then we will learn about abstract data types. Hence, in the end, we would cover algorithms and know why they are important.

Note- If you really want to learn from this post then reading is not enough. At the very least you should try running some snippets in the post. And if you are completely new, know how to install python.

Basic Data Structures

Data Structure and Algorithms are building blocks of Python. It works in a similar manner as grammar works for high-level language. To get a more clear idea, follow comments in the snippets. For any queries related to the blog post, you can reach me in the comment section. Let’s start covering the topics on data structures and algorithms in Python.

List

List in Python is the same as ‘array’ in other languages. In addition, Python lists are heterogeneous, which means you can add any kind of value in a list. This can be string, numerical, a boolean, or even a list itself.

listA = [1,2,3,4,5]                 #Create list using square braces []
 
listA.extend([6,7])                 #Extend the list
print(list)
 
listA.append([8,9])                 #Append elements to the list
print(list)
 
listB = [10,11,12]                  #Create another list
 
listoflist = [listA, listB]         #Join two lists
print(listoflist)                   #Print

Tuples

# indexing is from 0
# Same as list apart from tuples are immutable, which means you can not change the content once the tuple is created.
# Use () instead of []
 
tuple = (2, 4, 6, 7)
 
#print length of tuple
print(len(tuple))
 
#print element at index 2
print(tuple[2])
 
#just like lists you can merge tuples too
a = (1, 2, 3)
b = (4, 5, 6)
merge = (a, b)
print(merge)

Dictionary

Dictionary needs a little introduction. In Python programming, the dictionary stores the information with the help of a unique key. Whenever you need that data, call it by the key it was assigned. Here the key is ‘ship1’, ‘ship2’ and the values are the name of captains.

#create the dictionary
captains = {}
 
#add values into it
captains["ship1"] = "Kirk"
captains["ship2"] = "Marcus"
captains["ship3"] = "Phillips"
captains["ship4"] = "Mike"
 
#fetch the data
#two ways to fetch the result from dictionary
print (captains["ship3"])
print (captains.get("ship4"))

Abstract Data Types(ADT)

Those which we saw above are inbuilt or pre-defined data types. So, now what is Abstract Data Type? ADT is defined as a data type that specifies a set of data values and a collection of well-defined operation. These data types can be performed on values.

ADT are defined independent of their implementation. They allow us to focus on the use of the new data type instead of how it’s implemented. This separation is typically enforced by requiring interaction with the abstract data type. The above interaction is obtained by an interface or defined set of operations, known as information hiding. By doing this, we can work with an abstraction and focus on what functionality the ADT provides instead of how that functionality is implemented.

Abstract data types can be viewed as black boxes as illustrated in figure. User programs interact with instances of the ADT by invoking one of the several operations defined by its interface. The set of operations can be grouped into four categories:

Constructors: Create and initialize new instances of the ADT

Accessors: Return data contained in an instance without modifying it.

Mutators: Modify the contents of an ADT instance

Iterators: Process individual data components sequentially.

To know more about ADT you can download data structures and algorithms in Python pdf by Necasie, everything explained there is worth reading.

Algorithms: how to use it for optimum data structure?

The most important thing to think about when designing and implementing a program is that its produced results can be relied upon. We want our bank balances to be calculated correctly and the fuel injectors in our automobiles to inject appropriate amounts of fuel. The preference is that neither airplanes nor operating systems crash. Isn’t it? That is why Algorithms and data structures in Python are important.

Thinking about Computational Complexity

How should one go about answering the question “How long will any single function take to run?” We could run the program on some input and time. Anyways, that wouldn’t be particularly informative because the result would depend upon the speed of the computer and the efficiency of the Python implementation and the value of input.

To simplify, we will use a more abstract measure of time. We measure time in terms of the number of basic steps executed by the program instead of measuring it in milliseconds. Now, the value of input, of course, the actual running time of an algorithm depends upon the size of the input and its values. Let’s take an example to understand this better. Consider a linear search algorithm implemented as under.

def linearSearch(L, x):
    for e in L:
        if e == x:
            return True 
        return False

Justification

Suppose that L is a list containing a million elements, and consider the call linear search(L, 3).
If the first element in L is 3, it will return True almost immediately.
On the other hand, if 3 is not in L, linear search will have to examine all one million elements before returning false.
So, in general, there are three broad cases to think about:

The best-case running time is when the inputs of the algorithm are favorable up to their best. Similarly, the worst-case running time is the maximum running time over all the possible inputs of a given size. Finally, by the analogy with the definitions of the best and worst-case running time, the average-case running time is when all possible inputs of a given size.

This was the walk in the park( easy stuff), but sit tight as what we will walk through now is something like a Jurassic Park.

Asymptotic Notation

Analysis is derived from the relation between the running time of an algorithm and the size of its inputs. This procedure leads us to use the following rules of thumb in describing the asymptotic complexity of an algorithm.

If the running time is the sum of multiple terms, keep the one with the largest growth rate, and drop the others.

The most commonly used asymptotic notation is called ”Big O” notation and it is used to give an upper bound on the asymptotic growth of a function. We, like many computer scientists, will often abuse Big O notation by marking statements like, “the complexity of f(x) is O(x²).” By this, we mean that in the worst-case f will take O(x²) steps to run.

However, the difference between a function being “in O(x²)” and “being O(x²)” is subtle but important. When we say that f(x) is O(x²), we are implying that x² is both an upper bound and a lower bound on the asymptotic worst-case running time. This is often called the tight-bound.

Some Important Complexity Classes

Some of the most common instances of Big O are listed below. In each case, n is a measure of the size of the inputs to the function.

O(1) denotes constant running time

O(log n) denotes logarithmic running time

O(n) denotes linear running time

O(n log n) denotes log-linear running time

O(n^k) denotes polynomial running time. K is constant.

O(cⁿ) denotes exponential running time. N is constant.

Screenshot 23 — Big-o complexity chart source- bogocheetsheet. Com

Screenshot 24 — Complexity for basic data structures source- bigocheatsheet. Com

Screenshot 25 — Sorting algorithms complexity source- bigocheatsheet. Org

Simple Algorithm for illustration

Object-oriented data structures are very important in Python language. Let’s take a look at a simple search algorithm and you will get a better understanding of its working. A search algorithm is a method for finding an item or group of items of specific properties within a collection. We will be keeping Binary Search because it searches a sorted array. We will do this by dividing the search interval in half and ignore half of the elements just after one comparison. The procedure is simple,

Compare x with the middle element.
If x matches with middle element, we return the mid index.
Else If x is greater than the mid element, then x can only lie in right half subarray after the mid element. So we recur for the right half.
Else (x is smaller) recur for the left half.

Below is the Python implementation of Recursive Binary Search.

# Python Program for recursive binary search.
 
# Returns index of x in arr if present, else -1
def binarySearch (arr, l, r, x):
 
     # Check base case     
     if r >= l:    
 
              mid = l + (r - l)/2             
 
              # If element is present at the middle itself              
              if arr[mid] == x:            
              return mid             
 
             # If element is smaller than mid, then it              
            # can only be present in left subarray            
            elif arr[mid] > x:         
            return binarySearch(arr, l, mid-1, x)            
 
            # Else the element can only be present             
            # in right subarray          
            else:            else:
            return binarySearch(arr, mid+1, r, x)            
 
     else:  
          # Element is not present in the array         
          return -1         
 
# Test array
arr = [ 2, 3, 4, 10, 40 ]
x = 10
# Function call
result = binarySearch(arr, 0, len(arr)-1, x)
if result != -1:
        print "Element is present at index %d" % result        
else:
        print "Element is not present in array"

Conclusion

Efficient algorithms are hard to invent. Successful professional computer scientists might invent one algorithm during their whole career- if they are lucky. Most of us never try to think about inventing a novel algorithm. Instead, we must learn to reduce the most complex aspects of the problems we previously faced. Hope you at least try to invent a new algorithm.