Getting Started With Big Data

by | Jan 12, 2018 | Big Data

7 Min Read. |


We live in an era where a substantial amount of data is being generated every moment. Don’t know something? Google it. Going to new places? Maps. Want to make new friends? Social networking. Want to cook delicious food for your foodie friends? Videos. Machine Learning and Artificial Intelligence, how do they function? Data and Logic. Every single platform or we can say every single thing on the planet is able to generate data. Currently, as of 2017, whopping 2,500,000,000,000,000,000 (2.5 Quintillion) bytes of data is being generated every second. That’s right, every second.

Big data is not about the data – Gary King

Definition of Big data

Just like opinions in everyday life, its definition varies from people to people, but the best yet written in simple language I find is,

Big Data describes the collection of complex and large data sets such that it’s difficult to capture, process, store, search and analyze using conventional database system which is offering more qualitative insights into our everyday life.

However, if you are a non-reader persona and ‘Oh crap, 4 lines just for a definition?’ type of person (believe me, I belong to the same category) then just take a look at the image below, you’ll probably get the idea.

Figure: General idea about Big Data

Big Data in Action

By now, you’ll probably be wondering how data is managed and even before that, where it is used, right? Let’s look at some of the applications where it can really be helpful.


  1. It might just cure cancer: As given in the article, before the end of his second term, President Obama came up with the program called Cancer Moonshot Program that had a goal of accomplishing 10 years of progress towards curing cancer in half that time. Sounds a little infeasible but it does it all.
  2. There are over 100,000 health applications available for smartphones to track our own health stats. You can imagine the number of users and the data generated by them.
  3. Even though it was a failure, with its help, Google announced  ‘Google Flu’ in 2009, which at that time they claimed with help of their massive data, they can serve humanity a little faster, accurately and easily.

To err is human, we can’t outsmart nature, but if it is something that promises better living for humans, it is always worth a shot. At least I believe that.

Economic development

  1. United Nations, an international organization, started an initiative called Global Pulse to leverage it for global development, by conducting Sentiment analysis (whether paragraph/text has a positive or negative effect) of messages from social networks to predict job losses, spending reduction or disease outbreaks in the given region.

Understanding and Predicting Crime

  1. Los Angeles Police Department (LAPD) collected data on more than 130 million crimes from the past 80 years, and continue to update the software as crimes are committed. The result which has seen, 33% reduction in burglaries, 21% reduction in violent crime, a 12% reduction in property crime. The software is now being used in cities across the UK, USA, Canada, Australia, France, Italy and China.

Waste Management

  1. Delivering efficient, effective, and environmentally responsible waste collection and recycling services by using sensors in the waste container to detect the filling level, historical data, and usage trends.

Urban Transport

  1. The revolutionary Self-driving cars. Topmost tech giants are in the race for manufacturing self-driving cars, how do they do it? Sensors, Image processing, and Programming. From acquiring data from the sensor to delivering end results, we deal with an enormous amount of data.
  2. Moreover, Real-time data capture for traffic analysis, route optimization techniques for better and shortest route, these all are an application in which we deal with the abundant amount of data.


  1. Tailoring products, for individual stores according to their customer’s preference, based on past data is the most crucial and important factor in order to become a successful retailer.

Interesting, isn’t it? Almost each and every business will need the Data Analytics team in the near future that is for sure.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 27th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.

Job roles for Big Data

It is a broad field with plentiful job opportunities. There are several job roles for which you can aim for, which includes,

  1. Business Analyst
  2. Data Analyst
  3. Statistician
  4. Data Scientist
  5. Data Engineer/ Architect
  6. Machine Learning Engineer

Now you know what, why and where, let’s talk a little about ‘how’, how to start with it, basic skills you need, blogs you should watch out, online MOOCs (Massive open online courses) to take, Data Scientist to follow today, in short, your very first step towards advancement Data Science career. In the upcoming section, I will try to cover more in the general scenario of the Data Science field rather than it because skills will remain the same at a certain level. All you need to do is write down what exactly you want to go for and just work for it.


  1. Statistics: Since it’s the science of collecting, analyzing and making an inference from data, you’ll need it the most. Start with the basics.
  2. Hackers approach: As I mentioned above, it is not about data, it’s the art of generating insights from that data. You must develop ‘Problem-solving mind’ or as some of them say Hackers approach. It will help you analyze and solve problems more easily. Try solving mind games like Rubik’s cube, it will help.
  3. Programming language: You need to start with a programming language in order to implement what you’ve learned theoretically. From my personal experience, I’d suggest ‘python programming’. After you figure out all the things about python, start with the very own python documentation rather than wandering around for help.
  4. Open Source Framework: get yourself familiarized with Hadoop Ecosystem which includes Hadoop Distributed File System (HDFS), Yet Another Resource Manager(YARN) etc. Just an overview will give you a push start.
  5. Optional: Since it is a vast field, a scant knowledge of linear algebra will give you an edge (for machine learning and AI)

Blogs you should watch for periodical advancement in the field

  1. Digital Vidya: This should be your go-to place to know about the latest trends, opportunities, news, and developments in the Analytics industry in India. And yes, the webinars that Digital Vidya conducts with leading influences and rising stars in the industry are not to be missed.
  2. KDnuggets – KDnuggets is one of the most popular data science blogs, with articles that cover Business Analytics, Statistics, and Machine Learning.
  3. Data Science Central – Data Science Central is the industry’s online resource for its practitioners
  4. Analytics Vidhya – Analytics Vidhya features articles on data science, machine learning, R programming, Python for analytics and more.
  5. No free hunch (Official of – Regular “how I did it” posts from machine learning competition winners, as well as more general data science tips from a practitioner’s perspective.
  6. R-bloggers – More than 750 R enthusiasts and experts contribute the blog, making it one of the most informative data science blogs on the web.
  7. Big Data Made Simple – It is a content resource that curates and generates content for almost 25 verticals and technologies in the big data landscape.
  8. – Launch your career in Data Science.
  9. Simply Statistics – A statistics blog by three biostatistics professors – Rafa Irizarry, Roger Peng, and Jeff Leek.
  10. PyImageSearch – This OpenCV, deep learning, and Python blog is written by Adrian Rosebrock
  11. The Open-Source Data Science Masters – The Open Source Data Science Masters Curriculum for Data Science View on GitHub Download .zip Download .tar.gz.

Courses you can take,

Digital Vidya Courses for Data Science:
  1. Big Data And Hadoop Courses: There are several Big Data courses available at Digital Vidya. You can enroll for a free webinar to know more about the course and about it. The instructors and advisors for these courses are highly qualified professionals working/worked at top organizations like PayPal and Royal Bank of Scotland.
  2. Data Science Course: Again, the base language ‘python’, this course covers everything from the introduction to the capstone project.
  3. Data Analytics Courses: Data Analysis using 4 different tools/languages, R, Python, SAS, and Excel/Power BI.

The only way to ensure that you are ready for the world out there is to take part in competitions and solve real-world problems. Kaggle is your place to go, all the instructions are given out there, try and get yourself familiarized with the iconic ‘TITANIC’ challenge. Moreover, you can follow lead Data Scientists of today’s era to learn about how they help to make the world a better place to live.

Still not sure where to start or what to do? Let me know in the comments, I’ll be happy to solve related queries.

Happy Learning.

Attend FREE Webinar on Data Science & Analytics for Career Growth

Date: 27th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)

  • This field is for validation purposes and should be left unchanged.

You May Also Like…

An overview of Anomaly Detection

An overview of Anomaly Detection

Companies produce massive amounts of data every day. If this data is processed correctly, it can help the business to...

1 Comment

  1. Jenil desai

    Great, thanks!


Submit a Comment

Your email address will not be published. Required fields are marked *