Attend FREE Webinar on Digital Marketing for Career & Business Growth Register Now

Data Analytics Blog

Data Analytics Case Studies, WhyTos, HowTos, Interviews, News, Events, Jobs and more...

Getting Started With Big Data

    -  

4.8 (96.36%) 11 votes

Introduction to Big Data

We live in an era where a substantial amount of data is being generated every moment. Don’t know something? Google it. Going to new places? Maps. Want to make new friends? Social networking. Want to cook delicious food for your foodie friends? Videos. Machine Learning and Artificial Intelligence, how do they function? Data and Logic. Every single platform or we can say every single thing on the planet is able to generate data. Currently, as of 2017, whopping 2,500,000,000,000,000,000 (2.5 Quintillion) bytes of data is being generated every second. That’s right, every second. That’s where the ‘Big Data’ comes in.

Big data is not about the data – Gary King

What is Big Data exactly?

Just like opinions in everyday life, the definition of Big Data varies from people to people, but the best yet written in simple language I find is,

Big Data describes the collection of complex and large data sets such that it’s difficult to capture, process, store, search and analyze using conventional database system which is offering more qualitative insights into our everyday life.

However, if you are a non-reader persona and ‘Oh crap, 4 lines just for a definition?’ type of person (believe me, I belong to the same category) then just take a look at the image below, you’ll probably get the idea.

Figure: General idea about Big Data

Big Data in Action

By now, you’ll probably be wondering how data is managed and even before that, where it is used, right? Let’s look at some of the applications where Big Data can really be helpful.

Healthcare

  1. Big data might just cure cancer: As given in the article, before the end of his second term, President Obama came up with the program called Cancer Moonshot Program that had a goal of accomplishing 10 years of progress towards curing cancer in half that time. Sounds a little infeasible but Big Data does it all.
  2. There are over 100,000 health applications available for smartphones to track our own health stats. You can imagine the number of users and the data generated by them.
  3. Even though it was a failure, with help of Big Data, Google announced  ‘Google Flu’ in 2009, which at that time they claimed with help of their massive data, they can serve humanity a little faster, accurately and easily.

To err is human, we can’t outsmart nature, but if it is something that promises better living for humans, it is always worth a shot. At least I believe that.

Economic development

  1. United Nations, an international organization, started an initiative called Global Pulse to leverage Big Data for global development, by conducting Sentiment analysis (whether paragraph/text has a positive or negative effect) of messages from social networks to predict job losses, spending reduction or disease outbreaks in given region.

Understanding and Predicting Crime

  1. Los Angeles Police Department (LAPD) collected data on more than 130 million crimes from the past 80 years, and continue to update the software as crimes are committed. The result which has seen, 33% reduction in burglaries, 21% reduction in violent crime, 12% reduction in property crime. The software is now being used in cities across the UK, USA, Canada, Australia, France, Italy and China.

Waste Management

  1. Delivering efficient, effective, and environmentally responsible waste collection and recycling services by using sensors in the waste container to detect the filling level, historical data and usage trends.

Urban Transport

  1. The revolutionary Self-driving cars. Topmost tech giants are in the race for manufacturing self-driving cars, how do they do it? Sensors, Image processing and Programming. From acquiring data from the sensor to delivering end results, we deal with enormous amount of data.
  2. Moreover, Real-time data capture for traffic analysis, route optimization techniques for better and shortest route, these all are an application in which we deal with the abundant amount of data.

Retail

  1. Tailoring products, for individual stores according to their customer’s preference, based on past data is most crucial and important factor in order to become a successful retailer.

Interesting, isn’t it? Almost each and every business will need the Data Analytics/Big Data team in near future that is for sure.

Big Data & Analytics Course by Digital Vidya

Free Big Data & Analytics Webinar

Date: 20th Sep, 2018 (Thursday)
Time: 3 PM to 4 PM (IST/GMT +5:30)

Job roles in Big Data

Big data is a broad field with plentiful job opportunities. There are several job roles for which you can aim for, which includes,

  1. Business Analyst
  2. Data Analyst
  3. Statistician
  4. Data Scientist
  5. Data Engineer/Big Data Architect
  6. Machine Learning Engineer

Now you know what, why and where of Big Data, let’s talk a little about ‘how’, how to start with Big Data, basic skills you need, blogs you should watch out, online MOOCs (Massive open online courses) to take, Data Scientist to follow today, in short, your very first step towards advancement Data Science career. In the upcoming section, I will try to cover more in general scenario of Data Science field rather than Big Data because skills will remain same at a certain level. All you need to do is write down what exactly you want to go for and just work for it.

Skills:

  1. Statistics: Since it’s the science of collecting, analyzing and making an inference from data, you’ll need it the most. Start with the basics.
  2. Hackers approach: As I mentioned above, Big data is not about data, it’s the art of generating insights from that data. You must develop ‘Problem-solving mind’ or as some of them say Hackers approach. It will help you analyze and solve problems more easily. Try solving mind games like Rubik’s cube, it will help.
  3. Programming language: You need to start with a programming language in order to implement what you’ve learned theoretically. From my personal experience, I’d suggest ‘python programming’. After you figure out all the things about python, start with the very own python documentation rather than wandering around for help.
  4. Open Source Framework: get yourself familiarized with Hadoop Ecosystem which includes Hadoop Distributed File System (HDFS), Yet Another Resource Manager(YARN) etc. Just an overview will give you a push start.
  5. Optional: Since Big Data is a vast field, a scant knowledge of linear algebra will give you an edge (for machine learning and AI)

Blogs you should watch for periodical advancement in the field

  1. Digital Vidya: This should be your go to place to know about the latest trends, opportunities, news and developments in the Big Data and Analytics industry in India. And yes, the webinars that Digital Vidya conducts with leading influences and rising stars in the industry are not to be missed.
  2. KDnuggets – KDnuggets is one of the most popular data science blogs, with articles that cover Business Analytics, Statistics, and Machine Learning.
  3. Data Science Central – Data Science Central is the industry’s online resource for big data practitioners
  4. Analytics Vidhya – Analytics Vidhya features articles on data science, machine learning, R programming, Python for analytics and more.
  5. No free hunch (Official of Kaggle.com) – Regular “how I did it” posts from machine learning competition winners, as well as more general data science tips from a practitioner’s perspective.
  6. R-bloggers – More than 750 R enthusiasts and experts contribute the blog, making it one of the most informative data science blogs on the web.
  7. Big Data Made Simple – Big Data Made Simple is a content resource that curates and generates content for almost 25 verticals and technologies in the big data landscape.
  8. DataSchool.io – Launch your career in Data Science.
  9. Simply Statistics – A statistics blog by three biostatistics professors – Rafa Irizarry, Roger Peng, and Jeff Leek.
  10. PyImageSearch – This OpenCV, deep learning, and Python blog is written by Adrian Rosebrock
  11. The Open-Source Data Science Masters – The Open Source Data Science Masters Curriculum for Data Science View on GitHub Download .zip Download .tar.gz.

Courses you can take,

Digital Vidya Courses for Data Science:
  1. Big Data And Hadoop Courses: There are several Big Data courses available at Digital Vidya. You can enrol for a free webinar to know more about the course and about Big Data. The instructors and advisors for these courses are highly qualified professionals working/worked at top organizations like PayPal and Royal Bank of Scotland.
  2. Python for Data Science: Again, the base language ‘python’, this course covers everything from the introduction to the capstone project.
  3. Data Analytics Courses: Data Analysis using 4 different tools/languages, R, Python, SAS and Excel/Power BI.

The last stage, Competitions.

The only way to ensure that you are ready for the world out there is to take part in competitions and solve real-world problems. Kaggle is your place to go, all the instructions are given out there, try and get yourself familiarized with the iconic ‘TITANIC’ challenge. Moreover, you can follow lead Data Scientists of today’s era to learn about how they help to make the world a better place to live.

Still not sure where to start or what to do? Let me know in the comments, I’ll be happy to solve related queries.

Happy Learning.

Data Science aspirant

  • Big Data


  • There is 1 comment


    • 8 months ago

      Jenil desai   /   Reply

      Great, thanks!

    Your Comment

    Your email address will not be published.