What is Data Science?
Data Science comprises of different disciplines which include Statistics, Machine Learning, Data Analysis, Computer Science, and Research. If you want to pursue Data Science career, after reading this you must be wondering, how and where to learn about different fields in order to advance your career, right? Don’t worry. This may seem daunting if you are completely new in this field and trying to figure out how to start a Data Science career. But, notc all job roles require all the skills. Different job roles and companies will emphasize on some skills over other skills, this way you don’t have to learn and be an expert in everything. We will cover this later as we go through the different career path in Data Science.
Followed by this introduction, I will discuss mainly four things to make sure you are heading in the right direction. Firstly, I will talk about the skills you need, followed by what are some certifications you should do in order to narrate that you possess those skills. Then, I will talk about what kind of job roles are there in the industry and which are some major employers. Lastly, I will emphasize the growth of Data Science in India followed by a conclusion. Ready to start? Let’s roll.
“It is possible to fly without motors,” Wilbur Wright says, “but not without knowledge and skill.” Even if you have best tools out there to perform Data Analysis or Predictive Modelling, without skills and knowledge there is nothing you can do. Data Science requires hard skills like Analysis, Machine Learning, Statistics with help of tools like Hadoop and programming languages like Python and R. However, you will also excel your career if you are good at critical thinking, pattern recognition (basically eye for detail), a great listener and problem solver. I will walk you through general skills that you need to start with and then more in-depth skills in order to excel your career. These are great to start with but not limited to.
It basically means dealing with collection, interpretation, organization, analysis, and presentation. Regardless of what anyone says, good knowledge of statistics is essential for a Data Scientist. You should be familiar with concepts like experimental design, regression modeling, Bayesian thinking, etc. Statistics is vital for all company types but most importantly at the data-driven companies. Needless to say, it will help to make decisions based on the data.
I would like to quote Steve Jobs here, he says, “Everyone should know how to program a computer because it teaches you how to think!” And I totally agree with that. It doesn’t matter what job role or what type of company you are interviewing for, you are most likely going to be expected to know how to code. Not only logic, but you should also keep in mind the structure of the code, style of declaring variables, the relationship between data structures, and all those little things which can help a layman understand your code in one go. You can choose programming language of your choice. However, if you don’t have any preference, statistical programming language such as Python or R, and a database querying language like SQL is good to start with.
If you are digging around this topic for a while, chances are you have seen this everywhere. Although, it is not mandatory to learn linear algebra before you start with machine learning, but at some point, you may wish to dive dipper. However, the question remains: Why? Why would a Data Scientist need to know and understand this when there are plenty out of box implementations in Python or R? The answer is relatively simple. At a certain point, it is crucial for your Data Science team to build their own implementations in the house that’s where this becomes handy. Also, it is friendly to MapReduce implementations.
If you are applying to companies which are mainly data-driven, you must be familiar with machine learning techniques. Methods like classification, regression, clustering, and more. It is entirely possible that a lot of these techniques can be implemented using R or Python libraries. That is why it is not necessary to be an expert on how these algorithms work, you should have a basic idea about it. And most importantly you must understand when it is appropriate to use each of them.
Most people don’t realize this but before deriving your decisions from data, you must clean your data. As easy as it may sound but 80% of the time you will be doing this. Often, data is going to be messy and daunting to work with. If you don’t know how to deal with such imperfect data, it can be a tedious task. Some examples include missing values, inconsistent string formatting (e.g., new york vs. New York vs. ny), and date formatting as well. This is the most important skill when you are hired as an early data bird in a small-scale company.
As I mentioned in one of my earlier blog dedicated to Data Visualization, it (data visualization) is Art as well as Science. Art because one must know how to represent the data and Science because one must collect and process, correct and pertinent data for visualization. We, humans, have a tendency to understand things better when we see things visualized. This is incredibly important, especially with young companies making data-driven decisions in the early stage. Your visualizations should communicate, by that, I mean describing your findings to audiences, both non-technical and technical. It is a plus point to if you are familiar with tools for data visualizations.
Certifications are the best way of showing that you know something in particular field. Here I will not talk about from where you should learn all these things, but rather I will list out some important professional certifications which you should take once you possess the relevant skills. Data Science and Analysis career paths and certifications are directly proportional to each other. Note that ‘Course’ and ‘Certifications’ are two different things. The former is useful when you need to learn something, while the following is to show you have learned those skills and ready to jump in the industry. These certificates are in high demand in the industry. It will definitely add a striking star to your profile and will certainly pay off. Let’s get started.
Data Science Using Python: Digital Vidya
“Digital Vidya is doing a great job,” says Akshay Sehgal- General Manager at Reliance, “at bringing data to the rest of the world.” You should take this training to gain a deep understanding of different packages in Python such as NumPy, SciPy, Pandas, and Scikit-Learn. You will be able to implement Machine Learning and NLP models at the end of training.
Location- Online Live Sessions
Duration- 18 weeks
Expiration- Does not expire
Cloudera Certified Associate: Data Analyst
The CCA exam demonstrates your fundamental knowledge as a data analyst, developer, and administrator. Passing this exam and earning this certification will tell your employer that you possess the basic skills required to be a Data Scientist. It can be of your great help if you are just starting out.
Cost- $295 per attempt
Duration- Self paced
Expiration- Valid for two-years of issuance
Cloudera Certified Professional: Data Engineer
Once you pass the CCA, you can move to the CCP exam, which Cloudera endorse as one of the most meticulous and “demanding performance-based certifications.” According to Cloudera, those who are looking to earn their CCP need to bring “in-depth experience developing data engineering solutions” to the table, as well as a “high-level of mastery” of common Data Science skills.
Cost- $600 per attempt- each attempt includes three exams
Expiration- Valid for three-years of issuance
Microsoft Certified Solutions Expert: Machine Learning
Microsoft provides variety certifications which cover IT specialty and skills, needless to say, which includes Data Science as well. However, the one I find relevant is the Machine Learning one. Make sure to check the requirements first if you are going for this.
Cost- $165 per attempt
Location- Online via Pearson VUE
Expiration- Valid for two-years of issuance
Dell EMC Data Science Associate
Dell EMC provides an associate certification that promises a hands-on, practitioner approach in what it describes as the “industry’s most comprehensive learning and certification program.” Once you pass the exam, you’re considered a “Proven Professional” as they say. The Data Science certification path covers advanced level as well.
- Cost- $200 per exam. Also, you need to purchase book or material related to that
- Location- Online via Person VUE
- Duration- Self-paced
- Expiration- Valid for two-years of issuance
Certified Analytics Professionals
CAP offers a vendor-neutral certification and promises to help you “transform complex data into valuable insights and actions,” which is quite what businesses are looking for in a data scientist: someone who not only understands the data but can draw rational conclusions and then express to key stakeholders why those data points are noteworthy.
Cost- $495 of members, $695 for non-members, team pricing is also available upon request
Location- In person at allotted test centers
Expiration- Valid for three-years of issuance
So, these are the certification you should go for in order to kick-off your journey or in mid-career change your track to Data Science. Now, let’s take a look at the job roles available in the industry at the present.
Before starting this category, one key piece of advice I want to give you is, read the job description carefully. Rather than the job title. Let me tell you why. Nowadays ‘Data Scientist’ is often used by hiring professionals as a blanket title to narrate jobs that are completely different. Reading the job description will enable you to jobs that you are already qualified for. And furthermore, develop a specific skill set to match the job roles you want to pursue. A career in Data Science in India comprises five types of job roles. Let’s look at them.
You may find companies where being a Data Analyst is a synonym for being the Data Scientist. This job role mostly consists of pulling out data from SQL, cleaning the data, becoming expert in excel or tableau, and producing basic visualizations. You may take the lead on company’s Google Analytics account. A company like this is the best place for aspiring Data Scientist to learn the basics. Once you are familiar with the day-to-day responsibility, you can explore new things to expand your skill set as well as company’s earnings.
Some companies have a lot of traffic and most importantly a large amount of data, so they start looking for someone to set up their data infrastructure with which the company will move forward. Since you’d be one of the first data hires, expertise on heavy statistics and machine learning is less important than strong software engineering skills. In a nutshell, you will have great opportunities to shine, but there will be less guidance and you may face a greater risk of dropping.
Machine Learning Engineer
There exist companies, for whom, their data or the data analysis platform is their product. That’s where Machine Learning Engineer come in. This is probably the ideal situation for you if you have strong statistics, mathematics and ML background. Machine Learning Engineers focus on creating data-driven products with help of given tools and algorithms rather than developing new algorithms.
Business Analyst is not a mainstream Data Science job role. But, it includes many of the responsibilities as the Data Analyst. Moreover, strong hand on use cases and diagrams will help you secure this post. Understanding the requirements from clients and delivering efficiently the same to your team is the main task of Business Analyst. However, your expertise in preparing reports will give you a plus point.
A Data Scientist is combined of above all. You perform data cleaning, analysis, predictive modeling, visualization and in some cases software engineering tasks as well. Data Scientist is the one hire for any consumer-facing company with massive amounts of data or companies that offers data-based services.
Growth in India and some Essential Tips
If we talk about India in particular, according to a recent Indian jobs study, Data Science is topmost and fastest growing field in India and the relevance of it is increasing in almost every sector. As you can see, analytics jobs in India saw an increment of 200 percent in April 2016 to April 207, compared to 52 percent from April 2015 to April 2016 and 40 percent from April 2014 to April 2015. So, you have the idea that Data Science Career Growth in India is in a hike mode.
About the tips, to begin with, choose the right role. Talk to people from industry, taking mentorship from the same will help you grow. Secondly, choose a tool and stick to it. After that, focus on practical applications and not just theory. Following right resources and people will also help you. Communication is the key. Grow network but do not spend much time on it.
Data is going to stay here for a while. Needless to say, the world is facing a shortage of skilled professionals, having right skill at the right time will give you an advantage in the industry. Always shoot for the moon, even if you miss, you’ll land between the stars. Moreover, I would recommend reading blogs, taking part in competitions, and exploring a little by yourself. Even after reading this you have doubts about how to build a Career in Data Science, feel free to ask us in comments. We will be more than happy to guide you through it.