Shweta Gupta, VP – Technology at Digital Vidya: Machine Learning has been constantly evolving in the Data Science Landscape. It is growing at a rapid pace in both research and industry verticals. However, Machine Learning Engineer has become the best role when it comes to Data Science. When talking about their skill set, Machine Engineers are advanced programmers who perform sophisticated programming and work with complex data sets.
Recently, I got a chance to speak with Sahil Bali, Machine Learning Engineer/Data Scientist of Network Intelligence and spoke at length about his work as a Machine Learning Engineer. It was indeed a refreshing conversation that gave me more insights on how a Machine Learning Engineer actually works. Let’s dive into the details of the interview right away.
Sahil is not from an IT/CS background. Rather, he completed his B.tech in Electronics & Communication Engineering. Initiated his career as HR Recruiter, while working for some time in the Recruitment industry, started his preparation in Machine Learning with freely available resources, lately completed one paid course as well. He has proficiency in all aspects of Data science, initiating from Identifying the Problem, Data Understanding and Preparation, Modeling, Evaluation and Deployment. Moreover, he is an adaptable Machine Learning engineer in building Machine Learning model from scratch and deploying it using API in python.
How did you get into Data Analytics? What interested you in learning Data Analytics?
Sahil: After practised hands-on side projects in data science, I felt lucky to teach the same to data science enthusiasts, which helped in many different ways.
I believe understanding, analysing and coming up with an optimized solution for the problem is one of the core skill with an engineering background. Yes, I have worked towards understanding data specifically and on a side note – “ how to torture the data to get maximum insights out of it ☺”
What was the first data set you remember working with? What did you do with it?
Sahil: Technically Iris dataset, where built Logistic model on the top of it. But I strongly feel in daily life we generate dataset in mind(day to day experience), on the basis of which we predict – Traffic jam during travelling, the presence of traffic police(especially when we don’t wear a helmet) and so forth.
Was there a specific “aha” moment when you realized the power of data?
Sahil: A lot of them, most of the time we don’t recognize as that is what we are doing from years, like traffic jam prediction (and likewise) because that saves the time which is most precious. Officially when I built the ML model to integrate into our product, which was typically detecting the malicious/Phishing cases in cybersecurity.
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Sahil: Normally every day is a new opportunity to learn in the startup environment. Though day starts with catching up on emails, social media, and news while travelling to work (then ignore social media and news while at work).
During the work, we normally do things from scratch level which is good for learning and dedicatedly follow Research and development lifecycle for building products. Also, there are 3-4 meetings in a day to discuss the idea, problem statements, and optimized solutions available with other members of ML team as well as other product development team(s).
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Sahil: I dedicatedly follow Twitter feeds from different blogs, Machine Learning advocates, experienced data Scientists, ML Researchers that way don’t need to check every other website, whereas everyone is pretty active on Twitter whether it’s professionals or famous blogs. Also, wrote python script which saves all the updated tweets and as I don’t afford to miss any of them.
Share the names of 3 people that you follow in the field of Data Science or Big Data Analytics.
Sahil: I personally follow everyone on twitter which includes mostly Data scientists, ML researchers and many leaders. Also, I would suggest being active on Twitter, LinkedIn.
(i) John Myles White– Engineering Manager at Facebook
(ii) Rachel Thomas – co-founder of fast.ai and professor
(iii) Rajat Monga – Engineering Lead for Tensorflow at Google Brain
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Sahil: Mostly I have automated my work, like looking for research papers and related data science Task.
Specifically,
(i) Python – which includes Pandas, scikit etc for data science tasks.
(ii) Frameworks – Tensorflow, Keras, Pytorch
(iii) AWS Cloud, Docker
Rest of my team use different tools, but one thing which is common in all of us is executing tasks from the command shell ?.
What are the different roles and skills within your data team?
Sahil: Team includes a variety of roles like Data Engineer, ML Engineer, Elastic search Engineer, DevOps Engineer with respective skill sets
Help describe some examples of the kind of problems your team is solving in this year?
Sahil: We are in the Cybersecurity market, building products which solve a variety of problems using raw logs and use ML/DL models to automate & built intelligent systems to solve typical domain related Issues.
How do you measure the performance of your team?
Sahil: With productive and optimized solutions to a number of problems, that being said includes personal growth which extends to time management, learning new skills, offer training to other office members and so forth.
Industry Readiness for Data Science
Are the industries looking to understand what they can do with data? Do they have the required data in place?
Sahil: I would say, Absolutely in cybersecurity sector professionals has already implemented many different applications, also coming up with new ideas to implement ML/DL use cases.
As we all know data is in huge amount, and now as a data scientist, we need to decide what is the required dataset within the Big data market, which solves the problem.
Which are the top 3 problems that are on top of the Data Science, either based on industries or based on technology area?
Sahil: In cybersecurity,
(i) Phishing/Malicious detection
(ii) Data Protection
(iii) Anomaly detection
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft skills that are needed for Data Analysts and Data Scientists?
Sahil: Joy in learning new skills at a good speed is the only factor which is required, all other skills like Deep Learning, statistics, communication, visualization, big data and other core concepts can be learned over the time. Why speed is important? here Data science is the everchanging field, every day we can find lots of different research papers on http://www.arxiv-sanity.com/ and so speed is always good to have.
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Sahil: Totally! Dealing with messy data is an important skill, as 80% of the time in data science is spent to structure the data set for modelling which may include normalizing the fields, removing outliers and formatting the data and so on.
Addition to that domain knowledge comes handy while solving problem and data visualization is an important tool to understand data or explaining results to the stakeholders/customers.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
Sahil: This is something if you can understand at a good level, getting a job is not a problem. Let’s see dig into it.
(i) Programming & Software Skills – R, Python, SAS or Excel – I would say start with the tool which solves your problem, I’ll try to put some light on it. Python is language which is good for everything may be web development, data science tasks, scripting, developing API and likewise. R language is always the first choice of statistical tasks. SAS and excel are again dedicated to data analysis.
(ii) Visualization Tools – Here we have many different options if we talk about just the python, start with seaborn, matplotlib and so on, What matters is your problem statement.
(iii) Statistical Foundation & Applied Knowledge – It’s a must skill to really be a data scientist, where start with basic concepts like probability, differentiation, and so on to get the idea so that understanding Deep learning or Machine learning wouldn’t be hectic.
(iv) Machine Learning – Having Knowledge of Supervised Learning and again filter to classification model would be a good start, which includes Logistic Regression, Gradient boost, Decision Tree, Random Forest, Xgboost Algo and so on.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Sahil: Learning a new tool is normal behaviour in data science, but having a strong foundation in:
(i) Statistics and Probability
(ii) Programming which includes OOPs concepts, SDLC, Algorithm and data structure and so on
(iii) Machine Learning Algorithms
(iv) Deep Learning
These 4 fields will make difference between good, better and best. That being said, there are many other factors as well like Communication, Time Management etc, but that is something non-technical factors and plays important role in almost every job.
Would you like to share a few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Sahil: Digital Vidya is guiding data science enthusiast and talent to be on right track with high-quality online course content, articles and blogs of experienced professionals, which is very useful. Data Science is a field which will be applied almost in every domain to build intelligent systems, automate the process and so on, which results in high demand and so Digital Vidya platform helping talent to full fill the gap between talent and demand. Cheers!
Are you inspired by the opportunity of Data Science? Start your journey by attending our upcoming orientation session on Data Science for Career & Business Growth. It’s online and Free :).