Started as a consultant with ZS Associates after the dual degree in mechanical engineering from IIT Madras in 2007 – this is where Naresh Mehta‘s journey started more than a decade ago, became an advanced user of SAS & Excel and realized the power of data.
Moved to consumer behaviour analytics in 2010 when he joined dunnhumby – got into Big Data and advanced analytics (forecasting, classification modelling, unsupervised segmentation etc.) helping clients in retail and banking sectors grow business by driving higher customer engagement. Around 2014, he started working on rapidly growing R & a little bit of Python!
Naresh spent more than 5 years at dh, working across different projects from Gurgaon, London and Edinburgh offices. And eventually decided to return to India in 2015 and join the fast moving and challenging startup ecosystem.
Spent close to 20 months at Limeroad setting up the data sciences vertical for the startup. This was his first brush with real-time clickstream data – the volume, velocity and complexity that defines ‘Big Data’. While as a team they were touching the entire ecosystem, he was focused on customer retention through product personalization and relevant communications.
Naresh moved to Zomato in Mar 2017 where he is now leading the ML vertical of the organization working on challenging projects like recommendation engine, payment fraud prediction, user generated image classification, search/listing algorithms etc.
How did you get into Data Analytics? What interested you in learning Data Analytics?
Naresh Mehta: I was fortunate to have started in a very data centric project at ZS associates which required me to pick SAS and advanced Excel (VBA etc.) I really liked how we could drive major business decisions at times even contradicting the intuitions of business leaders just by harnessing the power of data – it made me feel empowered and I knew I was going to stick to data sciences for some more time to come!
What was the first data set you remember working with? What did you do with it?
Naresh Mehta: Wow – nostalgia, great question 🙂
The first data set I ever worked with was ‘calling log’ of a major US pharma company – essentially timestamp of all the visits a sales rep made to a physician promoting a given drug.
This information (along with physician-level prescription volume data) fed into the modelling exercise (call planning) where we would try to optimize number of times a rep should visit a physician to maximize prescriptions (sales) for the given cost of making those visits – classic regression problem with optimization.
Was there a specific “aha” moment when you realized the power of data?
Naresh Mehta: While I got first hand feel of how useful and powerful data can be at ZS, the biggest ‘aha’ moment/s happened at dunnhumby when I built my first classification model predicting likelihood of a customer to take home insurance – leveraging grocery data for prediction!! Imagine my wonder when I started seeing trends like customer who all of a sudden started shopping in baby aisles have significantly higher propensity to purchase home insurance… data is indeed the oil (and lot more) of this century!
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Naresh Mehta: Tough to have a typical day in a start-up, but let me try!
I continue to be fairly hands on and spend a major fraction (~40%) of my time working with data – analysing the impact of ongoing experiments OR finding gaps in customer journey (clickstream funnel) where we could do a better job through superior algorithms or new product features which then feeds into next series of experiments!
Another 30% of the time is spent in meetings with team members – clearing road blocks, ensuring alignment between data scientists/statisticians and ML engineers/developers who are putting models in production.
About 20% of the time goes in meeting stakeholders from other teams – the work we do directly impacts different business verticals and product features, so alignment with business heads and product managers is critical.
Roughly 10% of the time goes in learning and development – data science domain is evolving very fast and it is very critical to stay abreast of recent developments e.g. if we just look at tree based models, basic CART got replaced by RF which got replaced by GBMs which is now being replaced by ensemble models – all this in last 3 years!
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Naresh Mehta: Lot of great stuff on internet!
- I spend good time across university research blogs and lecture notes/slides if I am looking for hardcore maths behind algorithms (stanford.edu, math.ethz.ch/sfs, ccs.neu.edu)
- Platforms like Medium, KD Nuggets are great to keep abreast of latest developments in the field.
- And of course forums like stats.stackexchange.com, Quora, kaggle etc. to get more specific technical/implementation gyaan!
Share the names of 3 people that you follow in the field of Data Science.
Naresh Mehta: Not consciously following specific people as such, but of course some big names crop up frequently as references – Geoffrey Hinton, Andrew Ng, Prof Sethu Vijaykumar and few more.
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Naresh Mehta: While I personally am a bit old school still using R, SAS and good old Excel for most of the analyses, the team is working heavily on Python and all the powerful libraries that come with it for most of the modelling projects.
What are the different roles and skills within your data team?
Naresh Mehta: It’s a small full stack team comprising of analysts (prelim insights, EDA etc. to help prioritize projects), infra engineer (Scala/spark pipelines), data scientists (maths/stats of the algorithms) and ML engineers (model training at scale, deployment, real-time prediction etc.).
Help describe some examples of the kind of problems your team is solving in this year?
Naresh Mehta: We are working across many problems, listing few of them below:
- Personalized home page and listings in the app (recommendation engine)
- Payment fraud prediction
- Predicting food delivery time by accounting for meal preparation time and travel time
- Classification and quality assessment of images uploaded by users
- NLP on reviews to extract key information from text
How do you measure the performance of your team?
Naresh Mehta: No set template for performance evaluation in this space – what’s critical is that the individual is trying his/her best to push the envelope and raising the bar within the team and organization.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Naresh Mehta: I notice a lot of young ML/data science professionals having a strong bias for execution vs understanding the underlying theory/maths which I don’t agree with.
According to me single biggest requirement to be a good data scientist is strong grasp on the maths/theory of the ML algos, followed by programming/data handling skills to effectively leverage open source packages/libraries, and finally the ability to align cost functions/target variables of the algos with business priorities – building something very stimulating intellectually but not really solving the problem at hand is not very useful!
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Naresh Mehta: Can never overstate the importance of ability to handle messy/noisy data in data science, because that is precisely how the real world data is!
In fact, anyone who has built a successful model would agree that 70% of the effort in modelling goes in feature engineering – ingesting raw data, cleaning it to address outliers/missing data etc., transforming variables to create strong impactful features, and doing all this while keeping the business objective in mind.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
- Programming and software skills – R, Python, SAS or Excel
Best to go with Python (especially for a newbie) – it is more intuitive from coding point of view and well supported by lot of powerful libraries needed in advanced ML.
- Visualization Tools
While plotly, seaborn and shiny are being used extensively these days, important to realize that more than the tool it is the art of knowing what exactly to plot that is more critical. I often get a LOT done just through Excel graphs
- Statistical foundation and applied knowledge
Read – a lot.
Best thing about having learnt the maths of it is that while technology will keep evolving to be able to ingest more data at a faster pace, the core underlying maths will always stay the same – that’s very comforting in this fast evolving domain!
- Machine Learning
Frankly speaking, if you know the underlying maths and have the programming/data handling skills, you already know ML… the 3 sections above essentially are the ingredients of ML!
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Naresh Mehta: With explosion of computation and storage power, lot of new techniques are becoming increasingly powerful e.g. deep learning (most of them based on neural networks), ensembles of boosted trees, SVMs with complex kernels etc. – important to be aware of how the world is changing around you so that you can make the most of it. However, do not let the noise drown out the real deal – the underlying maths of it!
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Good to see Digital Vidya becoming increasingly more involved in covering data science vertical, look forward to collaborate with DV to help shape this industry.
Are you inspired by the opportunity of Data Analytics? Start your journey by attending our upcoming orientation session on Data Analytics for Career & Business Growth. It’s online and Free :).