Manu is an engineer from IIT Delhi and has a management degree from MDI, Gurgaon. During his MBA days, financial engineering really interested him and that’s where he got interested in number-crunching. From Campus, he got a job in American Express Risk Analytics Center in India, followed by Accenture Digital practice before co-founding FN MathLogic Consulting Services Pvt Ltd.
What was the first data set you remember working with? What did you do with it?
Manu Chandra: I remember working with Fraud Detection dataset. I was tasked to create business rules based on the dataset to find out potential fraudulent credit card transactions
Was there a specific “aha” moment when you realized the power of data?
Manu Chandra: Yes, my “aha” moment was when I realized the analytical solution that I developed was put in production leading to +500K $ savings per annum. That’s when I realized the power of data and it’s the potential impact on real-life business decisions.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Manu Chandra: I diligently follow analytics news on the web for Deep Learning (my area of Interest). One of the key places that I look for the latest research in my areas of interest is Arxiv.org.
Share the names of 3 people/publications/research that you follow in the field of Data Science or Big Data Analytics.
(iv) Yousha Bengio
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Manu Chandra: My number one choice of tool is Spark for data munging and python (TensoFflow/pytorch) for modelling. Our team also leverages the same.
What are the different roles and skills within your data team?
Manu Chandra: We have only data scientists in our team with a combination of quantitative, data engineering, modelling and business analytical skills.
Help describe some examples of the kind of problems your team is solving in this year?
Manu Chandra: We are doing very exciting projects this year leveraging Spark, Machine Learning & Deep Learning. Some of the projects where we have used advanced techniques to derive additional value for clients include text classification, ATM cash dispense forecasting, credit score cards, predicting health claim outcome and optimizing marketing spend. We have also developed a methodology and tool to convert SAS codes to Spark (and successfully implemented the same with a client getting 100X execution speed improvements).
How do you measure the performance of your team?
Manu Chandra: Apart from client projects, we also expect our team members to spend significant time (up to 40%) researching on the latest research and finding applicability on real-life datasets. This is an important activity that allows us to add value to our clients. We measure our team performance basis how much our projects had a positive impact on client’s business as well as their ability to work with the latest research (i.e. being able to understand research papers and adapting them to real life datasets). We do not follow any bell curve so all our team members can be high performers.
Big Data Team, Skills and Tools
In the huge Big Data landscape, the skills are swiftly changing. Which is the technology do you see dominating in the ETL data space and real time?
Manu Chandra: In my assessment, Spark is going to dominate the ETL space in the next few years. Given that it is opensource, it is expected to get integrated at the back end of more and more tools/ analytical ecosystem. At MathLogic, we use Spark for most of the data-engineering work. The race for real-time analytics tool is still evolving and there is no clear winner yet. Apache Kafka along with Spark streaming seems a good bet at this point but there are other contenders like Pulsar, NiFi, Flink that are good for specific use cases.
How do aspiring Data Engineering demonstrate their capabilities of handling the tool, technology, data and domain? Is Certificate (Cloudera/Hortonworks) a clear differentiator?
Manu Chandra: Ability to use big data tools is currently important for Data Engineers as amongst other things this skill can be assessed objectively. In my opinion, a good data engineer needs to be able to work with a variety of tools. Understanding of data and domain will remain important even as the ability to use the tool may become less important as tools will start to come with more advanced GUI/ drag and drop options/ automation. The above certificates are good for system administrators, not necessarily data engineers.
Are Analytical skills, Statistics, Machine Learning must have or good to have skills for Data Engineers?
Manu Chandra: I believe a Data Engineer should invest in developing analytical, statistical and machine learning skills over time. This will help them in developing engineering solutions which can be integrated quickly with analytical solutions and can also help them graduate to a data scientist role.
Industry Readiness for Data Science
Are the industries looking to understand what they can do with data? Do they have the required data in place?
Manu Chandra: In my opinion, firms in almost all industries have been bombarded with AIML hype and are looking/ have a mandate to adopt AI/ data in decision making. However, most are still figuring out better ways to get the right data for the appropriate business problem. A number of them don’t realize that in order to leverage cutting edge techniques they need to have a data strategy along with a data collection and validation mechanism in place. Having properly labelled data is also a big challenge for most firms in India.
Industry Readiness for Big Data
Is Big Data becoming a reality in the industry beyond the social giants like Facebook, Google, Yahoo? If yes, which industries are actually moving towards the power of Big Data Analytics? If no, what is the outlook for adoption?
Manu Chandra: Companies which were already storing a lot of structured customer and internal data e.g. financial services, telecom, organized retail are the ones moving first towards big data analytics. Having consumed the traditional data sources these industries are moving towards utilizing alternate data source (satellite, social media and sensor/ device data). New age players and first movers in manufacturing and agriculture have also started leveraging the sensor and satellite data though we are far from industry-wide adoption. Comparing globally, the healthcare industry in India is lagging far behind in adoption. Similarly, government and utilities have a long way to go.
Name 3 Industries and the kind of problems that they are solving using Big Data.
(i) The unsecured lending industry is trying to solve the issue with assessing first-time credit seeker’s creditworthiness using alternate big data sources.
(ii) Startups are trying to address the issue of unpredictability in Agriculture through high-resolution satellite/ drone images.
(iii) Globally, the healthcare industry is using big data/radiology images for diagnosis.
Who in the Industry is your typical client for Big Data? Is it the CTO, CIO, CMO or special data leaders?
Manu Chandra: We prefer to work with business stakeholder as we strive to provide end to end solution to a business problem. So we work with CMO/ CRO/ CEO/ CIO/ CAO as appropriate.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft skills that are needed for Data Analysts and Data Scientists?
(i) Ability to understand the business problem and converting it to an analytical solution
(ii) Coding skills
(iii) Deep understanding of Algorithms
(iv) Ability to present the analysis
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Manu Chandra: It is very important for any aspiring data practitioner to work with messy data as real-life data is always messy and noisy.
The two other areas where one needs to focus on is the ability to understand/formulate the business and apply the right analytical technique.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Manu Chandra: I see lot of productivity benefits coming from the use of sophisticated data engineering platforms (example SPARK) and in data modeling with the use of AutoML techniques. Another thing that will continue is the rapid pace of changes in Deep Learning algorithms. A data scientist should develop an ability of “How to Learn”, meaning being able to follow and new research in their own use case.
Big Data Solution Space
What is the kind of structured and unstructured data companies have? What is the size that we are talking about?
Manu Chandra: In terms of structured data the credit bureaus in India have several years historical data about billion plus financial trades, telecom company user base and CDR’s (Call details Records) also run into billions. Large e-commerce companies also have big data. It is easily Petabytes of data that we are talking about. On the unstructured front one of the largest data would be satellite image data. Device/ sensor data volumes can also be very big. Of course, social media feeds are gigantic but companies are typically filtering and processing a very small portion.
Are there legacy systems that are being replaced? If yes, which legacy skills are being replaced?
Manu Chandra: A number of legacy systems are being replaced by more agile web/ cloud-based counterparts. Sticking to analytical systems there is a trend to move towards open source systems (Spark/ R / python).
What is the size of clusters/environments that are being deployed for the clients? What are the production challenges?
Manu Chandra: Size of environments depends upon the scale and also whether a common environment is leveraged globally. I have worked with clients with a global cluster having ~1000 modes, more than 3000 cores and 20TB of RAM. One of the production challenges is that the modern software have much more frequent releases (3-6 months) cycle and at times without backward compatibility. This is a nightmare scenario for IT support teams. Another production challenge is that need for computing power, RAM and storage is growing rapidly. In a way, systems are always in flux and do not settle down.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Manu Chandra: Analytics industry in India is undergoing a tremendous change and players like Digital Vidya are playing an important role in the transition directly and indirectly. Given that industry in India is lagging behind global counterparts (especially China) we need more such forums educating the industry, holding hands and supporting with talent requirements.
Are you inspired by the opportunity of Data Science? Start your journey by attending our upcoming orientation session on Data Science for Career & Business Growth. It’s online and Free :).