Vijay Srinivas Agneeswaran started writing software to parallelize and distribute a scientific job across a cluster of nodes in the classical cluster/grid/peer-to-peer computing mode during his doctoral work at the IIT Madras. Vijay’s doctoral thesis turned out to be precursoror to the Cassandra kind of large scale fault tolerant distributed data stores – it combined the best of unstructured P2P systems and Distributed Hash Tables (DHTs). He subsequently conceptualized and developed software to distribute data across data centres with constraints such data centre layouts, networking/power constraints and business SLAs such as Recovery Point Objective (RPO) which tells up to which point in time can we recover or Recovery Time Objective (RTO), which tells how quickly we can be up and running. Vijay started researching into Big Data / Cloud Software and headed-up teams at Cognizant/Impetus where:
- my team created software for managing/searching through large documents.
- parallelized and implemented machine learning algorithms on top of Apache Spark and Storm.
- Built fine-grained role-based access control inside of the Hadoop Distributed File System (HDFS).
- Built (what was then) the first distributed deep learning network on top of Spark.
Subsequently, at SapientRazorfish where Vijay heads the Data Science team out of India, he is responsible for technically spearheading all Data Science work including the talking to businesses and solving their problems using Machine Learning/Data Science/AI as necessary.
What interested you in learning Data Analytics?
Vijay Srinivas Agneeswaran: I started looking into data almost 2 decades back – starting from my doctoral work and building distributed storage systems for Oracle or building large scale document search systems or analysing large data sets using algorithms or building value for data that our clients have. It was clear that data was going to be central to many businesses and the ability to pose/answer various questions on data was important. This is what made me learn data science.
What was the first dataset you remember working with? What did you do with it?
Vijay Srinivas Agneeswaran: It was a document categorization problem we were solving. The training data set was a large collection of web pages and a label to indicate the category they belonged to – a supervised learning problem.
Was there a specific “aha” moment when you realized the power of data?
Vijay Srinivas Agneeswaran: It gave great pleasure to see that the machine was able to categorize even new pages, which it had not seen as part of the training data set.
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Vijay Srinivas Agneeswaran: As the Senior Director of Technology and heading the Data Science team out of India, I am responsible for technically spearheading all Data Science work which can be summarized as understanding businesses and solving business problems using Machine Learning/Data Science/AI as necessary. This includes initiatives internally code named as “cognitive commerce”, “cognitive content marketing” etc. In addition, as I have a strong engineering background, I help in modernizing the engineering teams to make sure they use state-of-art technology/architectures/algorithms. The last hat I wear is to spearhead all next generation work on data – this includes all exploratory work such as handling sparsity in recommendation systems, creating editable BlockChains for say ensuring GDPR compliance or other applications.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Vijay Srinivas Agneeswaran: Data Science Central is one place that has some useful blogs/links. I also follow some of the top people in the area. Since I am on the PC of Strata and other top conferences, I get to see the best ideas coming up and this helps me keep up on the latest trends. I also attend the Strata conferences (those where I am giving tech talks) – Strata is the best place to keep track of trends.
Share the names of 3 people that you follow in the field of Data Science.
Vijay Srinivas Agneeswaran:
- Ben Lorica, head of Data Sciences for O’Reilly publications
- Kirk Borne, head of Data at BoozAllen
- Hillary Mason, VP of Data at Cloudera
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Vijay Srinivas Agneeswaran: Python, Spark/Hadoop, TensorFlow, Kubernetes, Jupyter etc. We have also used and are familiar with R, SAS, Keras, Kaffe, plus the cloud tools such as Google ML or Azure ML along with H2O.ai etc.
What are the different roles and skills within your data team?
Vijay Srinivas Agneeswaran: We have Data Scientist roles at various levels starting from Associate DS, Senior Associate or Senior Data Scientist, Master Data Scientist, Principal Data Scientist, Chief Data Scientist etc.
Help describe some examples of the kind of problems your team is solving in this year?
Vijay Srinivas Agneeswaran: Recommendation systems, product classification automation and demand forecasting for commerce or retail clients, financial product recommendations based on goals for financial institutions, credit risk prediction for banks, email classification/bucketing, sales forecasting and churn analysis for telecom clients, etc. We are also exploring next generation work on training deep learning networks with insufficient data, creating editable BlockChains, etc.
How do you measure the performance of your team?
Vijay Srinivas Agneeswaran: Client impact is one important measure. Functional measures like data science models built, deployed in production and business impact of data science work is also measured.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Vijay Srinivas Agneeswaran: Sound fundamentals, urge to learn are the foundational soft-skills that I would consider are mandatory. Math fundamentals, as well as strong programming skills (preferably in Python), are the other top technical skills I would focus on.
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Vijay Srinivas Agneeswaran: Data Munging/Wrangling is very important. Most client data or actual data is messy and noisy. So, it is extremely important to look at Data Munging, which is why we prefer to use Python – libraries are awesome for Data Munging.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
Vijay Srinivas Agneeswaran: Go learn fundamentals. Start programming. Start participating in Kaggle kind of competitions. Start following the right data scientists. Be motivated.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Vijay Srinivas Agneeswaran: The following are my top areas to focus on for the next year or so:
- Model management, including things like Hyper-parameter tuning, auto ML, model lineage, model dashboard.
- Model interpretability or explainability – ability to reason and explain why a certain result has been arrived at. For instance, see Skater.
- Continuous integration in DS. For instance, see Jarvis kind of framework.
- Parallelization frameworks such as Ray.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Vijay Srinivas Agneeswaran:
I think the idea of looking at some of the top data scientists and making them answer detailed questionnaire is quite useful and could inspire many people.
To know more about Vijay Srinivas Agneeswaran, you can check out his LinkedIn.
Are you inspired by the opportunity of Data Analytics? Start your journey by attending our upcoming orientation session on Data Analytics for Career & Business Growth. It’s online and Free :).