Shweta Gupta, VP – Technology at Digital Vidya
Data Science is not only interesting and enjoyable but important as well. From its use in healthcare all the way to agriculture, the future will depend on our ability to use data to our advantage. It is the cornerstone of the modern day research, and without it, the whole concept of innovation will slow down. We recently caught up with Avik Sarkar a passionate Data Professional who is very eager to see changes in the Indian Data Science Space.
Let’s dive deeper to know more about his thoughts and experiences in working with Data on a daily basis.
Avik Sarkar has been quite interested in Mathematics and numbers since school days. This lead him to study in related fields of Statistics, Applied Statistics with Informatics and application of data in Computing – through which Avik could explore the various aspects of data and its interaction with Mathematics. This helped him explore roles either as a programmer coding algorithms or now working on the data/artificial intelligence policies. It is important to continue learning new things, being passionate about numbers and make a positive impact on the society.
How did you get into Data Analytics? What interested you in learning Data Analytics?
Avik Sarkar: I decided to study Mathematics/Statistics way back in the mid-90s, when the fields of data analytics, data science, big data, etc. where unknown and not at all popular. I would have worked on the same lines – possibly in research as the commercial aspects of data analytics is very recent. There are many other fields of basic sciences which do not get their due importance and people move to different fields. I have been fortunate to work in the same fields in which I have studied.
What was the first dataset you remember working with? What did you do with it?
Avik Sarkar: I was introduced to the world of data mining and machine learning by my professors at IIT Bombay. The first large dataset that I worked with was the Reuters dataset of the news article. I used this for my master’s thesis on multi-topic text classification. I was again fortunate that my induction to advanced text mining techniques occurred before the “big data” term was coined – unknowingly I worked for several years on large text datasets manually writing code to distribute a large task across multiple nodes. Later during my PhD., I worked on the TREC (Text Retretival Conference) dataset and data obtained by crawling the internet/intranet.
Was there a specific “aha” moment when you realized the power of data?
Avik Sarkar: There are several in the corporates where I held positions. But one work particularly stands out – this was when the Uttarakhand flood occurred in 2013 and I was part of the team from IBM helping in the relief work. We looked at all the mobile phones located at the area where the flood occurred on the day of the incident. And we tracked the locations of these mobile phones that have moved out to safe zones in the following days – indicating the possibility of these people as safe. As the owner details of these phones were known, calls were made to these people to verify their safety. This helped in bringing down the list of missing person and huge saving in time for the relief workers. Though simple, it demonstrated the huge power of using data for positive social impact.
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Avik Sarkar: I have worked across several roles and the time spend depends a lot of the task at hand. Large amount of time is usually spend in designing algorithms to handle a particular type of data. While dealing with noisy data from source, large time is spend clearing the data and making it suitable for analysis. In my current role with the Government of India, I spend a lot of time to envision the role data can play for efficient policy making and governance. We are in the process of coming up with a “National Data Analytics Portal” which would be a one-stop destination for all data about India, including the various ministries and states – with facility for self-service analytics for the end-user.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Avik Sarkar: I firmly believe that the latest trends can be understood by interacting with the citizen/customer sites by visiting the fields. The challenges related to India are quite different and analytics firms in India should focus their efforts towards the same and there is hardly any blog or website that address the same.
For example in India, primary healthcare person in the villages are busy attending to patients and often do not get time to enter the records online – even most online systems are available in English adding to their plight. Development of medical monitoring devices that can directly read the measurement into the database would be of immense help. The healthcare person would be benefited from a system to which he/she can talk in regional language and get recommendation about the patient conditions. Similarly these techniques can be used for translation of books in the regional languages.
Share the names of 3 people that you follow in the field of Data Science.
Avik Sarkar: Few researchers across premium Indian institutes are engaged in some of these research but lot more need to be done. There is a need to find data science heroes in India who are focusing on India specific problems.
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Avik Sarkar: Often we do not have the luxury of choosing the tool we are most comfortable with and have to work with a tool which is available. Commercial tools like SPSS/SAS are quite user-friendly and easy to use and can be easily picked up without much prior experience. In the absence of the same, one has to reply upon tools like R and Python for tasks related to data manipulation and data cleaning.
What are the different roles and skills within your data team?
Avik Sarkar: My current role is in policy think tank working with people of various verticals (e.g. healthcare, industry/SME, etc.) on the role data can play for India’s economic growth and improved governance through real-time monitoring of states/districts on various indicators. This is enabled through a mix of satellite imagery combined with on-ground surveys. We work a lot with external partners for analysis of the satellite images or conducting the surveys
Help describe some examples of the kind of problems your team is solving in this year?
Avik Sarkar: We are working towards greater use of technology to improve governance. These technology include artificial intelligence, machine learning, data analysis, blockchain, etc.
How do you measure the performance of your team?
Avik Sarkar: NITI Aayog works with “Team India” which comprises of the entire cabinet at the centre and the leadership in the states. Combined effort of these parties can lead to the real transformation for India.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Avik Sarkar: Often I meet a lot of data analysts/scientists who get completely focussed on the technology aspect of the solution. It is very important to understand the end objective that the underlying analysis is aimed at solving. An interesting piece of algorithm or an optimized code is not of much value unless it successfully solves the problem on ground.
Data Scientists need to master the art of storytelling with data. This is a very effective way to communicate the findings from the data starting from the high level to the finer insights – this helps in getting a larger group interested about the findings from the data. Intuitive, interactive visualization can often help in the process of storytelling.
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Avik Sarkar: Messy data would remain messy and there is no easy answer to the same. This is primarily because the noise in the data comes from various different sources which is hard to know from before. Over time the data scientist would have got experience dealing with messy data and this experience would help in reducing the processing time in future.
Special characters in text fields often leads to large amount of noise. The type of noise is often due to the difference between the text encoding used across the different systems.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
- Programming and software skills – R, Python, SAS or Excel
- Visualization Tools
- Statistical foundation and applied knowledge
- Machine Learning
It is important to learn multiple tools as one’s favourite tool might not be available always. Simple data manipulation using Python, Perl, R or Shell Scripting is often quite powerful to deal with large amounts of data. Good knowledge of various data visualization tools including the interactive ones help in delivering the data story. Once the data is cleaned and visualization done, there are large number of both open-source or commercial tools to run the algorithms. Depending on the complexity of the task in hand, range of techniques can be adopted such that they capture the underlying relationship of the data.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Avik Sarkar: As the field is becoming very popular, we see a large number of people from different domains coming in the field of data science. This provides an enormous opportunity for the practitioners to cross-learn from the techniques used across these various domains and enrich the field of data science. The field of data science is in the initial phases and this would evolve a lot based this cross-domain interaction.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Data Science is more of an art with a few technology components at the core. Art has no boundaries, so the field of Data Science should be without any boundaries learning from different domains and tools would evolve over time. The most important skill for a Data Scientist is to pick up new ways to analyse the data and gain insights from the same – this attitude of constant learning is very important.
To know more about Avik Sarkar, you can check out his LinkedIn profile.
Are you inspired by the opportunity of Data Analytics? Start your journey by attending our upcoming orientation session on Data Analytics for Career & Business Growth. It’s online and Free :).