Ramasubramanian Sundararajan graduated with an undergraduate degree in Information Systems and then enrolled in a doctoral programme at IIM Calcutta. At the time, his interest was in the area of computer networking; however, he found himself more gravitating towards data science after having taken a course on neural networks and their applications to finance. That was back in 1998. Twenty years on, Ramasubramanian is still a neural net junkie!
His professional career started in 2003, with GE Global Research. He spent 11 years with GE, working on data science problems across multiple industry domains — finance, healthcare, energy, aviation etc. After that, he spent three years doing travel analytics at Sabre Airline Solutions. For the past year, Ramasubramanian Sundararajan has been working with Cartesian Consulting, heading up their AI practice.
What was the first data set you remember working with? What did you do with it?
Ramasubramanian Sundararajan: My first vivid memory is that of predicting US Dollar to Swiss Franc exchange rates using a combination of two neural networks. My classmate and I had read a paper on combining multiple predictive models using something called the Dempster-Shafer Theory of Evidence. It sounded quite fascinating, so we used it to build an exchange rate predictor. It was, more or less as expected, an utter and complete failure. Honestly, I don’t think we knew what we were doing. But the experience of coding a neural network from scratch was very educational.
Was there a specific “aha” moment when you realized the power of data?
Ramasubramanian Sundararajan: My “a-ha” moment came when I read a book, actually. It was called An introduction to computational learning theory and the preface began with the line, “Computational learning theory is a tentative attempt to build a mathematical model of the human cognitive process”. The book laid out some of the theoretical foundations of machine learning, or learning from data, the concept that underpins much of what we refer to today as AI. It was while reading the book that it really struck home that so much of what we understand as human beings is a matter of learning patterns from data.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Ramasubramanian Sundararajan: I subscribe to some professional groups and follow some accounts on LinkedIn and Twitter — this is often where I come across articles and links to recent work across the world. A lot of the rest comes from discussions with peers, attending conferences etc.
Share the names of 3 people/publications/research that you follow in the field of Data Science or Big Data Analytics.
Ramasubramanian Sundararajan: Too many to list, actually! It used to be easier back when there were only a few major avenues where people published their research. Just read a couple of academic journals and browse through a few major conference proceedings and you got a sense of what people were working on. Now, everywhere you turn, you find someone doing something interesting.
Team, Skills and Tools
Which are your favorite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Ramasubramanian Sundararajan: Much of what we do today, by way of data manipulation and model building, happens in either R or Python. The choice depends on the nature of the problem, availability of libraries and contributed code, and the comfort level of the programmer. SQL for basic data access, and comfort with some visualization library/software is key, and is more or less taken as a given.
What are the different roles and skills within your data team?
Ramasubramanian Sundararajan: I head a lab whose principal mandate is to help build AI-enabled products and solutions, and my team is comprised of people with great skills in building machine learning models for a variety of problems and data types (text, images etc). There are also a number of people within the organization with a very good background in AI/ML/statistics, so we work together on a number of problems. We also collaborate with other teams within the organization with strong engineering skills — for complex models to be deployed at scale, good engineering is a must-have.
Help describe some examples of the kind of problems your team is solving in this year?
Ramasubramanian Sundararajan: Our big challenge this year is in the CRM space — helping organizations communicate with their customers in a relevant and timely manner. This includes building effective recommender systems, designing good CRM governance systems, and optimizing the content of marketing communications.
How do you measure the performance of your team?
Ramasubramanian Sundararajan: Their ability to do the right thing, and do the thing right. Doing the right thing means identifying the right problem to solve, understanding its business impact and knowing how to measure success. Doing the thing right means picking the right tools for the job, knowing the difference between cool and useful, seeing roadblocks ahead of time, and building things that will last.
Industry Readiness for Data Science
Are the industries looking to understand what they can do with data? Do they have the required data in place?
Ramasubramanian Sundararajan: More and more organizations are getting the data infrastructure in place. There is also an increasing awareness of the value of unstructured, hitherto unexplored data sources (free text, images etc). The extent to which the power of this data is harnessed through analytics varies across organizations. There is considerable curiosity about things like AI, but marrying complexity with business value is still a challenge in most places.
Which are the top 3 problems that are on top of the Data Science, either based on industries, or based on technology area?
Ramasubramanian Sundararajan: There are so many of them that it’s tough to pick just three. But here are a few, off the top of my head:
- Combining structured and unstructured data to get insights. It’s not just organizations that have silos; it’s also models. Insights gleaned from text repositories within organizations can be contextualized using structured data residing in databases, and vice-versa. However, this happens a lot less than one might imagine.
- Improving the quality of healthcare through machine learning. This includes computer-aided diagnostics, better algorithms to sift through patient data of various kinds to derive useful insights etc.
- Analytics for the Internet of Things. Pretty soon, sensor data is going to dominate all others in terms of sheer volume and velocity. There’s a lot to learn from it, and we’ve barely scratched the surface.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Ramasubramanian Sundararajan: It is said that data scientists require three sets of skills – modeling, programming and domain knowledge. All three are equally important in order to be effective. The key, in my opinion, is getting the basics right in all three. A sound knowledge of basic statistics, a good understanding of algorithms, the ability to translate a business problem into the right technical problem, and above all, the ability to ask critical questions to understand the value of analytics in a given business setting. A lot of data scientists tend to try and cram as many tools as possible into their toolkit — the true test of their effectiveness is really their ability to pull out the right tool for the right occasion.
As for soft skills, there are many. Listening well, learning to ask why one should solve a problem before answering how to solve it, learning to communicate well so as to convince decision makers about complexity, and so on.
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Ramasubramanian Sundararajan: The knee-jerk answer is that aspiring data scientists should spend a lot of time with messy data, so that they learn how to deal with real world problems effectively. It is indeed required. However, it is a good idea to learn the various modeling techniques first using simple, clean datasets so that one gets the hang of the techniques and their intuition, and then graduate to messier datasets where things are not so obvious.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Ramasubramanian Sundararajan: In many ways, it has become easier to introduce complexity in the kind of models we build. The easy availability of libraries that help automate a lot of what we used to have to code manually, pre-trained models and transfer learning mechanisms that allow us to learn implicitly from other problems and datasets than our own, mechanisms to parallelize large problems with relative ease etc. But complexity always comes with a big bill — the ease of introducing complexity in models only serves to emphasize the hidden costs of complex systems.
For data scientists, this poses some interesting challenges. There’s a term one finds in the neural network literature: the stability-plasticity dilemma. Simply put, it is the ability of intelligent systems to be plastic enough to adapt to new information but stable enough not to forget the old information entirely. This is what data scientists face. They need to stay abreast of recent developments and yet assimilate the new knowledge organically and not get carried away, to use automation tools wherever possible and yet find inventive ways to use domain knowledge and tradecraft to improve upon their results.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
As the availability of data and tools to mine actionable insights thereof increases, having data scientists as well as data science-aware decision makers will become the norm at all organizations. But this whole edifice works only if the talent pipeline is strong, and this is where I feel organizations like Digital Vidya have an important role to play.
To know more about Ramasubramanian Sundararajan, you can check out his LinkedIn.
Are you inspired by the opportunity of Data Science? Start your journey by attending our upcoming orientation session on Data Science for Career & Business Growth. It’s online and Free :).