Below is a transcript of a wonderful discussion between Shweta Gupta, the Vice President of Technology Vertical and Kunal Jain, the Founder of a famous blog called Analytics Vidya.
Shweta Gupta: Kunal what I know of you is from your profile really, you dissed aerospace engineering and then you somewhere dived into data analytics and you founded analytics with them. So, you can help me bridge this journey, aerospace engineering to data science.
Kunal: Yeah, to be honest, when you look at it in the rear-view, it looks disjointed but to be honest, computer science and coding was always enticing. I remember making a software for my dad back when I was in class 11. I was studying C & C++ back then. So, computer coding and general computer nerdiness were always there at least to some extent. But as a part of my degree, I studied aerospace engineering. So, my bachelor’s and master’s both were in aerospace engineering.
And then a change happened when I was kind of passing out from the College. The career options which I had in aerospace at that time were kind of limiting especially in India and which essentially meant that if I would need to go out of India to pursue my career in aerospace, I would need to do further studies which I wasn’t prepared for. So, I was looking for options and then Capital One came to the campus at that time. When I inquired about the job profile, it looked interesting. When I talked to people, they discussed some of the things which I had done as part of my master’s thesis, some of the underlying algorithms. So, within aerospace, my specialization was in computational fluid dynamics. Some of the algorithms which I was working on were being used at Capital One as well. It looked interesting that you’re not only using your mathematical skills, but you’re also making a business impact. That’s how I joined Capital One back in 2006 and that was when I got exposed to analytics and data science as we call it today. So, I spent about 3 and 1/2 years working in Capital One across roles of risk management customer management to customer acquisition.
Then in 2010, I decided to come back to India because I wanted to be in India in the long term. So, and at that time, I came across an opportunity with Aviva life insurance. So, I joined Aviva. We founded the team which then scaled from a single person to a 20 member team by the time I left Aviva. But what happened during this period was that the need of having our data science ecosystem kept coming back again and again. Just to give you examples, if I was building models for let’s say customer retention. One of my friends called me one day and asked if I have included any macro economical variables available in persistence modelling and I said no, but that sounds like an interesting idea. People who are actually doing these discussions in their micro circles, I would call up my batch mates when I needed any help. And that need kept coming back again and again in various shapes and forms. Then in 2013, I said that while I don’t know how to solve the larger problem, let me start sharing my perspective on a blog and that’s how Analytics Vidya started.
In April 2013, I booked the domain and I started the blog based on how I was seeing analytics evolving at that time. This was a very early stage for Analytics and Data Sciences we know it today. At that time, machine learning was not as a bigger focus area as it is today but I could still see a lot of impact. I continued analytics along with my job for about a year before I quit Aviva and started working on Analytics Vidya full-time.
Shweta Gupta: I think what I really caught very well in your answer was that the business requirements and the needs were driving people to discuss and you could see that potential. So, it was in Mackenzie’s report that data science is hard.
Kunal: It had nothing to do with Mackenzie’s report and to be honest, till about a year, or a year and a half into Analytics Vidya, I wasn’t even sure how big the market is or how big the community is. I very clearly remember my last day at Aviva and the first time we got some 8,000 visits a month on the blog. So, one of my colleagues said that it’s good to have a personal blog but how big do you think this can become and at that time my answer was that if this becomes 4x of what it is today, I will be happy. So, I did not know how big it would be or know how much impact it would make. But what I could see was people who were following us were really liking what we were doing. The kind of content we were putting out there was really helpful to people. So, I think that was the main reason. I mean it was more of God’s decision.
Shweta Gupta: God’s decision for a data scientist. That’s a good statement to start with. Considering that it was an initial sharing and content focused, how do you see Analytics Vidhya today. What are the main offerings, what are your top three, if you could help summarize that?
Kunal: What we are building is basically a knowledge platform for analytics and data science professionals. The idea is to create a community where people come for sharing knowledge and then while they share knowledge or they engage on the platform, they create knowledge or regain knowledge. In terms of top priorities clearly building community is the core of whatever we are doing and the community today is getting multiple initiatives of multiple engagement points. We have a vlog which gets very vital, we have discussion portals, we have meetups, we have seminars, we have workshops, later this year we are doing a conference. So, really building a community of people and addressing their needs across the board is what Analytics Vidhya stands for. Now within this, there are some initiatives which we see really working well so hackathons, for example, get a lot of attention from the community. I think it’s a brilliant way to apply and test yourself and learn in the process.
Hackathons get a lot of attention, the conference which is coming up later this year is going to be a unique conference in itself in India. So, that’s how the portal is shaping up but the way I would look at it in long-term is basically, help building a professional identity for every data scientist so tomorrow you can go out and say this is how I look on Analytics Vidhya. These are the computations I have participated in, these are the articles I know or I’ve read and these are the webinars or events which I have been part of. So, that starts telling people who you are as a data scientist and then it should also help us make things a lot more personalized for the community.
Shweta Gupta: You are using a much broader ecosystem as your objective because clearly, having something based out of our country and something more local is aspiring for some people in the industry. I think it makes knowledge sharing and contextual knowledge base much easier otherwise if everybody is only focusing of the gathering of the world, then I think it’s like all the competition there, all the data sets there come from the miniature markets, I think they come from the Western world to say.
Kunal: That’s cool. In fact, the way we look at competitions is actually very different compared to what kaggle looks like. So, if you look at a typical kaggle competition, there would be a bunch long competition with a huge prize money and a really difficult research problem out there. This is great for the top data scientists, which is great for a matured market, but in India or in developing markets, things are very different. Most of the companies in India would be happy to have a quick model in a few days and then go out and implement that. That’s why our hackathons tend to be of short duration. Hackathons get a lot more intense and a lot more activity going on around that.
Shweta Gupta: If you can talk about one of your hackathons and particularly maybe if you’re okay with using the example of fractal analytics because the job descriptions were listed out, the hackathon was one.
Kunal: Let me start with how does hackathon as a product stand in the industry and what are some of the really unique aspects of that product. If you look at the way companies used to interact with people traditionally, for example, if a company wants to hire people, they would go through traditional recruitment channels. So, they’ll have their own consultants finding out people. That’s a very really expensive and a time taking process. Then a lot of times people who are searching don’t really understand data science, they don’t really appreciate the difference between various technologies. It becomes really difficult to hire people through the traditional channels especially in the markets which like analytics and data science. What hackathons do is basically they turn this model completely around. So, hiring is just one use case but hackathons have multiple use cases so you can use them for crowdsourcing solutions for example. But what they do is they open up a platform at a huge scale so thousands of people can simultaneously solve a problem or compete for a recruitment position and all of that happens within a few days. For example, in case of tractor luring and some of the previous hiring hackathons, we have seen thousands of people competing for various positions and then getting recruited within a month compared to the normal hiring cycles. So, that’s where you say you’re doing skill-based hiding, you’re doing a lot of hiding in a very condensed form or if you’re solving problems, you get the best solution out of the best data scientists across the globe. So as a product, hackathons turn a lot of things completely around and it’s a unique proposition. For people who are participating, they can have much more meaningful discussions. So, the companies whom they are interacting with are convinced about this skill set by the time they go and interact. They can spend their time understanding what the role is, what the team does, what are some of the problems they are facing rather than going through the traditional technical rounds of interviews.
So, that’s where we see a lot of attraction from our clients, from our community. Typically, in form of how hackathons happen, for example in case of fractal, we understood what the requirements were. They posted it out on the website that these are the positions for which the company is hiring and interested people then could fill in their interest. On the day of competition, they get a problem statement and all of these problem statements are from live industry problems so all of these are real-life use cases which have been faced. Actual data scientists would be working on these problems in their day-to-day roles. Then, that’s where the beauty of the thing comes. So, when people look at the solutions, they can relate to what the person has done. All the solutions which come, once the competition starts, people can download the data and people can read the problem statement and then they start uploading their solutions and they try out various tools, techniques. In terms of tools and techniques, the stage is widely open people can use any tool be it R, be it Python, be it SAS. People you can use any technique. People use a range of techniques starting from simple linear logistic regression models to all the way up to boosting algorithm. It’s Z boost, CAD boost or now even neural networks and deep learning. You see a whole range of solutions coming out and then in some cases, there are solutions which are very specific to industry problems.
This kind of exposure even if you are someone is new to the industry is something which is really unique. If you think about it as an end user, I’m working on the same problem on which thousands of other data scientists are working and the beauty in the community set up is that people share their solutions very openly. For example, if I build a solution. I rank 100 in the leader booth and would post it on this light channel and say that this is a script which will take you to rank 100, go on and improve on this further. You get this pooling. I have seen people that becoming better data scientist just by the fact that they religiously come to these hackathons and so people come and in a period of two, three days, they learn so much which they wouldn’t have learned in their jobs or through a course. In that sense, hackathons provide this really unique opportunity irrespective of whether you win a competition or not, there is a tremendous amount of learning which happens enough in a hackathon. Over a period of two days typically, we see the community exchanging anywhere between ten to fifteen thousand messages among themselves. We get on an average eight to ten solutions per participant. People try out various models, people try out various tools and techniques and by end of the hackathon, they’ve been marked what they think is their best solution. People mount that solution and the platform then automatically scores the solutions based on their accuracy or some criteria which are defined up front for them. It puts out a leader booth in real time as soon as they finish and this is where you stand and then thereafter the recruitment process can start immediately. That’s how a typical hackathon works.
If you want to participate in a hackathon, I think the best thing to do is to start with a few practice problems which are there on Analytics Vidhya. They give you a flavour of these same problems but in a non-competitive environment, you’ll find tutorials for these problems published on the blog. You’ll find a benchmark script. That’s a good place to start, that’s a good place to understand how you improve your models or what are some of the tools and techniques which can be used to make your models better irrespective of whether you use Python or SAS or any other tool. It’s a good exercise to practice on a few known problems or practice problems before you get into a competitive or a price money hackathon. Once you’ve done that, if you are new, you should again try and see what other people are doing, what are the discussion portals, typically people what are the tools and techniques they used or people share that again. Try and learn as much from your peers or the best participants out there.
Another thing which we do is push a competition, we release the solution to the community so that topper solutions are made available to the community so that you can see how the best data scientists approach the same problem. And that’s again a huge learning. You not only see where you stand, you also see where some of the best people stand and what kind of feature engineering they do, what kind of transformations they do, how did they go about messing variable imputations, what kind of techniques are in simple modelling did they use to make their solutions better. That’s the best way to kind of learn and then a lot of times people also come across techniques which they were not aware. But you can always go back and search for algorithms today. You can always go back and see what that algorithm means and where it can be applied. That’s where the entire hackathon model comes together brilliantly, the kind of results which I’ve seen, the kind of impact I see when people who participate regularly, it’s just tremendous.
Shweta Gupta: It seems that there is a great potential to create heroes out of these hackathons.
Kunal: Yes, in fact, because we had seen much, we actually published a few stories on Analytics Vidhya where people started by following us or they were doing some courses in parallel, but through the experience, they had across multiple hackathons and the way their career transformed was indeed phenomenal.
Shweta Gupta: Okay, well I think persistence is what you’re bringing out as the hackathons are on collaborative learning. Reading all those people stuff, writing your own, it’s okay you’re not being competitive right up front but participating instead of holding back and waiting for one day to start writing code and then post hackathons picking up the solutions, picking up those best practices, reading and then having certain kind of goal that when you are going to be more active in becoming a hackathon. That’s something which people can plan as their individual career based on the piece they want.
Kunal: That’s correct. At times, initially it might feel intimidating to people when you come up in a hackathon for the first time but if you come forward two to three hackathons, you start getting the knack of things, you understand how people quickly come up with solutions, what are some of the simple ways to create effective solutions. For example, in a lot of problems, you can come out with a very simple solution by just simple averaging or averaging over segments and that’s a really smart solution to come out in 5 or 10 minutes. Then you can always kind of improve on that going further. I think typically, it takes anywhere between two to three hackathons for people to kind of forget the hang of it but once they cross that speed, they start enjoying hackathons. We start enjoying and looking for these pieces of insights or nuggets and then a lot of times, there are really interesting outcomes.For example, one of the participants had created more than a thousand features in a period of seven days. There were more than a thousand variables if the person was considered in his predictor model. The person who won got a better solution by just using 14 variables. So that’s the beauty.
And it also tells it’s not just about applying more machine power, it’s not about just putting grand power at work, it’s a lot about how you go about thinking about the process, how you develop new features or how much business do you understand lightly after participating or working on a particular project.
Shweta Gupta: I think the key message for you is when you come out of the first one, don’t kick yourself. It’s expected that it’s going to be hard for you, it is going to be intimidating but don’t let it fall you down, instead, you move, that’s how it is and be persistent. Go with a second cover, the third, keep reading and that’s when you will start seeing yourself participating. I think that’s a very important message because I interact with a lot of folks who are learning or listening somewhere and for them to have this confidence that one needs to keep learning and starting to participate in a hackathon. Do not wait to go and learn all your advanced machine learning. Finish your deep learning, finish everything and then you will think you will participate in the hackathon. That clearly Kunal is not what we ask people to do.
Kunal: Yes, in fact, to be honest, you can come and participate in a hackathon if you know linear logistic regression and decision trees. If those three techniques, you are good to compete in any competition and have a few solutions out there and then see how the process works.
Shweta Gupta: Wow, that’s a very successful summary of what is the kind of skills you can bring on and be confident. I’ll shift a gear here because let’s get a few more insights from you since you work with the industry. Now that with the years things are stabilizing in terms of businesses knowing what is a space of both big data and analytics and they want to then look in to solve those problems versus in the earlier years, it was still trying to figure out what to do with analytics. Now, how do you see the space? What are the different kinds of companies? Where is the actual work happening?
Kunal: Broadly as I see it, I can look at the industry in four different parts. There are four kinds of companies which you can find. I went back to my clustering days. The first set of companies are companies are the capital units. So there are traditional analytic units where there is a product which is there which may or may not be an analytics product but they are applying analytics to improve your product or things. For example a Flipkart or Amazon they have a product which is a marketplace, but there is a lot of analytics which goes into making the platform efficient. How do you enable discovery of new products, how do you make the right recommendations to your right customers? Take similarly for a bank of which customers are risky, which customer is less risky, what is the right interest rate to lend for various customers, that’s the question which again involves a lot of analytics. For insurance, for example at Aviva, the biggest question was which customers are likely to give us a second premium or a third premium so that our products become more profitable? There are these companies where there are capital units where people work and then these companies have access to a lot of customer data and they can use that to make their products more meaningful.
The second set of companies today are the senior companies who are offering services for doing these analytics services. Some of these companies will actually fall into category one which I mentioned, do not have the skills and the expertise to solve all the problems themselves. They, in turn, rely on these category two companies which specialize in analytics. These are companies where they’ll have for example a music model or a fractal where there are people with that specialized skill set who can solve the specific problem and then help the companies make their products better. There are these analytics services and these companies are focused on analytics. Now, they’ve been in and for example, Fractal started back in 2001, Mu Sigma started back in 2006. They have been there for some time. Again the spectrum there varies from companies with a few people, two or three people on the lower side to all the way up to let’s say three thousand to four thousand people. That’s the second group of companies.
The third one is the established large IT companies which traditionally offered IT services but now with analytics coming in or demand coming in from their clients, they are kind of rescaling their workforces or they are looking into analytics as their focus area and then you would find again the likes of TCS, Workflow to management consulting. I was talking to someone at Mckinsey a few days back and they were telling how McKinsey started looking at themselves as an analytics company than as a management consulting company. There are these companies over there who had a different focus previously but they now look at analytics as a focus area. And then finally the fourth category I would say is startups or people who are building data products or data services. For example, there are companies who are building API, for solving very specific problems. I was again talking to companies which are providing really interesting healthcare solutions or solutions to logistic problems based on some proprietary algorithms they’ve created. This healthcare company, they’ve created small pads which can go on your body and it starts capturing a lot of data about your heart rate, about very various other parameters and then they use that to provide you recommendations about your health.
These are broadly the four kinds of companies in which I would classify the industry. Again, the career opportunities in them, I personally like the startup space a lot because you’re really are changing a lot of things. There is a lot of exciting work happening and I think it’s a great time to be a known analytic startup at this stage. But at the same time even if you are in an MNC trying to change their products. We have seen companies like Google, Facebook, Amazon come out and say that they are rethinking their entire lines of business around AI and machine learning. There is again a huge demand in the industry with the scarce supply.