Shweta Gupta: This brings us to the next segment because I think we understand the industry. We also understand how the ecosystem Analytics Vidhya platform is enabling this ecosystem and now I would like us to focus on the industry I am in, the education industry because we have the requirements. We have an industry trying to solve problems, we have the ecosystem. Now, lot of people who are wanting to upskill themselves whether when they are still undergraduates, pursuing their master’s at that level, or anybody from 5-10 to 15 -20 years of experience in all ranges because it’s never a bad time to learn and skill yourself in a hot area especially if you enjoy working with data. If that’s something which you have enjoyed but you just haven’t looked at it as your pure-play career but here’s an opportunity. And then there are these companies who are looking to see how to really get people to do that, cover this bridge between their aspirations and getting them to solve the problems. I want to hear your a more objective view on this platform and how do you see this industry and how do you see this industry differentiating for this talent? That’s something if you.
Kunal: Sure. As I look at the analytics training or reskilling industry so to say today, broadly again I go back to my own segments things. I see two broad kind of courses or opportunities and the way people can look at rescaling. There are these long-term courses being offered by really reputed Institutes. You see courses from ISB which are year-long courses which take up a significant investment but at the same time are run over a long duration. You typically attend these classes one weekend every month depending on the course but typically you’ll work over the weekends which that’s one format and you have to be really sure about making that kind of investment into reskilling and I think the kind of investment it requires, you need to be able to arrange for that to go in that direction. That’s one option. The other option which is where you see relatively short duration courses anywhere between let’s say three months on the lower side to six months on the higher side where there is a lot of focus on solving specific needs or specific areas.
For example, a course offering data science using let’s say R or Excel or python where you have taken a call that there is a particular language you want to use there, a set of areas that you will focus on. Then, there you see multiple players coming out some of them for example in case of Digital Vidya, you bring a lot of expertise from your experience in digital marketing trainings. You would look at making the course very hands-on focus on industry means. There are various people who are focusing in this area and again the idea is to run short-term courses with very focused and identified needs. That the end of that upscaling of people. It’s broadly for the end user, the way I recommend to them is if you are very early in your decision making, that whether you want to let’s say test out analytics vidhya or you’re not sure it’s probably one of the areas you want to explore. You might look at some of the openly available courses of let’s say YouTube channels and then see whether it is what you actually want.
If you’ve crossed that stage, I think the short duration courses, can be a good fit where you can start identifying your needs or career path. You can say that let me start with a particular language R or Python or SAS and then vary my expertise around that. If you have done that or absolutely sure that analytics is something which you need, you can also look at some of the longer-term investments but I typically avoid telling people that making long-term investments without getting sure that they would want to make a career in analytics. That’s broadly the way I look at the industry. I think again, overall if you just take a step back and the way you see industry, there is still a lot of unmet demand especially so, if you have a high quality offering, that’s something which is always up for grabs. A lot of clients when we work with on the jobs portal sites typically, they’re hiring cycles are anywhere between six months to 12 months. If a position becomes vacant today, it takes them anywhere between six to twelve months to fill the position they try to through the traditional recruitment channels. If you are really good at what you do and then you’re continuously learning and re-learning and reskilling yourselves then there is a huge unmet demand that you can look up to.
Shweta Gupta: Let follow up this with this question which if I don’t ask you, it’s going to nevertheless come up. In the space when the skill set from the core skill of SAS, R and Python in terms of languages, what should they pick up and especially they’re focusing on the programming aspect of it. I don’t see, there’s this huge chunk as you’d mentioned people who have the data and they need to solve problems. I tell them that is nothing as powerful as you know how to leverage your Excel, advanced Excel data, statistic or network. But once somebody decides to go into deeper and go into machine learning at some point, then you need to pick up one of these. Everybody asks which one. Let’s hear it from you. I have been saying it in a lot of different ways I believe. You tell us.
There the question of the new language typically doesn’t come. The language question typically comes when you want to go on and become a data scientist or a business analyst in the domain. There again the way I see or say to people is no company would hire you because you know multiple languages. Companies hired you because they want to solve problems using one of these tools. Again, there is no point in trying to learn everything. The first thing which you should think about is take one and become good at it. Once you’ve done that, you should then try learning more techniques rather than more tools. That’s the background. Now within this, if your target jobs or the ideas that you would want to be present are let’s say MNCs, banking industry or let’s say clinical research industry. These are some of the industries where SAS has been there traditionally as the market leader and then it would probably stay for near future as the tool where they use. None of the banks would go out and put SAS out of production from there let’s say risk scoring mechanisms. You would have SAS there in these areas and SAS is really efficient.
Then R and Python, both of them have really a strong community, strong ecosystem, lot of functionalities to offer, the typically modelling technique would come out first in R and Python and probably after a few years in SAS. Going forward, R and Python are clearly the future tools to use because they provide a lot of ecosystems, they provide a lot of flexibility and more importantly, you don’t need to invest significantly in procuring those tools. If you’re starting afresh, I would say and until unless you’re dead sure that you want a role in banking or you are in a bank and you want to upscale yourself into analytics team, I would say R and Python is the way to go, SAS, if you if you are very clear about your sectors and then the companies if you want to go to. Now, within R and python, again if you’re from statistical background, R obviously become the natural way to go because there is a lot of research that has been done traditionally and if you are from coding background, Python becomes the easier way because it’s faster to learn if you come from object-oriented programming background and can take that easily.
My personal bias is slightly high towards Python because I see that it provides the production-ready environment. It provides integration between a web development platform to a machine learning platform especially if you’re building products on the web. And lately there a lot of deep learning frameworks and work has happened on Python rather than on R. My personal preference is Python but both of them provide very rich ecosystem in terms of functionality. That’s the brief answer.
Shweta Gupta: If you can just finally if you can talk about the conference and then we move the questions because I’ve seen some very interesting topics on the conference you’re hosting in November. If you can tell us a little bit about it.
Kunal: Yes, we are really excited to do this conference in November. It’s for those of you who don’t know it’s happening from 9th of November to 11th of November in Bangalore at MLR Convention Center. The idea behind doing this conference was I’ve been in multiple conferences myself and a lot of times, these conferences end up being very business focused. As a practitioner, I really find very little value. Typically, what happens is if by end of the conference you have found three sessions where you learn something new that’s a great pick of it. That’s how I’ve seenmost of the conferences. They would talk about analytics at 50,000 feet, you’ll see similar use cases being shared and that’s what we said that we should create a conference which is very practitioner focused, which is focused on people who do data science and who want to learn about various tools, various techniques and various hands-on things and data science. You’ll see that the entire conference is shaped in that manner. Anyone who is taking a session is practising data scientist himself or they are leading a team of data scientists and they have been doing data science for a long time. You’ll find that top data scientists from India, from across the globe.
Dr. Toby Boone is one of the biggest influencers in analytics and data science. He is coming to the conference, all the top data scientists from analytics vidhya platform, all the top callers are coming to the conference, either as speakers or they’ll be there. And as part of this conference, we are doing five workshops and we are also doing 10 hack sessions. A hack session is basically a live demonstration of a problem solving. Someone would take a problem on stage and show how that problem gets solved in real-life by data scientist. Really, hands-on. Its a conference for people who want to learn, I think there can be a better way to come and learn multiple industry applications, see what the people are doing, developing a network of people who are doing data science. Tomorrow, when you come across a question that I want to solve a particular problem or what do people do if they face a problem like this, you can you can leverage that network, you can see some of those practices getting shared. That’s the thinking behind it, acts on it and it’s panning out really well. I can see the vision with which we started getting fulfilled. I can almost visualize how things are shaping up and we’ve got some really great partners and sponsors as part of the summit. So I’m really looking forward to it.
Shweta Gupta: What we’ll do is we’ll shift to some questions from our patrons. I asked some of our folks to put down their questions if they had something for you. I’m gonna read maybe three of them. Let’s start. Prasad asks, what statistics are usually used in data science?
Kunal: Statistics, is one of the key areas that you should learn and understand especially so in your early days in analytics. In terms of bare requirement, I would say you should be very comfortable with descriptive statistics obviously, the mean, median, mode, the definitions, how you can compute that. And you should be comfortable with the basic inferential stats. How do you prove or disprove a particular hypothesis, what are the various kind of tests you need to do to prove or disprove those hypotheses, what are some of the errors, what does a type 1 error mean, what does a type 2 error mean? Why are they scenarios where the one type of error is more costly than the other kind, what are some of the trade-offs? Those are the requirements you need to have and more so from practical application side of things.
You may not know the exact derivation or the exact theory behind it. You can always find the formulas from internet nowadays but you should understand what to be applied, when and in what scenarios. if this much that’s good enough to start, that’s good enough to in fact understand most of the things. If you need anything more specific, you can always come back and look at the texts and lectures. In advanced, you’ll probably need the survival modelling or you might also need Hidden Markov models but not at the start. As long as you are comfortable with descriptive stats and basic inferential stats, you either good to start.
Shweta Gupta: This is from Nandeep. What is the difference between working at a merchandising company which has a data analytics unit vis-a-vis working at a company which totally deals with data analytics?
Kunal: Sure, let me let me explain this let’s say you are working for Tesco, let’s say Tesco would have its own data about how the customers travel or Tesco is basically equivalent of Big Bazaar in the UK, it’s the biggest supermarket. Basically, you get data about what the customers are buying, where are the picking things from, what are the things people buy it together, how frequently do they buy things? If I have bought a 5 kg, let’s say after today, when is the next time I buy? That gives you some indication of my consumption patterns. If you are in a company which is using all that information and then it could be any field not just merchandising. It would be any field. Basically, you could use that data to solve specific problems. In this case, you will say what should store placement be like? Which product should I put near to each other so that they’re more? If you’re owning that data and you are trying to answer that yourself, that’s category one as they were mentioned. But a lot of times, because traditionally these companies are a big dealer set up by people who didn’t have analytics capabilities or they didn’t have the right expertise. There are companies who come with that expertise and solve these problems. These companies would come in as a consultancy company or a consultant and give you very specific recommendations which can be applied on the field and that’s the role too. The second set of companies, you start with a client problem, you build a solution around it, you handover the solution to the client and then typically move on to the next project.
Shweta Gupta: The way I would say it in the first one data is yours, you own the data you act on it. In the second you have somebody else’s data, get inside its outcome prediction. I’ll move to one of the questions from Tata, sorry I don’t get your pronunciations right. He asks with options like hackathons, why do businesses look at competitions for solutions instead of taking on full-time employees.
Kunal: Yes, that’s actually true. Companies are using hackathons for both cases – finding the right talent or finding the solutions. I don’t see it’s an either-or it’s actually a And so there are companies who need people to solve problems but they cannot give the data out in the open. You need the talent to be in your company and to solve the problem. There are areas where you can open source the problem and get the best solutions out. The second thing to keep in mind is it’s not a one-time exercise you are doing so even if you build a predictive model today or get a solution today, you need to continuously improve it, you need to find ways to improve it regularly. For example, Google updates its search algorithm every few days. There is a dedicated team of people who are working on that algorithm. You would need talent irrespective. You can get cloud sources solutions to accelerate that journey. I don’t see that as either/or question, to be honest.
Shweta Gupta: Aviravasal asks I have completed the data analytics course in Digital Vidya. How can I practice or train myself on an everyday basis to master in this analytics field? What is expected to enter into the industry?
Kunal: Sure, if you are looking to enter the industry, in a hands-on role, you should try and get at least two to three projects on your CV before you apply. You should try and get at least two to three projects on your CV before you apply for various roles and these projects can come from hackathons, these can come from competitions, these can come from open datasets, these can also come from your business problem. The company which you are working on if there is a problem which can be solved using data you can use that. What you should also do is once you’ve bugged on a problem, create a GitHub profile, put your codes on GitHub, show what you have done to people on social media, take feedback from people in the industry. You can write about your approach on LinkedIn. What I was saying was once you have worked on something, just showcase it to the world, take feedback on it, put it on your data profile, share it with people on social media, take feedback in terms of how that particular problem can be solved in a better manner. Talk to people in the industry and once you’ve done that four to three problems, that’s the right time to apply for various tools in the industry. In the industry, if you’re applying for hands-on role, typically they’ll ask for your experience and that experience should come from some of these competitions and data set. That’s what I would recommend starting with and in the early years, I would say that keep trying, keep learning new stuff, the domain is changing very quickly, very dynamically. A few years back deep learning was not there is a domain. Today, there is a lot of work happening there. Get your basics right, don’t jump to advanced topic straight away. Build your base of statistics, common machine learning algorithms. But once you’ve built that, keep learning and then keep improving yourself.
Shweta Gupta: I think you have with this answer, you’ve also kind of covered what are the next things people should be learning after getting that foundation ready, that is deep learning. I think your last question, what are some of the next generation tools in data science I think that’s connected to wrap it up with the same answer.
Kunal: Thanks for having me over and great to talk to people and help them any analytic questions they might have. Thanks a lot for having me.