Ajay Ohri did his MBA from IIML and engineering from DCE, top ten institutes in India of their kind. After a year of car sales from Tata Motors, he started analytics with GE as a Business Analyst. Ajay took a triple promotion to be a Manager in WNS- leading the first successful POC to contract team for their Knowledge Services. After that, he did an year of domestic analytics at Citi, followed by ten years of his own start-up and freelancing. Ajay did many projects and wrote three books on Data Science. Now, he is working as a Senior Data Scientist in Hyderabad.
How did you get into Data Analytics? What interested you in learning Data Analytics?
Ajay Ohri: I got into Data Analytics as I am more of an analytical thinker and I like both numbers and technology. In Data Science you have to keep learning new stuff, and I like that. I also like spreading what I know so I wrote three book on Data Science with more to come.
What was the first data set you remember working with? What did you do with it?
Ajay Ohri: I think I remember working with the entire GE Consumer Finance data. Senior management had an interesting question- how many accounts does GECF have and how many customers. With a dataset size of 150GB + (this was 2004 remember) I ended up crashing the server every time I ran a count query in Proc SQL. Finally, my manager convinced me to do some random sampling. It was called a business snapshot report.
Was there a specific “aha” moment when you realized the power of data?
Ajay Ohri: There were many -aha moments. One such moment was when we sold an extra 50000 credit cards on CitiFinancial closed personal loan database using some filters.
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Ajay Ohri: My typical day begins at 9 AM with a scrum call. Our methodology of project working is to divide tasks into two week goals or sprints. This is basically the agile development method for software and it is different from CRISP-DM or KDD methodologies.
A bit of context is necessary to explain why we do so. My current role is a data scientist in a team implementing Big Data Analytics in a southeast Asian Bank. We have data engineers, admin/ infrastructure people, data scientists and of course customer engagement managers in the team catering to each specific need of the project. My current organization is an AI startup named Kogentix, not only having Big Data Services but also a Big Data Product named AMP which acts like a GUI on PySpark and tries to automate Big Data. AMP is quite cool and I will come to it soon. This leads to the focus of my startup to get as many clients as possible as well as test and implement out our Big Data Product. This means demonstrating success in our client engagements- one of our client was shortlisted for an award last month. Am I sounding too marketing oriented- you bet I am. The work a data scientist does is usually of a strategic consequence to the client.
What do I do on a daily basis? It could be many things – including not just emails and meetings. I could be using Hive to pull data, using it to merge data (or using Impala), I could be using PySpark (Mllib) to make churn models or do k means clustering. I could be pulling data in an excel file to make summaries and I could be making data visualizations. Some days I prototype in R using some machine learning packages. I also help with testing of AMP, our Big Data Analytics product and work with that team for feature enhancement of the product (if you forgive the pun- since the product is used for feature enhancement). When I code Big Data, I could be using the GUI for Hadoop HUE or I could be using command line programming including batch submitting of code.
Prior to this, when I working for India’s 3rd biggest software company Wipro my role was quite opposite. Our client was India’s Ministry of Finance (the arm that deals with taxes). Junior data scientists pulled data using SQL from an RDBMS (due to legacy issues), and I validated the results. The reports were then sent to the various clients. On an ad-hoc basis we also used SAS Enterprise Miner as a concept test to show time series forecasts of imports and exports for India. Timelines are quite slow and bureaucratic when working for a federal government vis a vis working for the private sector. I remembered one presentation when the bureaucrat in charge was astonished we were executing machine learning and why the government did not use it earlier. But SAS/VA (for Dashboards),SAS Fraud Analytics (which I trained on and which was in process of implementation) and Base SAS (the analytics workhorse) are amazing software and I doubt how anything resembling SAS Domain Specific Bundles can be made soon.
Prior to this for ten years I ran Decisionstats.com. I blogged, sold ads (not very good), wrote 3 books in data science, scores of articles for Programmable Web, StatisticsViews and did some data consulting. I even wrote a few articles for KDnuggets. You can see my Wikipedia profile.
I have no typical day. Sometimes I am training, sometimes I am coding, Many times I am in meetings or calls. From a data science perspective I do a stage of CRISP-DM, pull data using Impala, run batch scripts, do some analysis using PySpark. Do some machine learning in R or Python. Present data in an excel or Powerpoint.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Ajay Ohri: I read r-bloggers.com, kdnuggets.com, I follow LinkedIn news feed a lot.
Share the names of 3 people that you follow in the field of Data Science.
Ajay Ohri: Hadley Wickham, Gregory Piatetsky-Shapiro, Hilary Mason
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Ajay Ohri: I like R a lot and Python is next. I also like SAS. Basically, I am impatient with tools which slow the analytics pipeline down. The team uses the Hadoop stack but Hive/Impala and Spark especially on Cloudera stack.
What are the different roles and skills within your data team?
Ajay Ohri: Three roles- Data Scientist (MLlib on PySpark, R and Python), Data Engineer (Sqoop, Hive), infrastructure admins (Java, Linux,Shell Script).
Help describe some examples of the kind of problems your team is solving in this year?
Ajay Ohri: Well we are solving the perenial problem of cross-selling more products to the customer database to a South East Asian bank. Most of the work is confidential- however we use CDH, PySpark a lot.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Ajay Ohri: Okay, great question!
Please learn a lot of SQL with complex nested queries, work with JSON data, learn R, Python but also Big Data including PySpark. LEARN how to make a data science story into a business story in a crisp presentation. Learn Git.
Soft Skills –
Try to work in a team, notice how your seniors write emails, and learn diplomacy in dealing with difficult questions
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Ajay Ohri: Messy data is a reality I have seen in past 15 years of working. It is tough to learn it on your own but try and learn from public data sets. R, Python, SAS- learn any two.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
- Programming and software skills – R, Python, SAS or Excel ALL
- Visualization Tools Python, R AND Tableau
- Statistical foundation and applied knowledge A/B Testing, Hypothesis Tests
- Machine Learning Everything in scikit-learn(Python), caret(R) and Mllib (spark)
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Ajay Ohri: Deep-learning and automation are some trends to keep pace with.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Yes, I like the huge investment you are doing to create the next generation of talent. As someone who helped create the analytics course from scratch for R and Python, I hope you do well. Initial feedback suggests Digital Vidya produces high quality Data Analysts.
Are you inspired by the opportunity of Data Analytics? Start your journey by attending our upcoming orientation session on Data Analytics for Career & Business Growth. It’s online and Free :).