The term “Data Science”, as we understand now was first coined by C. F. Jeff Wu in 1997. Coincidentally, the same year Dr. Arijit Laha got interested in intelligent pattern-based computing and opted for advanced optional courses of “pattern recognition”, “image processing” and “computer vision” during his M.Tech. at Indian Statistical Institute, Kolkata. That is followed by a Ph.D. there focused on inventing a few ML algorithms for pattern recognition and building a couple of Expert Systems for DRDO. Since then Dr. Arijit has mostly been working in this diverse and amorphous field, which has gone through repeated relabeling “data mining”, “data science”, “data analytics” and now “AI”. To keep his head straight, he calls it “pattern computing”.
How did you get into Data Analytics? What interested you in learning Data Analytics?
Dr. Arijit Laha: One can say, I have already been there, before data analytics caught up commercially. For me, the real interest is to extract actionable intelligence/knowledge from the volumes of data.
What was the first data set you remember working with? What did you do with it?
Dr. Arijit Laha: Other than standard datasets used by the research community, first interesting datasets I worked with are two multi-channel satellite images of ground cover. I used them for building landcover-type classifiers. The classifiers were built using neuro-fuzzy ML algorithms of my invention.
Was there a specific “aha” moment when you realized the power of data?
Dr. Arijit Laha: Before I studied Computer Science, I was a research fellow in theoretical physics. During that time I was exposed to the computational aspects of the subject such as modeling and simulation, and the role of data therein. This, in turn, got me interested in Computer Science and ultimately to computational intelligence.
What is your typical day-in-a-life in your current job? Where do you spend most of your time?
Dr. Arijit Laha: In my current job I wear a large number of hats, starting from strategic leadership, helping marketing efforts to execute and deliver client projects, sometimes even hands-on solution architecting and implementation in critical and complex projects. Naturally, spending of time depends on the priorities of the day.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Dr. Arijit Laha: Given the rapid pace of development in the field, especially my current focus NLP, for technical knowledge I follow research trends. While for emerging business problems, client interactions are a great source of information in the service industry.
Share the names of 3 people that you follow in the field of Data Science.
Dr. Arijit Laha: Again, there are many people doing significant works, and almost every day new people joining the rank. So, instead of following people I prefer to follow their works/publications.
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Dr. Arijit Laha: At the current stage of my career, I mostly deal with atypical/non-standard problems, requiring multi-part system solutions. They often require multiple tools and technologies. But the heart of the systems involve pattern-computing, for which my current favorite is Python and its packages like Scikit-Learn, NumPy, SciPy, Pandas, Seaborn, etc. as well as NLTK, Pattern, Stanford CoreNLP, Gensim, etc. for NLP works. Also, when situation calls for deep learning, TensorFlow is my current favorite.
What are the different roles and skills within your data team?
Dr. Arijit Laha: My team is currently focusing on creating advanced and reusable NLP system building blocks and operating in R&D mode. So, each team member assumes the roles of experimenter, developer, tester and engineer. They are divided in two groups informally. Each group is responsible for testing and evaluation of the other group’s work.
Each of the team member possess (of course, to different extents) skills/knowledge of programming (mostly Python), NLP and linguistics.
Help describe some examples of the kind of problems your team is solving in this year?
Dr. Arijit Laha: Some of the recent solutions my team built include service tickets routing and profiling, customer-care conversation summarization, aspect-oriented sentiment analysis financial media articles.
How do you measure the performance of your team?
Dr. Arijit Laha: Both by quantity and quality, subject to the complexity of the tasks.
Industry Readiness for Data Science
Are the industries looking to understand what they can do with data? Do they have the required data in place?
Dr. Arijit Laha: Industries are aware of the potential, and looking to understand. Unfortunately, many are following somewhat wrong process for that.
Industries have a lot of operational and transactional data by absolute volume. But they are scattered across databases. Seldom there is a comprehensive and up-to-date data inventory, which is essential for getting the data ready for performing analytics.
Which are the top 3 problems that are on top of the Data Science, either based on industries, or based on technology area?
Dr. Arijit Laha:
- Lack of data preparedness
- Lack of rigor and clarity in business problem formulation
- Lack of adequately skilled data scientists
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Dr. Arijit Laha:
- Knowing Data Science is not about applying algorithms but providing best-possible solution of the business problem.
- How to map a business problem into technical domain.
- Engaging the client in useful discussion and communication
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Dr. Arijit Laha: Real-life data is always messy and noisy.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
Dr. Arijit Laha:
- Programming and software skills – R, Python, SAS or Excel
- Require good grasp of many tools, but
- Needs to develop mastery of at least one Turing-complete programming language with good analytics library support – Python and R are great candidates
- Visualization Tools
- There are many tools, my personal favorite is Tableau
- But also must master programming language specific graph and visualization packages/libraries
- Statistical foundation and applied knowledge
- Must have, including Linear Algebra
- Machine Learning
- Obviously, must have, additionally
- Not only needs to understand what an ML technique/algorithm can do, but also know (by heart) the kind of conditions/situation it does not work well.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Dr. Arijit Laha: Change and expansion (of scope of application) are only constants in this field. Only continuous hard work and knowledge/skill update will save one for obsolescence. However, given the amount of commercial interest and clatter in the area, investing one’s effort and resources should be done judiciously based on inclination, specialization and prioritization.
To know more about Arijit Laha, you can check out his LinkedIn.
Are you inspired by the opportunity of Data Science? Start your journey by attending our upcoming orientation session on Data Science for Career & Business Growth. It’s online and Free :).