Rohit started his career as a software developer in TCS where he worked for 6.5 years and worked in different roles as java developer and then as Java architect. Then inspired to enhance his knowledge in big data and data science he moved to Europe to pursue a PhD in Big data especially in graph data analytics and management. The current trend in information technology and how important these skills are for the industry was the main motivation which pushed him towards taking a career path in Data Analytics.
What was the first data set you remember working with? What did you do with it?
Rohit Kumar: I worked with Twitter data and we used it for mining information flow patterns between different users in twitter and to find out most influential users in the network.
Was there a specific “aha” moment when you realized the power of data?
Rohit Kumar: There had been many “aha” moments since I started working with data analytics. We found interesting patterns and trends, for example using graph motif analysis on Twitter and other kinds of network data such as SMS or Facebook message data we found that the communication patterns between users are very different in different platforms.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Rohit Kumar: I regularly follow KDnuggets, Kaggle for new competitions.
Share the names of 3 people/publications/research that you follow in the field of Data Science or Big Data Analytics.
Rohit Kumar: I usually follow main conferences such as VLDB, KDD, WSDM and CIKM for latest research being done in the field of data science and big data.
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Rohit Kumar: I use much variety of different tools but the main ones are Spark and Python for data analytics
What are the different roles and skills within your data team?
Rohit Kumar: Apart from data engineers and we have people with expertise in ML/DL, Geolocation data analytics, Statistics and psychology. And off course there are some people in the team for project managers who have some basics of data science but generally does client presentations and project management etc.
Help describe some examples of the kind of problems your team is solving in this year?
Rohit Kumar: We work in multiple European Union funded research projects and also projects for Spain government, For example in one of the projects for Barcelona city we are helping the City council to set up a big data analytics platform to analyse data coming from different public transport systems to design a smart public transport system.
In another project for the European Union, we are designing machine learning algorithms to help identify sensitive data so that companies can use it to align to GDPR and privacy requirement laws for Europe.
How do you measure the performance of your team?
Rohit Kumar: Different KPIs are used for different projects to measure the performance of the team. For example, if its ML/DL based projects then we see how good our models are performing, how well the project timelines were handled etc.
Big Data Teams, Skills and Tools
In the huge Big Data landscape, the skills are swiftly changing. Which is the technology do you see dominating in the ETL data space and real time?
Rohit Kumar: Spark seems to be becoming a very popular tool in ETL data space for Big data landscape, NoSQL systems like MongoDB and Casandra and ETL workflow managers such as Airflow is also quite popular.
How do aspiring Data Engineering demonstrate their capabilities of handling the tool, technology, data and domain? Is Certificate (Cloudera/Hortonworks) a clear differentiator?
Rohit Kumar: A certificate from a known place such as Cloudera/Hortonworks is obviously very useful but these certificates limit your scope only to tools in Cloudera/Hortonworks. More than certification real work experience in some project is still more valued.
Are Analytical skills, Statistics, Machine Learning must have or good to have skills for Data Engineers?
Rohit Kumar: Basic understanding of ML and Analytical skills is very important for data engineers. Statistics is not needed.
Industry Readiness for Data Science
Are the industries looking to understand what they can do with data? Do they have the required data in place?
Rohit Kumar: Yes, they have their private data sources
Which are the top 3 problems that are on top of the Data Science, either based on industries or based on technology area?
(i) Developing and maintaining the infrastructure and data platform.
(ii) Privacy and security concerns
(iii) Maintaining the communication between data science teams and business side.
Industry Readiness for Big Data
Is Big Data becoming a reality in the industry beyond the social giants like Facebook, Google, Yahoo? If yes, which industries are actually moving towards the power of Big Data Analytics? If no, what is the outlook for adoption?
Rohit Kumar: Yes, it is very much a reality, industries like e-commerce companies, financial companies, Energy sector, telecom companies, public sector working towards smart cities are all utilizing or moving towards big data analytics
Name 3 Industries and the kind of problems that they are solving using Big Data.
(i) Telecom companies: How to utilize the data for mobility pattern analysis.
(ii) Public sector such as city council: Smart city projects
(iii) E-commerce: Real-time recommendation using big data.
Who in the Industry is your typical client for Big Data? Is it the CTO, CIO, CMO or special data leaders?
Rohit Kumar: CTOs
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft skills that are needed for Data Analysts and Data Scientists?
Rohit Kumar: Good communication skill is a must for a data scientist, then of course very good statistical and mathematical background with good programming in either python or R and data visualization skills. Plus domain expertise is a big plus for a specific domain like medical or smart city projects.
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Rohit: Lots of focus and time goes in understanding the data, cleaning it properly and then doing exploratory analysis.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
(i) Programming and software skills – R, Python, SAS or Excel
(ii) Visualization Tools
(iii) Statistical foundation and applied knowledge
(iv) Machine Learning
I would say (a) and (c) is must then (b) and (d) is good to have.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Rohit Kumar: Keep up to date with new tools especially the ones in the big data field as sooner than later you will be working with lots of data.
Big Data Solution Space
What is the kind of structured and unstructured data companies have? What is the size that we are talking about?
Rohit Kumar: Navigation data from web sites, blogs and comments from social media. IoT device data, mobile connection data, I have seen up to 2000 TB of data for telecom companies for 1 year only.
Are there legacy systems that are being replaced? If yes, which legacy skills are being replaced?
Rohit Kumar: Not being replaced as of now.
What is the size of clusters/environments that are being deployed for the clients? What are the production challenges?
Rohit Kumar: Depends!! We have clusters with 3-4 systems with 16 GB RAM to clusters with 20 -30 systems with 2 TB RAM. The main production challenges are the deployment and upgrade of systems and maintenance of the system.
Would you like to share a few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Rohit Kumar: I think Digital Vidya is doing some really good work in developing the data analytics talent for the industry. Python is still one of the most popular languages for industry data science teams and digital Vidya has a really comprehensive course in this area. Traditional tools like excel and power BI is also actively used in legacy systems and I see Digital Vidya is conducting courses in these areas as well. There is an offering in Big data courses as well for data engineers. The courses are really up to date and have nice projects for a real-world experience which I think is a big plus.
Are you inspired by the opportunity of Data Science? Start your journey by attending our upcoming orientation session on Data Science for Career & Business Growth. It’s online and Free :).