After high school, Charles Jansen went for a medical university. He hated it and after some years he left the field of medicine and France at the same time. Charles studied Hotel Management and went to work in China. After a short while, his interest for programming, that he had since high school, continued to grow, up to the point where he decided to study Computer Science. (So as you can see he had a hard time to find something that he liked) Soon after, Charles started to work at S&P Global in Argentina. He was working on data extraction and automation, doing rules bases parsers. At some point, his actual boss was saying that what his previous team was doing could and should be done with a Machine Learning approach. Out of curiosity and planning to demonstrate that it could not be done he started to study Machine Learning as an autodidact. Charles realised rapidly that he was wrong and that an ML approach was indeed feasible to automate data extraction. He was impressed by what could be done with boosted trees and Deep Learning, did some online certification, Kaggle competitions, and then a post-graduate degree in Data Science.
What was the first data set you remember working with? What did you do with it?
Charles Jansen: I guess it probably was MNIST or similar during a training. To do an image recognition based exercises. Later on, I did a GAN on it to auto-generate new data out of it.
From a professional stand point, the first data set I worked with were webpages (HTML) and PDF documents where data needed to be extracted in an automated way.
Was there a specific “aha” moment when you realized the power of data?
Charles Jansen: For the power of Machine Learning, there was definitely a very strange moment when I realised that what I was convinced that could not be done by machine learning, could actually probably be done by a complex workflow of specialized models.
How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?
Charles Jansen: I have thousands of Data Scientist connection in Linked In. Thanks to this I have a very interested feed of what is happening in the world.
Kaggle is one of my favourite webpage to go to stay updated about the state of the art publication and tools. Many people there are exploring and sharing their code.
And finally, http://www.arxiv-sanity.com that helps one to cherry pick papers.
Share the names of 3 people that you follow in the field of Data Science.
Charles Jansen:
- Andrew Ng
- Andrej Karpathy
- Andrew Trask
Team, Skills and Tools
Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?
Charles Jansen: Anaconda, Spyder, Jupyter Notebook, Python, Scikit learn, Pandas, Numpy, Dask and Keras would be the one that I use the most.
Help describe some examples of the kind of problems your team is solving in this year?
Charles Jansen: Data extraction is my main problem. But there is other smaller project around speech to text, categorization, etc.
Advice to Aspiring Data Scientists
According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?
Charles Jansen: You need logical thinking, problem understanding and problem-solving skills. Logical thinking because you need it for programming, problem understanding because the problem you are trying to solve is not always obvious, and problem-solving because you often have to find more than one solution to a specific problem (sometimes the first approach isn’t possible because of results, time or resource and you need to rethink the whole approach).
How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?
Charles Jansen: A lot! Clean datasets are something you don’t see very often. But most training dataset are pretty clean. Even Kaggle dataset, they tend to be way too clean for a Data Science real like experience (there is some exceptions).
I would recommend you to learn to create web crawlers and create your own dataset from the net. Then practice with this data.
Data cleaning and data manipulation in general is something you will use all the time.
What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?
Charles Jansen:
- Programming and software skills – R, Python, SAS or Excel
The best language for you is the one you know the best. Personally from this list I like Python better.
- Visualization Tools
There is great tool likes Tableau. But there is also a lot of free libraries. You can find how to use many of those by looking at and learning from Kaggle Kernels.
- Statistical foundation and applied knowledge
It depends what you want to achieve. For data analytics I don’t feel statistical foundation will be much needed, but for machine learning it is.
- Machine Learning
Data Analytics is a subfield of Data Science where ML isn’t necessarily mandatory. My interest is in ML and specifically DL, now if those will be useful to you totally depend on what your job will be. For data analytics SQL and Visualization Tools are probably what you will use the most.
What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?
Charles Jansen: I don’t do predictions. I let my models do them for me.
Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?
Charles Jansen: Most, if not all, companies are looking for Data Science talents, but there is still only few of them available in the market and filling new positions can be challenging. Meanwhile, the amount of data available globally continues to grow and the need continue to increase.
Having global organizations like Digital Vidya providing online courses and insight is essential to fill the gap. I personally find much more value in online course than in traditional courses (possibility to pause, go back, speed up, review the topic someday after, etc.)
To know more about Charles Jansen, you can check out his LinkedIn.
Are you inspired by the opportunity of Data Science? Start your journey by attending our upcoming orientation session on Data Science for Career & Business Growth. It’s online and Free :).
Hi, this was very helpful!