Interview with Hemant Rathore
Senior Data Scientist, ERM India
Mumbai, Maharashtra, India
Q1. Please share your inspirational journey as a Data Scientist with us?
I am currently working at ERM – Environmental Resources Management, India as a Senior Data Scientist where my role is to explore & apply ML/AI based approaches to solve various business problems mainly into EHS domain, I am also pursuing my Masters in Data Science from Indian Institute of Technology, Hyderabad.
I started my career as a Data Engineer in 2010 with TCS, where I worked on verity of Analytics projects mainly around data engineering, ETL and data visualizations, i played different roles there including Lead data engineer, BI Lead & Data Analyst later I joined Teradata as a Data Scientist where I worked on some large scale ML solutions using big data platforms.
Before Joining ERM I was working as Lead data Scientist at AccionLabs, India where I leaded some projects around cutting edge image processing techniques on 3D medical images.
I was also associated with Cognixia as a corporate trainer where I have conducted many training and certification programs around Data Science and Machine Learning.
Q2. What is Data Analytics how is it related to Machine Learning & Data Science?
Data Analytics is the umbrella term used for all sort of data analysis and business intelligence related work, broadly it can be categorized into Descriptive, Diagnostic, Predictive, and Prescriptive Analytics, the traditional data analytics and BI techniques come under descriptive analytics where we try to understand what has happened.
On the other side we use different ML based approaches to answer some more advanced questions around Why it happened (Diagnostic), What will happen in future (Predictive) and what can be done to achieve required outcome (Prescriptive).
The majority of data science work is all about extracting the insights out of data where data can be anything from structured DB tables to csv files, from unstructured text data to log files or from images to audio/video files anything, ultimately we want get some insights out of it.
Q3. Why Data Science is growing rapidly in 2021? How it is useful for business solutions?
Data is the new oil, it is immensely valuable asset for any organization now a days, people often ask – why it is so important to analyze the historical data? The answer is patterns, Data science enables the organizations to understand the important patterns lying within the historical data which can further help to understand or control the future outcomes and this is why it has got so much potential.
With the latest advancements in the fields of data storage and processing hardware organizations are producing the data at the scale never seen before, with this much amount of data they need resources & techniques to process it and make sense out of it, data science is the solution here, with the help of advance data mining algorithms we can process large volume of data at scale and thus help the organizations to uncover the business critical patterns to take timely actions.
Q4. Which one is the hardest working part of any Data Science Project?
The hardest part in any data science project is to map the business problem to the technical design, what we call as ‘Design Thinking’, all the business problems cannot be solved using the same approach, one should know what fits where. One very common mistake I often see among the newbies is directly jumping to the implementation part without spending enough time on research or design. Every data science problem be unique in some aspects and sufficient time should be spent on design thinking.
Q5. What is the best specific path to become a Full Stack Data Scientist for any student? Please suggest important languages, tools and libraries for Full Stack Data Scientist?
Firstly we should understand that there is no shortcut to excel anything and it applies to data science as well, Data science is a research based field, one should never skip the foundations of data science, mainly Statistics and Data Engineering concepts, one should invest enough time to fully understand the basic building blocks before jumping to the coding, I would recommend the following learning path for the beginners –
Learning Path – Theoretical Concepts
- Probability and Statistics
- Matrix Algebra
- SQL & ETL Concepts
- Data Wrangling & Cleaning
- Data mining & Statistical Learning (Machine Learning)
Learning Path – Tools & Practical Implementation
- Data Engineering – Databricks, Spark, Airflow, ADF
- Data Visualization – MS PowerBI, Tableau
- ML programming – Python, R, Keras, TensorFlow, PyTorch
- Model Deployment – Flask, Django
- Cloud Platforms – Azure/AWS
Q6. Which is the best platform to deploy any ML Model/Application on web?
There are many options available now a days to deploy your ML models, firstly you can go for traditional REST API based model deployment and make your model a call away for any application, you can use either Flask or Django for this if you are from Python background.
If you are using some cloud platform like Azure you can go for Azure Endpoint service which provides you no-code deployment option to host your ML models.
Containerization services like Docker is another option which gives you great flexibility to customize your model environment.
TensorFlow Serving & Kubeflow are some options to deploy your heavy models using GPU support.
Q7. Which type of Machine Learning is highly used to solve the real world problem?
Natural Language Processing (NLP) is a subfield within machine learning and now a days it’s one of the highly used domain. Text data is present everywhere and its growing exponentially, the biggest challenge with text data is we cannot summarize it like numeric data and thus it requires its own way of preprocessing and modeling.
NLP has wide range of applications and its growing rapidly, some of the common NLP applications are as follows –
- Frequent Pattern mining
- Sentiment Analysis
- Language Translation
- Topic modeling
- Document classification
- Auto correction/ Auto completion
- Text Summarization
- Question Answering
- Named Entity Recognition
LinkedIn – linkedin.com/in/hemant–rathore