Interview with Srivatsan Srinivasan
Chief Data Scientist
Phoneix, Arizona, United States
Q1. Please share your inspirational journey with us up to become a Chief Data Scientist?
I would call myself a self-thought data person (except for Andrew Ng courses I have taken on ML side). I have played multiple roles throughout my career, from Engineer to Data Architect to Data Scientist to a leadership role, managing multi-million dollar data portfolio. Throughout this transition, I have stayed close to technology, solving complex business problems for customers I work with and, above all, generating tangible business value that can further accelerate new investment into IT projects within these businesses.
I have worked closely with CxO’s and IT executives of Fortune 100 firms who have trusted on me to deliver complex data science engagements. Constant research and learning is what keeps me going in this rapidly changing field.
Coming specifically to data science, I have been in this space even before the hype started around it. I have learned and worked with some fantastic data scientists who were realistic in expectation on what this technology space can offer, and on top of it, working on banking side, the aspect of explainability and ethics has always been core to my outcome.
Even in the sales hat I wear, I sell data solutions to the customer that I am confident will make them successful instead of thinking delivery as an afterthought of sales cycle.
That’s all about my journey. Not sure if there is any inspiration to draw from my journey though.
Q2. You did your engineering in Electronics and Communication then what did attract you towards Data Science Domain?
I was always passionate about electronics, and I still am. I did both my Diploma and Degree in electronics. To be realistic, when I finished college, there were not many jobs in electronics in India except for a few, which I was not sure if I want to be there.
I started taking Java course post college and I landed my first job as Java developer. I soon realized it was not the space I wanted to build my future on and started focusing more on backend side, writing SQL, PL/SQL, ETL, and performing Oracle DBA role. From that time on, data has always been in my DNA and something that excites me. I have tried my best to ride along with data trends and got aboard on to Big Data hype era, building data engineering systems and then to data science and machine learning.
Q3. You are running a YouTube Channel named “AIEngineering”, which is very successful in Data Science and AI Domain. Why did you make it and what was your main motto behind this online venture?
While we were never short of good data science and ML content, one area that I saw not much information available was on engineering machine learning systems. Most colleges and courses focus on algorithms that are essential, but 70 to 80% of work is outside of these algorithms. Framing a data problem, Data collection, Data Engineering, Model deployment, Scaling, and monitoring aspects are hardly covered, that are critical for enterprise-scale systems.
I intended to bridge the gap between academics and real-world systems through my AIEngineering channel. Academics are good at teaching the foundation and intuition behind machine learning algorithms, which they are extremely good at delivering. My focus is to help individuals build enterprise-scale systems to deliver business outcome and value.
Q4. Your “Time Series Modelling and Analysis” video series is so popular on YouTube. Where it is used and what are the benefits of Time Series in real world? Please share something about it?
Time Series has always been critical for businesses, be it in retail for sales forecasting or product demand forecasting, in energy for demand forecasting, and across industries in call demand forecasting, workforce forecasting etc. In today’s world with IOT, AIOps, Predictive Maintenance among other Time Series models have even more relevance.
Most institutes and courses focus on traditional machine learning, deep learning, and less on Time Series. There has been a sea of change in Time Series modeling over time, with newer tools like Facebook Prophet, DeepAR, LSTM etc., showing State of Art results in some cases. I think my Time Series course has good popularity due to coverage of both statistical and advanced techniques and demonstrating it with some useful real-world dataset. Few aspects in my view that differentiates the course are the coverage on scaling time series using Spark, Time Series Anomaly detection, Multivariate and Multiple time series apart from the regular Time Series modeling coverage.
Q5. Which cloud platform is best to deploy any Machine Learning model in real business world?
In my personal opinion, top 3 cloud platforms are almost equally placed for ML and AI capability. Even if any of the providers are lagging on any capability, they might not be very far from catching up with it. Selection of cloud platform for ML purely depends on organization data strategy on cloud. It is not ideal to have data on one cloud and ML on another (even though possible). Selection of the cloud has to be looked from application, data, and ML capability.
Coming to the capability of cloud, I feel every cloud today has some uniqueness. GCP with its AutoML suite and better Cloud API accuracy, Azure with its nice drag and drop UI for model creation and deployment and AWS with similar capability.
Q6. What is the main use of Apache Spark in Machine Learning and Data Science Domain?
Apache Spark is used to create scalable machine learning on large datasets. So if you have a dataset that has outgrown what python or R dataframe can handle, Spark becomes an obvious choice for many reasons, even though there are frameworks like Dask and others. Some reasons I feel works in Spark favor are
- Support for Multiple Languages – Python, R, Scala and Others
- Built-in scalable algorithms as well as support for most python ML packages like Tensorflow, XGBoost, LightGBM, Prophet, and many others
- Seamlessly run the same job on-premise in your data center and on any cloud either with cloud native services like AWS EMR, GCP Dataproc, kubernetes, or with databricks. This way enterprise is always cloud ready
- Connectivity to significant data sources – HDFS, RDBMS, Cloud-based storage, Cloud data warehouses, file systems, and many more
- One another reason which I feel works in Spark favor is investment across many enterprises that have embarked on their Big Data journey
Q7. Which Machine Learning Model have you used maximum in your career and which one is your favorite?
Even though I have used XGBoost or other GBM frameworks the most, I do not have any favorite algorithm in particular. The selection of an algorithm depends on the task and data in hand. I typically tend to follow the path of simple to complex models in algorithm selection. Circumstances where explainability is vital and models are regulated, the simpler the model, is better. Even in cases where performance is vital, I try to find a middle ground that balances performance and explainability instead of choosing models by performance metric alone.
One thing I will also like to say to aspiring data scientist, is not to chase algorithms and think that your favorite algorithm will solve every data problem you work with. Being curious is the best algorithm for any data scientist, not XGBoost or BERT or others. Curiosity to understand the problem, the business process, the data, and to drive business value is key to successful ML projects.
Read more about Srivatsan Srinivasan @
YouTube Channel – https://www.youtube.com/c/AIEngineeringLife
LinkedIn Profile – https://www.linkedin.com/in/srivatsan-srinivasan-b8131b/
Link to some of my courses :
Mastering Apache Spark – https://www.youtube.com/playlist?list=PL3N9eeOlCrP5PfpYrP6YxMNtt5Hw27ZlO
End to End Time Series – https://www.youtube.com/playlist?list=PL3N9eeOlCrP5cK0QRQxeJd6GrQvhAtpBK
Model Deployment – https://www.youtube.com/playlist?list=PL3N9eeOlCrP5PlN1jwOB3jVZE6nYTVswk