I have recently moved from a data analyst role to a data scientist role. This might not sound like a big change to you, but in reality these two roles can be quite different.
Disclaimer: This is my experience and it might be very different from yours! I just want to share my personal experience here with those of you who want to make this transition and don’t know where to start.
What is the difference between data science and data analytics?
Depends on the company!
Different companies use the title of data scientist in different ways. For example, a product data scientist in one company might have exactly the same responsibilities as a product data analyst in another company. I’m emphasizing on the word “product” because I want to highlight that both of these roles are embedded in product teams and they both focus on making better products for the users.
At Wattpad, my current company, product data analysts focus on analyzing the performance of the company's products. They collaborate closely with the product managers to understand each product and its mission. Product managers often want to make changes on existing products (for that, they usually run an A/B experiment) or introduce new products to the users. Analysts help product managers to evaluate the changes they made. Furthermore, they work closely with engineers to implement new tracking events so that they can check the user flow. To summarize what analysts do:
- Defining health metrics to check the performance of the products
- Making dashboards to monitor the health metrics closely
- Investigate changes in health metrics to better understand user engagement with the products
- Help product managers and engineers design A/B tests
- Analyze the results of an A/B test and interpret them
A data analysts usually doesn’t build and deploy machine learning models. This would fall into the responsibilities of a data scientist.
The product data scientists at Wattpad also work closely with the product team to serve users better. Building recommender system and predictive models are two examples of their responsibilities.
The recommender system helps users find the right content on Wattpad with one click! What it does is that it recommends users stories that they would most probably enjoy reading based on the stories they’ve been reading on the platform previously. Then, the product team would build a product on the platform for the users to find these recommended contents.
Predictive models are machine learning models that can tell for example if a user has high probability of becoming a paying user or not. Many entertainment companies have both a free and a paid version of their product. Subscribing for the paid version has a lot of advantages for the users such as having personalized offers and having an ad-free experience. The product team wants to know how they should target users with the right offer for the paid version. Imagine that the product team targets users who don’t have any interests in becoming a paying users. These users might get tired of seeing the targeted messages and offers all the time and they might lose interest in coming back to the platform!
So it is very important to understand users needs and offer them a product that matches their needs.
But…How did I make this transition?
Obviously I needed to build certain skillsets before transitioning to data science world. As an analyst, I naturally had some of the necessary skills a data scientist has to have. For example, creating data sets for analysis, performing exploratory data analysis and finding correlations between different parameters in the dataset are usually part of daily responsibilities of an analyst. However, these weren’t enough and I needed to work with numerical libraries that I had never worked before. I needed to gain a good understanding of these libraries and make sure I understand what their outputs are. You might think this part is easy, but to be honest sometimes the documentations and example codes are not clear enough. In that case, you need to either reach out to some of your colleagues that used those libraries before or write your questions in forums such as Stack Overflow . When I was working as an analyst, I was fortunate enough that I got a chance to work on a data science project and get mentorship from our senior data scientists. Throughout that experience, I gained necessary and basic skillsets a data scientist need.
One thing that I realized is that the data science projects you find in Kaggle or towards data science blogs are very different from the ones you actually work on at your company. The aim might be similar or even the same, like building a classifier that separate different groups of users or building a predictive model that predicts the probability of becoming purchaser for each user. However, in Kaggle or towards data science you have a dataset ready which includes predefined features. Of course you need to do data exploration to understand the correlation between your target values and your features and also perform feature engineering to eliminate some features but you don’t need to create your dataset from scratch! You don’t need to think about the features in advance…Everything is ready for you to explore and play around! This doesn’t happen in reality…It’s often quite hard to think about features you need for your project. Often times you need to modify your dataset by adding new features or removing the existing ones. So you would end up having different iterations of your dataset.
Most of the times you need to transform your features. The transformations might be more complicated than only removing the NULL values or cutting the outliers! It could be combining two features to get a new features which might result in higher values of accuracy score (or any other metrics you’re measuring the performance of your model with). You need to have a good domain knowledge to come up with these transformations. Sometime you end up using two or even more models in one project. The result of one model could be used as a feature in another model.
When it comes to testing your model, we’ll go beyond the testings you find in the online tutorials. We need to test the model on the real data. For example, if you want to know whether a model that predicts users’ churn is working well or not, you need to compare your model’s prediction with real churn data and see whether your prediction made sense or not.
There is also productionization part of the model, where you put the model into production and test it live. These steps are never part of the online blog posts because it is impossible to show how it is done.
With all that said, it is perfectly fine to prepare yourself for becoming a data scientist by doing the projects you find online. As a matter of fact, it is necessary! But don’t get shocked if in reality the data science projects at your company are more complex.
If you like to make a transition to data science world, you should advocate for it! You should ask your manager to give you opportunities to work on data science projects. You need to seek for mentorship! The best is to pair with other data scientists on a project and shadow them. Then slowly you can take some small responsibilities in that project. If you think that is not possible in your current company, prepare for the role and apply for another company. You can upload some of the data science projects you worked on in your Github so that others can see your skills.