
Applied machine learning
Use machine learning and online traffic data of an official account on WeChat, one of the biggest social media platforms in China, to predict the net-increase of followers, so as to have a better understanding of users' behavior.
My approach
1. Data preparation
Download and prepare the raw data by splitting the data into three parts: Cross-Validation data(70%), Development data(20%), and Test data(10%).
2. Baseline experiment
Run the linear regression algorithm with default settings on the dataset to get the baseline performance
3. Data exploration
Conduct the K means experiment to make more sense of the data.
4. Feature engineering
Conduct rounds of feature engineering and error analysis on LightSide to adjust the feature space.
5. Tuning
Tune the SVM regression algorithm to get the final model.
The model got significant improvement after feature engineering.
Result
The final model has been significantly improved over the baseline model.
Metric | Baseline | Final model |
---|---|---|
Correlation coefficient | 0.2841 | 0.3009 |
Mean absolute error | 0.0148 | 0.0143 |
Root mean squared error | 0.0376 | 0.0379 |
Relative absolute error | 81.0229% | 78.6965% |
Root relative squared error | 95.931% | 96.6811% |
Reflection
From this project, I learned how to do error analysis and expand the feature space systematically. Additionally, I found that only informative data and meaningful features can lead to high-quality models. All this knowledge is very helpful for me to design more reasonable products in the future.
Other work