Online News Popularity Data Blog Post
Rina Deka 2023-07-09
Online News Popularity Data Blog Post
What would you do differently?
I definitely would have had separate branches for each of us separately from the main branch. We ran into lots of merge conflicts and I had to send my code via email because I wasn’t able to debug my git in time after several hours of trying (several hours also beceause of how long it took to run the code), and have the project owner run and push the commits that I was trying to put in. I would have also thought about automating little by little so that we could debug render time issues.
What was the most difficult part for you?
I think that the most difficult part of the project for me was honestly the automation and merging, and googling through obsecure github issues. Helping figure out automation was a little bit difficult since the notes given were a little sparse but we eventually figured it out, although not necessarily in the most efficient way!
What are your big take-aways from this project?
The biggest take-aways I have from this project are:
- It is more efficient sometimes to automate R Markdown, if you have different documents with a similar structure that you would like to create.
- Always remember to preprocess your data! Centering and scaling is a must.
- Make sure that you’re calling your predictors effectively, and removing unnecessary channels.
- Concise code is better for readability.
- If you can, once you have a solution, you should try to find a way to reduce the run time. Is there an easier way to do the same thing? Etc.
- While in general, boosted trees might be the most effective models typically, depending on the data it seems that it’s not necessarily the case and that different models perform better with different data sets it seems. However, I wonder if this could be due to an overfitting issue that was overlooked?
- Make sure you name your chunks!
- Merge carefully