Frequently Asked Questions
I’ve gotten many emails from people interested in the project, and I’m truly flattered. Since I haven’t been able to respond to all of them, I’ve distilled the most common questions below. I hope this helps with your research, coursework, or whatever you’re working on.
Which specific ML models are you using? What are the hyperparameters?
Currently, I don’t want to be fully transparent about the models I use or the specific training configurations and hyperparameters. Why? Mostly because I don’t want to steer anyone down the wrong path if I end up being wrong. At the same time, if I happen to be onto something, I want time to test, revise, learn, and re-run the experiment before sharing details.
What I will say: leverage the thousands of research papers out there on this topic and experiment on your own. I challenge you to try things independently. That’s how you’ll truly learn. New studies are published every week that you can draw from.
How much time did you spend building the models?
I truly wish I had kept track. Hundreds of hours reading scholarly papers, mixed with hundreds more experimenting. Training, tuning, failing (oh, so many failures), chasing red herrings, dealing with false positives, and banging my head against the wall. So many times I thought I had something, only to realize a bug was inflating my metrics.
Is this your full-time job?
No. Sometimes I wish it were, and sometimes I’m glad it’s not. This is a labor of love. I find the problem genuinely interesting, interesting enough that it will continue to be a part-time pursuit I come back to, especially as the research evolves or if I have a breakthrough of my own.
Is your GitHub repo public?
At this time, no. I don’t plan to share the full repository, but I may open-source parts of it at some point.
Are you planning to go beyond the S&P 500?
Short answer: yes. I plan to expand my research to cover the broader Nasdaq, the Dow Jones, and potentially cryptocurrency—which is an entirely different class of problem. But for now, I’ve set my sights on what I believe is one of the most difficult tasks in quantitative finance: consistently outperforming the S&P 500. It’s not a challenge for the faint of heart, and it will consume most of the spare time I can dedicate to this project.
Are you planning to invest real money?
At this time? No. Hard stop.
Eventually? Maybe, and I would even say probably. But I would have to convince myself that the alpha is real and that my models provide at least a slight edge over the rest of the market. That kind of conviction takes time: years of investment in research, experimentation, failure, and eventually finding success.
I won’t lie: my ambitious end goal is to build an ensemble of models that consistently outperforms the S&P 500. I don’t know what I’ll do if and when I reach that goal. To me, retirement means doing what I love instead of doing something just to be paid. I’m hoping this project helps me get there.
Did you vibe-code parts of this project?
I didn’t set out to, but necessity won out—and the results genuinely surprised me. I used Claude to handle portions of this project that fell outside my core expertise, which freed me to focus on what actually matters: researching models, running endless rounds of tuning, trials, and backtesting. I’ve become a firm believer that AI is a meaningful productivity multiplier, especially in domains you don’t know well and don’t want to spend months learning from scratch.
The models, however, I researched and built on my own. For tuning and backtesting, I used Claude to automate my workflows so I could make efficient use of Google Colab for GPU-accelerated training and evaluation. That said, it was still time-consuming—I ended up building a library of helper functions specifically to guide Claude toward proper leakage protection for time-series data. AI-assisted coding has come a long way, but it still has further to go before I fully trust it.
Have a question that isn’t covered here?
Send me a message and I’ll do my best to respond.