I was thinking about this the other day, wouldn't it be feasible to make fine-tune or something like that into every git change, mailist, etc, the linux kernel has ever hard?
Wouldn't such an LLM be the closer -synth- version of a person who has worked on a codebase for years, learnt all its quirks etc.
There's so much you can fit on a high context, some codebases are already 200k Tokens just for the code as is, so idk
I'd be willing to bet the sum of all code submitted via patches, ideas discussed via lists, etc doesn't come close to the true amount of knowledge collected by the average kernel developer's tinkering, experimenting, etc that never leaves their computer. I also wonder if that would lead to overfitting: the same bugs being perpetuated because they were in the training data.