Probably this was already done at Google, Meta, X and OpenAI, before training their LLMs.
There's actually section in the Wikipedia page that explicitly says DeepSeek was trained on it