golergka 2 days ago

What are is this problem from? What areas in general did you find useful to create such benchmarks?

May be instead of sharing (and leaking) these prompts, we can share methods to create one.

2
mobilejdral 2 days ago

Think questions where there is a ton of existing medical research, but no clear answer yet. There are a dozen alzheimer's questions you could for example ask which would require it to pull in a half dozen contradictory sources into a plausible hypothesis. If you have studied alzheimer's extensively it is trivial to evaluate the responses. One question around alzheimer's is one of my goto questions. I am testing its ability to reason.

henryway 2 days ago

Can God create something so heavy that he can’t lift it?

viraptor 2 days ago

There's so much text on this already, it's unlikely to be even engaging any reasoning. Or specifically, if you got a few existing answers from philosophy mashed together, you wouldn't be able to tell it apart from reasoning anyway.