simianwords 6 days ago

Interesting but I'm a bit lost. You are optimising but how do you know the ground truth of "good" and "bad"? Do you manually run the workflow and then decide based on a predefined metric?

Or do you rely on generic benchmarks?

1
viraptor 6 days ago

https://github.com/datarobot/syftr/blob/main/docs/datasets.m...

You need custom QA pairs for custom scenarios.