CMay 1 day ago

One immediate observation I have about this model is that it seems to do a better job of filtering out or toning down some ideological disinformation that other models regurgitate from activist controlled Wikipedia articles, at least for a few I've checked. Previously you had to write your own sanity-check prompts to get the model to do extra up-front work to validate the logical and historical accuracy of things before it spits out what it thinks is the most popular answer.

With this, at least it seems like some of that work was done upfront or the thinking is tuned to avoid those issues, because it's giving me similar conclusions to a sanity-checked prompt. Heck, even Google Gemini and ChatGPT were spitting that stuff out, where this one is giving me a reasonable response. So in that regard, big thumbs up to the Mistral team if they did any specific work in that area. It's something I cared about that I was getting concerned nobody else cared about enough to fix.

1
andsoitis 1 day ago

What’s an example prompt and sanitized prompt you use to evaluate?

CMay 1 day ago

Not going to leak my tests, but here's how you can create your own.

- Think up a topic that's interesting to you, yet maybe controversial.

- Look up primary sources and empirical information about it.

- Then look at a relevant Wikipedia article about it to see if the way the Wikipedia article frames it is honestly and faithfully justified by the primary sources and empirical data about it.

If the article seems to have a strong bias or critically misrepresent the reality even if it does so by stating true things, you have a juicy nugget on your hands.

Ask any given LLM about that topic and see if it regurgitates the opinion in the Wikipedia article. If it does, then develop your own prompt that requires the LLM to go down a checklist of things that help resolve warped logic without specifically trying to shape the output to your own preference. Now find other articles and see how well your checklist generalizes.

How well this works depends on how good the model you're using is at instruction following.

A lot of what thinking models do is expand the context around a topic to hopefully improve final prediction. To assist that, you have to encourage the LLM to be hesitant to form an opinion or decide on the conclusion before the end, otherwise it can start with a conclusion and spend the rest of the time supporting a weak conclusion rather than arriving at a stronger one after new information emerges.

The danger is that reasoning models will state early on in their reasoning some ideological fact the same way it might say, "well i know that 1+1=2, so that means X", when in reality a particular fact does not stand up to scrutiny. Then it gets lost in a loop thinking ideologically, which can help propagate these things through language models which is dangerous.

Ideally all ingested Wikipedia gets evaluated against some levels of ground truth before getting trained on to start with, but then it's harder to keep up to date with it. Until then we have to help LLMs handle these cases better.

staticman2 4 hours ago

I don't get it.

If I think that Rabbits and Hares are classified by Wikipedia incorrectly and Ideological Wikipedia Editors are hiding the truth with Disinformation, why would I give the model any credit if it tells the correct answer only if I develop a custom 22 point mammalian biology reasoning checklist that leads it to the Real Truth about Rabbits and Hares?

It certainly doesn't inspire confidence that any other particular question would be answered correctly?