lvl155 2 days ago

Google has been catching up. Funny how fast this space is evolving. Just a few months ago, it was all about DeepSeek.

5
bitpush 2 days ago

Many would say Google's Gemini models are SOTA, although Claude seems to be doing well with coding tasks.

snarf21 2 days ago

Gemini has been better than Claude for me on a coding project. Claude kept telling me it update some code but the update wasn't in the output. Like, I had to re-prompt just for updated output 5 times in a row.

jacob019 2 days ago

I break out Gemini 2.5 pro when Claude gets stuck, it's just so slow and verbose. Claude follows instructions better and seems to better understand it's role in agentic workflows. Gemini does something different with the context, it has a deeper understanding of the control flow and can uncover edge case bugs that Claude misses. o3 seems better at high level thinking and planning, questioning if it should it be done and whether the challenge actually matches the need. They're kind of like colleagues with unique strengths. o3 does well with a lot of things, I just haven't used it as much because of the cost. Will probably use it more now.

ookdatnog 2 days ago

If the competition boils down to who has access to the largest amount of high quality data, it's hard to see how anyone but Google could win in the end: through Google Books they have scans of tens of millions of books, and published books are the highest quality texts there are.

itake 2 days ago

I've been learning vietnamese. Unfortunately, a lot of social media (reddit, fb, etc) has a new generation of language. The younger generation uses so much abbreviations and acronyms, ChatGPT and Google Translate can't keep up.

I think if you're goal is to have properly written langauge using older writing styles, then you're correct.

ookdatnog 2 days ago

I don't think it's simply a stylistic matter: it seems reasonable to assume that text in books tends to have higher information density, and contains longer and more complicated arguments (when compared to text obtained from social media posts, blogs, shorter articles, etc). If you want models that appear more intelligent, I think you need them to train on this kind of high-quality content.

The fact that these tend to be written in an older writing style is to me incidental. You could rewrite all your college text books in contemporary social media slang and I would still consider them high-quality texts.

johan914 2 days ago

I have been using Google’s models the past couple months, and was surprised to see how sycophantic chatGPT is now. It’s not just at the start or end of responses, it’s interspaced within the markdown, with little substance. Asking it to change its style makes it overuse technical terms.

malshe 1 day ago

I have observed that DeepSeek hallucinates a lot more than others for the same task. Anyone else experienced it?

resource_waste 2 days ago

Deepseek was exciting because you could download their model. They are seemingly 3rd place and have been since Gemini 2.5.

Squarex 2 days ago

I would put them on the fourth after Google, OpenAI and Anthropic. Still the best open weight llm.