hacklas 2 days ago

The Deepseek V3 is a model with 671 billion parameters, of which 37 billion are active.

Magistral Small is a 24 billion parameter model.

Pretty impressive in terms of efficiency for Mistral.

The size of the Magistral Medium is not publicly available, so it is difficult to compare efficiency there.

1
kouteiheika 2 days ago

> The size of the Magistral Medium is not publicly available, so it is difficult to compare efficiency there.

FWIW one of their 70B models has leaked in the past (search for "miqu") and rumors at the time were that it was their medium model.