Those are not LLMs. They use the same foundational technology (pick what you like, but I'd say transformers) to accomplish tasks that require entirely different training data and architectures.
I was specifically asking about LLMs because the comment I replied to only talked about LLMs - Large Language Models.
At this point in time calling a multimodal LLM an LLM is pretty uncontroversial. Most of the differences lie in the encoders and embedding projections. If anything I'd think MoE models are actually more different from a basic LLM than a multimodal LLM is from a regular LLM.
Bottom line is that when folks are talking about LLM applications, multimodal LLMs, MoE LLMs, and even agents are all in the general umbrella.