Tag

多模態模型

Multimodal models combine text, vision, code, and sometimes speech in one inference stack, making them relevant to agentic workflows, visual understanding, and human-computer interaction. This tag covers model design, long-context handling, fine-tuning, and deployment trade-offs, from Qwen3.5 vision tuning to Kimi K2.5 and MiMo.

0 articles

No articles yet