Tag
多模態模型
Multimodal models combine text, vision, code, and sometimes speech in one inference stack, making them relevant to agentic workflows, visual understanding, and human-computer interaction. This tag covers model design, long-context handling, fine-tuning, and deployment trade-offs, from Qwen3.5 vision tuning to Kimi K2.5 and MiMo.
0 articles
No articles yet