Tag
multimodal AI
Multimodal AI combines text, images, audio, and video in one model or workflow, so systems can understand, generate, and edit across formats. It matters for long-context assistants, image editing, speech interfaces, video analysis, and agentic software.
15 articles

Xiaomi MiMo-V2-Omni turns perception into action
5 takeaways from Xiaomi MiMo-V2-Omni, a multimodal agent model that pairs visual, audio, video, and browser action skills.

Gemma 4 brings 256K context to open models
Google’s Gemma 4 adds text, image, and audio input, plus up to 256K context and five model sizes for local or server use.

MiniMax M3 adds 1M-token coding power
MiniMax M3 brings coding and agent features, a 1 million-token context window, and multimodal input to the company’s flagship model.

Google Gemini 3.5 Pro Targets June With 2M Tokens
Google plans June availability for Gemini 3.5 Pro, with a 2 million token window, Deep Think reasoning, and first access on paid plans.

ScoreDetect details AI moderation rollout, 99% matching
ScoreDetect outlines a multimodal moderation stack, 99% matching, blockchain proof, and a 90-day rollout for enforcement teams.

Gemini 1.5 Pro-002, Flash-002 and 2.0 Flash update Google AI
Google released Gemini-1.5-Pro-002 and Flash-002 on Sept. 24, 2024, then previewed Gemini 2.0 Flash with live multimodal and agent tools.

MemDreamer tackles long-video overload
MemDreamer splits perception from reasoning to make hours-long video understanding fit in a tiny context window.

MiniMax M3: 中国首个三合一开源模型
MiniMax M3 combines coding, 1M context, and native multimodal support, while MiniMax Code adds an agentic coding layer.

Why MiniMax M3 matters more than another long-context model
MiniMax M3 is a real step forward because it pairs long context with multimodal and agentic control.

What We Know About GPT-5.6's Release Date
OpenAI has not announced GPT-5.6, but hiring, infrastructure work, and model rumors point to a late-2024 or early-2025 window.

Why Geminigen AI Is Just Another Generative AI Wrapper
Geminigen AI is presented as a broad generative AI concept, but it adds no clear technical edge or product identity.

Why AI infrastructure is now the real moat
AI leadership now depends more on compute, distribution, and product limits than on model demos.

Kimi K2.6 Brings 256K Context to API Users
Kimi K2.6 adds 256K context, multimodal input, and stronger coding for developers using the Kimi API Platform.

OpenAI’s ChatGPT Images 2.0 lands with sharper edits
OpenAI quietly shipped ChatGPT Images 2.0, and early tests show stronger edits, cleaner text, and faster image workflows for creators.

Xiaomi’s MiMo AI Push Targets Agentic Software
Xiaomi’s MiMo-V2-Pro, Omni, and TTS models pair 1T+ parameters with low pricing, aiming squarely at agentic AI workloads.