Back to home

Tag

multimodal AI

Multimodal AI combines text, images, audio, and video in one model or workflow, so systems can understand, generate, and edit across formats. It matters for long-context assistants, image editing, speech interfaces, video analysis, and agentic software.

15 articles

Xiaomi MiMo-V2-Omni turns perception into action
Industry News/Jun 26

Xiaomi MiMo-V2-Omni turns perception into action

5 takeaways from Xiaomi MiMo-V2-Omni, a multimodal agent model that pairs visual, audio, video, and browser action skills.

Gemma 4 brings 256K context to open models
Model Releases/Jun 17

Gemma 4 brings 256K context to open models

Google’s Gemma 4 adds text, image, and audio input, plus up to 256K context and five model sizes for local or server use.

MiniMax M3 adds 1M-token coding power
Model Releases/Jun 13

MiniMax M3 adds 1M-token coding power

MiniMax M3 brings coding and agent features, a 1 million-token context window, and multimodal input to the company’s flagship model.

Google Gemini 3.5 Pro Targets June With 2M Tokens
Model Releases/Jun 12

Google Gemini 3.5 Pro Targets June With 2M Tokens

Google plans June availability for Gemini 3.5 Pro, with a 2 million token window, Deep Think reasoning, and first access on paid plans.

ScoreDetect details AI moderation rollout, 99% matching
Tools & Apps/Jun 10

ScoreDetect details AI moderation rollout, 99% matching

ScoreDetect outlines a multimodal moderation stack, 99% matching, blockchain proof, and a 90-day rollout for enforcement teams.

Gemini 1.5 Pro-002, Flash-002 and 2.0 Flash update Google AI
Model Releases/Jun 9

Gemini 1.5 Pro-002, Flash-002 and 2.0 Flash update Google AI

Google released Gemini-1.5-Pro-002 and Flash-002 on Sept. 24, 2024, then previewed Gemini 2.0 Flash with live multimodal and agent tools.

MemDreamer tackles long-video overload
Research/Jun 8

MemDreamer tackles long-video overload

MemDreamer splits perception from reasoning to make hours-long video understanding fit in a tiny context window.

MiniMax M3: 中国首个三合一开源模型
Model Releases/Jun 6

MiniMax M3: 中国首个三合一开源模型

MiniMax M3 combines coding, 1M context, and native multimodal support, while MiniMax Code adds an agentic coding layer.

Why MiniMax M3 matters more than another long-context model
Model Releases/Jun 6

Why MiniMax M3 matters more than another long-context model

MiniMax M3 is a real step forward because it pairs long context with multimodal and agentic control.

What We Know About GPT-5.6's Release Date
Model Releases/Jun 4

What We Know About GPT-5.6's Release Date

OpenAI has not announced GPT-5.6, but hiring, infrastructure work, and model rumors point to a late-2024 or early-2025 window.

Why Geminigen AI Is Just Another Generative AI Wrapper
Industry News/Jun 2

Why Geminigen AI Is Just Another Generative AI Wrapper

Geminigen AI is presented as a broad generative AI concept, but it adds no clear technical edge or product identity.

Why AI infrastructure is now the real moat
Industry News/May 16

Why AI infrastructure is now the real moat

AI leadership now depends more on compute, distribution, and product limits than on model demos.

Kimi K2.6 Brings 256K Context to API Users
Model Releases/May 4

Kimi K2.6 Brings 256K Context to API Users

Kimi K2.6 adds 256K context, multimodal input, and stronger coding for developers using the Kimi API Platform.

OpenAI’s ChatGPT Images 2.0 lands with sharper edits
Model Releases/Apr 24

OpenAI’s ChatGPT Images 2.0 lands with sharper edits

OpenAI quietly shipped ChatGPT Images 2.0, and early tests show stronger edits, cleaner text, and faster image workflows for creators.

Xiaomi’s MiMo AI Push Targets Agentic Software
Industry News/Mar 28

Xiaomi’s MiMo AI Push Targets Agentic Software

Xiaomi’s MiMo-V2-Pro, Omni, and TTS models pair 1T+ parameters with low pricing, aiming squarely at agentic AI workloads.