Tag

multimodal AI

Multimodal AI combines text, images, audio, and video in one model or workflow, so systems can understand, generate, and edit across formats. It matters for long-context assistants, image editing, speech interfaces, video analysis, and agentic software.

15 articles

Industry News/Jun 26

Xiaomi MiMo-V2-Omni turns perception into action

5 takeaways from Xiaomi MiMo-V2-Omni, a multimodal agent model that pairs visual, audio, video, and browser action skills.

Model Releases/Jun 17

Gemma 4 brings 256K context to open models

Google’s Gemma 4 adds text, image, and audio input, plus up to 256K context and five model sizes for local or server use.

Model Releases/Jun 13

MiniMax M3 adds 1M-token coding power

MiniMax M3 brings coding and agent features, a 1 million-token context window, and multimodal input to the company’s flagship model.

Model Releases/Jun 12

Google Gemini 3.5 Pro Targets June With 2M Tokens

Google plans June availability for Gemini 3.5 Pro, with a 2 million token window, Deep Think reasoning, and first access on paid plans.

Tools & Apps/Jun 10

ScoreDetect details AI moderation rollout, 99% matching

ScoreDetect outlines a multimodal moderation stack, 99% matching, blockchain proof, and a 90-day rollout for enforcement teams.

Model Releases/Jun 9

Gemini 1.5 Pro-002, Flash-002 and 2.0 Flash update Google AI

Google released Gemini-1.5-Pro-002 and Flash-002 on Sept. 24, 2024, then previewed Gemini 2.0 Flash with live multimodal and agent tools.

Research/Jun 8

MemDreamer tackles long-video overload

MemDreamer splits perception from reasoning to make hours-long video understanding fit in a tiny context window.

Model Releases/Jun 6

MiniMax M3: 中国首个三合一开源模型

MiniMax M3 combines coding, 1M context, and native multimodal support, while MiniMax Code adds an agentic coding layer.

Model Releases/Jun 6

Why MiniMax M3 matters more than another long-context model

MiniMax M3 is a real step forward because it pairs long context with multimodal and agentic control.

Model Releases/Jun 4

What We Know About GPT-5.6's Release Date

OpenAI has not announced GPT-5.6, but hiring, infrastructure work, and model rumors point to a late-2024 or early-2025 window.

Industry News/Jun 2

Why Geminigen AI Is Just Another Generative AI Wrapper

Geminigen AI is presented as a broad generative AI concept, but it adds no clear technical edge or product identity.

Industry News/May 16

Why AI infrastructure is now the real moat

AI leadership now depends more on compute, distribution, and product limits than on model demos.

Model Releases/May 4

Kimi K2.6 Brings 256K Context to API Users

Kimi K2.6 adds 256K context, multimodal input, and stronger coding for developers using the Kimi API Platform.

Model Releases/Apr 24

OpenAI’s ChatGPT Images 2.0 lands with sharper edits

OpenAI quietly shipped ChatGPT Images 2.0, and early tests show stronger edits, cleaner text, and faster image workflows for creators.

Industry News/Mar 28

Xiaomi’s MiMo AI Push Targets Agentic Software

Xiaomi’s MiMo-V2-Pro, Omni, and TTS models pair 1T+ parameters with low pricing, aiming squarely at agentic AI workloads.