Research
AI research papers, breakthroughs, and technical deep dives. From academic publications to lab findings shaping the future of AI.

OPD lets you distill skills without brute-force RL
I break down On-Policy Distillation and turn the idea into a copy-ready post-training template.

Google DeepMind turns science into tools
Google DeepMind’s science tools show how Google is packaging AI for researchers who want precision, not hype.

Measuring when LLM behavior actually переносится
A new framework tests whether an LLM’s behavior transfers across payoff-equivalent decision environments.

Prompt injection is now an AI security problem
Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.

Solver choice changes which Nash equilibrium wins
Different zero-sum game solvers can converge to different Nash equilibria, and the choice is algorithm-dependent.

Proper positive-only learning gets a full characterization
A new result characterizes when proper learning from positive-only samples is possible.

DexCompose Reuses Dexterous Policies Across Tasks
DexCompose composes pretrained hand policies into multi-task manipulation by assigning finger-level action ownership.

HaWoR turns hand motion into MANO params
HaWoR’s hand reconstruction setup boils down to predicting MANO parameters, not raw meshes.

NVIDIA’s $30,000 grant targets USC health AI
USC is advertising NVIDIA’s $30,000 academic grant for health and AI research, with June 30, 2026 applications due.

CUDA Toolkit 13.3 fixes a nested-divergence bug
CUDA Toolkit 13.3 fixes a compiler bug from 12.8 that could corrupt registers in deeply divergent GPU kernels.

EAGLE3 is the real speedup for Kimi-K2.5 on MI325X
EAGLE3 is the main reason Kimi-K2.5-W4A8 decodes faster on AMD MI325X, not kernel tweaks.

LLM fine-tuning turns generic models into domain tools
A practical breakdown of enterprise LLM fine-tuning, from data prep to model choice, plus a copy-ready template.

Rust learners need permission to clone first, optimize later
Rust learners should clone freely at first, then optimize once they understand the problem.

Mistral OCR 4 brings structure to document AI
Mistral OCR 4 adds boxes, block labels, and confidence scores to OCR, with API pricing from $4 per 1,000 pages.

Autoregressive Boltzmann Generators ditch flows
ArBG replaces flow-based Boltzmann generators with autoregressive modeling for faster, more scalable equilibrium sampling.

RiVER trains LLMs without ground-truth answers
RiVER shows LLMs can improve from score-based tasks without ground-truth answers by calibrating rewards from execution feedback.

DanceOPD distills image-editing skills into one model
DanceOPD trains flow-matching image models to combine text-to-image and editing skills without them fighting each other.

Microsoft funds AI research on team collaboration
Microsoft Research opened a Spring 2026 CFP for AI that helps teams work better, with awards around $50K to $75K.

3 AI papers on code, music, and diagnosis
A Zhihu roundup highlights three 2026.06.24 AI papers on code generation, real-time music, and rare-disease diagnosis.

New NLP papers map agent memory and tool use
A June 24 arXiv roundup highlights agent memory, tool-use signals, and conversational search papers that push practical NLP forward.

Self-Distillation Can Shrink Model Diversity
Self-distillation can boost pass@1 while quietly reducing rollout diversity and hurting out-of-distribution robustness.

RevengeBench tests reverse-engineering game policies
RevengeBench tests whether LLMs can reconstruct hidden game policies from behavior and improve with custom probes.

Learning Action Priors for Cross-Embodiment Manipulation
A two-stage training scheme gives VLA robots an explicit motion prior before cross-modal alignment.

OPSD lets you turn user clicks into training
I break down OPSD into a copyable loop for turning implicit user feedback into targeted correction and continual training.

UltraQuant: 4-bit KV caching for long agents
UltraQuant shows 4-bit KV caching can speed long, multi-turn agent serving while keeping more context resident.

FLUX3D fixes 3DGS detail loss from images
FLUX3D improves image-to-3D Gaussian generation by aligning sparse 3D latents with dense 2D image tokens.

Stochastic Subgradient Last Iterate Gets Tight Bounds
The paper tightens last-iterate bounds for stochastic subgradient descent in 1D and shows variance alone is not enough.

InSight lets VLAs learn new skills on their own
InSight makes vision-language-action policies learn new manipulation skills without human demos of those target tasks.

Anthropic is right to sound the alarm on recursive self-improvement
Anthropic’s warning is justified, but the bigger problem is that AI control is already slipping beyond easy governance.

OpenAI’s bug hunt rattled Chrome, Safari, Firefox
OpenAI researchers found multiple exploitable browser bugs in Chrome, Safari, and Firefox within a week.