[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-stride-training-data-attribution-sparse-recovery-en":3,"article-related-stride-training-data-attribution-sparse-recovery-en":30,"series-research-b940c037-352c-4c68-8e44-62748fafa560":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"b940c037-352c-4c68-8e44-62748fafa560","stride-training-data-attribution-sparse-recovery-en","STRIDE tracks training data influence faster","\u003Cp data-speakable=\"summary\">STRIDE turns training data attribution into sparse recovery from subset perturbations and cuts attribution cost by 13×.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Research org\u003C\u002Fstrong>: Unspecified in arXiv abstract\u003C\u002Fli>\u003Cli>\u003Cstrong>Core data\u003C\u002Fstrong>: 13× faster than previous art\u003C\u002Fli>\u003Cli>\u003Cstrong>Breakthrough\u003C\u002Fstrong>: Learns steering operators and recovers influences with sparse linear decomposition\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.05165\">STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations\u003C\u002Fa> tackles a practical problem that keeps getting harder as models scale: figuring out which training examples actually shaped a model’s prediction. That matters for debugging, dataset curation, contamination checks, and any workflow where you need to explain why a model behaves the way it does.\u003C\u002Fp>\u003Cp>The paper’s core point is simple: the usual way to do training data attribution for large language models is too expensive. If you try to estimate influence by repeatedly retraining or by tracking gradients across billions of parameters, you either pay a huge compute bill or lean on approximations that only capture local effects. STRIDE proposes a different angle by looking at the model’s behavior in activation space instead of trying to follow every parameter change.\u003C\u002Fp>\u003Ch2>What problem this paper is trying to fix\u003C\u002Fh2>\u003Cp>Training Data Attribution, or TDA, is about tracing a model’s predictions back to the training data that influenced them. In the ideal version, you run causal interventions: add or remove data and observe how predictions change. That gives you a much cleaner signal than post-hoc guesswork, but it is brutal to scale for \u003Ca href=\"\u002Ftag\u002Fllms\">LLMs\u003C\u002Fa> because the obvious method involves repeated retraining.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780552977778-4t7h.png\" alt=\"STRIDE tracks training data influence faster\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Most practical methods therefore approximate training influence in parameter space, usually with gradients. The abstract is blunt about the downside: gradients across billions of parameters are prohibitively expensive, and they only give local approximations. In other words, they can be useful, but they are not the same thing as actually measuring how training data changes the model’s behavior.\u003C\u002Fp>\u003Cp>STRIDE is trying to close that gap without paying the full retraining cost. The paper’s move is to treat influence as something you can infer from how the model’s outputs shift under controlled perturbations, rather than something you must reconstruct from the full parameter update path.\u003C\u002Fp>\u003Ch2>How the method works in plain English\u003C\u002Fh2>\u003Cp>STRIDE stands for Steering-based Training Data Influence Decomposition. The name gives away the basic idea: instead of asking, “How did the weights move?”, it asks, “How can we steer the model’s behavior to mimic what a subset of training data would have done?”\u003C\u002Fp>\u003Cp>The framework learns lightweight steering operators. These operators are meant to mimic the behavioral shift caused by training on data subsets. Once you have those operators, you can measure how they perturb test predictions and then use sparse linear decomposition to recover the influence of individual training examples.\u003C\u002Fp>\u003Cp>The sparse recovery framing matters. Rather than assuming every training example contributes equally or trying to reconstruct a dense, exact explanation, STRIDE assumes the explanation can be recovered from a sparse mixture of signals. That is the compressive-sensing-style intuition the abstract points to: if only some examples matter strongly for a prediction, you can recover them from a smaller set of perturbation observations.\u003C\u002Fp>\u003Cp>For engineers, the practical appeal is obvious. If the method works as described, you are trading a giant retraining or gradient-tracking problem for a much lighter-weight perturbation-and-reconstruction pipeline. That is the kind of shift that can make attribution feasible in real workflows instead of just being a research idea.\u003C\u002Fp>\u003Ch2>What the paper actually shows\u003C\u002Fh2>\u003Cp>The abstract makes two concrete claims about results. First, STRIDE achieves state-of-the-art performance for \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> pre-training attribution. Second, it is 13× faster than previous art. Those are the only \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa>-style numbers provided in the source, so there is no detailed leaderboard, dataset breakdown, or task-by-task score in the abstract itself.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780552973765-wamq.png\" alt=\"STRIDE tracks training data influence faster\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The paper also says it validates practical utility through downstream applications. The examples named are data selection, data contamination, and qualitative analysis. That is a useful signal: the method is not presented as a purely theoretical attribution tool, but as something intended to support real dataset and model-forensics workflows.\u003C\u002Fp>\u003Cp>Still, the abstract leaves important evaluation details unspecified. We do not get the exact attribution metrics, the size or type of models tested, or the specific datasets used for the state-of-the-art comparison. So while the headline is strong, readers should treat the result as a summary claim until they inspect the full paper.\u003C\u002Fp>\u003Ch2>Why developers should care\u003C\u002Fh2>\u003Cp>If you build or fine-tune LLMs, training data attribution is one of those capabilities that becomes more valuable as models get larger and datasets get messier. It helps answer questions like: which samples are driving a weird output, which data should be removed, and whether a model is reacting to contamination rather than learning the underlying task.\u003C\u002Fp>\u003Cp>STRIDE suggests a path toward making that analysis cheaper. A 13× speedup is not just an academic improvement if the bottleneck is repeated attribution runs during dataset cleaning or model auditing. Faster attribution could mean tighter iteration loops for dataset selection, more practical contamination checks, and better postmortems when a model behaves unexpectedly.\u003C\u002Fp>\u003Cp>There is also a broader methodological point here. The paper challenges the assumption that influence must be approximated in parameter space. By moving to activation-space behavior and sparse recovery, STRIDE is betting that the right abstraction for attribution is what the model does, not just how its weights change. That is a useful design pattern to watch, even beyond this specific paper.\u003C\u002Fp>\u003Ch2>What is still unclear\u003C\u002Fh2>\u003Cp>The abstract gives a strong direction, but not the full operational picture. It does not include benchmark tables, error bars, or the exact evaluation setup, so you cannot yet tell how robust the 13× speedup is across different model sizes or attribution settings.\u003C\u002Fp>\u003Cp>It also does not spell out the cost of learning the steering operators, whether the method needs special access to internal activations, or how it behaves when the attribution signal is not sparse. Those are the kinds of implementation details that matter if you want to move from a paper result to a production pipeline.\u003C\u002Fp>\u003Cp>Even with those gaps, STRIDE is a notable entry in the TDA space because it reframes the problem in a way that is computationally more plausible for LLMs. For teams doing dataset governance, model debugging, or contamination analysis, that is exactly the kind of shift worth watching.\u003C\u002Fp>\u003Cul>\u003Cli>STRIDE reframes training data attribution as sparse recovery over activation-space perturbations.\u003C\u002Fli>\u003Cli>The abstract claims state-of-the-art LLM pre-training attribution and a 13× speedup.\u003C\u002Fli>\u003Cli>The source does not provide detailed benchmark tables, so the full evaluation scope is still unclear.\u003C\u002Fli>\u003C\u002Ful>","STRIDE turns training data attribution into sparse recovery from subset perturbations and cuts attribution cost by 13×.","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.05165",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780552977778-4t7h.png","research","en","447ac6c9-477b-45c8-bec2-ff94dc4cf5d4",[17,18,19,20,21],"training data attribution","LLMs","sparse recovery","activation space","data contamination",[23,24,25],"Recasts attribution away from parameter gradients and into activation-space perturbations.","Claims state-of-the-art LLM pre-training attribution with a 13× speedup.","Useful for dataset selection, contamination checks, and model debugging, but benchmark details are limited in the abstract.",0,"2026-06-04T06:02:29.766655+00:00","2026-06-04T06:02:29.758+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,34,36,38,40],{"name":18,"slug":33},"llms",{"name":21,"slug":35},"data-contamination",{"name":19,"slug":37},"sparse-recovery",{"name":17,"slug":39},"training-data-attribution",{"name":20,"slug":41},"activation-space",{"id":15,"slug":43,"title":44,"language":45},"stride-training-data-attribution-sparse-recovery-zh","STRIDE 讓訓練資料歸因快 13 倍","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"9426a6bd-912e-444b-893d-ef9a0434d0ae","streamma-multi-agent-reasoning-latency-en","StreamMA cuts multi-agent reasoning latency","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780554790437-pffi.png","2026-06-04T06:32:33.361195+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"dfcbc7e1-aadb-4fe2-b572-c2e0372a3022","audio-language-models-arbitration-reversals-en","How audio-language models lose to text","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780553874831-f2dl.png","2026-06-04T06:17:28.510747+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"c9c264b1-3a0d-4f5b-ada3-02687c9ab795","mathematicians-warn-ai-could-distort-math-en","Mathematicians Warn AI Could Distort Math","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780504385180-uln0.png","2026-06-03T16:32:29.94161+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"50db75e4-31d8-4222-9f32-476b682a3848","humanoid-gpt-zero-shot-motion-tracking-en","Humanoid-GPT scales motion tracking with a GPT-style model","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780469286641-cfel.png","2026-06-03T06:47:34.975723+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"a65ad2e8-de08-4108-82cb-c3737a17ac6f","ipt-vlms-hidden-space-reasoning-en","IPT helps VLMs reason about hidden space","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780468449119-aqbt.png","2026-06-03T06:32:47.048757+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"4515ce72-a5c8-4559-a345-f24f50d89d09","neuron-selectivity-changes-with-scale-en","How neuron selectivity changes as models scale","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780467495396-q75j.png","2026-06-03T06:17:44.638423+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]