[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-skill-to-lora-cuts-agent-token-overhead-en":3,"article-related-skill-to-lora-cuts-agent-token-overhead-en":30,"series-research-40d7a637-c770-47b5-8813-fd56a798b332":73},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"40d7a637-c770-47b5-8813-fd56a798b332","skill-to-lora-cuts-agent-token-overhead-en","Skill-to-LoRA cuts agent token overhead","\u003Cp data-speakable=\"summary\">Skill-to-LoRA turns SKILL.md skill files into LoRA adapters and cuts prompt overhead.\u003C\u002Fp>\u003Cp>\u003Ca href=\"\u002Ftag\u002Fagent\">Agent\u003C\u002Fa> frameworks often ship \u003Ca href=\"\u002Ftag\u002Fskills\">skills\u003C\u002Fa> as \u003Ccode>SKILL.md\u003C\u002Fcode> files, which means every run pays the cost of re-reading the same instructions. The paper behind \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.16769\" target=\"_blank\" rel=\"noopener\">arXiv:2606.16769\u003C\u002Fa> proposes a cleaner path: distill the skill text into a skill-specific \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09685\" target=\"_blank\" rel=\"noopener\">LoRA\u003C\u002Fa> adapter and load that at \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa> time instead of stuffing the whole document into the prompt.\u003C\u002Fp>\u003Cp>That matters because long skill docs are expensive in token budget, slow down context assembly, and can crowd out the actual task. The idea is simple enough to explain at a coffee shop: read the skill once, train an adapter offline, then swap the adapter in when the agent needs that skill.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Item\u003C\u002Fth>\u003Cth>Value\u003C\u002Fth>\u003Cth>Why it matters\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Paper\u003C\u002Ftd>\u003Ctd>arXiv:2606.16769\u003C\u002Ftd>\u003Ctd>Identifies the method and source\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Method\u003C\u002Ftd>\u003Ctd>Skill-to-LoRA (S2L)\u003C\u002Ftd>\u003Ctd>Replaces text injection with adapters\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Skill format\u003C\u002Ftd>\u003Ctd>SKILL.md\u003C\u002Ftd>\u003Ctd>Common file format in agent stacks\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Training mode\u003C\u002Ftd>\u003Ctd>Offline synthesis from full skill docs\u003C\u002Ftd>\u003Ctd>Moves heavy lifting out of the request path\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Runtime mode\u003C\u002Ftd>\u003Ctd>Dynamic adapter loading\u003C\u002Ftd>\u003Ctd>Reduces prompt length during inference\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What S2L is trying to fix\u003C\u002Fh2>\u003Cp>The core complaint is familiar to anyone who has built agent workflows: the model does not need a 2,000-word instruction file every single time if the skill is stable. The paper argues that \u003Ccode>SKILL.md\u003C\u002Fcode> distribution is convenient for humans but wasteful for inference, because the model keeps paying for the same context on every call.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781993875444-u3le.png\" alt=\"Skill-to-LoRA cuts agent token overhead\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft\" target=\"_blank\" rel=\"noopener\">LoRA\u003C\u002Fa> fits this problem well because it encodes task behavior in a compact set of low-rank updates rather than in repeated text. S2L takes the skill document, uses it to synthesize demonstration data, and trains a skill adapter that can be loaded \u003Ca href=\"\u002Fnews\u002Fai-coding-assistant-roi-measured-en\">only when\u003C\u002Fa> needed.\u003C\u002Fp>\u003Cp>That shifts the cost curve. Instead of spending tokens on instructions every time, you spend compute once during preparation and keep the runtime request smaller.\u003C\u002Fp>\u003Cul>\u003Cli>Less prompt stuffing during inference\u003C\u002Fli>\u003Cli>More room in context for user input and task state\u003C\u002Fli>\u003Cli>Cleaner separation between skill knowledge and live conversation\u003C\u002Fli>\u003Cli>Potentially higher pass rate if the adapter captures the skill better than raw text\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Why the offline step matters\u003C\u002Fh2>\u003Cp>The offline stage is where S2L does the heavy work. The full skill document is used to create skill-guided demonstrations, and those examples train a dedicated adapter. In practice, that means the model learns the behavior behind the instructions instead of re-parsing the instructions every time.\u003C\u002Fp>\u003Cp>This is a meaningful design choice because agent skills are often repetitive. A browser automation skill, a \u003Ca href=\"\u002Ftag\u002Fcode-review\">code review\u003C\u002Fa> skill, or a file-editing skill tends to carry stable patterns. If those patterns can be distilled into weights, the runtime path gets simpler.\u003C\u002Fp>\u003Cblockquote>\"LoRA is a low-rank decomposition that is learned alongside the original model weights and does not require inference latency.\" — Edward J. Hu et al., \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09685\" target=\"_blank\" rel=\"noopener\">LoRA: Low-Rank Adaptation of Large Language Models\u003C\u002Fa>\u003C\u002Fblockquote>\u003Cp>That quote matters here because it explains the technical bet. S2L is banking on the fact that adapters can store useful behavior without adding much runtime friction. The paper is less about a new model architecture and more about moving skill delivery from text to parameters.\u003C\u002Fp>\u003Cp>If you want a broader context for agent design, this sits in the same conversation as tool-use systems and structured prompting. We covered related agent patterns in \u003Ca href=\"\u002Fnews\u002Fagent-tool-use-patterns\" target=\"_blank\" rel=\"noopener\">our recent agent tooling roundup\u003C\u002Fa>.\u003C\u002Fp>\u003Ch2>What the numbers imply for builders\u003C\u002Fh2>\u003Cp>The source summary does not publish a full \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> table, but it does give two signals that matter: the idea targets token reduction, and it claims a higher pass rate. Those are the right metrics for this kind of system, because a skill system only wins if it is cheaper and at least as reliable as prompt injection.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781993873208-2qzs.png\" alt=\"Skill-to-LoRA cuts agent token overhead\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>For builders, the comparison is straightforward. A text-based skill file is easy to inspect and edit, while an adapter-based skill is harder to read but cheaper to run. The tradeoff is between transparency and runtime efficiency.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-cookbook\" target=\"_blank\" rel=\"noopener\">Prompt injection\u003C\u002Fa> keeps skills human-readable but grows context length with every call\u003C\u002Fli>\u003Cli>Adapter loading keeps the prompt shorter but requires a training pipeline\u003C\u002Fli>\u003Cli>Text skills are easy to version in Git, while adapters need model artifact management\u003C\u002Fli>\u003Cli>Adapters can be swapped dynamically, which is useful when an agent needs many specialized behaviors\u003C\u002Fli>\u003C\u002Ful>\u003Cp>That last point is where S2L gets interesting for production teams. If you have dozens of skills, each with its own document, the prompt tax adds up fast. A library of adapters could be easier to load on demand than a wall of instructions appended to every request.\u003C\u002Fp>\u003Cp>The catch is operational complexity. You now need a process for generating examples, training adapters, validating them, and matching the right adapter to the right task. That is a better fit for teams already running model infrastructure than for hobby projects that just want a quick agent demo.\u003C\u002Fp>\u003Ch2>Where this fits in the agent stack\u003C\u002Fh2>\u003Cp>S2L is part of a broader move to make agents less dependent on giant prompts and more dependent on reusable artifacts. The same pressure shows up in \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fdspy\" target=\"_blank\" rel=\"noopener\">DSPy\u003C\u002Fa>, in function-calling workflows, and in systems that separate planning from execution. The common thread is that prompt text is a brittle place to store everything.\u003C\u002Fp>\u003Cp>There is also a nice practical angle here for teams shipping products. If a skill is stable enough to be distilled, it may belong in weights rather than in a markdown file. If a skill changes every week, the markdown file still wins because it is easier to edit and audit.\u003C\u002Fp>\u003Cp>That split suggests a hybrid architecture: keep fast-changing policy in text, and move stable procedural skills into adapters. For agent builders, that may be the most realistic way to reduce token waste without turning every update into a model retraining job.\u003C\u002Fp>\u003Cp>The paper’s idea is small, but the implication is large enough to matter: agent systems may start treating skill docs as training material, not runtime baggage. If S2L holds up in broader tests, the next question is practical, not theoretical: which skills belong in text, and which ones are worth turning into adapters?\u003C\u002Fp>","Skill-to-LoRA turns SKILL.md files into LoRA adapters so agents can load skills without stuffing long docs into context.","zhuanlan.zhihu.com","https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2050299452569785949",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781993875444-u3le.png","research","en","411c8eae-4b17-49a6-bc60-a72749c85a3d",[17,18,19,20,21],"Skill-to-LoRA","LoRA","agent skills","SKILL.md","token efficiency",[23,24,25],"S2L distills SKILL.md files into LoRA adapters for runtime loading.","The method aims to reduce token use and improve agent pass rate.","It trades human-readable prompts for a more operational training pipeline.",0,"2026-06-20T22:17:31.1477+00:00","2026-06-20T22:17:31.139+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":32,"relatedPosts":36},[],{"id":15,"slug":33,"title":34,"language":35},"skill-to-lora-cuts-agent-token-overhead-zh","Skill-to-LoRA 讓技能別再吃 Token","zh",[37,43,49,55,61,67],{"id":38,"slug":39,"title":40,"cover_image":41,"image_url":41,"created_at":42,"category":13},"405de39d-cfc5-43bf-b47b-ff9ce7be96a9","turboquant-does-not-hurt-search-quality-equal-bytes-en","TurboQuant does not hurt search quality at equal byte budgets","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781857967113-2xax.png","2026-06-19T08:32:22.235692+00:00",{"id":44,"slug":45,"title":46,"cover_image":47,"image_url":47,"created_at":48,"category":13},"66286461-18c3-42a2-a053-16a87b9a0dd0","deterministic-multicalibration-optimal-sample-use-en","Deterministic multicalibration finally hits optimal sample use","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781850768283-gcmj.png","2026-06-19T06:32:28.768728+00:00",{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":13},"6dc0410b-c9ec-4148-974b-0b5f7a14975c","uniego-proxy-teachers-egocentric-video-en","UNIEGO unifies egocentric video with proxy teachers","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781849887430-g735.png","2026-06-19T06:17:32.327109+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":13},"b398938d-f651-4d91-bfee-d888ba44fe6f","diffusiongemma-transparency-measured-en","DiffusionGemma’s transparency problem, measured","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781848969642-b497.png","2026-06-19T06:02:30.672396+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":13},"8abdf0aa-3fa8-4123-adec-4b0d3cd6b7de","nitro-split-kernel-isolation-math-en","Nitro’s split kernel turns isolation into math","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781843602176-04ij.png","2026-06-19T04:32:58.564142+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":13},"39d1ecdc-5ce6-45b7-af63-f1b74337311d","blackwell-wins-agentic-ai-infrastructure-benchmark-en","Blackwell wins because agentic AI needs full-stack infrastructure","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781803966380-s5kc.png","2026-06-18T17:32:18.823071+00:00",[74,79,84,89,94,99,104,109,114,119],{"id":75,"slug":76,"title":77,"created_at":78},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":80,"slug":81,"title":82,"created_at":83},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":85,"slug":86,"title":87,"created_at":88},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]