[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llm-fine-tuning":3},{"tag":4,"articles":11,"peer_article_count":137},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"93aa15ea-c3f0-4f7d-a7c2-b22a81051ec1","LLM fine-tuning","llm-fine-tuning",3,"LLM 微調指的是在既有基礎模型上，透過監督式資料或強化學習調整模型行為，讓它更貼近特定任務與領域。這個主題涵蓋資料準備、訓練穩定性、評估與部署，例如 PPO 的替代方法、BPO\u002FGBPO，以及用 S3、SageMaker 和 MLflow 加速實作。","LLM fine-tuning covers the methods used to adapt a base model to a specific task or domain, from supervised training to RL-based alignment. It matters because stability, data pipelines, and tooling shape real outcomes; examples include BPO\u002FGBPO as PPO alternatives and AWS workflows with S3, SageMaker, and MLflow.",[12,21,29,36,44,52,59,66,73,81,88,95,102,109,116,123,130],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"35368bfc-0dbe-45dc-b422-87b1bd350ac0","google-openrl-llm-fine-tuning-kubernetes-en","Google OpenRL brings RL fine-tuning to Kubernetes","Google’s OpenRL lets teams run LLM post-training and fine-tuning on their own Kubernetes clusters.","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782572578249-jlty.png","en","2026-06-27T15:02:27.543012+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":26,"image_url":27,"cover_image":27,"language":19,"created_at":28},"772c0694-0e86-465d-b676-012a2240eaf7","llm-fine-tuning-turns-generic-models-into-domain-tools-en","LLM fine-tuning turns generic models into domain tools","A practical breakdown of enterprise LLM fine-tuning, from data prep to model choice, plus a copy-ready template.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782569906260-hdga.png","2026-06-27T14:17:57.190952+00:00",{"id":30,"slug":31,"title":32,"summary":33,"category":26,"image_url":34,"cover_image":34,"language":19,"created_at":35},"f1d47b23-1f30-42d8-8d19-b261da877408","llm-fine-tuning-production-2026-en","LLM Fine-Tuning for Production in 2026","AgamiSoft’s guide maps the 2026 fine-tuning choices for production LLMs, from open models to data prep, evaluation, and deployment.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782252180192-5xbc.png","2026-06-23T22:02:33.702857+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":41,"image_url":42,"cover_image":42,"language":19,"created_at":43},"cb08c71e-096a-4508-b172-4698b9a607cc","fine-tuning-llms-locally-sft-lora-dpo-en","Fine-Tuning LLMs Locally: SFT, LoRA, DPO","LLM Configurator’s Guide 13 explains when to fine-tune, how SFT, LoRA, and DPO differ, and how to prepare and evaluate datasets.","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781839068257-3o35.png","2026-06-19T03:17:22.225063+00:00",{"id":45,"slug":46,"title":47,"summary":48,"category":49,"image_url":50,"cover_image":50,"language":19,"created_at":51},"4d6fc0c2-481a-48c6-9743-2f3f77945134","peft-llm-fine-tuning-without-full-retraining-en","PEFT for LLM Fine-Tuning Without Full Retraining","PEFT lets developers fine-tune LLMs by training small adapter layers instead of all weights.","ai-agent","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781403469215-8tu4.png","2026-06-14T02:17:26.696413+00:00",{"id":53,"slug":54,"title":55,"summary":56,"category":49,"image_url":57,"cover_image":57,"language":19,"created_at":58},"39f54361-7d76-4dfe-be99-dcae84f18a07","llm-research-engineers-post-training-services-en","LLM research engineers turn post-training into services","A practical breakdown of Codersarts’ on-demand LLM training work, with a copy-ready template for evals, SFT, RLHF, and alignment.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781402606334-iyoh.png","2026-06-14T02:02:47.274885+00:00",{"id":60,"slug":61,"title":62,"summary":63,"category":26,"image_url":64,"cover_image":64,"language":19,"created_at":65},"5cf69bca-6c4c-46e0-a4b7-b0a59835c548","prevent-catastrophic-forgetting-llm-fine-tuning-en","How to Prevent Catastrophic Forgetting in LLM Fine-Tuning","Use Anchored Weight Decay to reduce prior-task drift during LLM fine-tuning.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780730282480-iwp2.png","2026-06-06T07:17:32.623791+00:00",{"id":67,"slug":68,"title":69,"summary":70,"category":26,"image_url":71,"cover_image":71,"language":19,"created_at":72},"9383f93b-9272-4bd3-81b9-1b3e84f4663e","fixing-llm-forgetting-es-fine-tuning-en","Fixing LLM forgetting in ES fine-tuning","This paper shows LLM fine-tuning with evolution strategies can drift, and anchored weight decay can curb it.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780604273180-xa1x.png","2026-06-04T20:17:26.230817+00:00",{"id":74,"slug":75,"title":76,"summary":77,"category":78,"image_url":79,"cover_image":79,"language":19,"created_at":80},"2a33bea3-0362-4c05-90c8-181ad6ff11b9","peft-vs-full-fine-tuning-en","PEFT vs Full Fine-Tuning","PEFT is the default for most LLM fine-tuning, while full fine-tuning fits edge cases needing deeper model change.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780603380563-5547.png","2026-06-04T20:02:32.377539+00:00",{"id":82,"slug":83,"title":84,"summary":85,"category":41,"image_url":86,"cover_image":86,"language":19,"created_at":87},"006102d8-46b9-4d87-ae50-d97f992ea1ea","lora-fine-tuning-llms-practical-en","LoRA Makes Fine-Tuning LLMs Practical","LoRA cuts LLM fine-tuning to a small adapter layer, reducing VRAM, training time, and cost for teams with modest GPUs.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780220885960-6xi7.png","2026-05-31T09:47:34.486127+00:00",{"id":89,"slug":90,"title":91,"summary":92,"category":26,"image_url":93,"cover_image":93,"language":19,"created_at":94},"a7495002-c056-4f43-a567-2b844f4ba52d","how-to-fine-tune-llms-with-sft-lora-and-rlhf-en","How to Fine-Tune LLMs with SFT, LoRA, and RLHF","Learn how to fine-tune a large language model with supervised training, LoRA, and alignment methods like RLHF and DPO.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780121881469-ao1d.png","2026-05-30T06:17:24.967007+00:00",{"id":96,"slug":97,"title":98,"summary":99,"category":49,"image_url":100,"cover_image":100,"language":19,"created_at":101},"224d9d33-0943-460b-80f8-14daa49fc7f0","how-to-fine-tune-an-llm-for-enterprise-en","How to Fine-Tune an LLM for Enterprise","A practical guide to choosing, training, and evaluating an enterprise LLM fine-tune.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779326634905-g9gx.png","2026-05-21T01:23:31.814244+00:00",{"id":103,"slug":104,"title":105,"summary":106,"category":26,"image_url":107,"cover_image":107,"language":19,"created_at":108},"d3d5812b-849a-4a6e-8c8c-d859618bd4b2","why-fine-tuning-llms-domain-tasks-right-default-en","Why fine-tuning LLMs for domain tasks is the right default","Fine-tuning is the best default when an LLM must be accurate in a narrow domain.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778916227001-iu04.png","2026-05-16T07:23:33.047894+00:00",{"id":110,"slug":111,"title":112,"summary":113,"category":78,"image_url":114,"cover_image":114,"language":19,"created_at":115},"aec8ac9b-8df2-4403-bf57-53f34783e3a0","lora-vs-qlora-vs-full-fine-tuning-en","LoRA vs QLoRA vs Full Fine-Tuning","A practical comparison of LoRA, QLoRA, and full fine-tuning for 2026 LLM projects.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778915640692-lzwf.png","2026-05-16T07:13:34.373862+00:00",{"id":117,"slug":118,"title":119,"summary":120,"category":26,"image_url":121,"cover_image":121,"language":19,"created_at":122},"346a0a80-82ae-4b5a-90fe-552ba3791de7","why-latent-agents-proves-internalized-debate-en","Why Latent Agents Proves Multi-Agent Debate Should Be Internalized","Latent Agents shows multi-agent debate works best when a single model internalizes it.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777944654721-4ftq.png","2026-05-05T01:30:23.124229+00:00",{"id":124,"slug":125,"title":126,"summary":127,"category":26,"image_url":128,"cover_image":128,"language":19,"created_at":129},"19f116fd-02dd-4a7d-9638-75a3bb70cae2","bounded-ratio-reinforcement-learning-ppo-en","Why Bounded Ratio RL Replaces PPO's Clipped Objective","BRRL gives PPO a cleaner theory, with BPO and GBPO aiming for more stable policy updates in control and LLM fine-tuning.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776751796218-p4in.png","2026-04-21T06:09:40.318224+00:00",{"id":131,"slug":132,"title":133,"summary":134,"category":17,"image_url":135,"cover_image":135,"language":19,"created_at":136},"4a3e15ba-07e8-4e4d-b5c8-d9a46deea8bd","aws-s3-sagemaker-unified-studio-fine-tuning-en","AWS uses S3 to speed LLM fine-tuning","AWS shows how SageMaker Unified Studio, S3, and MLflow can fine-tune Llama 3.2 11B Vision Instruct on DocVQA data.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775139362238-r31j.png","2026-04-02T14:15:38.340988+00:00",5]