[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-rlmf-teaches-llms-express-uncertainty-better-en":3,"article-related-rlmf-teaches-llms-express-uncertainty-better-en":30,"series-research-4987870f-92aa-4f80-8eb7-aa8f0109337e":75},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"4987870f-92aa-4f80-8eb7-aa8f0109337e","rlmf-teaches-llms-express-uncertainty-better-en","RLMF teaches LLMs to express uncertainty better","\u003Cp data-speakable=\"summary\">A new RL method uses metacognitive feedback to make \u003Ca href=\"\u002Ftag\u002Fllms\">LLMs\u003C\u002Fa> express uncertainty more faithfully.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Research org\u003C\u002Fstrong>: Unspecified in arXiv abstract\u003C\u002Fli>\u003Cli>\u003Cstrong>Core data\u003C\u002Fstrong>: Up to 63% better than standard RL\u003C\u002Fli>\u003Cli>\u003Cstrong>Breakthrough\u003C\u002Fstrong>: Reinforcement learning with metacognitive feedback plus metacognitive data selection\u003C\u002Fli>\u003C\u002Ful>\u003Cp>This paper is about a problem engineers run into all the time: models that sound confident even when they should not. The authors argue that if a model can judge its own performance more accurately, that signal can be used to improve how it expresses uncertainty, not just how often it gets answers right.\u003C\u002Fp>\u003Cp>That matters because confidence is part of the product surface. If a chatbot, \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa>, or assistant overstates what it knows, users make bad decisions. The paper’s core idea is to treat metacognition as a training signal, then use it to shape both calibration and the language a model uses to communicate uncertainty.\u003C\u002Fp>\u003Ch2>What problem this paper is trying to fix\u003C\u002Fh2>\u003Cp>The abstract frames \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> metacognition as a weak spot. Current models can hallucinate with high confidence, miss their own knowledge boundaries, and misrepresent internal uncertainty. In practice, that means a model may be right often enough to seem useful, but still be unreliable in the moments that matter most.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782887573710-gn6d.png\" alt=\"RLMF teaches LLMs to express uncertainty better\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The specific target here is faithful calibration, or FC. In plain terms, FC means the model’s expressed uncertainty should match its intrinsic uncertainty. The authors call this fundamentally metacognitive, because it is not just about producing an answer; it is about knowing how sure the model actually is and saying so honestly.\u003C\u002Fp>\u003Cp>That distinction is important for developers. Many systems already have some kind of confidence score, refusal policy, or uncertainty phrasing. But if those signals are not aligned with the model’s real internal state, they can become decorative rather than useful.\u003C\u002Fp>\u003Ch2>How the method works in plain English\u003C\u002Fh2>\u003Cp>The paper introduces two linked ideas. The first is \u003Ca href=\"\u002Ftag\u002Freinforcement-learning\">reinforcement learning\u003C\u002Fa> with metacognitive feedback, or RLMF. Instead of using only the usual preference signal during optimization, RLMF refines completion rankings based on how good the model’s self-judgments of performance are.\u003C\u002Fp>\u003Cp>The second idea is metacognitive data selection. Here, similar self-judgments are used to choose high-value training examples. The authors say this outperforms naive active learning, which suggests that the model’s own sense of where it is weak can help pick better training data than simpler selection strategies.\u003C\u002Fp>\u003Cp>The training setup is two-stage and decoupled. First, the methods calibrate the faithfulness of the model’s self-reported confidence scores. Then the system maps those calibrated scores to natural, context-adaptable linguistic uncertainty using targeted output editing.\u003C\u002Fp>\u003Cp>That two-step design is practical. It separates “how sure is the model internally?” from “how should it phrase that uncertainty to a user?” For product teams, that separation is useful because internal scoring and outward wording often need different controls.\u003C\u002Fp>\u003Ch2>What the paper actually shows\u003C\u002Fh2>\u003Cp>The abstract says the authors ran extensive experiments and found that RLMF achieves generalizable, state-of-the-art faithful calibration on diverse tasks while preserving accuracy. It also says the method improves the model’s ability to assess and express its own capability limits.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782887567409-wwhs.png\" alt=\"RLMF teaches LLMs to express uncertainty better\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>There are only a few concrete numbers in the abstract. The clearest one is that RLMF surpasses standard RL by up to 63%. The abstract does not include the task-by-task \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> table, so it does not give the full list of scores or evaluation metrics in the source text provided here.\u003C\u002Fp>\u003Cp>What we can say from the abstract is narrower but still useful: the method is not just making models more cautious by lowering confidence across the board. The claim is that it improves faithfulness while preserving accuracy, which is the key tradeoff most teams care about.\u003C\u002Fp>\u003Cp>The paper also makes a broader claim about RL signals. The authors suggest that metacognitive performance is an effective reinforcement learning signal and may overcome limits of prior intrinsic feedback methods. That positions the work as a training strategy, not just a calibration trick.\u003C\u002Fp>\u003Ch2>Why developers should care\u003C\u002Fh2>\u003Cp>If you are building an assistant, agent, or any workflow that depends on self-reported certainty, this paper points to a better way to train that behavior. A model that can better estimate its own limits is easier to trust, easier to route, and less likely to confidently mislead users.\u003C\u002Fp>\u003Cp>It also suggests a useful design pattern: separate internal uncertainty estimation from user-facing wording. That \u003Ca href=\"\u002Fnews\u002Fopencode-free-model-agnostic-ai-agent-en\">gives teams\u003C\u002Fa> more control over when to surface uncertainty, how to phrase it, and how to keep the model from sounding more certain than it really is.\u003C\u002Fp>\u003Cp>For retrieval systems, support bots, or decision tools, this could translate into better refusal behavior, better escalation triggers, and more honest answers when the model is outside its comfort zone. The paper does not claim that metacognitive feedback solves hallucinations outright, but it does argue that it improves the model’s ability to recognize and express uncertainty faithfully.\u003C\u002Fp>\u003Ch2>Limitations and open questions\u003C\u002Fh2>\u003Cp>The abstract leaves several practical questions unanswered. It does not provide the full benchmark breakdown, the exact datasets, or the detailed evaluation protocol in the text we have here. It also does not tell us how expensive the method is to train or whether the gains hold under all deployment settings.\u003C\u002Fp>\u003Cul>\u003Cli>The source does not include benchmark tables or task-level metrics.\u003C\u002Fli>\u003Cli>The source does not specify compute cost, training time, or inference overhead.\u003C\u002Fli>\u003Cli>The abstract does not say how well the method transfers to production-style user interactions.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>There is also an implementation question around the two-stage design. Calibrating self-reported confidence and then editing outputs into natural language uncertainty sounds sensible, but teams will still need to decide how much of that pipeline should be learned, how much should be rule-based, and how to audit failures.\u003C\u002Fp>\u003Cp>Even with those gaps, the paper’s direction is clear: use the model’s own metacognitive signal as training data. For engineers working on trustworthy AI, that is a promising lever because it targets the gap between what a model knows, what it says it knows, and what users think it knows.\u003C\u002Fp>\u003Cp>In short, this is a calibration paper with a product angle. It is less about making LLMs smarter in the abstract and more about making them more honest about uncertainty in ways that can be trained, measured, and potentially deployed.\u003C\u002Fp>","A new RL method uses metacognitive feedback to make LLMs express uncertainty more faithfully.","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.32032",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782887573710-gn6d.png","research","en","0ee8cc51-c309-4477-8914-82f7824161e3",[17,18,19,20,21],"LLM calibration","uncertainty expression","reinforcement learning","metacognition","faithful calibration",[23,24,25],"RLMF uses metacognitive feedback to improve how LLMs express uncertainty.","The method is two-stage: calibrate confidence first, then edit outputs into natural uncertainty.","The abstract reports up to 63% improvement over standard RL while preserving accuracy.",0,"2026-07-01T06:32:29.360612+00:00","2026-07-01T06:32:29.351+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":34,"relatedPosts":38},[32],{"name":19,"slug":33},"reinforcement-learning",{"id":15,"slug":35,"title":36,"language":37},"rlmf-teaches-llms-express-uncertainty-better-zh","RLMF 讓 LLM 更會表達不確定","zh",[39,45,51,57,63,69],{"id":40,"slug":41,"title":42,"cover_image":43,"image_url":43,"created_at":44,"category":13},"c31a1ae3-05aa-445e-a8c4-efafed7fbc2d","qval-dense-supervision-testbed-long-horizon-agents-en","QVal tests dense supervision before training","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782886678947-rwaj.png","2026-07-01T06:17:34.353581+00:00",{"id":46,"slug":47,"title":48,"cover_image":49,"image_url":49,"created_at":50,"category":13},"28e23e1d-1463-4129-9d01-f0aa4e3578e6","self-explanation-training-tracks-model-behavior-en","Self-Explanation Training Still Tracks Model Behavior","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782885775255-0o56.png","2026-07-01T06:02:31.014016+00:00",{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":13},"c6744f0f-9be6-4da8-8bab-3b4fbfe127ba","worldevolver-self-evolving-world-models-llm-planning-en","WorldEvolver lets LLM agents revise foresight","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782801184442-vqwa.png","2026-06-30T06:32:29.368198+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":13},"d1c3c523-563b-4044-8071-3d9eddbe1fb5","levo-2-full-length-song-generation-en","LeVo 2 tackles full-length song generation","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782800281314-56al.png","2026-06-30T06:17:32.527415+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":13},"a59be5b9-166f-4ef9-af4d-37b1d39874f6","vlk-synthetic-humanoid-loco-manipulation-en","VLK trains humanoid motion from synthetic scenes","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782799374018-zmv6.png","2026-06-30T06:02:30.235591+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":13},"e4fcd8f3-1391-4ef3-b44d-1aab77b30fca","claude-sonnet-46-sre-benchmark-rootly-en","Claude Sonnet 4.6 narrows the SRE gap","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782750772754-cmvk.png","2026-06-29T16:32:28.970805+00:00",[76,81,86,91,96,101,106,111,116,121],{"id":77,"slug":78,"title":79,"created_at":80},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":82,"slug":83,"title":84,"created_at":85},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":87,"slug":88,"title":89,"created_at":90},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]