[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-avise-ai-security-evaluation-framework-en":3,"tags-avise-ai-security-evaluation-framework-en":31,"related-lang-avise-ai-security-evaluation-framework-en":32,"related-posts-avise-ai-security-evaluation-framework-en":36,"series-research-b712257f-129d-400a-bc73-5e1c3ab200a4":73},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":30},"b712257f-129d-400a-bc73-5e1c3ab200a4","AVISE tests AI security with modular jailbreak evals","\u003Cp>AI systems are moving into critical workflows, but the tooling for checking whether they can be broken is still lagging behind. \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.20833\">AVISE: Framework for Evaluating the Security of AI Systems\u003C\u002Fa> tries to close that gap with a modular, open-source way to identify vulnerabilities and run security evaluations against models.\u003C\u002Fp>\u003Cp>The practical angle is straightforward: if you are shipping or integrating language models, you need a repeatable way to test whether a prompt strategy can jailbreak them. This paper proposes one such path, and then demonstrates it with an automated test designed around a multi-turn Red Queen attack.\u003C\u002Fp>\u003Ch2>What problem this paper is trying to fix\u003C\u002Fh2>\u003Cp>The paper starts from a simple but important observation: AI systems are being deployed in high-stakes settings, yet systematic security evaluation is still underdeveloped. That matters because vulnerabilities in these systems can lead to high-profile exploits and consequential failures, not just model weirdness in a demo.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776924767358-ocir.png\" alt=\"AVISE tests AI security with modular jailbreak evals\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>For engineers, the gap is not abstract. If security testing is ad hoc, it is hard to compare model behavior across releases, reproduce findings, or build an internal process that catches weak spots before deployment. AVISE is aimed at making that process more structured.\u003C\u002Fp>\u003Cp>The authors frame AVISE as a framework for both identifying vulnerabilities and evaluating security in AI systems and models. In other words, it is not just a single benchmark or one-off attack script; it is meant to be a foundation for building automated security tests.\u003C\u002Fp>\u003Ch2>How AVISE works in plain English\u003C\u002Fh2>\u003Cp>AVISE stands for AI Vulnerability Identification and Security Evaluation. The paper describes it as a modular open-source framework, which suggests the main design goal is extensibility: researchers and practitioners can plug in different tests and evaluation components rather than being locked into one fixed setup.\u003C\u002Fp>\u003Cp>The paper’s demonstration focuses on a specific attack path. The authors extend a theory-of-mind-based multi-turn Red Queen attack into an Adversarial Language Model, or ALM, augmented attack. The point of that extension is to make the attack more automated and more suited to security evaluation of language models.\u003C\u002Fp>\u003Cp>On top of that attack, they build an automated Security Evaluation Test, or SET, for discovering jailbreak vulnerabilities. The SET includes 25 test cases, and an Evaluation Language Model, or ELM, that decides whether a given test case successfully jailbroke the target model.\u003C\u002Fp>\u003Cp>That setup is important because it separates the attack generation from the judging step. In practice, this kind of split can make evaluation pipelines easier to automate, easier to repeat, and easier to adapt to different models or threat assumptions.\u003C\u002Fp>\u003Ch2>What the paper actually shows\u003C\u002Fh2>\u003Cp>The concrete results in the abstract are limited to the evaluation setup and the performance of the ELM, so there is no broad benchmark table or long list of attack success rates here. What the paper does report is that the ELM achieved 92% accuracy, an F1-score of 0.91, and a Matthews correlation coefficient of 0.83.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776924768153-dvky.png\" alt=\"AVISE tests AI security with modular jailbreak evals\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Those numbers matter because the evaluation model is doing a classification job: deciding whether each test case successfully produced a jailbreak. Accuracy alone can hide class imbalance, so the inclusion of F1 and Matthews correlation coefficient gives a more rounded view of performance.\u003C\u002Fp>\u003Cp>The paper also says AVISE was used to evaluate nine recently released language models of diverse sizes. According to the abstract, all nine were vulnerable to the augmented Red Queen attack, though to varying degrees.\u003C\u002Fp>\u003Cp>That is the main empirical takeaway: the attack worked across the set of models tested. The abstract does not provide the individual model names, the exact success rates per model, or the specific failure modes, so those details remain outside what we can conclude from the source text alone.\u003C\u002Fp>\u003Cul>\u003Cli>AVISE is modular and open-source.\u003C\u002Fli>\u003Cli>The demo attack is an ALM-augmented version of a multi-turn Red Queen attack.\u003C\u002Fli>\u003Cli>The SET has 25 test cases.\u003C\u002Fli>\u003Cli>The ELM scored 92% accuracy, 0.91 F1, and 0.83 MCC.\u003C\u002Fli>\u003Cli>All nine evaluated models were vulnerable to the attack, to varying degrees.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Why developers should care\u003C\u002Fh2>\u003Cp>If you build with language models, the biggest value here is not the specific attack name. It is the idea of turning AI security evaluation into something you can automate, version, and repeat. That is the difference between a one-time red-team exercise and a usable engineering practice.\u003C\u002Fp>\u003Cp>AVISE also points toward a broader operational need: security testing for AI systems should be more reproducible. A modular framework makes it easier to compare results across model versions, swap in new tests, and standardize how jailbreak findings are measured.\u003C\u002Fp>\u003Cp>For teams shipping models or model-based products, a framework like this could fit into pre-release validation, regression testing, or internal red-teaming workflows. The paper does not claim AVISE is a complete solution, but it does present a concrete step toward more rigorous AI security evaluation.\u003C\u002Fp>\u003Ch2>Limits and open questions\u003C\u002Fh2>\u003Cp>The abstract is clear about the framework and the demo, but it leaves a lot unsaid. We do not get a full breakdown of the nine models, the exact prompts in the 25 test cases, or how the attack performs under different deployment conditions. We also do not see evidence here that the framework covers threats beyond jailbreak discovery in language models.\u003C\u002Fp>\u003Cp>Another limitation is that the reported metrics are for the ELM, not necessarily for the overall security framework across all settings. A strong evaluator does not automatically mean a strong real-world defense; it means the test harness can classify outcomes with decent reliability.\u003C\u002Fp>\u003Cp>So the safest reading is this: AVISE looks like an infrastructure piece for AI security evaluation, not a finished answer to model safety. Its value is in making testing more systematic and easier to reproduce, especially for jailbreak-style attacks.\u003C\u002Fp>\u003Cp>For practitioners, that is still useful. \u003Ca href=\"\u002Fnews\u002Fopenai-gpt-54-cyber-security-access-en\">Security work\u003C\u002Fa> tends to fail when it is manual, inconsistent, or hard to rerun. AVISE is trying to make AI vulnerability testing feel more like normal engineering: modular, automated, and measurable.\u003C\u002Fp>","AVISE is an open-source framework for finding AI vulnerabilities, with a 25-case jailbreak test that flagged all nine models as vulnerable.","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.20833",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776924767358-ocir.png",[13,14,15,16,17],"AI security","jailbreaks","language models","red teaming","evaluation framework","en",0,false,"2026-04-23T06:12:31.125572+00:00","2026-04-23T06:12:31.076+00:00","done","239fa648-66eb-48bc-9690-1bd6ac69d2ca","avise-ai-security-evaluation-framework-en","research","7ec4baa4-f0af-441e-a97d-56f81a2ca854","published","2026-04-23T09:00:08.536+00:00","2026-04-23T10:00:03.789+00:00",[],{"id":27,"slug":33,"title":34,"language":35},"avise-ai-security-evaluation-framework-zh","AVISE 模組化測 AI 安全漏洞","zh",[37,43,49,55,61,67],{"id":38,"slug":39,"title":40,"cover_image":41,"image_url":41,"created_at":42,"category":26},"0e7d8f32-289f-4117-861c-6feb9bd2eb29","parallel-sft-code-rl-cross-language-transfer-en","Parallel-SFT aims to make code RL transfer better","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776924587865-otqv.png","2026-04-23T06:09:32.496091+00:00",{"id":44,"slug":45,"title":46,"cover_image":47,"image_url":47,"created_at":48,"category":26},"2a6b0902-8cf2-42c9-9b38-59e6ed0294c9","speechparaling-bench-paralinguistic-speech-generation-en","SpeechParaling-Bench tests speech models on nuance","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776924234257-ns8c.png","2026-04-23T06:03:39.315548+00:00",{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":26},"89d74343-03a7-4325-88e0-14029dab320d","safe-continual-rl-changing-environments-en","Safe Continual RL for Changing Real-World Systems","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776838195882-6v8v.png","2026-04-22T06:09:33.432376+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":26},"ee3a99cb-0f1f-42b8-9bcf-9ac32ecc6770","random-neural-nets-fluctuations-phase-transitions-en","Random Neural Nets Show Phase-Shifted Fluctuations","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776838027807-14qw.png","2026-04-22T06:06:36.679543+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":26},"7fb8a4e6-2e67-41e8-8631-a9b482935aea","edge-of-stability-generalization-en","Why “edge of stability” can help generalization","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776837837398-ubbj.png","2026-04-22T06:03:36.883776+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":26},"19f116fd-02dd-4a7d-9638-75a3bb70cae2","bounded-ratio-reinforcement-learning-ppo-en","Bounded Ratio RL Reframes PPO's Clipped Objective","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776751796218-p4in.png","2026-04-21T06:09:40.318224+00:00",[74,79,84,89,94,99,104,109,114,119],{"id":75,"slug":76,"title":77,"created_at":78},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":80,"slug":81,"title":82,"created_at":83},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":85,"slug":86,"title":87,"created_at":88},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]