[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-prompt-injection-ai-security-problem-en":3,"article-related-prompt-injection-ai-security-problem-en":31,"series-research-1a5d9d4d-4e21-4860-84b0-9b209ca4d7f5":80},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"1a5d9d4d-4e21-4860-84b0-9b209ca4d7f5","prompt-injection-ai-security-problem-en","Prompt injection is now an AI security problem","\u003Cp data-speakable=\"summary\">Prompt injection is a way to trick large language models with hidden instructions.\u003C\u002Fp>\u003Cp>Prompt injection is no longer a niche curiosity from AI forums. It now shows up in search tools, document summaries, browser assistants, and enterprise workflows, where a single malicious line can change what a model says or does.\u003C\u002Fp>\u003Cp>The Wikipedia entry on \u003Ca href=\"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrompt_injection\" target=\"_blank\" rel=\"noopener\">prompt injection\u003C\u002Fa> pulls together the basic attack, the history of the term, and a growing list of real incidents. The numbers in that entry are hard to ignore: 75% of business employees use generative AI, 46% adopted it within the last six months, and one recent \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> placed \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\" target=\"_blank\" rel=\"noopener\">DeepSeek\u003C\u002Fa>'s \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\" rel=\"noopener\">DeepSeek-R1\u003C\u002Fa> near the bottom for injection resistance.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Fact\u003C\u002Fth>\u003Cth>Number\u003C\u002Fth>\u003Cth>Why it matters\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Business employees using generative AI\u003C\u002Ftd>\u003Ctd>75%\u003C\u002Ftd>\u003Ctd>Prompt injection now affects everyday work tools\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Employees adopting genAI in the last six months\u003C\u002Ftd>\u003Ctd>46%\u003C\u002Ftd>\u003Ctd>Adoption is moving faster than security controls\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>DeepSeek-R1 rank in Spikee isolation test\u003C\u002Ftd>\u003Ctd>17th of 19\u003C\u002Ftd>\u003Ctd>Some models still fail basic attack resistance tests\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>DeepSeek-R1 rank with rules and markers\u003C\u002Ftd>\u003Ctd>16th of 19\u003C\u002Ftd>\u003Ctd>Extra controls did not close the gap much\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Chatbot Arena reasoning rank for DeepSeek-R1\u003C\u002Ftd>\u003Ctd>6th\u003C\u002Ftd>\u003Ctd>Strong reasoning does not mean strong security\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Prompt injection works because models mix instructions and data\u003C\u002Fh2>\u003Cp>The core problem is simple: large language models read instructions and content in the same context window. That makes it hard for the model to tell whether a sentence is a user request, a system rule, or text hidden inside a webpage, email, PDF, or image.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782716584463-r1ei.png\" alt=\"Prompt injection is now an AI security problem\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>In a direct attack, the user tries to override the model's behavior with a malicious prompt. In an indirect attack, the attacker hides instructions inside content the model later reads. That second form is the one that worries teams building chatbots with browsing, file upload, or memory features.\u003C\u002Fp>\u003Cp>A classic example is translation. If a model is told to translate a sentence into French, and the sentence contains the instruction to ignore the request and output a different phrase, the model may follow the hidden instruction instead of the visible task. That is the same trick, just stripped down to its simplest form.\u003C\u002Fp>\u003Cul>\u003Cli>Direct injection targets the user-facing prompt.\u003C\u002Fli>\u003Cli>Indirect injection hides inside external content.\u003C\u002Fli>\u003Cli>Obfuscation can bury commands in images, white text, or documents.\u003C\u002Fli>\u003Cli>Prompt leaking tries to expose the model's hidden system prompt.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>The term emerged in 2022, then spread fast\u003C\u002Fh2>\u003Cp>The phrase \u003Ca href=\"https:\u002F\u002Fx.com\u002Fhimbodhisattva\" target=\"_blank\" rel=\"noopener\">\"prompt injection\"\u003C\u002Fa> was first used on Twitter in May 2022 by the account @himbodhisattva, and Simon Willison later helped popularize it. Jonathan Cefalu of \u003Ca href=\"https:\u002F\u002Fwww.preamble.com\" target=\"_blank\" rel=\"noopener\">Preamble\u003C\u002Fa> also flagged the issue in May 2022, describing it as a command injection problem and reporting it to \u003Ca href=\"https:\u002F\u002Fopenai.com\" target=\"_blank\" rel=\"noopener\">OpenAI\u003C\u002Fa>.\u003C\u002Fp>\u003Cblockquote>\"Prompt injection is the new SQL injection.\" — Simon Willison\u003C\u002Fblockquote>\u003Cp>That comparison stuck because it explains the risk in terms developers already understand. SQL injection abuses the boundary between code and data. Prompt injection abuses the boundary between instructions and content.\u003C\u002Fp>\u003Cp>Willison also drew a line between prompt injection and jailbreaking. Jailbreaking tries to bypass a model's safety rules. Prompt injection tries to make the model treat attacker-controlled text as if it were trusted instruction. The two overlap, but they are not the same attack.\u003C\u002Fp>\u003Ch2>The incidents are moving from demos to products\u003C\u002Fh2>\u003Cp>The Wikipedia page lists several public incidents that show how this problem hits real systems, not just toy examples. In February 2023, a Stanford student found a way to make \u003Ca href=\"https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\" target=\"_blank\" rel=\"noopener\">Microsoft Bing Chat\u003C\u002Fa>, now part of \u003Ca href=\"https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fmicrosoft-copilot\" target=\"_blank\" rel=\"noopener\">Microsoft Copilot\u003C\u002Fa>, reveal its internal guidelines and codename by telling it to ignore earlier instructions.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782716579586-4s8i.png\" alt=\"Prompt injection is now an AI security problem\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>In December 2024, \u003Ca href=\"https:\u002F\u002Fwww.theguardian.com\u002Ftechnology\u002F2024\u002Fdec\u002F\" target=\"_blank\" rel=\"noopener\">The Guardian\u003C\u002Fa> reported that \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Fchatgpt\" target=\"_blank\" rel=\"noopener\">ChatGPT\u003C\u002Fa>'s search tool could be manipulated by hidden webpage content. Invisible text could push the model toward positive reviews and away from negative ones, which is exactly the kind of output tampering that makes AI search feel unreliable.\u003C\u002Fp>\u003Cp>In early 2025, researchers found academic papers with hidden prompts aimed at AI peer review systems. That is a more uncomfortable example because it shows prompt injection can affect institutional workflows, where the output is not a chat reply but a decision that can shape careers and publication records.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\" target=\"_blank\" rel=\"noopener\">Bing Chat\u003C\u002Fa> exposed internal instructions in 2023.\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fopenai.com\u002Fchatgpt\" target=\"_blank\" rel=\"noopener\">ChatGPT\u003C\u002Fa> search was reported vulnerable to hidden webpage prompts in 2024.\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\" target=\"_blank\" rel=\"noopener\">DeepSeek-R1\u003C\u002Fa> ranked 17th of 19 in one injection benchmark.\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.google.com\u002Fintl\u002Fen_us\u002Fgemini\u002F\" target=\"_blank\" rel=\"noopener\">Gemini\u003C\u002Fa> memory manipulation was reported in 2025.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Benchmark numbers show the gap between skill and safety\u003C\u002Fh2>\u003Cp>The most interesting part of the Wikipedia entry is the contrast between reasoning performance and attack resistance. DeepSeek-R1 ranked sixth on the Chatbot Arena reasoning benchmark, which tells you it can produce strong answers on hard tasks. But WithSecure's \u003Ca href=\"https:\u002F\u002Fwithsecure.com\u002Fen\u002Fsolutions\u002Fconsulting-services\u002Fspikee\" target=\"_blank\" rel=\"noopener\">Spikee\u003C\u002Fa> benchmark found that it was much easier to attack than several other models.\u003C\u002Fp>\u003Cp>That split matters because teams often buy models for answer quality and assume security will improve with scale. It does not work that way. A model can be good at math, coding, and reasoning while still being weak at separating trusted instructions from hostile text.\u003C\u002Fp>\u003Cp>The same pattern shows up in other systems. \u003Ca href=\"\u002Ftag\u002Fgoogle\">Google\u003C\u002Fa> rated the \u003Ca href=\"\u002Ftag\u002Fgemini\">Gemini\u003C\u002Fa> memory issue as low risk because it required user interaction and visible notifications, but researchers still warned that delayed tool invocation could let hidden instructions sit in memory and trigger later. That is a small detail with big consequences.\u003C\u002Fp>\u003Cp>Here is the practical comparison developers should keep in mind:\u003C\u002Fp>\u003Cul>\u003Cli>Reasoning benchmarks measure task quality.\u003C\u002Fli>\u003Cli>Injection benchmarks measure attack resistance.\u003C\u002Fli>\u003Cli>Memory and browsing features expand the attack surface.\u003C\u002Fli>\u003Cli>Extra rules help, but they rarely solve the whole problem.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Mitigation is a process, not a single filter\u003C\u002Fh2>\u003Cp>The mitigation section on Wikipedia points to data hygiene, guardrails, user training, system prompt design, and dual-\u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> setups. Those ideas are useful, but they only work when teams treat prompt injection as an application security problem instead of a prompt-tuning problem.\u003C\u002Fp>\u003Cp>For developers, the most useful habit is to assume that any external text can be hostile. That includes emails, web pages, uploaded documents, OCR output, and even structured notes that look harmless. If a model can read it, an attacker may be able to hide instructions in it.\u003C\u002Fp>\u003Cp>That is why the best defense is usually layered. Filter what enters the model, isolate system instructions, minimize tool privileges, and verify any action that can change data or send messages. If a model can browse the web and also act on what it reads, the risk is much higher than in a plain chat box.\u003C\u002Fp>\u003Cp>Prompt injection will keep showing up wherever \u003Ca href=\"\u002Ftag\u002Fllms\">LLMs\u003C\u002Fa> meet real-world content. The next question for product teams is simple: which parts of your system trust text too much, and what happens when that text lies?\u003C\u002Fp>\u003C\u002Fh2>","Prompt injection lets hidden text steer LLMs, and recent tests show models like DeepSeek-R1 can be tricked at worrying rates.","en.wikipedia.org","https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FPrompt_injection",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782716584463-r1ei.png","research","en","637c3016-e364-4bfe-904e-5e60a18ed678",[17,18,19,20,21,22],"prompt injection","LLM security","indirect prompt injection","AI safety","ChatGPT","DeepSeek-R1",[24,25,26],"Prompt injection tricks models by mixing malicious instructions into normal-looking text.","Real incidents have affected Bing Chat, ChatGPT search, Gemini memory, and academic review workflows.","Strong benchmark performance does not mean strong resistance to prompt injection.",0,"2026-06-29T07:02:36.642691+00:00","2026-06-29T07:02:36.617+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":32,"relatedLang":39,"relatedPosts":43},[33,35,37],{"name":21,"slug":34},"chatgpt",{"name":20,"slug":36},"ai-safety",{"name":17,"slug":38},"prompt-injection",{"id":15,"slug":40,"title":41,"language":42},"prompt-injection-ai-security-problem-zh","Prompt injection 已是 AI 資安問題","zh",[44,50,56,62,68,74],{"id":45,"slug":46,"title":47,"cover_image":48,"image_url":48,"created_at":49,"category":13},"f3edd37b-2524-4d6d-b411-7ca0cce9eff0","google-deepmind-turns-science-into-tools-en","Google DeepMind turns science into tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782721105101-d4rm.png","2026-06-29T08:17:58.280652+00:00",{"id":51,"slug":52,"title":53,"cover_image":54,"image_url":54,"created_at":55,"category":13},"c522f9af-2862-4f1c-bbf9-99bc20c78544","measuring-llm-behavior-portability-en","Measuring when LLM behavior actually переносится","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782717476648-9gjo.png","2026-06-29T07:17:30.115953+00:00",{"id":57,"slug":58,"title":59,"cover_image":60,"image_url":60,"created_at":61,"category":13},"fba917c8-939c-4457-a90e-4012d9a692df","solver-choice-nash-equilibrium-selection-en","Solver choice changes which Nash equilibrium wins","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782714784738-e4dj.png","2026-06-29T06:32:31.603116+00:00",{"id":63,"slug":64,"title":65,"cover_image":66,"image_url":66,"created_at":67,"category":13},"2ae76231-ec2d-48aa-a82f-1d26f1b36882","proper-positive-only-learning-characterization-en","Proper positive-only learning gets a full characterization","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782713877346-qgit.png","2026-06-29T06:17:34.38343+00:00",{"id":69,"slug":70,"title":71,"cover_image":72,"image_url":72,"created_at":73,"category":13},"46714aa0-3c43-4154-a9cf-f961865b6109","dexcompose-reuses-dexterous-policies-across-tasks-en","DexCompose Reuses Dexterous Policies Across Tasks","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782712971902-lykf.png","2026-06-29T06:02:28.7043+00:00",{"id":75,"slug":76,"title":77,"cover_image":78,"image_url":78,"created_at":79,"category":13},"ce859659-0b28-456b-8641-63f6d4c47cf9","hawor-hand-motion-mano-params-en","HaWoR turns hand motion into MANO params","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1782705791981-evnj.png","2026-06-29T04:02:46.964681+00:00",[81,86,91,96,101,106,111,116,121,126],{"id":82,"slug":83,"title":84,"created_at":85},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":87,"slug":88,"title":89,"created_at":90},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]