OpenClaw skills - an example of a prompt injection attack

p.kaczmarek2 261 4

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Report a violation of the law

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

» | Topic author Helpful post? (+3)

Post #1
21836470 12 Feb 2026 09:29

Zack Korman in his GitHub repository skills presents a simple 'prompt injection' attack to which modern AI agent systems such as OpenClaw are vulnerable. "Prompt injection", as the name suggests, involves 'injecting' a malicious command into the data processed by the model. LLM models cannot natively distinguish between the prompt (command) and the data, so they can potentially perform unwanted actions commanded to them by the attacker. This is particularly dangerous for agents such as OpenClaw, as they often have access to large amounts of sensitive systems and personal data.

The attack shown here hides the malicious command in a Skill file, which is a skill for use by the agent. The skills system is simply a system of text files (prompts) that the agent loads to 'acquire' new skills. Skill files are often in Markdown format, further hiding the command from the user by putting it in HTML comment format. This is what a user visiting GitHub sees:

This, on the other hand, is seen by the agent (the source of the document):

The attack shown downloads the file via curl and executes it locally as a bash script.
As you can see, the GitHub mechanisms themselves effectively hide the malicious command, which is not visible in any way, even with a meticulous analysis of the rendered skill description. Only checking the source of the file can protect us from the attack.

The number of attacks of this type is increasing with the popularity of OpenClaw. Security researchers warn that 28 malicious 'skils' appeared between 27 and 29 January 2026, with an increase of 386 between 31 January and 2 February. Malicious 'skils' often impersonate cryptocurrency-related tools and distribute malware on Windows and macOS platforms. Their aim is to steal passwords and keys.

In summary, agent systems offer great opportunities, but at the same time introduce new threats. Prompt injection in skills files shows how thin the line between functionality and vulnerability is. User awareness and the development of defence mechanisms are key to the safe development of this technology and excessive euphoria and the introduction of untested solutions pose a serious risk to the sensitive data stored on our computers.

Sources:
https://opensourcemalware.com/blog/clawdbot-skills-ganked-your-crypto
https://github.com/ZackKorman/skills

Do you use a skills system for AI agents? Do you manually verify the source of every document your AI processes?

Cool? Ranking DIY
Helpful post? Buy me a coffee.
About Author
p.kaczmarek2 p.kaczmarek2

Moderator Smart Home
Offline

Joined: 26 Dec 2014

Posts: 14003

Help: 634

Posts rating: 11805

Points: 135224
p.kaczmarek2 wrote 14003 posts with rating 11805, helped 634 times. Been with us since 2014 year.
ADVERTISEMENT
#2 21836727 12 Feb 2026 13:59

kolor kolor

Level 13

» | Helpful post? (+1)

Post #2
21836727 12 Feb 2026 13:59

Total surveillance, and uncontrolled interception of activities e.g. banking, logging etc. by AI is real
The solution could be a local bot-antibot, like an antivirus because normal antiviruses will be defeated.
Maybe use the system behind https://www.qubes-os.org/ for now, it is little known advertised as secure,
it's a modified Linux/Fedora, and is distinguished by the fact that it divides running programs into quote "isolated virtual machines".
Review: https://www.reddit.com/r/linux/comments/tjr0qx/qubes_os_review/?tl=pl.
ADVERTISEMENT
#3 21837388 13 Feb 2026 10:02

gulson gulson

System Administrator

» | Helpful post? (+1)

Post #3
21837388 13 Feb 2026 10:02

What are they doing? They give the bot access to everything and then the cry-bot got "infected" with the prompt and made transfers, sent out spam, scammed people.
I installed this toy on an isolated VPS and when I saw that this thing was sending 100k tokens (i.e. whole books of instructions) to companies, I disconnected.
If I send 100k descriptions and tools myself, every model will pick something, adjust it and do the job.
Zero optimisation, zero security.

But but... this is the seed for making something safer and smaller - specialised, so it can't be ignored and ridiculed like this.
ADVERTISEMENT
#4 21837398 13 Feb 2026 10:13

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #4
21837398 13 Feb 2026 10:13

At this point, the very nature of LLMs is the source of the trouble. I wonder how this will develop further. Maybe they'll come up with a new mode of operation for LLMs - separately the prompt - I don't know, some other weight - and separately the data? Such a modification of the architecture?

Or maybe another LLM-supervisor, evaluating separately a given piece of text, whether it is malicious?

Or maybe you just need more compute and a properly trained LLM will be less susceptible?

Looking at the rate of AI development it's probably within a few dozen years we'll find out....

kolor wrote:

Maybe use the system behind https://www.qubes-os.org/ for now, it is little known advertised as secure,
is a modified Linux/Fedora, and is distinguished by the fact that it divides running programs into quote "isolated virtual machines".
Review: https://www.reddit.com/r/linux/comments/tjr0qx/qubes_os_review/?tl=pl.

Do you want a test of such a system (in the context of electronics and working with electronics) to appear on Elektroda? E.g. can you run CAD programs, for PCBs, etc. there?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#5 21837502 13 Feb 2026 12:12

kolor kolor

Level 13

» | Helpful post? (0)

Post #5
21837502 13 Feb 2026 12:12

Regarding the nature of LLMs , it is worth reviewing this code from github, programmers will surely understand roughly what it is about.
https://github.com/ggml-org/llama.cpp/blob/master/examples/training/finetune.cpp.
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Didn't find an answer? Ask Artificial Intelligence

*I agree to send the question to OpenAI, Anthropic PBC, Perplexity AI, Inc., Kagi Inc., Google LLC - owners of language models in order to prepare the best response. The companies may monitor and log information entered into the form.

*I agree to publicly display my question and answer. The question and answer will be publicly available to everyone. The process may take a few minutes. Upon completion, you will be redirected to the page with the answer.

Wait...(2min)

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

Home page
/
Forum
/
Artificial Intelligence (AI)
/
AI News
/
OpenClaw skills - an example of a prompt injection attack

FAQ

TL;DR: 28 malicious “skills” appeared on Jan 27–29, 2026; “LLM models cannot natively distinguish between the prompt and the data.” [Elektroda, p.kaczmarek2, post #21836470]

Why it matters: If you load unvetted agent skills, a hidden prompt can run commands and steal credentials.

Who this is for: OpenClaw users, AI-agent builders, security engineers, and crypto holders seeking practical defenses.

Quick Facts

- Malicious skills spiked by +386 between Jan 31 and Feb 2, 2026, targeting Windows and macOS for credential theft. [Elektroda, p.kaczmarek2, post #21836470]
- Attackers hide commands in Markdown HTML comments; the rendered page looks safe, but the source runs curl|bash. [Elektroda, p.kaczmarek2, post #21836470]
- Only viewing the raw source reliably reveals the injected command; rendered views can conceal it. [Elektroda, p.kaczmarek2, post #21836470]
- Isolation-first workflow: compartmentalize apps into separate VMs to limit blast radius (e.g., Qubes OS model). [Elektroda, kolor, post #21836727]
- Typical goal of these skills: exfiltrate passwords and crypto keys via downloaded scripts executed locally. [Elektroda, p.kaczmarek2, post #21836470]

What is a prompt injection attack in AI agents?

A prompt injection embeds attacker instructions inside data the model processes. Because agents treat prompts and data similarly, hidden commands can trigger actions, like downloading and executing scripts. In skills-based agents, the injected text may live inside the skill file itself, not the user prompt. “LLM models cannot natively distinguish between the prompt and the data,” which enables this class of abuse. [Elektroda, p.kaczmarek2, post #21836470]

How did the OpenClaw skills attack work?

The attacker placed a malicious command inside a Skill file. The command was hidden in Markdown as an HTML comment, invisible in the rendered view. When loaded, the agent read the source and executed a curl command piped to bash, running code locally. The deception relies on the difference between rendered Markdown and its raw source. [Elektroda, p.kaczmarek2, post #21836470]

Why are Markdown HTML comments risky here?

Markdown comments are not displayed in rendered views, so users miss the payload during visual inspection. Agents, however, read the raw text and parse the hidden instructions. This mismatch lets attackers smuggle commands past human reviewers while still influencing the agent’s behavior on load. [Elektroda, p.kaczmarek2, post #21836470]

What is a Skill file in OpenClaw-style agents?

A Skill file is plain text, often Markdown, that instructs the agent how to perform a capability. Loading it extends the agent with new behaviors. Because it is just text, hidden prompts or shell commands can be embedded, so every Skill must be treated like executable input. [Elektroda, p.kaczmarek2, post #21836470]

What platforms and data do malicious skills target?

Researchers observed Windows and macOS payloads distributed through malicious skills. Their objective includes stealing passwords and private keys. Crypto-themed skills are common lures. Between Jan 31 and Feb 2, 2026, malicious skills increased by 386, indicating rapid weaponization. [Elektroda, p.kaczmarek2, post #21836470]

How can I safely install a Skill? (3-step How-To)

Download the Skill as raw text and review every line, including comments.
Block any network or shell execution strings (curl, wget, bash, PowerShell).
Test inside an isolated VM or container with no secrets or wallet access. [Elektroda, p.kaczmarek2, post #21836470]

How do I manually verify the source of a Skill file?

Always open the raw source, not the rendered page. Search for shell invocations, obfuscated URLs, base64 blobs, or suspicious HTML comments. Treat any instruction that downloads and executes code as hostile. Only proceed after removing or disabling these sections and retesting in isolation. [Elektroda, p.kaczmarek2, post #21836470]

Will traditional antivirus stop these agent-skill attacks?

Not reliably. Skills trigger actions through the agent, which may bypass classic detection patterns. One proposed approach is a local bot-antibot layer and strong OS-level isolation. “Total surveillance and uncontrolled interception” threats require compartmentalization to reduce impact if one Skill misbehaves. [Elektroda, kolor, post #21836727]

What is Qubes OS and why is it recommended here?

Qubes OS is a security-focused system that splits tasks into isolated virtual machines. This compartmentalization limits what a compromised process can access. Running agents and testing Skills in separate VMs reduces lateral movement and protects credentials and wallets. [Elektroda, kolor, post #21836727]

What simple red flags reveal a malicious Skill?

Watch for crypto-themed branding, urges to paste API keys early, or hidden sections in comments. Any instruction to run curl|bash or PowerShell from a remote URL is a high-confidence indicator. If a Skill requires admin rights during setup, halt and inspect. [Elektroda, p.kaczmarek2, post #21836470]

What is curl|bash and why is it dangerous in Skills?

It downloads a remote script with curl and immediately executes it with bash. This pattern gives attackers code execution on your machine without review. In Skills, it can be hidden in comments or templates and triggered when the agent reads the file. [Elektroda, p.kaczmarek2, post #21836470]

Can careful reading of a rendered Markdown page catch hidden commands?

No. Rendered Markdown may omit HTML comments and other hidden text. The forum example shows that even meticulous visual review misses the payload. Only reviewing the raw source reliably reveals embedded instructions or scripts. This is a critical edge-case failure. [Elektroda, p.kaczmarek2, post #21836470]

What is OpenClaw in this context?

OpenClaw is an AI agent system that loads Skills to gain new capabilities. Because agents can access sensitive systems and personal data, a single malicious Skill can cause significant harm, including data theft and system compromise. [Elektroda, p.kaczmarek2, post #21836470]

Should I verify every document my AI processes?

Yes. Treat all external text as untrusted code. Verify the origin, inspect the raw source, and strip or sandbox anything that can execute commands or call networks. The thread repeatedly emphasizes manual verification and defense-in-depth for safe operation. [Elektroda, p.kaczmarek2, post #21836470]

How do I sandbox agent activity to protect credentials and wallets?

Place the agent in an isolated VM with no password stores or wallets attached. Use separate VMs for browsing, development, and crypto. If the agent is compromised, the isolation prevents immediate access to secrets and limits lateral movement. [Elektroda, kolor, post #21836727]

OpenClaw skills - an example of a prompt injection attack

Didn't find an answer? Ask Artificial Intelligence