Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable
On 5 February 2026, Snyk's security research team published ToxicSkills — an ecosystem-wide audit of public Claude Code skills. The numbers are the kind that should make any platform team running an unmanaged skills install policy stop and re-read the report twice:
- 3,984 public Claude Code skills audited.
- 534 (13.4%) contained at least one critical-level security issue.
- 76 were confirmed malicious payloads, validated with a human-in-the-loop review.
Public skill marketplaces are dirty by default. If your developers can install arbitrary skills from a public source, you have an unmanaged supply-chain attack surface — and npm install lessons from the last decade tell you exactly how this ends.
Read the numbers honestly
Snyk's report headline mentions "1,467 malicious payloads," but the body draws a distinction the headline doesn't. 1,467 is the count of skills with security flaws at any severity — including dependency CVEs, weak permissions, and over-broad capability declarations. 76 is the count of skills the team verified as deliberately malicious through manual review. These are two very different categories and the post must not blur them. Use the 76 number when you mean "intentionally hostile artefact." Use the 13.4% / 534 number when you mean "critical-severity issue in any skill."
The audit also documented an earlier precursor campaign — around 30 directly malicious skills bundled together — which is distinct from the much larger, named campaign that emerged a few weeks later.
ClawHavoc is the separate, scarier story
The campaign worth losing sleep over isn't in the Snyk report. Koi Security coined "ClawHavoc" for a coordinated supply-chain attack on the OpenClaw skill marketplace — initially 335+ malicious skills, with Antiy CERT subsequently counting 1,184 variants classified as Trojan/OpenClaw.PolySkill. This is typosquatting, dependency confusion, and backdoored utility skills, all running with whatever permissions a developer happened to grant when they hit "yes" on the install prompt.
The Snyk audit was a census. ClawHavoc is an active campaign. Both stories matter. They don't share an attribution.
Two adjacent incidents to keep on the same threat board:
- MedusaLocker via Claude Code skills. Cato Networks documented ransomware operators packaging MedusaLocker payloads as helpful-looking utility skills.
- Silent codebase exfiltration. Mitiga and Datadog Security Labs have both published variants where the skill behaves correctly on the visible task while exfiltrating source to an attacker endpoint in the background.
None of this is theoretical. All of it is in the last six months of advisory write-ups.
What this means for your deployment
Three controls actually matter. Pick all three.
1. strictKnownMarketplaces. This is the load-bearing setting in Claude Code's plugin marketplace documentation. It accepts a boolean true (block any marketplace not in your allow-list) or an array of explicit marketplace descriptors. Critically, this is an enterprise-managed setting — individual developers cannot bypass it from their own settings.json. That's a feature, not a bug.
The defensible posture for a regulated team:
{
"strictKnownMarketplaces": [
{ "name": "internal", "source": "git@github.com:yourorg/skills-marketplace.git" }
]
}
That config says: skills install only from your internal mirror. Public marketplaces, even the official ones, are off the menu unless you explicitly add them. Pair this with the private marketplace pattern and you have something a regulator can audit.
2. Pin commit SHAs, not @latest. Every skill in your internal mirror should be pinned to a specific commit hash, not a tag and certainly not a moving reference. This is the same supply-chain hygiene that survived the colors.js/faker.js/ua-parser-js incidents on npm: the latest version is whatever the attacker pushed last night. @latest is a CVE waiting for an opportunity. A pinned SHA is a contract.
3. Quarterly AppSec re-vetting. Pinning a SHA is the right starting position, but it doesn't make the skill safe forever. New CVEs in transitive dependencies land monthly. The skill's own behaviour can be re-classified as malicious when threat intelligence catches up. The honest cadence is quarterly: re-run your scanner over every pinned skill, re-vet anything that's been updated upstream, and re-pin only after the rescan passes.
The CI pattern
The tool to wire into your scanner is snyk-agent-scan — originally Invariant Labs' mcp-scan, now maintained under Snyk after acquisition. It explicitly supports Claude Code skills in its current agent matrix. A minimal CI job:
# .github/workflows/skills-revet.yml
name: Quarterly Skills Re-vet
on:
schedule:
- cron: "0 9 1 */3 *" # 09:00 UTC, 1st of every 3rd month
workflow_dispatch:
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install snyk-agent-scan
run: pipx install snyk-agent-scan
- name: Scan all pinned skills
run: |
uvx snyk-agent-scan@latest --skills \
--fail-on-severity critical \
--output-format sarif > skills-scan.sarif
- uses: github/codeql-action/upload-sarif@v3
with: { sarif_file: skills-scan.sarif }
The job fires on the first day of each quarter, scans every skill in the mirror, fails the workflow if anything critical is found, and uploads SARIF so GitHub's code-scanning dashboard becomes your audit trail.
This is also the part to show your auditor. Quarterly automated re-vetting, scanner output in SARIF, retained in your CI history. That is what supply-chain hygiene looks like as evidence.
The regulatory framing
For regulated teams in Singapore, supply-chain risk is squarely within scope of the MAS Technology Risk Management Guidelines — software dependencies, third-party components, and ongoing assurance are explicit obligations. The MAS Information Paper on AI Model Risk Management (December 2024) extends the same principles to AI components: independent validation, ongoing monitoring, change-management discipline. Agent skills are software dependencies. Treating them as anything less is a finding waiting to happen.
For everyone else: this is the moment to ship the boring controls before the marketplace ecosystem matures further. OWASP has drafted an Agentic Skills Top 10, which is a sign of where the industry expects standards to land. Better to have strictKnownMarketplaces, pinned SHAs, and a quarterly CI job in place before that becomes an audit checklist than after.
Skills make agents useful. They also make them attackable. Public marketplaces are exactly as dirty as you'd predict from a decade of npm and PyPI experience. Snyk just put numbers on it.
The fix is unglamorous. The fix also works.
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The 30 Principles for Agentic Engineering — Part 4: Governance and Safety
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept
AI-reviews-AI looks like a control. Under MAS, the EU AI Act, and any reasonable audit, it isn't. Here's why your compliance team won't accept it — and the compensating controls that actually work.
The Governance Wall: Why Most AI Agents Can't Reach Production
The prototype-to-production gap for AI agents isn't technical — it's governance. Most organisations have nothing in this layer. The companies that build it first win the enterprise market. Everyone else stays in pilot purgatory.