Rules reference

The extension ships 39 rules, each independently toggleable from the extension’s options page. Rules marked default: on are active on fresh install; default: off rules must be enabled manually.

Rules marked top frame only never run inside iframes — useful for page-wide targets (footers, cookie overlays, URL recipes) so they don’t fire pointlessly in every embedded frame.

This page groups rules by the threat or pattern they defend against, the same grouping the options page uses; rule IDs here match the filenames in extension/src/rules/, which is the authoritative source for behavior. Initial enabled/disabled state for each rule lives in extension/src/rules/rule-metadata.ts. If this page disagrees with either, trust the source. The Install page covers how to override defaults at build time without forking the repo.

Numbered citations like [1] link to the References section at the bottom.

Coverage scope

All rules run against the page’s light DOM and any open shadow roots the page builds, regardless of which attachment path created the shadow:

Imperative attachment — element.attachShadow({ mode: "open" }). Covers the way most chat widgets, consent banners, ad SDKs, and custom elements ship UI today.
Declarative shadow DOM at parse time — <template shadowrootmode="open"> in the initial HTML. The browser materializes the shadow before the content script runs; the extension’s startup walk finds it.
Declarative shadow DOM post-parse — Element.setHTMLUnsafe and ShadowRoot.setHTMLUnsafe, the modern hydration path used by SSR-style component frameworks. The extension wraps both so a host that gains a shadow via this path is added to the registry.

Closed shadow roots ({ mode: "closed" }) are not reached, regardless of how they were attached (imperative, parse-time DSD with shadowrootmode="closed", or any other path). The Web Components spec makes closed mode opt-out of all external JavaScript access — host.shadowRoot is null, document.adoptedStyleSheets and MutationObserver do not cross the boundary, and no supported API undoes that. Any content a page renders inside a closed shadow root — whether ads, chat widgets, hidden text, or prompt-injection payloads — is invisible to every rule and will be passed through to the agent untouched. Closed shadow roots are uncommon outside browser UA shadows and a handful of hardened embeds, but they are a known gap. The optional Flag Closed Shadow Roots rule can heuristically warn the agent at read-time when this gap is in use.

Extension presence is observable. The rules leave rendered artifacts on the page — click-to-reveal placeholders, screen-reader-only landmarks, inline annotation chips, neutralized button labels — so a sophisticated site that fingerprints for those artifacts can detect ABS and serve a different DOM under that fingerprint than it would to an unprotected browser. The rule engine only sees what the page renders, so any content shaped by such adaptation is read by the rules as legitimate page content. Counter-cloaking from a content script is structurally out of scope.

Indirect prompt injection

Remove or neutralize content that could carry attacker-controlled instructions to a browser-use agent. The threat model — attacker text reaches the model via the page the agent reads, not via the user’s prompt — is introduced by Greshake et al. [1] and extended specifically to LLM-driven web agents by Wu et al. (WIPI) [2]. Each rule below targets a delivery vector documented in those threat models.

Rendered text and user-generated content

Hide Prompt Injection

ID: prompt-injection-redact
Default: on

Hide page sections matching known prompt-injection patterns. The pattern set is intentionally not reproduced in docs — see the project README for how patterns are sourced and shipped.

The bundle is a finite, curated catalog of phrasings observed in the literature and in fixtures. Payloads outside the catalog — novel framings, instruction shapes drawn from agent APIs the bundle does not yet cover, role markers from chat formats outside the set — pass through. The same bundle backs meta-injection-strip, attribute-injection-sanitize, json-ld-sanitize, html-comment-strip, and svg-text-strip, so coverage in those rules is bounded by the same catalog.

Redact Encoded Payloads

ID: encoded-payload-redact
Default: on

Redact long base64, hex, or percent-encoded runs in page text whose decoded bytes are mostly printable ASCII. Defends against the “decode this and follow it” carrier — encoded text a human skims past as noise but an agent may helpfully decode and treat as content or as an instruction. Length floors sit above common hash sizes (SHA-256, SHA-512, Git commit SHAs), and a decoded printable-ratio filter discards hashes, fingerprints, and binary blobs whose bytes are not readable text. JWTs are left alone so secrets-redact can flag them with its more specific label. Encoded content is a non-rendered carrier in the same class as HTML comments and hidden text in Greshake et al. [1].

Hide Comments

ID: comments-redact
Default: on

Hide user-generated comment threads so agents aren’t exposed to potential prompt injection from commenters. Covers common platforms (Disqus, Facebook) plus Reddit, YouTube, and Hacker News.

User-generated text as a prompt-injection delivery vector is core to the WIPI threat model [2].

Hide Reviews

ID: reviews-redact
Default: on

Hide user-generated review text so agents aren’t exposed to potential prompt injection from reviewers. Covers schema.org microdata and supported sites (Amazon, Walmart); aggregate star ratings are kept visible.

Detection relies on the schema.org Review microdata vocabulary where sites expose it.

ID: social-embed-redact
Default: on

Hide embedded social-media widgets (Twitter/X, YouTube, Facebook, Instagram, TikTok, LinkedIn, Reddit, Spotify, SoundCloud). Replaced with a placeholder so the agent knows an embed lived there. Skipped on the embed providers’ own domains, where embeds are the page content. Social embeds are a third-party content surface whose text the host page does not control.

Non-rendered DOM

Strip HTML Comments

ID: html-comment-strip
Default: on

Walk every HTML comment in the page and blank its data when the value matches the prompt-injection pattern set (the same regex bundle used by prompt-injection-redact). Comments are invisible to humans but readable by agents and can carry prompt-injection payloads; the scrub neutralizes the matching carrier while leaving the Comment node attached. Comments inside <script>/<style>/<noscript> are preserved verbatim. Off-pattern comments — license headers, build stamps, dev notes — are left alone, as are framework-marker comments that SPA renderers use as Suspense or hydration boundaries. The scrub is not reversible within the current page load. HTML comments are explicitly enumerated as a non-rendered carrier in Greshake et al. [1].

Strip Noscript

ID: noscript-strip
Default: on

Walk every <noscript> element in the page and blank its children. A browser-use agent runs in a browser at all precisely because the site requires JavaScript — an operator who could read the same data from the server directly would do that and skip the browser entirely. With JS enabled, <noscript> content is, by definition, never rendered to a human, but the markup still sits in the DOM and is still walked by accessibility-tree and innerText consumers. That makes it a clean carrier for prompt-injection payloads, fabricated authority claims, or fallback chrome the agent may treat as load-bearing. The <noscript> element itself stays attached so SPA frameworks (React 19 native head metadata, Vue Teleport-to-head, etc.) that hold a live reference to the node can still reconcile it on route change. html-comment-strip previously preserved Comment nodes inside <noscript> so that SSR hydration markers and conditional-CSS fragments survived; with this rule on, the surrounding noscript’s contents are blanked, taking those comments with them.

Same non-rendered-carrier class as Greshake et al. [1]; the “renderer-and-reader disagree on what’s visible” asymmetry is the one formalized for zero-width characters in Boucher et al. [5] and CSS-hidden DOM in Liao et al. (EIA) [3].

Strip Hidden Text

ID: hidden-text-strip
Default: on

Walk every element matching a hidden-CSS trigger (foreground matching background, visibility:hidden, opacity:0, font-size:0, off-screen positioning, zero-area clipping) and blank every text node inside it. Defends against “unseeable” prompt injection. The element and its descendant element nodes stay attached so SPA frameworks (React, Vue, Svelte, Astro) that hold live references to the rendered nodes can still reconcile them on route change. Screen-reader-only text is preserved (via .sr-only, .visually-hidden, .a-offscreen, .aok-offscreen, MUI visuallyHidden, and the 1×1 + overflow:hidden + position:absolute envelope) so a11y-tree affordances like Amazon SERP prices stay intact. display:none is left alone so collapsed menus and tab panels keep working.

Liao et al. (EIA) [3] demonstrates that web elements made invisible via CSS — opacity, off-screen positioning, zero-area clipping — are read by web agents but unseen by humans, the exact asymmetry this rule closes.

Strip Unicode Invisibles

ID: unicode-invisibles-strip
Default: on

Remove Unicode code points that have no visible glyph but are still read by agents walking the DOM or accessibility tree: the Unicode Tags block (U+E0000–U+E007F), bidi override and isolate characters (U+202A–U+202E, U+2066–U+2069), and the zero-width family (U+200B, U+2060–U+2064, U+FEFF, U+180E). Applied to text nodes and to every attribute value, so the rule also closes the aria-label / alt / title / placeholder surface. Code points with legitimate script-shaping use are preserved: ZWJ (U+200D, emoji and Indic joining), ZWNJ (U+200C, Persian/Hindi ligature control), and the directional marks LRM/RLM (U+200E/U+200F).

The bidi-override attack class — invisible reordering chars that make text render one way to humans and parse another way to compilers, interpreters, or LLMs — comes from Boucher & Anderson (Trojan Source) [4]. Boucher et al. (Bad Characters) [5] extends the same family — zero-width insertions, homoglyph swaps, bidi reordering — to NLP systems and shows comparable degradation in sentiment, translation, and toxicity classifiers. The Unicode-tag-block variant against LLM-integrated browsers (the U+E0000–U+E007F carrier that encodes arbitrary ASCII as invisible tag characters) was popularized by Goodside (2024) and is now a standard test case in the indirect-injection benchmarks.

HTML metadata and attributes

Strip Meta Injection

ID: meta-injection-strip
Default: on

Walk every <meta> element with a content attribute and every <title> element. When the value matches the prompt-injection pattern set (the same regex bundle as prompt-injection-redact), blank the <meta> element’s content attribute and blank the <title> text. Both elements stay attached so SPA frameworks (React 19 native head metadata, react-helmet, Vue Teleport-to-head, Astro view transitions) that hold a live reference to the hoisted node can still reconcile it on route change. The rule does not gate on specific name= / property= values — any meta whose content carries instruction-shaped text is scrubbed, covering name="description", name="keywords", property="og:title", property="og:description", name="twitter:title", name="twitter:description", name="twitter:image:alt", and the article:* family. Meta tags without a content attribute are left alone. The rule scans document.head in addition to the engine’s apply root, since meta and title normally live in <head> and SPA frameworks mutate <head> on route changes.

Page metadata is invisible to a sighted human (it surfaces in the browser tab, social-share unfurls, and search-result snippets, not in the rendered article body), but agents that summarize a page frequently pull description / og:description / <title> first as a compact “what is this page” answer. A poisoned description reaches the agent without ever appearing in the page content the user reviews.

The metadata vocabularies themselves are Open Graph (Facebook, 2010 — og:*) and Twitter Cards (Twitter / X — twitter:*); the underlying <meta name="description"> is in the HTML Living Standard. HTML metadata is enumerated among the non-rendered carriers in Greshake et al. [1].

Scrub Attribute Injection

ID: attribute-injection-sanitize
Default: on

Walk every element and, for a small allowlist of agent-readable attributes — aria-label, aria-description, aria-roledescription, aria-placeholder, aria-valuetext, aria-keyshortcuts, alt, title, placeholder, data-tooltip, and value on <input> elements the user cannot reach (disabled or type="hidden") — remove the attribute outright when its value matches the prompt-injection pattern set (the same regex bundle used by prompt-injection-redact). Clean attributes are preserved. Attributes outside the allowlist are not inspected. We remove the whole attribute rather than blank it because an empty aria-label actively hides an element from accessibility-tree consumers, whereas a missing aria-label lets fallback name computation (visible text, alt, associated label) proceed normally.

These attributes are almost never the main visible label sighted users read — they surface in screen readers, hover popups, and empty-state hints. Browser-use agents, on the other hand, read the accessibility tree where they are first-class names and descriptions, so an attribute is a quiet carrier for instruction-shaped text the operator never has to render.

HTML attribute values are enumerated as non-rendered carriers in Greshake et al. [1]; Liao et al. (EIA) [3] demonstrates that web agents act on accessibility-tree content that has no visible counterpart. The accessibility-tree surface itself is documented by the W3C ARIA Accessible Name and Description Computation 1.2 spec and Mozilla’s A11y Tree explainer.

Structured data

Sanitize JSON-LD

ID: json-ld-sanitize
Default: on

Walk every <script type="application/ld+json"> block, parse it, recursively replace any string field whose value matches the prompt-injection pattern set (the same regex bundle used by prompt-injection-redact) with an empty string, and re-serialize. Structural fields useful to the agent — price, priceCurrency, availability, sku, identifier, ratingValue, reviewCount, position — are preserved exactly. Malformed JSON-LD is left alone; non-application/ld+json <script> blocks are not touched.

Structured data is invisible to a sighted human reviewing the page but is increasingly cited by browser-use agents as a “trusted summary” of what the page is: schema.org/Product gives them name / brand / SKU / price, schema.org/Article gives them author / publisher / datePublished, and schema.org/Review gives them rating context. A site (or a third-party fragment writing into the page) can poison description, articleBody, name, or author.name without changing what a human sees.

JSON-LD is the JSON serialization of the schema.org vocabulary (W3C JSON-LD 1.1 Recommendation, 2020) — the same vocabulary reviews-redact reads to find user-generated reviews. The non-rendered-but-agent-read carrier model comes from Greshake et al. [1]; Liao et al. (EIA) [3] and Wu et al. (WIPI) [2] both demonstrate agents acting on page metadata an end user never sees.

Strip SVG Injection

ID: svg-text-strip
Default: on

Walk every <title>, <desc>, and <text> element that lives inside an <svg> and blank its text content when it matches the prompt-injection pattern set (the same regex bundle used by prompt-injection-redact). The element shell is preserved: <text> belongs to the visible drawing and removing it can shift surrounding geometry, while <title> and <desc> are anchored to specific shapes for accessibility-tree consumers — keeping the element keeps the structural mapping intact while the payload is gone. The companion svg-sprite-strip rule only removes hidden, unreferenced sprite containers; this rule handles SVGs that render visually (logos, infographics, charts, inline icons).

SVG <title> and <desc> are the SVG-namespace equivalents of HTML’s accessible-name and accessible-description: screen readers surface them, and browser-use agents reading the accessibility tree pull them as “what is this image?” without the operator having to render any visible text. SVG <text> content does render, but inside an <svg> it lives outside the regular flow-text walkers that drive several other rules. Either surface can be authored without touching surrounding HTML — for example, swapping the SVG asset behind an <img src=…svg> reference on a CDN.

SVG accessibility text is the SVG-namespace instance of the non-rendered-carrier class in Greshake et al. [1]; the surface is documented by the W3C SVG Accessibility API Mappings and the rendered-but-isolated <text> case is covered by Liao et al. (EIA) [3].

Sanitize Schema Trust Claims (Experimental)

ID: schema-trust-sanitize
Default: off

Walk JSON-LD blocks and microdata items for schema.org Organization-typed claims — Article.publisher, Article.sourceOrganization, ClaimReview.author, and top-level brand assertions — and blank the name, url, and @id fields when the claim’s url resolves to a different registrable domain than the page asserting it. Structural fields (@type, logo, datePublished, price, ratingValue) are preserved exactly, so an agent still gets the article’s body data; it just loses the impersonating identity strings. Name-only claims with no url to anchor against are left alone. Off by default while we gather real-world signal on false positives; the rule short-circuits entirely on known syndicators (Google News, Yahoo News, MSN, Apple News, Flipboard, SmartNews, Feedly, Pocket), web archives, AMP cache, and Google Translate proxies, where mismatched publisher claims are expected.

Person-typed claims get a weaker treatment than Organization. When a Person is nested under an authority-context property (author, editor, publisher, creator, contributor, reviewedBy, funder, sponsor, similar) and its url is on a different registrable domain than the page, the rule annotates the markup with abs:unverified-authority: true (JSON-LD) or data-abs-schema-trust-unverified="true" (microdata) and leaves the identity fields intact. Sanitizing those would erase legitimate guest-author and academic bylines, which routinely link off-domain. The annotation surfaces the same domain-binding gap a blanked Organization would, without damaging real metadata. A standalone @type: Person (a personal homepage) is not borrowing anyone’s authority and is left alone regardless of URL.

Schema.org has no native provenance mechanism — every claim is self-asserted, which is structurally why a page can list itself as published by The New York Times without any binding to that organization (Iliadis & Pedersen [14]; Google’s own Structured Data General Guidelines treat publisher impersonation as a policy violation enforced manually after crawl, not a markup-level check). The unearned-authority surface for agents is the same one already established for json-ld-sanitize and meta-injection-strip — structured data the human reviewer does not see but the agent ingests as a “trusted summary.” Wu et al. (WIPI) [2] and Liao et al. (EIA) [3] both document agents acting on page metadata that has no visible counterpart.

Cross-origin surface

Hide Cross-Origin Frames (Experimental)

ID: cross-origin-frame-redact
Default: off

Replace cross-origin embedded frame-like elements with a click-to-reveal placeholder so a browser-use agent reading the parent page doesn’t ingest the embedded-origin content. Three carriers are covered:

<iframe> whose src resolves to a different web origin,
<object data="…"> and <embed src="…"> pointing at a different web origin.

Same-origin iframes/objects/embeds, srcdoc iframes, and inert about:/javascript:/data:/blob: resources are left alone. Each frame in the page processes its own direct children, so a cross-origin frame nested inside a same-origin frame is also caught. Off by default because legitimate cross-origin embeds (payment widgets, OAuth pop-ins, video, third-party comments, PDF viewers) are common and removing them will break those flows until the user reveals.

Motivated by Roesner & Kohlbrenner [15], which shows that agents willing to read cross-origin frame content turn the same-origin policy from a hard guarantee into a soft one. <object> and <embed> carry the same SOP-bypass shape when they load a cross-origin resource. srcdoc inherits the embedding origin (no SOP crossing) and the content script runs inside the srcdoc frame on its own, so existing rules already apply to its body.

Flag navigator.webdriver Reads (Experimental)

ID: webdriver-probe-annotate
Default: off
Top frame only

Inject a main-world probe that wraps navigator.webdriver’s getter on the top-level document and listens for reads. If the page reads the property, the rule prepends a screen-reader-only landmark to the document noting that the site can distinguish AI-agent traffic from human traffic and may serve different content to agents than to people.

Content scripts run in the extension’s isolated JavaScript world and cannot observe page-world property accesses directly. Two complementary delivery paths run the same wrap-and-dispatch logic in the page world, so the rule fires regardless of when the user toggled it on:

Primary, document_start. When the rule becomes enabled, the background service worker registers a standalone main-world bundle (webdriver-probe.js) via chrome.scripting.registerContentScripts with world: "MAIN" and runAt: "document_start". Subsequent navigations run the probe before the page’s first script, so reads issued during initial HTML parse are caught.
Fallback, document_idle. The rule’s own apply inline-injects the same probe via <script> textContent. Covers the tab the user was already viewing when they toggled the rule on (dynamic registrations only apply to future navigations). Misses early-parse reads on that tab but catches DOMContentLoaded / load handlers, polled fingerprinters, and interaction-driven checks. Pages with a strict script-src CSP block the inline <script>; future navigations are still covered by the registered bundle.

Either path dispatches the same DOM CustomEvent on the document; the isolated-world content script listens and stamps the landmark on first detection. The wrapped getter persists for the lifetime of the document — disabling the rule stops new landmarks from being added and unregisters the main-world script for future navigations, but the wrap on the already-loaded page is left in place.

The annotation flags capability, not measured cloaking — a navigator.webdriver read by itself is also consistent with legitimate anti-fraud fingerprinting on banking, payments, and checkout flows. The landmark text accordingly never uses the unqualified word “cloaking”. Off by default while the false-positive rate is characterized.

Motivated by the AI-targeted cloaking threat model: Caspi & Tugendhaft [18] demonstrate that a site identifying inbound requests as agent traffic can serve a poisoned, attacker-controlled version of a page that human reviewers never see, turning the page itself into an indirect-prompt-injection delivery surface. Search-engine cloaking has a long lineage [19]; the same primitive aimed at LLM crawlers is the new threat.

Flag Closed Shadow Roots (Experimental)

ID: closed-shadow-root-annotate
Default: off
Top frame only

Detect pages that attach closed shadow roots and prepend a screen-reader-only landmark noting that the extension cannot see inside those shadow trees. Complements the open-shadow-root coverage described in Coverage scope above by giving the agent a positive signal at read-time that a known blind spot is in use on this page.

Two detection paths feed the landmark:

Main-world probe (primary). When the rule is enabled, a page-world wrap over Element.prototype.attachShadow runs at document_start. Any call with mode: "closed" dispatches a binary signal that the landmark listens for. The wrap runs in the page’s own JavaScript world because page-script attachments hit the page’s prototype copy, which is a distinct object from the isolated-world copy the rule engine sees. No shadow contents are exposed by the probe — only the binary “attachment happened” signal crosses worlds, preserving the spec-mandated encapsulation of closed mode.
Structural heuristic (fallback). Looks for the shape strongly correlated with a closed shadow host: an upgraded custom element (hyphenated tag name, defined in customElements) with no light-DOM children, no host.shadowRoot, and a non-zero rendered box. Built-in elements with UA shadow roots (<input>, <details>, <video>) are filtered out for free — their tag names contain no hyphen. The heuristic covers the active tab between toggle-on and the moment the probe injects into it; on subsequent navigations the probe runs at document_start and the heuristic becomes redundant.

The heuristic path has a known false positive: a custom element that renders via canvas, WebGL, or ::before background-image with no actual shadow root will trip it. The landmark text reads “may contain content ABS cannot see,” not “this is definitely a closed shadow root.” The main-world probe is definitive when both signals are available.

Declarative shadow DOM with shadowrootmode="closed" is not surfaced by either path. The parser materializes the shadow without going through attachShadow, so the probe doesn’t see it; and the materialized closed root is indistinguishable from “no shadow” from outside JS, so the heuristic can’t catch it reliably either. The open variant of declarative shadow DOM is covered by the regular open-shadow plumbing described in Coverage scope.

Off by default while the false-positive rate of the heuristic path is characterized.

Visual identity spoofing

Flag Spoofed Links

ID: link-spoof-annotate
Default: on

Annotate <a> elements whose visible text is visually spoofed relative to the link’s actual destination. Three checks, all signalled with a visible inline chip appended next to the anchor:

The visible text contains a word that mixes Latin letters with letters from Greek (U+0370–03FF), Cyrillic (U+0400–04FF), Armenian (U+0530–058F), or Cherokee (U+13A0–13FF) — the script blocks that supply the Latin confusables used in homoglyph attacks. A pure-Cyrillic word adjacent to a pure-Latin word does not match; this test requires within-word script mixing.
The visible text contains a domain whose letters are drawn entirely from one non-Latin script but whose visual skeleton — via a curated subset of the Unicode TR39 confusables table — collapses to a pure-Latin string (e.g. a fully-Cyrillic letter sequence visually mimicking a Latin brand). Catches single-script homograph attacks the within-word check (#1) misses. The chip surfaces the Latin form the domain mimics.
The visible text contains a fully-formed domain whose registrable identity (PSL, ICANN section) doesn’t match the link’s actual host. Visible candidate and href are both normalized to their punycode form before the comparison, so legitimate IDN links (visible Unicode ↔ xn-- href) don’t surface while attacker-redirect cases do. Gated to http(s): hrefs so mailto:, tel:, and fragment anchors don’t get spurious comparisons.

The chip is rendered as visible markup — not just a data-* attribute — because the rule’s threat model is the asymmetry where a vision-based agent (or a sighted user) reads the rendered glyphs and acts on the displayed domain, while the real navigation target is hidden in the unrendered href. DOM-walking agents see the raw code points and the raw href and can perform the same comparisons themselves; this rule mainly exists to close the gap for accessibility-tree and screenshot consumers.

The homograph attack class is named by Gabrilovich & Gontmakher [6]; Holgers et al. [7] measures real-world prevalence and confusable coverage. The canonical confusable mapping browsers and TLDs use to refuse mixed-script IDN labels comes from Unicode TR #36 and TR #39. Boucher et al. (Bad Characters) [5] shows homoglyph substitution degrades modern NLP classifiers at rates comparable to zero-width insertions and bidi reordering. For the href / text-domain mismatch check, Dhamija et al. (Why Phishing Works) [8] is the foundational user study showing that link-text / link-target mismatch is the single most reliable cue users fail to check — making it the cue best worth re-presenting visibly to the agent.

Flag Trust Badges (Experimental)

ID: trust-badge-annotate
Default: off

Annotate image-shaped trust badges — Norton Secured, McAfee SECURE, BBB Accredited, TrustPilot, Verified Seller, and similar — whose accessible name asserts third-party endorsement that no content-script-accessible signal backs. The chip notes the claim is not verifiable from page content; the badge itself is left in place so the visual layout the page operator chose is preserved. Off by default while we gather real-world signal on false positives.

Detection is intentionally narrow. Only <img>, <svg>, and elements with role="img" are considered, so plain text labels (e.g., the “Verified Purchase” line on a review, which reviews-redact already owns) are out of scope. The accessible name is read in standard precedence (aria-label → aria-labelledby → SVG <title> → alt → title), capped at a short length, and matched against a curated phrase set with word boundaries; bare single words like “verified” or “trusted” do not match. Badges on the issuer’s own registrable domain — a Norton page showing its own logo, BBB.org showing its accreditation seal — are exempted as first-party.

A page operator can drop <img alt="Norton Secured"> onto any page; the chrome TLS UI, EV certificate organization name, and other trust signals a human would use to verify the claim are not reachable from a content script. The asymmetry the rule closes is the same one link-spoof-annotate closes for visible-text-vs-href: a vision-based or accessibility-tree-driven agent sees the badge as evidence of trustworthiness, with no way to check it.

SusBench [16] and DECEPTICON [17] both include trust-badge spoofing in their benchmark suites and document that current computer-use agents over-weight these badges as proof of legitimacy. The unverifiable-claim framing is the same one applied by schema-trust-sanitize to JSON-LD Organization claims and by link-spoof-annotate to anchor text — page-asserted authority that has no binding to the entity it names.

Dark patterns

Block manipulative UI patterns that work on humans and can mislead agents the same way. Current computer-use agents are highly susceptible to these patterns — sometimes more so than humans — per SusBench [16] and DECEPTICON [17].

The pattern taxonomy itself traces to Brignull’s deceptive.design catalog (originally darkpatterns.org, 2010) [10] and the empirical study by Mathur et al. [9], which enumerates Scarcity, Sneaking (sneak-into-basket), Preselection, Urgency (countdown timers), Confirmshaming (under Misdirection), and Nagging — the families the rules below target. Bösch et al. [11] gives the parallel privacy-side taxonomy.

Urgency

Hide Countdown Timers

ID: countdown-timer-redact
Default: on

Hide running countdown timers so agents aren’t pressured by the artificial time-sensitivity dark pattern. Snapshots timer-shaped text and confirms the value decreased after 1.5s; re-scans on subtree mutations to catch lazy-loaded sections. The snapshot-and-confirm approach follows Mathur et al. [9], who detected countdown timers by capturing DOM mutations over time and comparing successive snapshots to confirm a ticking value.

Timers that reset to their starting value on each tick — and timers rendered via <canvas> or WebGL rather than DOM text — are not detected: the snapshot comparison requires a parseable text representation whose value strictly decreases across the 1.5s window.

Scarcity

Hide Scarcity Warnings

ID: scarcity-redact
Default: on

Hide scarcity- and activity-based urgency messages (“Only 3 left”, “Selling fast”, “12 viewing now”) so agents aren’t pressured by manufactured scarcity. Out-of-stock indicators and bestseller badges are kept visible because they convey real purchaseability or preference information. Cataloged as Scarcity (low-stock and high-demand subtypes) in Mathur et al. [9], which found scarcity claims on roughly a fifth of the 11K shopping sites they crawled.

Sneaking

Flag Cart Add-Ons

ID: cart-addon-annotate
Default: on

On checkout-like URLs, prepend a visible [abs: likely cart add-on] annotation to line items matching common sneak-into-basket patterns (protection plans, extended warranties, AppleCare/SquareTrade/Asurion, insurance, donation/round-up, gift wrap, carbon offset, shipping/package protection, Route, Seel, Navidium, driver tips). The line item is not removed — the agent reads the annotation and decides whether to click the line’s remove control.

Brignull’s original 2010 Sneak into Basket pattern [10], generalized to the Sneaking family in Mathur et al. [9].

Annotate Drip-Pricing Fees (Experimental)

ID: hidden-fee-annotate
Default: on

On checkout-like URLs, prepend a visible [abs: drip-pricing fee] annotation to order-summary line items whose label matches a curated mandatory-fee phrase set and that sit beside a currency amount. The row is not removed — many of these charges are legally required to be surfaced, and silently hiding the line would both desync the displayed total and risk wiping a disclosure the operator must show. Match precision is layered: a whole-string regex on a small leaf-ish label, an order-summary ancestor (<table>, [role="region"] with order-summary labelling, <aside>/<section> with cart-shaped class or id, or schema.org Order microdata), an adjacent currency amount, and a single-item-cart skip so that flows where the fee itself is the product (utility-bill portals, court e-filing, DMV) are not annotated. The action is annotation-only, so a misfire is at worst an extra chip on a row; per-rule activity counts (in the extension popup) and an inline per-host denylist let us react quickly if live signal surfaces a false-positive cluster.

Motivated by the FTC Trade Regulation Rule on Unfair or Deceptive Fees (effective 2025-05-12), which bans drip pricing for live-event tickets and short-term lodging. Cataloged as Hidden Costs in Brignull’s deceptive.design [10] and part of the Sneaking family in Mathur et al. [9].

Preselection

Clear Checkout Checkboxes

ID: checkout-checkbox-sanitize
Default: on

On checkout-like URLs (/cart, /checkout, /basket, /bag, /payment, /order), uncheck every pre-checked checkbox so the agent inherits no silently selected add-ons (insurance, warranty, gift wrap, donations, marketing opt-ins). The agent is then expected to re-check anything it actually wants to opt into, including required agreements. role="checkbox" widgets and radio groups are out of scope.

The cleared state is held against framework re-renders that would otherwise silently restore pre-selected values from component state. A genuine user (or WebDriver-driven) click on the box releases that lock and the toggle sticks normally; only programmatic re-checks issued by the page itself are reverted.

Pre-checked opt-ins are Preselection in Mathur et al. [9] and Brignull’s deceptive.design catalog [10].

Annotate Form Prefills (Experimental)

ID: form-prefill-annotate
Default: on

On checkout-like URLs, prepend a visible [abs: pre-populated …] annotation to form controls that ship with a server-rendered default the agent might silently inherit. Covers three shapes: text / email / tel / number / url inputs whose value attribute is non-empty, <select> elements whose explicitly-selected option is not the first one, and radio groups with an initially-checked option. The control’s value is not changed — annotation preserves agency, so the agent reads the chip in its DOM snapshot and decides whether to overwrite. Required-selection radio groups stay submittable because nothing is unchecked.

False-positive control is layered: text inputs with a recognized browser- autofill autocomplete token (name, email, tel, street-address, postal-code, etc.) are treated as legitimate autofill targets and skipped; selects whose name, id, or label suggests a geo field (country / state / province / region / currency) are skipped because geo-IP defaults are legitimate; controls the user has focused since the page loaded are skipped; disabled / readonly controls are skipped; per-form chip count is capped so a long form (saved-card manager, admin panel) doesn’t accumulate clutter. The live property (not the HTML attribute) is read so framework-rendered defaults (React defaultValue, Vue v-model initialState, jQuery .val()) are flagged the same way the agent’s DOM snapshot would see them; the focused-control skip above keeps user-entered values out of the flag set.

Hidden inputs are out of scope for this rule — the value is submitted regardless of any chip and the agent never reads hidden inputs into a decision, so annotation would not change behavior. The hidden-input arm is handled by Scrub Hidden Affiliate Metadata below.

Future work tracked in issue #121:

Optionally include the prefilled value in the chip text. The chip flags presence only today, to avoid leaking remembered PII into a logging snapshot.
Synthetic blur / change event after annotation so sites that recalc totals on input events can re-render. Today the rule never touches values.
Sanitize-mode toggle for <select> defaults on sneaking-prone fields (shipping speed, tip percent, donation amount, insurance plan). Annotation is the default action because the FP profile across visible defaults is harder to bound than for hidden-input metadata.

Pre-populated form fields are Preselection in Mathur et al. [9] and Brignull’s deceptive.design catalog [10].

Scrub Hidden Affiliate Metadata

ID: hidden-affiliate-sanitize
Default: on

On checkout-like URLs, clear value on <input type="hidden"> whose name matches a curated affiliate / UTM / referral attribution allowlist (utm_source / utm_medium / utm_campaign / utm_term / utm_content, aff / aff_id / affiliate_id, ref / ref_id / referrer / referral_code, source_id / campaign_id / partner_code, click_id, gclid / fbclid / msclkid). The input is preserved — only its value is cleared — so the form’s structure and the page’s own scripts that read the input’s existence still work. Annotation is the wrong tool here because hidden inputs are submitted regardless of any chip and never reach the agent’s snapshot; sanitize-and-forget is the only useful action.

Scope is attribution only — promo / coupon / discount names (promo, promo_code, promotion_id, coupon, coupon_code, coupon_id, discount_code, discount_id) are intentionally not in the allowlist. Hidden promo-code inputs commonly carry a legitimate user-acquired discount (email promo link, sticky session promo, “apply coupon” UI that writes to a hidden field at submit time). Clearing them would silently strip the user’s discount with no visible recourse. Attribution has the opposite asymmetry — clearing it is invisible to the user and only costs the marketing trail.

Hard CSRF / session / cart / order / nonce / state / signature denylist takes precedence: any name containing csrf, nonce, signature, hmac, secret, session, antiforgery, etc. is preserved even if it also overlaps the allowlist by name shape. Failure mode for those is a silently-rejected submit — strictly worse than the original dark pattern. The input must live inside an enclosing <form> (either as a descendant or via a form attribute reference); free-floating hidden inputs are JS-only data carriers we leave alone. A per-host kill-switch (empty at launch) covers loyalty / Apple Pay / 1-Click flows where saved attribution is the user’s intent; populate via PR review as live signal surfaces hosts where the affiliate id is load-bearing.

value is cleared via the prototype’s native setter so React / Vue value trackers observe the change. No input or change event is dispatched — hidden inputs aren’t expected to fire those, and firing one could trip totals-recalculation handlers that re-fetch attribution from the same source.

Affiliate / UTM attribution metadata is part of the Sneaking family in Mathur et al. [9] and the Hidden costs / hidden information catalog in Brignull’s deceptive.design [10].

Confirmshaming

Neutralize Confirmshame Buttons

ID: confirmshame-sanitize
Default: on

Rewrite guilt-tripping decline buttons to a neutral No thanks so an agent reading the DOM or accessibility tree isn’t pushed away from the decline option by manipulative copy. Coverage spans monetary confirmshame (“No, I’d rather pay full price”, “I don’t want to save money”, “I hate discounts”), health and safety guilt (“I don’t care about my family’s safety”, “I’m fine being unprotected”), loyalty downgrades (“Downgrade to basic”, “Forfeit my Gold status”), gamified progress loss (“Lose my streak”, “Sacrifice my XP”), imperative self-commands (“Charge me extra”, “Stop helping me save”), sarcastic acceptance (“Whatever, take my money”), and the reverse-positive “Yes, [bad outcome]” framing common on confirmation dialogs (“Yes, skip my savings”, “Confirm: pay full price”).

The underlying control is preserved — only its visible label and any matching aria-label / title are rewritten — so the agent can still click it normally. Plain decline labels like “No thanks”, “Decline”, “Maybe later”, “Skip”, and “Continue as guest” are left untouched.

Coverage is English-only — the phrase set ships in English and does not run across localized variants of the same buttons. Buttons whose decline action is conveyed entirely by an icon, with no text label and no accessible name on aria-label / title, are out of scope: the rule matches against textual labels and has nothing to rewrite when there is no text.

Cataloged as Confirmshaming in Brignull’s deceptive.design [10] and as part of the Misdirection family in Mathur et al. [9].

Roach Motel

Flag Roach-Motel Sign-Ups

ID: roach-motel-annotate
Default: on
Scope: top frame only

On signup, subscription, and checkout pages of sites documented to make cancellation difficult, embed a screen-reader-only landmark carrying a normalized cancellation-difficulty grade (hard, very-hard, impossible), the canonical cancel/delete URL when known, and a short note. Agents reading the accessibility tree see the warning before completing signup; sighted users see nothing.

Two data sources back the rule:

A hand-curated list under extension/data/sites/ for FTC-defendant cases (Amazon Prime, Care.com, Match.com, Cleo AI, LA Fitness, Adobe, Vonage) and well-documented cancellation-friction cases (NYTimes, Washington Post, WSJ, Planet Fitness, Equinox), each with its own signup/subscription pathnames. Curated entries take precedence on URL match.
A vendored snapshot of JustDeleteMe’s account-deletion directory (MIT License, Robb Lewis & contributors), filtered to entries graded hard or impossible. Used as a fallback when the curated list misses, gated to signup-shaped pathnames (/signup, /subscribe, /join, /membership, /checkout, /plans, /pricing, /billing, /cart, /upgrade, /register). JustDeleteMe attribution is included in the landmark text so the agent can cite the source back to the user. Refresh with bun run fetch-justdeleteme.

Brignull’s original 2010 Roach Motel pattern, renamed Hard to cancel in the current deceptive.design taxonomy [10]. Vasudevan et al. [12] gives the empirical basis: cancellation flows asymmetric to signup flows on a significant share of subscription sites across the US, EU, and UK. The legal “good” standard converges on signup/cancel symmetry — the FTC’s 2024 Click-to-Cancel rule, California AB-2863, and EU Digital Services Act Art. 25.

Nagging

ID: newsletter-modal-hide
Default: on
Scope: top frame only

Remove interstitial newsletter signup modals that cover the page. Detects fixed-position dialogs containing signup language and an email input. Standard login modals, paywalls, and small toasts are kept visible.

Interstitial signup modals are categorized as Nagging in Mathur et al. [9].

Sensitive-data masking

Replace credentials and personal identifiers with placeholders before they reach the model. Both rules walk text nodes and substitute in place — the page still renders normally for humans.

Mask PII

ID: pii-redact
Default: on

Hide credit card numbers (Luhn-validated), phone numbers, and SSNs. Microsoft’s open-source Presidio framework uses the same mix of regex patterns, checksum validation (e.g., Luhn for credit cards), and named-entity recognition to detect and redact PII in text.

Mask Secrets

ID: secrets-redact
Default: on

Hide API keys, tokens, JWTs, private keys, and other high-entropy credentials. Repository secret-scanning tools — gitleaks, trufflehog, and Yelp’s detect-secrets — use comparable regex and entropy heuristics to surface API keys, tokens, and private keys in source repositories. This rule applies the same approach to live page text instead of files on disk.

Context pollution

Remove page chrome and irrelevant regions that cost tokens without helping the agent complete its task — footers, cookie banners, chat widgets, ads, engagement rails, and dead SVG sprite definitions.

Content-vs-boilerplate separation has a long line of prior art, starting with Kohlschütter et al. [13] — the basis for the Boilerpipe library — and Mozilla’s Readability.js, the algorithm behind Firefox Reader View. The rules below are the agent-facing analogue of those heuristics, targeted at specific chrome categories instead of running a single generic article extractor.

ID: footer-redact
Default: on

Hide the page footer (legal links, sitemap, social icons, marketing copy) to save tokens. Per-section footers inside articles or asides are left visible. Footers are a canonical boilerplate region in Kohlschütter et al. [13] and are stripped by Readability.js.

ID: cookie-banner-hide
Default: on
Scope: top frame only

Remove GDPR/CCPA cookie consent banners (OneTrust, Cookiebot, TrustArc, Sourcepoint, Quantcast, Osano, Didomi, and generic patterns). These overlays float above the page, so they’re removed entirely rather than replaced with an in-flow placeholder.

Aarhus University’s Consent-O-Matic maintains the canonical open ruleset for matching CMPs (Consent Management Platforms) like OneTrust, Cookiebot, and TrustArc — the same CMP coverage this rule targets, though Consent-O-Matic auto-fills banners while this rule removes them outright.

Remove Chat Widgets

ID: chat-widget-hide
Default: on
Scope: top frame only

Remove live-chat widgets (Intercom, Drift, Zendesk, Crisp, Tawk.to, HubSpot, Olark, LiveChat, Freshchat, Zopim). These bubbles float above the page, so they’re removed entirely rather than replaced with an in-flow placeholder. Chat bubbles are floating chrome that Readability-style extractors discard.

Hide Ads and Sponsored Results

ID: ads-hide
Default: on

Remove display ads and paid/sponsored search results. Well-known surfaces (AdSense, GAM, Outbrain, Taboola, Google/Bing/Amazon sponsored results) are stripped from the DOM so the agent never sees them. ~13k additional ad selectors from EasyList are injected as a display:none stylesheet for broader coverage of third-party ad networks.

Selectors come directly from EasyList, the filter list that powers uBlock Origin, Adblock Plus, and most other consumer ad blockers — over a decade of community-maintained ad and tracker selector patterns.

Hide Disguised Ads (Native Advertorials)

ID: disguised-ad-flag
Default: on

Hide article-shaped blocks that carry a visible disclosure label — “Sponsored”, “Promoted”, “Advertorial”, “Paid Post”, “Partner Content”, “Featured Listing”, “From our Advertisers”, “Marketing Partner”, “In partnership with <Brand>”, or the bracketed variants ([Ad], (promoted), (sponsored)) common in social feeds — but are rendered by the publisher’s own CMS rather than served from an ad network. Native advertorials bypass the infrastructure-level selectors that power ads-hide because they share class names and DOM shape with editorial articles; the only signal that distinguishes them is the disclosure label itself, which the FTC’s .com Disclosures require publishers to render prominently.

Detection works on the visible label — not network selectors — and only fires when the label sits inside an article-shaped container (heading, image or outbound link, body prose). The heading carrier is recognized as any of <h1>– <h6>, [role="heading"], or [class~="headline"], so design systems that emit a styled <div class="headline"> or an ARIA-heading <div> instead of a real <h*> are still detected. Filter chips, navigation links, and editorial paragraphs that mention sponsorship in passing are excluded by that shape check, by an interactive-ancestor guard, and by a whole-string regex on the label element. Matching cards are replaced with a click-to-reveal placeholder in the same style as ads-hide and irrelevant-sections-redact.

The label-only approach is the boilerplate-detection counterpart to Kohlschütter et al. [13] and Readability.js, narrowed to the disclosure signal that paid content must carry by law.

Remove Unused SVG Sprites

ID: svg-sprite-strip
Default: on

Remove hidden SVG sprite containers (those holding only <symbol>/<defs> definitions) when none of their symbols are referenced by any <use> element on the page. Referenced sprites are preserved so icons keep working. Dead-code elimination — the bundler optimization of dropping references that no live code reaches — applied to SVG <symbol> definitions at runtime.

Hide Irrelevant Sections (AI)

ID: irrelevant-sections-redact
Default: off
Scope: top frame only
Availability: requires an OpenAI API key — either bundled at build time via OPENAI_API_KEY, or saved on the extension’s options page. Until a key is configured the rule shows as Unavailable on the options page.

Use a small LLM to identify engagement/exploration rails (related products, “you might also like”, recommended articles, trending now, etc.) and replace them with click-to-reveal placeholders. Sends a compressed page tree with stable refs so the LLM can choose the right granularity; interactive elements (search, cart, checkout, login) are labeled as protected. Re-scans on scroll to catch lazy-loaded content.

An LLM-driven generalization of the boilerplate-detection heuristics in Kohlschütter et al. [13] and Readability.js, targeted at engagement and recommendation rails instead of running a generic article extractor.

Agent shortcuts

Inject hints that let agents reach what they need without navigating the human-facing UI — currently URL recipes for searches, filters, and direct lookups on covered hosts.

Embed Search URL Recipes

ID: search-url-helper
Default: on
Scope: top frame only

On covered hosts (Amazon, Best Buy, Etsy, IKEA, Home Depot, REI, GitHub, Wikipedia, Hacker News, MDN, npm, weather.gov, arXiv, Python docs, BBC), embed a screen-reader-only landmark at the top of the page describing how to run searches, filters, sorts, and direct lookups via URL. Lets agents navigate by URL instead of typing into search boxes and clicking facets. No visible affordance — the landmark is preserved by hidden-text-strip via the sr-only class allowlist.

Same goal as the llms.txt proposal (Howard, Answer.AI, 2024) — give LLMs a compact, machine-readable hint about how to use a site — but injected client-side as a hidden landmark instead of relying on the site to publish a top-level file. The hidden-but-readable delivery mechanism reuses the long-established sr-only / visually-hidden convention from screen-reader accessibility practice.

References

[1] Greshake et al. (2023). Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec 2023. arxiv:2302.12173. Introduces the indirect prompt injection threat model — attacker text reaches the model via the page or document the LLM reads, not via the user’s prompt — and enumerates non-rendered DOM regions (HTML comments, hidden text, alt and metadata attributes) as carriers.

[2] Wu et al. WIPI: A New Web Threat for LLM-Driven Web Agents. arxiv:2402.16965. Extends the indirect prompt injection threat model specifically to LLM-driven web agents.

[3] Liao et al. (2025). EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage. ICLR 2025. arxiv:2409.11295. Demonstrates that web elements made invisible via CSS — opacity, off-screen positioning, zero-area clipping — and accessibility-tree content without a visible counterpart are read by web agents but unseen by humans.

[4] Boucher & Anderson (2023). Trojan Source: Invisible Vulnerabilities. USENIX Security 2023; CVE-2021-42574. trojansource.codes. Introduces the bidi-override attack class.

[5] Boucher, Pajola, Brookes, Anderson (2022). Bad Characters: Imperceptible NLP Attacks. IEEE S&P 2022. arxiv:2106.09898. Zero-width insertions, homoglyph swaps, and bidi reordering against NLP systems with comparable degradation in sentiment, translation, and toxicity classifiers.

[6] Gabrilovich & Gontmakher (2002). The Homograph Attack. CACM 2002. gabrilovich.com. Names the attack class and demonstrates the microsoft.com-with-Cyrillic-o proof of concept.

[7] Holgers, Watson, Gribble (2006). Cutting Through the Confusion: A Measurement Study of Homograph Attacks. USENIX ATC 2006. usenix.org. Measures real-world prevalence and confusable coverage.

[8] Dhamija, Tygar, Hearst (2006). Why Phishing Works. CHI 2006. berkeley.edu. Foundational user study showing that link-text / link-target mismatch is the single most reliable cue users fail to check.

[9] Mathur et al. (2019). Dark Patterns at Scale: Findings from a Crawl of 11K Shopping Websites. CSCW 2019. princeton.edu. Enumerates Scarcity, Sneaking, Preselection, Urgency, Misdirection (including confirmshaming), and Nagging.

[10] Brignull (2010–). deceptive.design (originally darkpatterns.org). The pattern taxonomy this section’s categories follow.

[11] Bösch et al. (2016). Tales from the Dark Side: Privacy Dark Strategies and Privacy Dark Patterns. PoPETs 2016. petsymposium.org. Parallel privacy-side taxonomy.

[12] Vasudevan et al. (2024). Staying at the Roach Motel: Cross-Country Analysis of Manipulative Subscription and Cancellation UXes. CHI 2024. arxiv:2309.17145. Cancellation flows asymmetric to signup flows across the US, EU, and UK.

[13] Kohlschütter, Fankhauser, Nejdl (2010). Boilerplate Detection using Shallow Text Features. WSDM 2010. dl.acm.org. The basis for the Boilerpipe library.

[14] Iliadis & Pedersen (2025). One schema to rule them all. JASIST 2025. wiley.com. Schema.org has no native provenance mechanism — every claim is self-asserted.

[15] Roesner & Kohlbrenner (2026). Agentic Browsers and the Same-Origin Policy. ICLR 2026 Workshop. franziroesner.com. Agents willing to read cross-origin frame content turn the same-origin policy from a hard guarantee into a soft one.

[16] Guo et al. (2025). SusBench. arxiv:2510.11035. Benchmark for computer-use-agent susceptibility to manipulative UI.

[17] Cuvin et al. (2025). DECEPTICON. arxiv:2512.22894. Companion benchmark for agent susceptibility to deceptive interface patterns.

[18] Caspi & Tugendhaft (2025). A Whole New World: Creating a Parallel-Poisoned Web Only AI-Agents Can See. arxiv:2509.00124. Demonstrates that a site identifying inbound requests as agent traffic — via UA, IP range, or automation telltales like navigator.webdriver — can serve a different, attacker-controlled version of a page to AI agents than to human reviewers, turning the cloaked page into an indirect-prompt-injection carrier.

[19] Wu & Davison (2005). Cloaking and Redirection: A Preliminary Study. AIRWeb 2005. lehigh.edu. Names server-side cloaking against search-engine crawlers and characterizes the agent-fingerprinting techniques operators use to decide which version of a page to serve. The AI-targeted variant is the same primitive aimed at LLM crawlers.