AI Prompt Market

[toby-bridges/api-relay-audit] CLAUDE.md

ChatGPT Key Leak/ChatGPT

9,294 characters

# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Overview Security audit tool for third-party AI API relay/proxy services. Detects hidden prompt injection, prompt leakage, instruction override with non-Claude identity substitution, context truncation, tool-call package substitution (AC-1.a), error response header leakage (AC-2 adjacent), SSE-level stream integrity anomalies (AC-1 SSE-layer), Web3 prompt injection (SlowMist signature isolation, profile-gated), relay-framework fingerprinting, and latency-variance fingerprinting. Threat taxonomy follows Liu et al., *Your Agent Is Mine*, arXiv:2604.08407 — AC-1 (payload injection), AC-1.a (dependency-targeted injection), AC-1.b (conditional delivery), AC-2 (secret exfiltration). Infrastructure fingerprint (Step 12) and latency variance (Step 13) are sourced from Zhang et al., *Real Money, Fake Models*, arXiv:2603.01919. AC-1 full tool_call support and AC-1.b beyond warm-up mitigation remain on the backlog (see FOR_JOHN.md). ## Scope / Constraints **Editable without asking**: `scripts/`, `api_relay_audit/`, `tests/`, `audit.py` (root standalone), `ROADMAP.md`, `CLAUDE.md`, `FOR_JOHN.md`. **Ask before touching**: `web/`, `.github/workflows/`, `.github/voice-samples/`, `docs/`, `deploy/`, any root-level config files. **Why**: `web/` is under a frontend colleague handoff (post-2026-04-20). `.github/workflows/` contains Claude Code action configuration — changes there have external side effects. ## Contribution Philosophy **User-feedback-driven, not speculative.** Do not add features because they might be useful — add them when a real user need has been reported. This applies to code changes, new detection steps, and incoming PRs. **Permanently out of scope** (evaluated and deliberately dropped — do not re-open without new information): - **Claude Code CLI header impersonation** (ROADMAP §14): brittle version-pinning, and impersonating CC headers removes audit differentiation value - **Hosted web dashboard** (ROADMAP "Explicitly NOT doing"): requires API backend + auth, which changes the product from a one-curl-download tool to a hosted service **PR evaluation heuristics**: Does the change address a reported user problem? Does it preserve the dual-distribution invariant? Does it add complexity for a use case with zero user reports? ## Commands ```bash # Install dependencies pip install httpx pytest # Run full audit python scripts/audit.py --key <KEY> --url <BASE_URL> --model claude-opus-4-6 # Context length test only python scripts/context-test.py --key <KEY> --url <BASE_URL> # Extract report data to JSON (for dashboard) python scripts/extract-data.py --reports-dir ./reports --output data.json # Run all tests python -m pytest tests/ -v # Run a single test file python -m pytest tests/test_client.py -v # Run a single test case python -m pytest tests/test_client.py::TestAutoDetection::test_format_cached -v # Doc-drift prevention — run before publishing any external comparison/blog/X long-form python scripts/collect-metrics.py ``` ## Doc-drift prevention Before publishing any external comparison / blog / X long-form post that quotes step counts, test counts, version numbers, or Codex review tallies, run `python scripts/collect-metrics.py`. The script regenerates `docs/_metrics.md` (committed, GitHub-readable) and `docs/_metrics.json` (gitignored, machine-readable). Verify every numeric claim in your draft against the table. Coverage contract: ~70% of typical drift (structured metrics — version, step count, test count, CLI flag count, Codex review tallies, ROADMAP progression, dual-distribution version parity). The remaining ~30% (external competitor intel, narrative completeness, framing) is documented as the human-review boundary inside `_metrics.md` itself. Origin: 21-day drift in `docs/comparison-api-relay-audit-vs-hvoy-vs-cctest.md` caught while preparing X publication, 2026-05-05. Pareto-frontier selection — introspective generation chosen over template rendering (overkill for a single-author monthly-release project) and over manual discipline (already failed once). ## Architecture ### Dual Distribution Model There are **two parallel versions**: - `audit.py` (root) — standalone, zero-dependency version (~2500 lines, curl-only). Users can `curl` this file and run it without installing anything. - `api_relay_audit/` + `scripts/` — modular version with `httpx`, used for development and testing. When making changes to audit logic, `audit.py` (root) must be updated to stay in sync. `tests/test_dual_distribution_parity.py::test_risk_matrix_character_identical` enforces byte-level parity on the risk-matrix block; `tests/test_web3_injection.py::TestWeb3MarkerParity` enforces it on Web3 markers; `tests/test_refusal_detector.py::TestRefusalMarkerParity` enforces it on the Step 4/6 refusal vocabulary. ### Module Responsibilities - `api_relay_audit/client.py` — All API calls go through `APIClient`. Auto-detects Anthropic vs OpenAI format (tries Anthropic first, caches on success). On SSL errors switches httpx → subprocess curl (`-sk`). - `api_relay_audit/context.py` — Canary-marker + binary search context truncation. Coarse scan → binary → fine, ~12 requests vs ~75 naive. - `api_relay_audit/reporter.py` — Builder-pattern Markdown report. `flag(level, msg)` appends to both body and risk summary. - `api_relay_audit/tool_substitution.py` — AC-1.a via text-echo of pinned package commands (`pip install requests==2.31.0`, etc.). Text surrogate only: does NOT catch rewrites targeting structured `tool_call` payloads. - `api_relay_audit/error_leakage.py` — AC-2 adjacent. 7-8 deterministic broken requests (malformed JSON, invalid model, wrong content-type, missing fields, unknown endpoint, `max_tokens=99999999` force-upstream, fake Bearer auth probe). Three scan paths: literal key match, LiteLLM-ported regex, LiteLLM issue-sourced markers (#5762, #8075, #12152, #13705, #15799, #20419). - `api_relay_audit/identity_patterns.py` — 26 non-Claude keywords. ASCII uses word-bounded regex (`Qwen2.5` matches, `laws` doesn't); CJK uses substring. - `api_relay_audit/stream_integrity.py` — SSE whitelist + usage monotonicity + thinking signature + stream model identity check. Tri-state verdict (`clean`/`anomaly`/`inconclusive`). Clean-room reimplementation of hvoy.ai concept, not a port. - `api_relay_audit/transparent_log.py` — Append-only JSONL forensic log. Hash-only (no body), entries ≤1.5 KB. Hooks into all 4 `APIClient` public methods (`call`, `get_models`, `raw_request`, `stream_call`). - `api_relay_audit/web3/injection_probes.py` — 3 SlowMist-derived probes; safe-priority aggregation with `HARD_INJECTED_MARKERS` override for contradictory responses. Profile-gated (`--profile web3|full`). - `api_relay_audit/infra_fingerprint.py` — 3 unauthenticated GET probes; signature DB covers 7 frameworks; majority vote → `confirmed`/`tentative`/`unknown`. Informational only, does not feed the risk matrix. - `api_relay_audit/latency_variance.py` — N identical `max_tokens=8` requests timed with `time.perf_counter` (not `time.time` — monotonicity, v1.8.1 fix). `ensure_format()` is called before the timing loop to prevent the first sample silently including a failed Anthropic probe. Bimodality is the strong signal for silent A/B model substitution. Informational only. - `scripts/audit.py` — 13-step orchestration. **6D risk matrix**: D1=token injection, D2=instruction override, D3=tool-call substitution, D4=error leakage, D5=stream anomaly, D6=Web3 injection (profile-gated). Steps 12/13 informational only. `--profile` gates step set at runtime — rejected branch-forking to preserve the dual-distribution invariant. ### APIClient Return Format ```python {"text": str, "input_tokens": int, "output_tokens": int, "raw": dict, "time": float} # or on error: {"error": str} ``` ## CLI Flags for `scripts/audit.py` `--key`, `--url`, `--model`, `--output`, `--profile {general,web3,full}`, `--skip-infra`, `--skip-context`, `--skip-tool-substitution`, `--skip-error-leakage`, `--aggressive-error-probes`, `--skip-stream-integrity`, `--skip-web3-injection`, `--skip-infra-fingerprint`, `--skip-latency-variance`, `--latency-probe-count N`, `--warmup N`, `--timeout` ## Dual-distribution invariant Whenever `scripts/audit.py` or any `api_relay_audit/*.py` module changes, the standalone `audit.py` at the repo root must be updated to match. The standalone version is a character-copy of the modular code with curl subprocess replacing httpx. New helper modules (e.g. `tool_substitution.py`) get inlined as a new `Section` block in `audit.py`. ## Reference Documents - `FOR_JOHN.md` — architecture decisions, design pitfalls, and "why we didn't do X" reasoning. Read before making structural changes. - `ROADMAP.md` — near-term candidates (§2), deferred (§2.4/2.45), and permanently out-of-scope (§2.6 "Explicitly NOT doing"). Check before implementing any new feature. - `.github/voice-samples/` — tone and structure for issue replies (`pr-reply-sample.md`) and PR reviews (`pr-review-sample.md`). The automated `claude-issue-triage.yml` and `claude-pr-review.yml` workflows read these files.

Download .txt