Below is a **high-level (language-agnostic)** design for a **client-side DNS leak detector** aimed at *censorship-resistance threat models*, i.e.: > “Censor/ISP can observe/log DNS intent or infer proxy usage; we want to detect when DNS behavior escapes the intended protection path.” I’ll cover: **definitions**, **detection standards**, **workflow**, **modules**, **passive+active detection**, **outputs**, and **test methodology**. --- # 1) Scope and goals ## Goals Your detector should answer, with evidence: 1. **Did any DNS query leave the device outside the intended safe path?** 2. **Which domains leaked?** (when visible) 3. **Which transport leaked?** (UDP/53, TCP/53, DoT/853, DoH) 4. **Which interface leaked?** (Wi-Fi/Ethernet vs tunnel) 5. **Which process/app triggered it?** (if your OS allows attribution) And in your censorship model, it should also detect: 6. **Split-policy intent leakage**: “unknown/sensitive domains were resolved using domestic/ISP-facing DNS.” ## Non-goals (be explicit) * Not a censorship circumvention tool itself * Not a full firewall manager (can suggest fixes, but detection is the core) * Not perfect attribution on every OS (process mapping may be partial) --- # 2) Define “DNS leak” precisely (your program’s standard) You need a **formal definition** because “DNS leak” is overloaded. ## Standard definition A (classic VPN / tunnel bypass) A leak occurs if: > **An unencrypted DNS query is sent outside the secure tunnel path** > This is essentially how popular leak test sites define it (“unencrypted DNS query sent OUTSIDE the established VPN tunnel”). ([IP Leak][1]) Your detector should implement it in a machine-checkable way: **Leak-A condition** * DNS over **UDP/53 or TCP/53** * Destination is **not** a “trusted resolver path” (e.g., not the tunnel interface, not loopback stub, not proxy channel) * Interface is **not** the intended egress ✅ Strong for censorship: plaintext DNS exposes intent. --- ## Standard definition B (split-policy intent leak) A leak occurs if: > **A domain that should be “proxied / remote-resolved” was queried via local/ISP-facing DNS.** This is the “proxy split rules still leak intent” case. **Leak-B condition** * Query name matches either: * a “proxy-required set” (sensitive list, non-allowlist, unknown), or * a policy rule (“everything except allowlist must resolve via proxy DNS”) * And the query was observed going to: * ISP resolver(s) / domestic resolver(s) / non-tunnel interface ✅ This is the leak most users in censorship settings care about. --- ## Standard definition C (encrypted DNS escape / bypass) A leak occurs if: > DNS was encrypted, but escaped the intended channel (e.g., app uses its own DoH directly to the Internet). This matters because DoH hides the QNAME but still creates **observable behavior** and breaks your “DNS must follow proxy” invariant. **Leak-C condition** * DoH (RFC 8484) ([IETF Datatracker][2]) or DoT (RFC 7858) ([IETF Datatracker][3]) flow exists * And it does **not** go through your approved egress path (tunnel/proxy) ✅ Detects “Firefox/Chrome built-in DoH bypass” style cases. --- ## Standard definition D (mismatch risk indicator) Not a “leak” by itself, but a **proxy inference amplifier**: > DNS egress region/path differs from traffic egress region/path. This is a *censorship-resistance hygiene metric*, not a binary leak. **Mismatch condition** * Same domain produces: * DNS resolution via path X * TCP/TLS connection via path Y * Where X ≠ Y (interface, ASN region, etc.) ✅ Helps catch “DNS direct, traffic proxy” or “DNS proxy, traffic direct” weirdness. --- # 3) High-level architecture ## Core components 1. **Policy & Configuration** * What counts as “safe DNS path” * Which interfaces are “protected” (tunnel) vs “physical” * Allowlist / proxy-required sets (optional) * Known resolver lists (optional) * Severity thresholds 2. **Traffic Sensor (Passive Monitor)** * Captures outbound traffic metadata (and optionally payload for DNS parsing) * Must cover: * UDP/53, TCP/53 * TCP/853 (DoT) * HTTPS flows that look like DoH (see below) * Emits normalized events into a pipeline 3. **Classifier** * Recognize DNS protocol types: * Plain DNS * DoT * DoH * Attach confidence scores (especially for DoH) 4. **DNS Parser (for plaintext DNS only)** * Extract: QNAME, QTYPE, transaction IDs, response codes (optional) * Store minimally (privacy-aware) 5. **Flow Tracker** * Correlate packets into “flows” * Map flow → interface → destination → process (if possible) * Track timing correlation: DNS → connection attempts 6. **Leak Detector (Rules Engine)** * Apply Leak-A/B/C/D definitions * Produce leak events + severity + evidence chain 7. **Active Prober** * Generates controlled DNS lookups to test behavior * Can test fail-closed, bypasses, multi-interface behavior, etc. 8. **Report Generator** * Human-readable summary * Machine-readable logs (JSON) * Recommendations (non-invasive) --- # 4) Workflow (end-to-end) ## Workflow 0: Setup & baseline 1. Enumerate interfaces and routes * Identify physical NICs * Identify tunnel / proxy interface (or “expected egress destinations”) 2. Identify system DNS configuration * Default resolvers per interface * Local stub presence (127.0.0.1, etc.) 3. Load policy profile * Full-tunnel, split-tunnel, or proxy-based 4. Start passive monitor **Output:** “Current state snapshot” (useful even before testing). --- ## Workflow 1: Passive detection loop (always-on) Continuously: 1. Capture outbound packets/flows 2. Classify as DNS-like (plain DNS / DoT / DoH / unknown) 3. If plaintext DNS → parse QNAME/QTYPE 4. Assign metadata: * interface * dst IP/port * process (if possible) * timestamp 5. Evaluate leak rules: * Leak-A/B/C/D 6. Write event log + optional real-time alert **Key design point:** passive mode should be able to detect leaks **without requiring any special test domain**. --- ## Workflow 2: Active test suite (on-demand) Active tests exist because some leaks are intermittent or only happen under stress. ### Active Test A: “No plaintext DNS escape” * Trigger a set of DNS queries (unique random domains) * Verify **zero UDP/53 & TCP/53** leaves physical interfaces ### Active Test B: “Fail-closed test” * Temporarily disrupt the “protected path” (e.g., tunnel down) * Trigger lookups again * Expected: DNS fails (no fallback to ISP DNS) ### Active Test C: “App bypass test” * Launch test scenarios that mimic real apps * Confirm no direct DoH/DoT flows go to public Internet outside the proxy path ### Active Test D: “Split-policy correctness” * Query domains that should be: * direct-allowed * proxy-required * unknown * Confirm resolution path matches policy --- # 5) How to recognize DNS transports (detection mechanics) ## Plain DNS (strongest signal) **Match conditions** * UDP dst port 53 OR TCP dst port 53 * Parse DNS header * Extract QNAME/QTYPE **Evidence strength:** high **Intent visibility:** yes (domain visible) --- ## DoT (port-based, easy) DoT is defined over TLS, typically port **853**. ([IETF Datatracker][3]) **Match conditions** * TCP dst port 853 * Optionally confirm TLS handshake exists **Evidence strength:** high **Intent visibility:** no (domain hidden) --- ## DoH (harder; heuristic + optional allowlists) DoH is DNS over HTTPS (RFC 8484). ([IETF Datatracker][2]) **Recognizers (from strongest to weakest):** 1. HTTP request with `Content-Type: application/dns-message` 2. Path/pattern common to DoH endpoints (optional list) 3. SNI matches known DoH providers (optional list) 4. Traffic resembles frequent small HTTPS POST/GET bursts typical of DoH (weak) **Evidence strength:** medium **Intent visibility:** no (domain hidden) **Important for your use-case:** you may not need to *prove* it’s DoH; you mostly need to detect “DNS-like encrypted resolver traffic bypassing the proxy channel.” --- # 6) Policy model: define “safe DNS path” You need a simple abstraction users can configure: ### Safe DNS path can be defined by one or more of: * **Allowed interfaces** * loopback (local stub) * tunnel interface * **Allowed destination set** * proxy server IP(s) * internal resolver IP(s) * **Allowed process** * only your local stub + proxy allowed to resolve externally * **Allowed port set** * maybe only permit 443 to proxy server (if DNS rides inside it) Then implement: **A DNS event is a “leak” if it violates safe-path constraints.** --- # 7) Leak severity model (useful for real-world debugging) ### Severity P0 (critical) * Plaintext DNS (UDP/TCP 53) on physical interface to ISP/public resolver * Especially if QNAME matches proxy-required/sensitive list ### Severity P1 (high) * DoH/DoT bypassing proxy channel directly to public Internet ### Severity P2 (medium) * Policy mismatch: domain resolved locally but connection later proxied (or vice versa) ### Severity P3 (low / info) * Authoritative-side “resolver egress exposure” (less relevant for client-side leak detector) * CDN performance mismatch indicators --- # 8) Outputs and reporting ## Real-time console output (for debugging) * “DNS leak detected: Plain DNS” * domain (if visible) * destination resolver IP * interface * process name (if available) * policy rule violated * suggested fix category (e.g., “force stub + block port 53”) ## Forensics log (machine-readable) A single **LeakEvent** record could include: * timestamp * leak_type (A/B/C/D) * transport (UDP53, TCP53, DoT, DoH) * qname/qtype (nullable) * src_iface / dst_ip / dst_port * process_id/process_name (nullable) * correlation_id (link DNS → subsequent connection attempt) * confidence score (esp. DoH) * raw evidence pointers (pcap offsets / event IDs) ## Summary report * Leak counts by type * Top leaking processes * Top leaking resolver destinations * Timeline view (bursts often indicate OS fallback behavior) * “Pass/Fail” per policy definition --- # 9) Validation strategy (“how do I know my detector is correct?”) ## Ground truth tests 1. **Known-leak scenario** * intentionally set OS DNS to ISP DNS, no tunnel * detector must catch plaintext DNS 2. **Known-safe scenario** * local stub only + blocked outbound 53/853 * detector should show zero leaks 3. **Bypass scenario** * enable browser built-in DoH directly * detector should catch encrypted resolver bypass (Leak-C) 4. **Split-policy scenario** * allowlist CN direct, everything else proxy-resolve * detector should show: * allowlist resolved direct * unknown resolved via proxy path --- # 10) Recommended “profiles” (makes tool usable) Provide built-in presets: ### Profile 1: Full-tunnel VPN * allow DNS only via tunnel interface or loopback stub * any UDP/TCP 53 on physical NIC = leak ### Profile 2: Proxy + local stub (your case) * allow DNS only to loopback stub * allow stub upstream only via proxy server destinations * flag any direct DoH/DoT to public endpoints ### Profile 3: Split tunnel (geoip + allowlist) * allow plaintext DNS **only** for allowlisted domains (if user accepts risk) * enforce “unknown → proxy-resolve” * emphasize Leak-B correctness --- Below is an updated **high-level design** (still language-agnostic) that integrates **process attribution** cleanly, including how it fits into the workflow and what to log. --- # 1) New component: Process Attribution Engine (PAE) ## Purpose When a DNS-like event is observed, the PAE tries to attach: * **PID** * **PPID** * **process name** * *(optional but extremely useful)* full command line, executable path, user, container/app package, etc. This lets your logs answer: > “Which program generated the leaked DNS request?” > “Was it a browser, OS service, updater, antivirus, proxy itself, or some library?” ## Position in the pipeline It sits between **Traffic Sensor** and **Leak Detector** as an “event enricher”: **Traffic Event → (Classifier) → (Process Attribution) → Enriched Event → Leak Rules → Report** --- # 2) Updated architecture (with process attribution) ### Existing modules (from earlier design) 1. Policy & Configuration 2. Traffic Sensor (packet/flow monitor) 3. Classifier (Plain DNS / DoT / DoH / Unknown) 4. DNS Parser (plaintext only) 5. Flow Tracker 6. Leak Detector (rules engine) 7. Active Prober 8. Report Generator ### New module 9. **Process Attribution Engine (PAE)** * resolves “who owns this flow / packet” * emits PID/PPID/name * handles platform-specific differences and fallbacks --- # 3) Workflow changes (what happens when a potential leak is seen) ## Passive detection loop (updated) 1. Capture outbound traffic event 2. Classify transport type: * UDP/53, TCP/53 → plaintext DNS * TCP/853 → DoT * HTTPS patterns → DoH (heuristic) 3. Extract the **5-tuple** * src IP:port, dst IP:port, protocol 4. **PAE lookup** * resolve the owner process for this traffic * attach PID/PPID/name (+ optional metadata) 5. Apply leak rules (A/B/C/D) 6. Emit: * realtime log line (human readable) * structured record (JSON/event log) --- # 4) Process attribution: what to detect and how (high-level) Process attribution always works on one core concept: > **Map observed traffic (socket/flow) → owning process** ### Inputs PAE needs * protocol (UDP/TCP) * local src port * local address * timestamp * optionally: connection state / flow ID ### Output from PAE * `pid`, `ppid`, `process_name` * optional enrichment: * `exe_path` * `cmdline` * `user` * “process tree chain” (for debugging: parent → child → …) --- # 5) Platform support strategy (without implementation detail) Process attribution is **OS-specific**, so structure it as: ## “Attribution Provider” interface * Provider A: “kernel-level flow owner” * Provider B: “socket table owner lookup” * Provider C: “event tracing feed” * Provider D: fallback “unknown / not supported” Your main design goal is: ### Design rule **Attribution must be best-effort + gracefully degrading**, never blocking detection. So you always log the leak even if PID is unavailable: * `pid=null, attribution_confidence=LOW` --- # 6) Attribution confidence + race handling (important!) Attribution can be tricky because: * a process may exit quickly (“short-lived resolver helper”) * ports can be reused * NAT or local proxies may obscure the real origin So log **confidence**: * **HIGH**: direct mapping from kernel/socket owner at time of event * **MEDIUM**: mapping by lookup shortly after event (possible race) * **LOW**: inferred / uncertain * **NONE**: not resolved Also record *why* attribution failed: * “permission denied” * “flow already gone” * “unsupported transport” * “ambiguous mapping” This makes debugging much easier. --- # 7) What PID/PPID adds to your leak definitions ### Leak-A (plaintext DNS outside safe path) Now you can say: > “`svchost.exe (PID 1234)` sent UDP/53 to ISP resolver on Wi-Fi interface” ### Leak-B (split-policy intent leak) You can catch: * “game launcher looked up blocked domain” * “system service triggered a sensitive name unexpectedly” * “your proxy itself isn’t actually resolving via its own channel” ### Leak-C (encrypted DNS bypass) This becomes *very actionable*: > “`firefox.exe` started direct DoH to resolver outside tunnel” ### Leak-D (mismatch indicator) You can also correlate: * DNS resolved by one process * connection made by another process (e.g., local stub vs app) --- # 8) Reporting / realtime logging format (updated) ## Realtime log line (human readable) Example (conceptual): * **[P0][Leak-A] Plain DNS leaked** * Domain: `example-sensitive.com` (A) * From: `Wi-Fi` → To: `1.2.3.4:53` * Process: `browser.exe` **PID=4321 PPID=1200** * Policy violated: “No UDP/53 on physical NIC” ## Structured event (JSON-style fields) Minimum recommended fields: ### Event identity * `event_id` * `timestamp` ### DNS identity * `transport` (udp53/tcp53/dot/doh/unknown) * `qname` (nullable) * `qtype` (nullable) ### Network path * `interface_name` * `src_ip`, `src_port` * `dst_ip`, `dst_port` * `route_class` (tunnel / physical / loopback) ### Process identity (your requested additions) * `pid` * `ppid` * `process_name` * optional: * `exe_path` * `cmdline` * `user` ### Detection result * `leak_type` (A/B/C/D) * `severity` (P0..P3) * `policy_rule_id` * `attribution_confidence` --- # 9) Privacy and safety notes (important in a DNS tool) Because you’re logging **domains** and **process command lines**, this becomes sensitive. Add a “privacy mode” policy: * **Full**: store full domain + cmdline * **Redacted**: hash domain; keep TLD only; truncate cmdline * **Minimal**: only keep leak counts + resolver IPs + process name Also allow “capture window” (rotate logs, avoid giant histories). --- # 10) UX feature: “Show me the process tree” When a leak happens, a good debugger view is: * `PID: foo (pid 1000)` * `PPID: bar (pid 900)` * `PPID: systemd/svchost/etc` This is extremely useful to identify: * browsers spawning helpers * OS DNS services * containerized processes * update agents / telemetry daemons So your report generator should support: ✅ **Process chain rendering** (where possible) --- # 11) Practical edge cases you should detect (with PID helping) 1. **Local stub is fine, upstream isn’t** * Your local resolver process leaks upstream plaintext DNS 2. **Browser uses its own DoH** * process attribution immediately reveals it 3. **Multiple interfaces** * a leak only happens on Wi-Fi but not Ethernet 4. **Kill-switch failure** * when tunnel drops, PID shows which app starts leaking first ---