Files

DaZuo0122 cfa96bde08 Add: dns leak detection

2026-01-17 18:45:24 +08:00

18 KiB

Raw Blame History

Below is a high-level (language-agnostic) design for a client-side DNS leak detector aimed at censorship-resistance threat models, i.e.:

“Censor/ISP can observe/log DNS intent or infer proxy usage; we want to detect when DNS behavior escapes the intended protection path.”

I’ll cover: definitions, detection standards, workflow, modules, passive+active detection, outputs, and test methodology.

1) Scope and goals

Goals

Your detector should answer, with evidence:

Did any DNS query leave the device outside the intended safe path?
Which domains leaked? (when visible)
Which transport leaked? (UDP/53, TCP/53, DoT/853, DoH)
Which interface leaked? (Wi-Fi/Ethernet vs tunnel)
Which process/app triggered it? (if your OS allows attribution)

And in your censorship model, it should also detect:

Split-policy intent leakage: “unknown/sensitive domains were resolved using domestic/ISP-facing DNS.”

Non-goals (be explicit)

Not a censorship circumvention tool itself
Not a full firewall manager (can suggest fixes, but detection is the core)
Not perfect attribution on every OS (process mapping may be partial)

2) Define “DNS leak” precisely (your program’s standard)

You need a formal definition because “DNS leak” is overloaded.

Standard definition A (classic VPN / tunnel bypass)

A leak occurs if:

An unencrypted DNS query is sent outside the secure tunnel path This is essentially how popular leak test sites define it (“unencrypted DNS query sent OUTSIDE the established VPN tunnel”). ([IP Leak][1])

Your detector should implement it in a machine-checkable way:

Leak-A condition

DNS over UDP/53 or TCP/53
Destination is not a “trusted resolver path” (e.g., not the tunnel interface, not loopback stub, not proxy channel)
Interface is not the intended egress

✅ Strong for censorship: plaintext DNS exposes intent.

Standard definition B (split-policy intent leak)

A leak occurs if:

A domain that should be “proxied / remote-resolved” was queried via local/ISP-facing DNS.

This is the “proxy split rules still leak intent” case.

Leak-B condition

Query name matches either:
- a “proxy-required set” (sensitive list, non-allowlist, unknown), or
- a policy rule (“everything except allowlist must resolve via proxy DNS”)
And the query was observed going to:
- ISP resolver(s) / domestic resolver(s) / non-tunnel interface

✅ This is the leak most users in censorship settings care about.

Standard definition C (encrypted DNS escape / bypass)

A leak occurs if:

DNS was encrypted, but escaped the intended channel (e.g., app uses its own DoH directly to the Internet).

This matters because DoH hides the QNAME but still creates observable behavior and breaks your “DNS must follow proxy” invariant.

Leak-C condition

DoH (RFC 8484) ([IETF Datatracker][2]) or DoT (RFC 7858) ([IETF Datatracker][3]) flow exists
And it does not go through your approved egress path (tunnel/proxy)

✅ Detects “Firefox/Chrome built-in DoH bypass” style cases.

Standard definition D (mismatch risk indicator)

Not a “leak” by itself, but a proxy inference amplifier:

DNS egress region/path differs from traffic egress region/path.

This is a censorship-resistance hygiene metric, not a binary leak.

Mismatch condition

Same domain produces:
- DNS resolution via path X
- TCP/TLS connection via path Y
Where X ≠ Y (interface, ASN region, etc.)

✅ Helps catch “DNS direct, traffic proxy” or “DNS proxy, traffic direct” weirdness.

3) High-level architecture

Core components

Policy & Configuration
- What counts as “safe DNS path”
- Which interfaces are “protected” (tunnel) vs “physical”
- Allowlist / proxy-required sets (optional)
- Known resolver lists (optional)
- Severity thresholds
Traffic Sensor (Passive Monitor)
- Captures outbound traffic metadata (and optionally payload for DNS parsing)
- Must cover:
  - UDP/53, TCP/53
  - TCP/853 (DoT)
  - HTTPS flows that look like DoH (see below)
- Emits normalized events into a pipeline
Classifier
- Recognize DNS protocol types:
  - Plain DNS
  - DoT
  - DoH
- Attach confidence scores (especially for DoH)
DNS Parser (for plaintext DNS only)
- Extract: QNAME, QTYPE, transaction IDs, response codes (optional)
- Store minimally (privacy-aware)
Flow Tracker
- Correlate packets into “flows”
- Map flow → interface → destination → process (if possible)
- Track timing correlation: DNS → connection attempts
Leak Detector (Rules Engine)
- Apply Leak-A/B/C/D definitions
- Produce leak events + severity + evidence chain
Active Prober
- Generates controlled DNS lookups to test behavior
- Can test fail-closed, bypasses, multi-interface behavior, etc.
Report Generator
- Human-readable summary
- Machine-readable logs (JSON)
- Recommendations (non-invasive)

4) Workflow (end-to-end)

Workflow 0: Setup & baseline

Enumerate interfaces and routes
- Identify physical NICs
- Identify tunnel / proxy interface (or “expected egress destinations”)
Identify system DNS configuration
- Default resolvers per interface
- Local stub presence (127.0.0.1, etc.)
Load policy profile
- Full-tunnel, split-tunnel, or proxy-based
Start passive monitor

Output: “Current state snapshot” (useful even before testing).

Workflow 1: Passive detection loop (always-on)

Continuously:

Capture outbound packets/flows
Classify as DNS-like (plain DNS / DoT / DoH / unknown)
If plaintext DNS → parse QNAME/QTYPE
Assign metadata:
- interface
- dst IP/port
- process (if possible)
- timestamp
Evaluate leak rules:
- Leak-A/B/C/D
Write event log + optional real-time alert

Key design point: passive mode should be able to detect leaks without requiring any special test domain.

Workflow 2: Active test suite (on-demand)

Active tests exist because some leaks are intermittent or only happen under stress.

Active Test A: “No plaintext DNS escape”

Trigger a set of DNS queries (unique random domains)
Verify zero UDP/53 & TCP/53 leaves physical interfaces

Active Test B: “Fail-closed test”

Temporarily disrupt the “protected path” (e.g., tunnel down)
Trigger lookups again
Expected: DNS fails (no fallback to ISP DNS)

Active Test C: “App bypass test”

Launch test scenarios that mimic real apps
Confirm no direct DoH/DoT flows go to public Internet outside the proxy path

Active Test D: “Split-policy correctness”

Query domains that should be:
- direct-allowed
- proxy-required
- unknown
Confirm resolution path matches policy

5) How to recognize DNS transports (detection mechanics)

Plain DNS (strongest signal)

Match conditions

UDP dst port 53 OR TCP dst port 53
Parse DNS header
Extract QNAME/QTYPE

Evidence strength: high Intent visibility: yes (domain visible)

DoT (port-based, easy)

DoT is defined over TLS, typically port 853. ([IETF Datatracker][3])

Match conditions

TCP dst port 853
Optionally confirm TLS handshake exists

Evidence strength: high Intent visibility: no (domain hidden)

DoH (harder; heuristic + optional allowlists)

DoH is DNS over HTTPS (RFC 8484). ([IETF Datatracker][2])

Recognizers (from strongest to weakest):

HTTP request with Content-Type: application/dns-message
Path/pattern common to DoH endpoints (optional list)
SNI matches known DoH providers (optional list)
Traffic resembles frequent small HTTPS POST/GET bursts typical of DoH (weak)

Evidence strength: medium Intent visibility: no (domain hidden)

Important for your use-case: you may not need to prove it’s DoH; you mostly need to detect “DNS-like encrypted resolver traffic bypassing the proxy channel.”

6) Policy model: define “safe DNS path”

You need a simple abstraction users can configure:

Safe DNS path can be defined by one or more of:

Allowed interfaces
- loopback (local stub)
- tunnel interface
Allowed destination set
- proxy server IP(s)
- internal resolver IP(s)
Allowed process
- only your local stub + proxy allowed to resolve externally
Allowed port set
- maybe only permit 443 to proxy server (if DNS rides inside it)

Then implement:

A DNS event is a “leak” if it violates safe-path constraints.

7) Leak severity model (useful for real-world debugging)

Severity P0 (critical)

Plaintext DNS (UDP/TCP 53) on physical interface to ISP/public resolver
Especially if QNAME matches proxy-required/sensitive list

Severity P1 (high)

DoH/DoT bypassing proxy channel directly to public Internet

Severity P2 (medium)

Policy mismatch: domain resolved locally but connection later proxied (or vice versa)

Severity P3 (low / info)

Authoritative-side “resolver egress exposure” (less relevant for client-side leak detector)
CDN performance mismatch indicators

8) Outputs and reporting

Real-time console output (for debugging)

“DNS leak detected: Plain DNS”
domain (if visible)
destination resolver IP
interface
process name (if available)
policy rule violated
suggested fix category (e.g., “force stub + block port 53”)

Forensics log (machine-readable)

A single LeakEvent record could include:

timestamp
leak_type (A/B/C/D)
transport (UDP53, TCP53, DoT, DoH)
qname/qtype (nullable)
src_iface / dst_ip / dst_port
process_id/process_name (nullable)
correlation_id (link DNS → subsequent connection attempt)
confidence score (esp. DoH)
raw evidence pointers (pcap offsets / event IDs)

Summary report

Leak counts by type
Top leaking processes
Top leaking resolver destinations
Timeline view (bursts often indicate OS fallback behavior)
“Pass/Fail” per policy definition

9) Validation strategy (“how do I know my detector is correct?”)

Ground truth tests

Known-leak scenario
- intentionally set OS DNS to ISP DNS, no tunnel
- detector must catch plaintext DNS
Known-safe scenario
- local stub only + blocked outbound 53/853
- detector should show zero leaks
Bypass scenario
- enable browser built-in DoH directly
- detector should catch encrypted resolver bypass (Leak-C)
Split-policy scenario
- allowlist CN direct, everything else proxy-resolve
- detector should show:
  - allowlist resolved direct
  - unknown resolved via proxy path

10) Recommended “profiles” (makes tool usable)

Provide built-in presets:

Profile 1: Full-tunnel VPN

allow DNS only via tunnel interface or loopback stub
any UDP/TCP 53 on physical NIC = leak

Profile 2: Proxy + local stub (your case)

allow DNS only to loopback stub
allow stub upstream only via proxy server destinations
flag any direct DoH/DoT to public endpoints

Profile 3: Split tunnel (geoip + allowlist)

allow plaintext DNS only for allowlisted domains (if user accepts risk)
enforce “unknown → proxy-resolve”
emphasize Leak-B correctness

Below is an updated high-level design (still language-agnostic) that integrates process attribution cleanly, including how it fits into the workflow and what to log.

1) New component: Process Attribution Engine (PAE)

Purpose

When a DNS-like event is observed, the PAE tries to attach:

PID
PPID
process name
(optional but extremely useful) full command line, executable path, user, container/app package, etc.

This lets your logs answer:

“Which program generated the leaked DNS request?” “Was it a browser, OS service, updater, antivirus, proxy itself, or some library?”

Position in the pipeline

It sits between Traffic Sensor and Leak Detector as an “event enricher”:

Traffic Event → (Classifier) → (Process Attribution) → Enriched Event → Leak Rules → Report

2) Updated architecture (with process attribution)

Existing modules (from earlier design)

Policy & Configuration
Traffic Sensor (packet/flow monitor)
Classifier (Plain DNS / DoT / DoH / Unknown)
DNS Parser (plaintext only)
Flow Tracker
Leak Detector (rules engine)
Active Prober
Report Generator

New module

Process Attribution Engine (PAE)
- resolves “who owns this flow / packet”
- emits PID/PPID/name
- handles platform-specific differences and fallbacks

3) Workflow changes (what happens when a potential leak is seen)

Passive detection loop (updated)

Capture outbound traffic event
Classify transport type:
- UDP/53, TCP/53 → plaintext DNS
- TCP/853 → DoT
- HTTPS patterns → DoH (heuristic)
Extract the 5-tuple
- src IP:port, dst IP:port, protocol
PAE lookup
- resolve the owner process for this traffic
- attach PID/PPID/name (+ optional metadata)
Apply leak rules (A/B/C/D)
Emit:
- realtime log line (human readable)
- structured record (JSON/event log)

4) Process attribution: what to detect and how (high-level)

Process attribution always works on one core concept:

Map observed traffic (socket/flow) → owning process

Inputs PAE needs

protocol (UDP/TCP)
local src port
local address
timestamp
optionally: connection state / flow ID

Output from PAE

pid, ppid, process_name
optional enrichment:
- exe_path
- cmdline
- user
- “process tree chain” (for debugging: parent → child → …)

5) Platform support strategy (without implementation detail)

Process attribution is OS-specific, so structure it as:

“Attribution Provider” interface

Provider A: “kernel-level flow owner”
Provider B: “socket table owner lookup”
Provider C: “event tracing feed”
Provider D: fallback “unknown / not supported”

Your main design goal is:

Design rule

Attribution must be best-effort + gracefully degrading, never blocking detection.

So you always log the leak even if PID is unavailable:

pid=null, attribution_confidence=LOW

6) Attribution confidence + race handling (important!)

Attribution can be tricky because:

a process may exit quickly (“short-lived resolver helper”)
ports can be reused
NAT or local proxies may obscure the real origin

So log confidence:

HIGH: direct mapping from kernel/socket owner at time of event
MEDIUM: mapping by lookup shortly after event (possible race)
LOW: inferred / uncertain
NONE: not resolved

Also record why attribution failed:

“permission denied”
“flow already gone”
“unsupported transport”
“ambiguous mapping”

This makes debugging much easier.

7) What PID/PPID adds to your leak definitions

Leak-A (plaintext DNS outside safe path)

Now you can say:

“svchost.exe (PID 1234) sent UDP/53 to ISP resolver on Wi-Fi interface”

Leak-B (split-policy intent leak)

You can catch:

“game launcher looked up blocked domain”
“system service triggered a sensitive name unexpectedly”
“your proxy itself isn’t actually resolving via its own channel”

Leak-C (encrypted DNS bypass)

This becomes very actionable:

“firefox.exe started direct DoH to resolver outside tunnel”

Leak-D (mismatch indicator)

You can also correlate:

DNS resolved by one process
connection made by another process (e.g., local stub vs app)

8) Reporting / realtime logging format (updated)

Realtime log line (human readable)

Example (conceptual):

[P0][Leak-A] Plain DNS leaked
- Domain: example-sensitive.com (A)
- From: Wi-Fi → To: 1.2.3.4:53
- Process: browser.exe PID=4321 PPID=1200
- Policy violated: “No UDP/53 on physical NIC”

Structured event (JSON-style fields)

Minimum recommended fields:

Event identity

event_id
timestamp

DNS identity

transport (udp53/tcp53/dot/doh/unknown)
qname (nullable)
qtype (nullable)

Network path

interface_name
src_ip, src_port
dst_ip, dst_port
route_class (tunnel / physical / loopback)

Process identity (your requested additions)

pid
ppid
process_name
optional:
- exe_path
- cmdline
- user

Detection result

leak_type (A/B/C/D)
severity (P0..P3)
policy_rule_id
attribution_confidence

9) Privacy and safety notes (important in a DNS tool)

Because you’re logging domains and process command lines, this becomes sensitive.

Add a “privacy mode” policy:

Full: store full domain + cmdline
Redacted: hash domain; keep TLD only; truncate cmdline
Minimal: only keep leak counts + resolver IPs + process name

Also allow “capture window” (rotate logs, avoid giant histories).

10) UX feature: “Show me the process tree”

When a leak happens, a good debugger view is:

PID: foo (pid 1000)
- PPID: bar (pid 900)
  - PPID: systemd/svchost/etc

This is extremely useful to identify:

browsers spawning helpers
OS DNS services
containerized processes
update agents / telemetry daemons

So your report generator should support:

✅ Process chain rendering (where possible)

11) Practical edge cases you should detect (with PID helping)

Local stub is fine, upstream isn’t
- Your local resolver process leaks upstream plaintext DNS
Browser uses its own DoH
- process attribution immediately reveals it
Multiple interfaces
- a leak only happens on Wi-Fi but not Ethernet
Kill-switch failure
- when tunnel drops, PID shows which app starts leaking first

18 KiB Raw Blame History Unescape Escape

1) Scope and goals

Goals

Non-goals (be explicit)

2) Define “DNS leak” precisely (your program’s standard)

Standard definition A (classic VPN / tunnel bypass)

Standard definition B (split-policy intent leak)

Standard definition C (encrypted DNS escape / bypass)

Standard definition D (mismatch risk indicator)

3) High-level architecture

Core components

4) Workflow (end-to-end)

Workflow 0: Setup & baseline

Workflow 1: Passive detection loop (always-on)

Workflow 2: Active test suite (on-demand)

Active Test A: “No plaintext DNS escape”

Active Test B: “Fail-closed test”

Active Test C: “App bypass test”

Active Test D: “Split-policy correctness”

5) How to recognize DNS transports (detection mechanics)

Plain DNS (strongest signal)

DoT (port-based, easy)

DoH (harder; heuristic + optional allowlists)

6) Policy model: define “safe DNS path”

Safe DNS path can be defined by one or more of:

7) Leak severity model (useful for real-world debugging)

Severity P0 (critical)

Severity P1 (high)

Severity P2 (medium)

Severity P3 (low / info)

8) Outputs and reporting

Real-time console output (for debugging)

Forensics log (machine-readable)

Summary report

9) Validation strategy (“how do I know my detector is correct?”)

Ground truth tests

10) Recommended “profiles” (makes tool usable)

Profile 1: Full-tunnel VPN

Profile 2: Proxy + local stub (your case)

Profile 3: Split tunnel (geoip + allowlist)

1) New component: Process Attribution Engine (PAE)

Purpose

Position in the pipeline

2) Updated architecture (with process attribution)

Existing modules (from earlier design)

New module

3) Workflow changes (what happens when a potential leak is seen)

Passive detection loop (updated)

4) Process attribution: what to detect and how (high-level)

Inputs PAE needs

Output from PAE

5) Platform support strategy (without implementation detail)

“Attribution Provider” interface

Design rule

6) Attribution confidence + race handling (important!)

7) What PID/PPID adds to your leak definitions

Leak-A (plaintext DNS outside safe path)

Leak-B (split-policy intent leak)

Leak-C (encrypted DNS bypass)

Leak-D (mismatch indicator)

8) Reporting / realtime logging format (updated)

Realtime log line (human readable)

Structured event (JSON-style fields)

Event identity

DNS identity

Network path

Process identity (your requested additions)

Detection result

9) Privacy and safety notes (important in a DNS tool)

10) UX feature: “Show me the process tree”

11) Practical edge cases you should detect (with PID helping)

18 KiB

Raw Blame History