WTFnet/docs/dns_poisoning_design.md

# DNS Poisoning Detection Design

This document summarizes the current implementation approach for detecting DNS poisoning in active probing, and the planned design for passive methods.

## Active probing (current implementation)

### Overview
- Active probing compares answers from multiple resolvers for the same domain and record type.
- The current CLI command is `dns detect <domain>`.
- The current implementation focuses on deterministic, best-effort heuristics and avoids OS-specific parsing.

### Inputs
- Domain name.
- Resolver list: either user-provided via `--servers` or default public resolvers.
- Transport: UDP/TCP/DoT/DoH.
- Optional SOCKS5 proxy for DoH queries (`--socks5`).
- Repeat count: `--repeat` (>= 1).
- Timeout: `--timeout-ms`.

### Query flow
1. For each resolver and each repeat, issue a DNS A query using `hickory-resolver`.
2. Collect a `DnsQueryReport` that includes:
   - `domain`, `record_type`, `transport`, `server`, `server_name`, `rcode`, `answers`, `duration_ms`.
3. Enrich results in the CLI with GeoIP:
   - `server_geoip` based on the resolver IP.
   - Per-answer GeoIP when answer data is an IP (A/AAAA).

### Current heuristics
The detect verdict is derived from the following checks across all results:
- **RCODE divergence**: mismatch in response code across resolvers.
- **Answer divergence**: different answer sets across resolvers.
- **Private/reserved answers**: any A/AAAA in private/reserved space.
- **TTL variance**: wide TTL span (currently > 3600s).

### Verdict mapping
- `clean`: no evidence found.
- `inconclusive`: only one evidence signal or no results.
- `suspicious`: two or more evidence signals.

### Output
- JSON output returns a list of per-resolver reports plus evidence.
- Human output shows verdict, evidence, and per-resolver summaries with GeoIP.
- Reports also include transport, server name (for DoT/DoH), and proxy (if used).

### Rationale and limitations
- This approach is deterministic and does not rely on parsing OS tools.
- False positives may occur due to legitimate geo-load balancing or CDN behavior.
- DNSSEC validation is not currently used in detection logic.

## Passive methods (planned design)

### Goals
- Observe DNS responses and correlate with active results.
- Identify anomalies without injecting traffic.

### Passive data sources (feature gated)
- Packet capture via `pcap` or `pnet` (root/admin privileges needed).
- Optional system resolver logs if available (platform-specific; best-effort).

### Planned pipeline
1. Capture DNS responses (UDP/TCP, port 53; optionally DoH/DoT if visible).
2. Parse responses into normalized records:
   - `domain`, `record_type`, `rcode`, `answers`, `ttl`, `server_ip`.
3. Maintain short-term rolling windows (time-bounded) to:
   - detect sudden shifts in answers
   - detect private/reserved answers for public domains
   - detect TTL anomalies compared to historical baseline

### Planned heuristics
- **Answer churn**: frequent changes in answer sets beyond normal CDN variance.
- **Resolver mismatch**: passive answers conflict with known public resolver responses.
- **Suspicious IP ranges**: private/reserved or local ISP blocks where not expected.
- **Low TTL bursts**: sudden TTL drops that persist for short windows.

### Output (planned)
- Passive summaries include:
  - top domains observed
  - divergence counts
  - suspicious answer summaries
  - optional GeoIP enrichment for answer IPs and resolver IPs

### Privacy and safety notes
- Passive capture should be explicit and opt-in.
- Store minimal metadata and avoid payload logging beyond DNS fields.