12 KiB
Conversation summary (so far)
-
You’re building a Modbus/TCP traffic generation system for security use with this pipeline: feature extraction → hybrid diffusion generates features → generator turns features into raw packets → checker validates traffic.
-
I proposed a checker that validates traffic at multiple layers:
- frame/IP/TCP parseability,
- optional TCP conversation sanity,
- Modbus/TCP structural + semantic validity (MBAP + PDU),
- optional “feature-fidelity” checks against generator sidecar metadata.
-
I recommended a generator → checker contract that is easy to debug and automate: PCAP/PCAPNG + JSONL sidecar (1 metadata line per packet/event).
-
You then shared your existing descriptor-driven Modbus parser (Rust) and a short doc. The public API/type model is:
Config → FunctionDescriptor → FieldDescriptor(FieldType, length/length_from/scale/enum_map)andparse_sawp_message(...)returns JSON containingunit,function, optionalexception, andfields. -
I said: the descriptor engine is very reusable for the checker, but the checker’s entry point should ideally work on raw Modbus/TCP bytes (MBAP+PDU), not require
sawp_modbus::Message. Also: for checker ergonomics, prefer structured error types overResult<Value, String>(but you can keep JSON output for the MVP).
Implementation document: Modbus/TCP Checker (Rust)
This document is a practical, “not too complex” plan to implement the checker while still following good practices where they don’t add much difficulty.
1) What the checker does (goals / non-goals)
Goals
The checker verifies that generator output is:
- Parsable as TCP/IP traffic,
- Modbus/TCP-valid at the application level (MBAP + PDU rules),
- Consistent in request/response pairing (Transaction ID matching),
- Optionally matches the expected features (function code, unit id, quantities, payload size, timing tolerances, etc.).
Non-goals (to keep it simple)
To avoid turning this into a full Wireshark, we deliberately do not implement:
- full TCP stream reassembly (segments split/merged),
- full TCP state machine with retransmits/out-of-order handling,
- IP/TCP checksum verification by default.
Instead, we enforce a generator constraint: one Modbus ADU per TCP payload (no segmentation, no coalescing). This single constraint dramatically reduces checker complexity and is realistic for generated traces.
Trade-off: best practice would handle segmentation/coalescing and reassembly; difficulty rises a lot. The “one ADU per TCP payload” rule is the best complexity/benefit lever for this project.
2) Generator output contract (what the checker consumes)
Recommended output (MVP-friendly and debuggable)
(A) PCAP or PCAPNG file
trace.pcapng(or.pcap) containing the raw generated packets
(B) Sidecar JSONL metadata file
trace.meta.jsonlwhere each line describes the corresponding packet/event (same order)
This is the easiest way to:
- reproduce failures,
- correlate packet index with expected semantic fields,
- produce actionable reports.
JSONL schema (minimal + optional)
Minimal fields (recommended):
trace_id(string/uuid)event_id(monotonic integer)pcap_index(or implicit by line number)ts_nstimestampdirection("c2s"or"s2c")flow(src/dst ip/port)
Optional expected block (for feature-fidelity checks):
expected.modbus.transaction_id,unit_id,function_code, andexpected.fields(names matching your descriptor JSON).
Example line:
{
"trace_id": "c7f1...",
"event_id": 42,
"pcap_index": 42,
"ts_ns": 1736451234567890123,
"direction": "c2s",
"flow": {"src_ip":"10.0.0.10","src_port":51012,"dst_ip":"10.0.0.20","dst_port":502},
"expected": {
"modbus": {"transaction_id": 513, "unit_id": 1, "function_code": 3},
"fields": {"starting_address": 0, "quantity": 10}
}
}
Trade-off: best practice is “self-describing PCAP” (pcapng custom blocks, or embedding metadata); difficulty higher. JSONL sidecar is dead simple and works well.
3) Workflow (starting from generator output)
Step 0 — Load inputs
- Read
trace.meta.jsonlinto a lightweight iterator (don’t load all if trace is huge). - Open
trace.pcapngand stream packets in order.
Step 1 — Align packets and metadata
For each packet index i:
- read packet
ifrom PCAP - read metadata line
ifrom JSONL If mismatch (missing line/packet), record a Fatal alignment error and stop (or continue with “best effort”, your call).
Step 2 — Decode packet and extract TCP payload
Decode:
- link layer (Ethernet/SLL/RAW depending on PCAP linktype),
- IPv4/IPv6,
- TCP,
- extract TCP payload bytes.
Minimal checks:
- packet parses,
- TCP payload length > 0 when direction indicates Modbus message,
- port 502 is present on either side (configurable if you generate non-502).
Step 3 — Parse Modbus/TCP ADU
Assuming payload contains exactly one ADU:
- parse MBAP (7 bytes) + PDU
- validate basic MBAP invariants
- parse function code and PDU data
- decide request vs response based on
direction - parse PDU data using descriptor map (your reusable part)
Step 4 — Stateful consistency checks
Maintain per-flow state:
- request/response pairing by
(transaction_id, unit_id) - outstanding request table with timeout/window limits
Step 5 — Feature-fidelity checks (optional)
If expected exists in JSONL:
- compare decoded modbus header + parsed fields with expected values
- compare sizes and (optionally) timing with tolerances
Step 6 — Emit report
Output:
report.jsonwith summary + per-finding samples (packet indices, flow key, reason, extracted fields)- optional
report.txtfor quick reading
4) Reusing your existing parser (what to keep, what to adjust)
You already have:
- A descriptor model (
Config/FunctionDescriptor/FieldDescriptor/FieldType) - A function that returns a JSON representation with the shape the checker wants (
unit,function, optionalexception,fields)
4.1 What is immediately reusable
Highly reusable for the checker:
- Descriptor loading (serde)
- Field decoding logic (length/length_from, scale, enum_map)
- The “JSON output” idea for reporting and debugging
4.2 Small design adjustment to make reuse clean (recommended)
Your checker will naturally see raw TCP payload bytes. So the lowest-friction integration is:
-
Implement a tiny MBAP parser in the checker:
- returns
(transaction_id, protocol_id, length, unit_id, function_code, pdu_data)
- returns
-
Then call your descriptor-based decoder on
pdu_data(bytes after function code)
Your doc shows the parser conceptually returns JSON with fields and supports request vs response descriptors , which maps perfectly to direction.
Suggested public entrypoint to expose from your parser module:
parse_with_descriptor(pdu_data: &[u8], unit: u8, function: u8, fields: &Vec<FieldDescriptor>) -> Result<Value, String>
If it’s currently private, just make it pub(crate) or pub and reuse it. This avoids binding the checker to sawp_modbus::Message and keeps implementation simple.
Trade-off: best practice would be to return a typed struct + typed errors; easier to maintain long term but more refactor work. For your “don’t make it hard” requirement, keeping JSON output + simple error types is totally fine for the first version.
4.3 How the checker chooses which descriptor to use
- If
direction == c2s→ request descriptor - If
direction == s2c→ response descriptor This matches the intent of havingrequestandresponsedescriptor vectors in your model .
5) Checker internal design (simple but extensible)
5.1 Core data structures
FlowKey { src_ip, src_port, dst_ip, dst_port, ip_version }PacketCtx { trace_id, event_id, pcap_index, ts_ns, direction, flow }DecodedModbus { transaction_id, protocol_id, length, unit_id, function_code, is_exception, exception_code?, pdu_data, parsed_fields_json? }
5.2 “Rules” model (optional, but keeps code tidy)
Instead of huge if/else blocks, implement a few rules that return findings:
RuleMbapValidRuleFunctionPduWellFormed(basic length sanity)RuleTxIdPairingRuleExpectedMatch(only if sidecar has expected)
If you don’t want a formal trait system initially, just implement these as functions that append to a Vec<Finding>.
5.3 Findings + severity
Use a compact severity scale:
Fatal: cannot parse / cannot continue reliablyError: protocol invalidWarn: unusual but maybe acceptableInfo: stats
A finding should include:
pcap_index,event_id,flow,severity,code,message- optional
observedandexpectedsnippets
6) What the checker validates (MVP vs stricter)
MVP validations (recommended first milestone)
-
PCAP + JSONL aligned
-
Parse Ethernet/IP/TCP and extract payload
-
MBAP:
- payload length ≥ 7
- length field consistency (basic)
-
PDU:
- function code exists
- exception handling if
fc & 0x80 != 0
-
Descriptor parse success (request/response based on direction)
-
Transaction pairing:
- every response matches an outstanding request by transaction_id/unit_id
- no duplicate outstanding txid unless you allow it
“Strict mode” additions (still reasonable)
-
enforce unit_id range (if you want)
-
enforce function-code-specific invariants using parsed fields
- e.g.,
byte_count == 2 * quantityfor register reads/writes (if present in descriptor)
- e.g.,
-
timeouts:
- response must arrive within configured window
Heavy features (avoid unless needed)
- TCP reassembly and multi-ADU per segment
- checksum verification
- handling retransmits/out-of-order robustly
7) Dependencies (crates) for the checker
Minimal set (keeps implementation easy)
-
PCAP reading
pcap(libpcap-backed; you already use it in your codebase)
-
Packet decoding
pnet_packet(you already usepnetpatterns)
-
Config + sidecar + report
serde,serde_json
-
Errors + logging
anyhow(fast to integrate) and/orthiserror(nicer structured errors)tracing,tracing-subscriber
-
Utilities
hashbrown(optional; std HashMap is fine)hex(useful for debug/trailing bytes like your parser does)
If you want to reduce external requirements (optional alternative)
- Replace
pcapwithpcap-file(pure Rust; no libpcap dependency) - Replace
pnetwithetherparse(often simpler APIs)
Trade-off: “best practice” for portability is pure Rust (
pcap-file+etherparse). “Best practice” for least effort given your current code is reusingpcap+pnet.
8) Suggested project layout (simple)
checker/
src/
main.rs # CLI entry
config.rs # descriptor loading
meta.rs # JSONL reader structs
pcap_in.rs # pcap streaming
decode.rs # ethernet/ip/tcp extract payload
mbap.rs # Modbus/TCP MBAP parsing
modbus_desc.rs # reuse your parse_with_descriptor + types
state.rs # outstanding tx table
validate.rs # main validation pipeline
report.rs # report structs + JSON output
9) Practical implementation tips (to keep it from getting “hard”)
-
Enforce generator constraints:
- one ADU per TCP payload
- no splitting/coalescing This keeps checker complexity low and makes failure reasons obvious.
-
Keep JSON output for parsed fields at first:
- You already have a clean JSON shape (
unit,function,fields) - Great for debugging mismatches with
expected.fields
- You already have a clean JSON shape (
-
Add strictness as “modes”:
--mode=mvp | strict- or config file toggles
-
Fail-fast vs best-effort:
- For CI or batch filtering, fail-fast on
Fatalis fine. - For research/debugging, best-effort (continue and collect findings) is more useful.
- For CI or batch filtering, fail-fast on