diff --git a/arxiv-style/main.tex b/arxiv-style/main.tex index d7906b6..8f2f127 100644 --- a/arxiv-style/main.tex +++ b/arxiv-style/main.tex @@ -387,25 +387,12 @@ The routing ablation supplies the most instructive counterexample. Disabling typ Taken together, the benchmark now supports a sharper claim than a plain KS/JSD table could offer. Mask-DDPM already provides stable mixed-type fidelity, perfect discrete legality, and a meaningful amount of continuous realism. The remaining error is concentrated in a small subset of ICS-specific channels whose realism depends on rare switching, long dwell intervals, constrained occupancy, and persistent local dynamics. The ablation study clarifies why: temporal staging protects dynamical realism, quantile-based shaping protects continuous fidelity and downstream utility, and type-aware routing protects coordinated mechanism-level behavior even when simpler metrics do not fully reveal its value. -% 5. Future Work -\section{Future Work} -\label{sec:future} -Future work will further expand from "generating legal ICS feature sequences" to "data construction and adversarial evaluation for security tasks". The core contribution of this paper focuses on generating feature sequences that are temporally consistent, have credible distributions, and have legal discrete values under mixed types and multi-scale dynamics. However, in the actual research of intrusion detection and anomaly detection, the more critical bottleneck is often the lack of "illegal data/anomaly data" with clear attack semantics and sufficient coverage. Therefore, a direct and important extension direction is to use the legal sequences generated in this paper as a controllable and reproducible "base line operation flow", and then, on the premise of maintaining sequence-level legality and engineering constraints, inject or mix illegal behaviors according to specified attack patterns, thereby systematically constructing a dataset for training and evaluating the recognition of illegal data packets. - -Specifically, attack injection can be upgraded from "simple perturbation" to "semantically consistent patterned rewriting": on continuous channels, implement bias injection, covert manipulation near thresholds, instantaneous mutations, and intermittent bursts, etc., so that it can both mimic the temporal characteristics pursued by attackers for concealment and not violate the basic boundary conditions of process dynamics; on discrete channels, implement illegal state transitions, alarm suppression/delayed triggering, pattern camouflage, etc., so that it reflects the trajectory morphology of "unreachable but forcibly created" under real control logic. Furthermore, the attack injection process itself can be coordinated with the type routing and constraint layer in this paper: for deterministically derived variables, illegal behaviors should be transmitted through the modification of upstream variables to maintain consistency; for supervised variables constrained by finite-state machines, interpretable illegal transitions should be generated through the "minimum violation path" or "controlled violation intensity", and violation points and violation types should be explicitly marked to facilitate downstream detection tasks to learn more fine-grained discrimination criteria. - -In terms of method morphology, this direction also naturally supports stronger controllability and measurability: attack patterns can be regarded as conditional variables to uniformly conditionally orchestrate legitimate generation and illegal injection, generating control samples of "different attack strategies under the same legitimate framework", thereby transforming dataset construction into a repeatable scenario generation process; meanwhile, by controlling the injection location, duration, amplitude, and coupling range, the performance degradation curves of detectors under different threat intensities and different operating condition stages can be systematically scanned, forming a more stable benchmark than "single acquisition/single script". Ultimately, this approach will transform the legitimate data generation capabilities presented in this paper into the infrastructure for security research: first providing a shareable and reproducible legitimate operation distribution, then injecting illegal patterns with clear semantics in a controllable manner, producing a dataset with sufficient coverage and consistent annotation for training and evaluating models that identify illegal packets/abnormal sequences, and promoting the improvement of reproducibility and engineering credibility in this direction. - -% 6. Conclusion -\section{Conclusion} +% 5. Conclusion and Future Work +\section{Conclusion and Future Work} \label{sec:conclusion} -This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control system (ICS) security by proposing a practical synthetic telemetry generation framework at the protocol feature level. We introduced Mask-DDPM, a hybrid generator designed explicitly for the mixed-type and multi-scale nature of ICS data, where continuous process dynamics must remain temporally coherent while discrete supervisory variables must remain categorically legal by construction. +This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control systems (ICS) security by proposing Mask-DDPM, a hybrid synthetic telemetry generator at the protocol-feature level. By combining a causal Transformer trend module, a trend-conditioned residual DDPM, a masked diffusion branch for discrete variables, and a type-aware routing layer, the framework preserves long-horizon temporal structure, improves local distributional fidelity, and guarantees discrete semantic legality. On windows derived from the HAI Security Dataset, the model achieves stable mixed-type fidelity across seeds, with mean KS = 0.3311 $\pm$ 0.0079 on continuous features, mean JSD = 0.0284 $\pm$ 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027. -Our main contributions are: (i) a causal Transformer trend module that provides a stable long-horizon temporal scaffold for continuous channels; (ii) a trend-conditioned residual DDPM that focuses modeling capacity on local stochastic detail and marginal fidelity without destabilizing global structure; (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees in-vocabulary outputs and supports semantics-aware conditioning on continuous context; and (iv) a type-aware decomposition/routing layer that aligns model mechanisms with heterogeneous ICS variable origins (e.g., process inertia, step-and-dwell setpoints, deterministic derived tags), enabling deterministic enforcement where appropriate and improving capacity allocation. - -We evaluated the approach on windows derived from the HAI Security Dataset and reported mixed-type, protocol-relevant metrics rather than a single aggregate score. Across seeds, the model achieves stable fidelity with mean KS = 0.3311 $\pm$ 0.0079 on continuous features, mean JSD = 0.0284 $\pm$ 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference 0.2684 $\pm$ 0.0027, indicating that Mask-DDPM preserves both marginal distributions and short-horizon dynamics while maintaining discrete legality. - -Overall, Mask-DDPM provides a reproducible foundation for generating shareable, semantically valid ICS feature sequences suitable for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Building on this capability, a natural next step is to move from purely legal synthesis toward controllable scenario construction, including structured attack/violation injection under engineering constraints to support adversarial evaluation and more comprehensive security benchmarks. +Overall, Mask-DDPM provides a reproducible foundation for generating shareable and semantically valid ICS feature sequences for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Future work will proceed in two complementary directions. Vertically, we will strengthen the theoretical foundation of the framework by introducing more explicit control-theoretic constraints, structured state-space or causal priors, and formal transition models for supervisory logic, so that legality, stability, and cross-channel coupling can be characterized more rigorously. Horizontally, we will extend the framework beyond the current setting to additional industrial control protocols such as Modbus/TCP, DNP3, IEC 104, and OPC UA, and investigate analogous adaptations to automotive communication protocols such as CAN/CAN FD and automotive Ethernet. A related extension is controllable attack or violation injection on top of legal base traces, enabling reproducible adversarial benchmarks for anomaly detection and intrusion-detection studies. \bibliographystyle{unsrtnat} \bibliography{references}