diff --git a/arxiv-style/main.tex b/arxiv-style/main.tex index 51c0480..875fbb7 100644 --- a/arxiv-style/main.tex +++ b/arxiv-style/main.tex @@ -24,7 +24,7 @@ \usepackage{caption} % Better caption spacing % 标题 -\title{Your Paper Title: A Deep Learning Approach for Something} +\title{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis} % 若不需要日期,取消下面一行的注释 \date{} @@ -34,23 +34,23 @@ \ifuniqueAffiliation % 标准作者块 \author{ - David S.~Hippocampus \\ - Department of Computer Science\\ - Cranberry-Lemon University\\ - Pittsburgh, PA 15213 \\ - \texttt{hippo@cs.cranberry-lemon.edu} \\ + Zhenglan Chen \\ + Aberdeen Institute of Data Science and Artificial Intelligence\\ + South China Normal University\\ + Foshan, Guangdong 528225, China \\ + \texttt{20223803054@m.scnu.edu.cn} \\ \And - Elias D.~Striatum \\ - Department of Electrical Engineering\\ - Mount-Sheikh University\\ - Santa Narimana, Levand \\ - \texttt{stariate@ee.mount-sheikh.edu} \\ + Mingzhe Yang \\ + Aberdeen Institute of Data Science and Artificial Intelligence\\ + South China Normal University\\ + Foshan, Guangdong 528225, China \\ + \texttt{20223803063@m.scnu.edu.cn} \\ \And - John Q.~Doe \\ - Department of Mathematics\\ - University of California, Berkeley\\ - Berkeley, CA 94720 \\ - \texttt{johndoe@math.berkeley.edu} + Hongyu Yan \\ + Aberdeen Institute of Data Science and Artificial Intelligence\\ + South China Normal University\\ + Foshan, Guangdong 528225, China \\ + \texttt{20223803065@m.scnu.edu.cn} } \fi @@ -69,11 +69,11 @@ pdfkeywords={Keyword1, Keyword2, Keyword3}, \maketitle \begin{abstract} - Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. Here is the abstract of your paper. +Industrial control systems (ICS) security research is increasingly constrained by the scarcity and non-shareability of realistic traffic and telemetry, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol feature/telemetry level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-horizon temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees categorical legality by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows (L=96) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 ± 0.0079 (continuous), mean JSD = 0.0284 ± 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 ± 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows. \end{abstract} % 关键词 -\keywords{Machine Learning \and Cyber Defense \and Benchmark \and Methodology} +\keywords{Machine Learning \and Cyber Defense \and ICS} % 1. Introduction \section{Introduction} @@ -294,8 +294,13 @@ In terms of method morphology, this direction also naturally supports stronger c % 6. Conclusion \section{Conclusion} \label{sec:conclusion} -In this section, we summarize our contributions and future directions. +This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control system (ICS) security by proposing a practical synthetic telemetry generation framework at the protocol feature level. We introduced Mask-DDPM, a hybrid generator designed explicitly for the mixed-type and multi-scale nature of ICS data, where continuous process dynamics must remain temporally coherent while discrete supervisory variables must remain categorically legal by construction. +Our main contributions are: (i) a causal Transformer trend module that provides a stable long-horizon temporal scaffold for continuous channels; (ii) a trend-conditioned residual DDPM that focuses modeling capacity on local stochastic detail and marginal fidelity without destabilizing global structure; (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees in-vocabulary outputs and supports semantics-aware conditioning on continuous context; and (iv) a type-aware decomposition/routing layer that aligns model mechanisms with heterogeneous ICS variable origins (e.g., process inertia, step-and-dwell setpoints, deterministic derived tags), enabling deterministic enforcement where appropriate and improving capacity allocation. + +We evaluated the approach on windows derived from the HAI Security Dataset and reported mixed-type, protocol-relevant metrics rather than a single aggregate score. Across seeds, the model achieves stable fidelity with mean KS = 0.3311 ± 0.0079 on continuous features, mean JSD = 0.0284 ± 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference 0.2684 ± 0.0027, indicating that Mask-DDPM preserves both marginal distributions and short-horizon dynamics while maintaining discrete legality. + +Overall, Mask-DDPM provides a reproducible foundation for generating shareable, semantically valid ICS feature sequences suitable for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Building on this capability, a natural next step is to move from purely legal synthesis toward controllable scenario construction, including structured attack/violation injection under engineering constraints to support adversarial evaluation and more comprehensive security benchmarks. % 参考文献 \bibliographystyle{unsrtnat} \bibliography{references}