forked from manbo/internal-docs
112 lines
11 KiB
TeX
112 lines
11 KiB
TeX
\documentclass{article}
|
|
|
|
\usepackage{arxiv}
|
|
|
|
\usepackage[utf8]{inputenc} % allow utf-8 input
|
|
\usepackage[T1]{fontenc} % use 8-bit T1 fonts
|
|
\usepackage{hyperref} % hyperlinks
|
|
\usepackage{url} % simple URL typesetting
|
|
\usepackage{booktabs} % professional-quality tables
|
|
\usepackage{amsfonts} % blackboard math symbols
|
|
\usepackage{nicefrac} % compact symbols for 1/2, etc.
|
|
\usepackage{microtype} % microtypography
|
|
\usepackage{cleveref} % smart cross-referencing
|
|
\usepackage{lipsum} % Can be removed after putting your text content
|
|
\usepackage{graphicx}
|
|
\usepackage{natbib}
|
|
\usepackage{doi}
|
|
|
|
% 标题
|
|
\title{Your Paper Title: A Deep Learning Approach for Something}
|
|
|
|
% 若不需要日期,取消下面一行的注释
|
|
%\date{}
|
|
|
|
\newif\ifuniqueAffiliation
|
|
\uniqueAffiliationtrue
|
|
|
|
\ifuniqueAffiliation % 标准作者块
|
|
\author{
|
|
David S.~Hippocampus \\
|
|
Department of Computer Science\\
|
|
Cranberry-Lemon University\\
|
|
Pittsburgh, PA 15213 \\
|
|
\texttt{hippo@cs.cranberry-lemon.edu} \\
|
|
\And
|
|
Elias D.~Striatum \\
|
|
Department of Electrical Engineering\\
|
|
Mount-Sheikh University\\
|
|
Santa Narimana, Levand \\
|
|
\texttt{stariate@ee.mount-sheikh.edu} \\
|
|
\And
|
|
John Q.~Doe \\
|
|
Department of Mathematics\\
|
|
University of California, Berkeley\\
|
|
Berkeley, CA 94720 \\
|
|
\texttt{johndoe@math.berkeley.edu}
|
|
}
|
|
\fi
|
|
|
|
% 页眉设置
|
|
\renewcommand{\shorttitle}{\textit{arXiv} Template}
|
|
|
|
%%% PDF 元数据
|
|
\hypersetup{
|
|
pdftitle={Your Paper Title},
|
|
pdfsubject={cs.LG, cs.CR},
|
|
pdfauthor={David S.~Hippocampus, Elias D.~Striatum},
|
|
pdfkeywords={Keyword1, Keyword2, Keyword3},
|
|
}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
Here is the abstract of your paper.
|
|
\end{abstract}
|
|
|
|
% 关键词
|
|
\keywords{Machine Learning \and Cyber Defense \and Benchmark \and Methodology}
|
|
|
|
% 1. Introduction
|
|
\section{Introduction}
|
|
\label{sec:intro}
|
|
Here introduces the background, problem statement, and contribution.
|
|
|
|
% 2. Related Work
|
|
\section{Related Work}
|
|
\label{sec:related}
|
|
Early generation of network data oriented towards ``realism'' mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \citep{10.1145/1151659.1159928,10.1145/1159913.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \citep{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \citep{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \citep{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \citep{7469060,10.1145/3055366.3055375}. This makes ``synthesis at the feature/telemetry level, aware of protocol and semantics'' more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from ``packet syntax reproduction'' to ``modeling of high-level spatio-temporal distributions and uncertainties'', requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
|
|
|
|
Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \citep{NEURIPS2020_4c5bcfec}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \citep{song2021scorebasedgenerativemodelingstochastic}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \citep{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \citep{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \citep{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \citep{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \citep{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \citep{11087622}. These results overall point to a conclusion: when the research focus is on ``telemetry/high-level features'' rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a ``monolithic diffusion'' is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
|
|
|
|
Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be ``legal'' under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \citep{austin2023structureddenoisingdiffusionmodels}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \citep{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \citep{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \citep{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of ``sequence legality and state coverage'' \citep{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of ``legality by construction'' in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
|
|
|
|
From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-modal fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \citep{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \citep{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \citep{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \citep{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \citep{2023}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \citep{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
|
|
|
|
% 3. Methodology
|
|
\section{Methodology}
|
|
\label{sec:method}
|
|
Here we describe our proposed method in detail.
|
|
|
|
% 4. Benchmark
|
|
\section{Benchmark}
|
|
\label{sec:benchmark}
|
|
In this section, we present the experimental setup and results.
|
|
|
|
% 5. Future Work
|
|
\section{Future Work}
|
|
\label{sec:future}
|
|
In this section, we present the future work.
|
|
|
|
% 6. Conclusion
|
|
\section{Conclusion}
|
|
\label{sec:conclusion}
|
|
In this section, we summarize our contributions and future directions.
|
|
|
|
% 参考文献
|
|
\bibliographystyle{unsrtnat}
|
|
\bibliography{references}
|
|
|
|
\end{document}
|