Update gitignore to exclude pdfs

Refactor: delete border of fig design v4
New pdf
2026-04-21 11:10:05 +08:00 · 2026-04-21 10:38:21 +08:00 · 2026-04-21 00:20:06 +08:00 · 2026-04-21 00:13:04 +08:00 · 2026-04-21 00:08:30 +08:00 · 2026-04-20 23:46:53 +08:00
201 changed files with 32171 additions and 4 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,9 @@
+papers/Topic8 Network Traffic Generation/NetDiffusion Network Data Augmentation Through Protocol-Constrained Traffic Generation/NetDiffusion Network Data Augmentation Through Protocol-Constrained Traffic Gener_1_33_translate_20260130214700.pdf
+arxiv-style/*.pdf
+arxiv-style/*.aux
+arxiv-style/*.log
+arxiv-style/*.blg
+arxiv-style/*.bbl
+arxiv-style/*.out
+.DS_Store
+paper/
--- a/API/whatsneeded.md
+++ b/API/whatsneeded.md
@@ -0,0 +1,32 @@
+# What's needed for development&research?
+
+Group members should describe their requirements and dependencies in detail in this document.
+
+Current group members:  
+
+- HY YAN
+
+- MZ YANG
+
+- ZL CHEN
+
+
+The following subheadings are sections for descriptions of requirements, respectively.
+
+
+## ZL CHEN
+
+{placeholder}
+
+
+
+## MZ YANG
+
+{placeholder}
+
+
+
+## HY YAN
+
+{placeholder}
+
--- a/LaTeX2e+Proceedings+Templates+download/fig-benchmark-ablations-v1.png
+++ b/LaTeX2e+Proceedings+Templates+download/fig-benchmark-ablations-v1.png
--- a/LaTeX2e+Proceedings+Templates+download/fig-benchmark-story-v2.png
+++ b/LaTeX2e+Proceedings+Templates+download/fig-benchmark-story-v2.png
--- a/LaTeX2e+Proceedings+Templates+download/fig-design-v4-from-user-svg-cropped.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/fig-design-v4-from-user-svg-cropped.pdf
--- a/LaTeX2e+Proceedings+Templates+download/fig1.eps
+++ b/LaTeX2e+Proceedings+Templates+download/fig1.eps
--- a/LaTeX2e+Proceedings+Templates+download/history.txt
+++ b/LaTeX2e+Proceedings+Templates+download/history.txt
@@ -0,0 +1,157 @@
+Version history for the LLNCS LaTeX2e class
+
+ date     filename      version   action/reason/acknowledgements
+----------------------------------------------------------------------------
+ 29.5.96  letter.txt      beta    naming problems (subject index file)
+                                  thanks to Dr. Martin Held, Salzburg, AT
+
+          subjindx.ind            renamed to subjidx.ind as required
+                                  by llncs.dem
+
+          history.txt             introducing this file
+
+ 30.5.96  llncs.cls               incompatibility with new article.cls of
+                                  1995/12/20 v1.3q Standard LaTeX document class,
+                                  \if@openbib is no longer defined,
+                                  reported by Ralf Heckmann and Graham Gough
+                                  solution by David Carlisle
+
+ 10.6.96  llncs.cls               problems with fragile commands in \author field
+                                  reported by Michael Gschwind, TU Wien
+
+ 25.7.96  llncs.cls               revision a corrects:
+                                  wrong size of text area, floats not \small,
+                                  some LaTeX generated texts
+                                  reported by Michael Sperber, Uni Tuebingen
+
+ 16.4.97  all files        2.1    leaving beta state,
+                                  raising version counter to 2.1
+
+  8.6.97  llncs.cls        2.1a   revision a corrects:
+                                  unbreakable citation lists, reported by
+                                  Sergio Antoy of Portland State University
+
+11.12.97  llncs.cls        2.2    "general" headings centered; two new elements
+                                  for the article header: \email and \homedir;
+                                  complete revision of special environments:
+                                  \newtheorem replaced with \spnewtheorem,
+                                  introduced the theopargself environment;
+                                  two column parts made with multicol package;
+                                  add ons to work with the hyperref package
+
+07.01.98  llncs.cls        2.2    changed \email to simply switch to \tt
+
+25.03.98  llncs.cls        2.3    new class option "oribibl" to suppress
+                                  changes to the thebibliograpy environment
+                                  and retain pure LaTeX codes - useful
+                                  for most BibTeX applications
+
+16.04.98  llncs.cls        2.3    if option "oribibl" is given, extend the
+                                  thebibliograpy hook with "\small", suggested
+                                  by Clemens Ballarin, University of Cambridge
+
+20.11.98  llncs.cls        2.4    pagestyle "titlepage" - useful for
+                                  compilation of whole LNCS volumes
+
+12.01.99  llncs.cls        2.5    counters of orthogonal numbered special
+                                  environments are reset each new contribution
+
+27.04.99  llncs.cls        2.6    new command \thisbottomragged for the
+                                  actual page; indention of the footnote
+                                  made variable with \fnindent (default 1em);
+                                  new command \url that copys its argument
+
+ 2.03.00  llncs.cls        2.7    \figurename and \tablename made compatible
+                                  to babel, suggested by Jo Hereth, TU Darmstadt;
+                                  definition of \url moved \AtBeginDocument
+                                  (allows for url package of Donald Arseneau),
+                                  suggested by Manfred Hauswirth, TU of Vienna;
+                                  \large for part entries in the TOC
+
+16.04.00  llncs.cls        2.8    new option "orivec" to preserve the original
+                                  vector definition, read "arrow" accent
+
+17.01.01  llncs.cls        2.9    hardwired texts made polyglot,
+                                  available languages: english (default),
+                                  french, german - all are "babel-proof"
+
+20.06.01  splncs.bst              public release of a BibTeX style for LNCS,
+                                  nobly provided by Jason Noble
+
+14.08.01  llncs.cls        2.10   TOC: authors flushleft,
+                                  entries without hyphenation; suggested
+                                  by Wiro Niessen, Imaging Center - Utrecht
+
+23.01.02  llncs.cls        2.11   fixed footnote number confusion with
+                                  \thanks, numbered institutes, and normal
+                                  footnote entries; error reported by
+                                  Saverio Cittadini, Istituto Tecnico
+                                  Industriale "Tito Sarrocchi" - Siena
+
+28.01.02  llncs.cls        2.12   fixed footnote fix; error reported by
+                                  Chris Mesterharm, CS Dept. Rutgers - NJ
+
+28.01.02  llncs.cls        2.13   fixed the fix (programmer needs vacation)
+
+17.08.04  llncs.cls        2.14   TOC: authors indented, smart \and handling
+                                  for the TOC suggested by Thomas Gabel
+                                  University of Osnabrueck
+
+07.03.06  splncs.bst              fix for BibTeX entries without year; patch
+                                  provided by Jerry James, Utah State University
+
+14.06.06  splncs_srt.bst          a sorting BibTeX style for LNCS, feature
+                                  provided by Tobias Heindel, FMI Uni-Stuttgart
+
+16.10.06  llncs.dem        2.3    removed affiliations from \tocauthor demo
+
+11.12.07  llncs.doc               note on online visibility of given e-mail address
+
+15.06.09  splncs03.bst            new BibTeX style compliant with the current
+                                  requirements, provided by Maurizio "Titto"
+                                  Patrignani of Universita' Roma Tre
+
+30.03.10  llncs.cls        2.15   fixed broken hyperref interoperability;
+                                  patch provided by Sven Koehler,
+                                  Hamburg University of Technology
+
+15.04.10  llncs.cls        2.16   fixed hyperref warning for informatory TOC entries;
+                                  introduced \keywords command - finally;
+                                  blank removed from \keywordname, flaw reported
+                                  by Armin B. Wagner, IGW TU Vienna
+
+15.04.10  llncs.cls        2.17   fixed missing switch "openright" used by \backmatter;
+                                  flaw reported by Tobias Pape, University of Potsdam
+
+27.09.13  llncs.cls        2.18   fixed "ngerman" incompatibility; solution provided
+                                  by Bastian Pfleging, University of Stuttgart
+
+04.09.17  llncs.cls        2.19   introduced \orcidID command
+
+10.03.18  llncs.cls        2.20   adjusted \doi according to CrossRef requirements;
+                                  TOC: removed affiliation numbers
+
+          splncs04.bst            added doi field;
+                                  bold journal numbers
+
+          samplepaper.tex         new sample paper
+
+          llncsdoc.pdf            new LaTeX class documentation
+
+12.01.22  llncs.cls        2.21   fixed German and French \maketitle, bug reported by
+                                  Alexander Malkis, Technical University of Munich;
+                                  use detokenized argument in the definition of \doi
+                                  to allow underscores in DOIs
+
+05.09.22  llncs.cls        2.22   robust redefinition of \vec (bold italics), bug
+                                  reported by Alexander Malkis, TUM
+
+02.11.23  llncs.cls        2.23   \ackname changed from "Acknowledgements" (BE) to
+                                  "Acknowledgments" (AE).
+                                  \discintname introduced for the new, mandatory
+                                  section "Disclosure of Interests".
+                                  New "credits" environment introduced to provide
+                                  small run-in headings for "Acknowledgments" and
+                                  the "Disclosure of Interests".
+
+29.01.22  llncs.cls        2.24   bugfixes for options envcountsame and envcountsect
--- a/LaTeX2e+Proceedings+Templates+download/llncs.cls
+++ b/LaTeX2e+Proceedings+Templates+download/llncs.cls
--- a/LaTeX2e+Proceedings+Templates+download/llncsdoc.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/llncsdoc.pdf
--- a/LaTeX2e+Proceedings+Templates+download/main.aux
+++ b/LaTeX2e+Proceedings+Templates+download/main.aux
@@ -0,0 +1,161 @@
+\relax 
+\citation{10.1007/s10844-022-00753-1,Nankya2023-gp}
+\@writefile{toc}{\contentsline {title}{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}{1}{}\protected@file@percent }
+\@writefile{toc}{\authcount {4}}
+\@writefile{toc}{\contentsline {author}{Zhenglan Chen \and Mingzhe Yang \and Hongyu Yan \and Huan Yang}{1}{}\protected@file@percent }
+\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}{}\protected@file@percent }
+\newlabel{sec:intro}{{1}{1}{}{section.1}{}}
+\citation{shin}
+\citation{info16100910}
+\citation{pmlr-v202-kotelnikov23a,rasul2021autoregressivedenoisingdiffusionmodels}
+\citation{jiang2023netdiffusionnetworkdataaugmentation}
+\citation{pmlr-v202-kotelnikov23a}
+\citation{10.1145/1151659.1159928}
+\citation{Ring_2019}
+\citation{10.1145/3544216.3544251}
+\citation{Lin_2020}
+\citation{7469060,10.1145/3055366.3055375}
+\citation{ho2020denoising}
+\citation{song2021score}
+\citation{rasul2021autoregressivedenoisingdiffusionmodels}
+\citation{tashiro2021csdiconditionalscorebaseddiffusion}
+\citation{wen2024diffstgprobabilisticspatiotemporalgraph}
+\citation{liu2023pristiconditionaldiffusionframework}
+\citation{kong2021diffwaveversatilediffusionmodel}
+\citation{11087622}
+\@writefile{toc}{\contentsline {section}{\numberline {2}Related Work}{3}{}\protected@file@percent }
+\newlabel{sec:related}{{2}{3}{}{section.2}{}}
+\citation{austin2021structured}
+\citation{Lin_2020}
+\citation{hoogeboom2021argmaxflowsmultinomialdiffusion}
+\citation{li2022diffusionlmimprovescontrollabletext}
+\citation{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}
+\citation{dai2019transformerxlattentivelanguagemodels}
+\citation{zhou2021informerefficienttransformerlong}
+\citation{wu2022autoformerdecompositiontransformersautocorrelation}
+\citation{zhou2022fedformerfrequencyenhanceddecomposed}
+\citation{nie2023patchtst}
+\citation{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}
+\citation{nist2023sp80082}
+\citation{ho2020denoising,song2021score}
+\citation{kollovieh2023tsdiff,sikder2023transfusion}
+\@writefile{toc}{\contentsline {section}{\numberline {3}Methodology}{5}{}\protected@file@percent }
+\newlabel{sec:method}{{3}{5}{}{section.3}{}}
+\citation{vaswani2017attention}
+\citation{ho2020denoising,kollovieh2023tsdiff}
+\citation{austin2021structured,shi2024simplified}
+\citation{yuan2025ctu,sha2026ddpm}
+\citation{vaswani2017attention}
+\citation{vaswani2017attention,nist2023sp80082}
+\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Masked-DDPM: Unified Synthesis for ICS traffic}}{6}{}\protected@file@percent }
+\newlabel{fig:design}{{1}{6}{}{figure.1}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.1}Transformer trend module for continuous dynamics}{6}{}\protected@file@percent }
+\newlabel{sec:method-trans}{{3.1}{6}{}{subsection.3.1}{}}
+\citation{kollovieh2023tsdiff,sikder2023transfusion}
+\citation{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}
+\citation{ho2020denoising}
+\citation{ho2020denoising,song2021score}
+\citation{kollovieh2023tsdiff,sikder2023transfusion}
+\newlabel{eq:additive_decomp}{{1}{7}{}{equation.1}{}}
+\newlabel{eq:trend_prediction}{{2}{7}{}{equation.2}{}}
+\newlabel{eq:trend_loss}{{3}{7}{}{equation.3}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.2}DDPM for continuous residual generation}{7}{}\protected@file@percent }
+\newlabel{sec:method-ddpm}{{3.2}{7}{}{subsection.3.2}{}}
+\citation{ho2020denoising,sikder2023transfusion}
+\citation{hang2023efficient}
+\citation{yuan2025ctu,sha2026ddpm}
+\citation{austin2021structured,shi2024simplified}
+\citation{nist2023sp80082}
+\citation{shi2024simplified}
+\newlabel{eq:forward_corruption}{{4}{8}{}{equation.4}{}}
+\newlabel{eq:forward_corruption_eq}{{5}{8}{}{equation.5}{}}
+\newlabel{eq:reverse_process}{{6}{8}{}{equation.6}{}}
+\newlabel{eq:ddpm_loss}{{7}{8}{}{equation.7}{}}
+\newlabel{eq:snr_loss}{{8}{8}{}{equation.8}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.3}Masked diffusion for discrete ICS variables}{8}{}\protected@file@percent }
+\newlabel{sec:method-discrete}{{3.3}{8}{}{subsection.3.3}{}}
+\citation{nist2023sp80082}
+\citation{shi2024simplified,yuan2025ctu}
+\citation{nist2023sp80082}
+\newlabel{eq:masking_process}{{9}{9}{}{equation.9}{}}
+\newlabel{eq:discrete_denoising}{{10}{9}{}{equation.10}{}}
+\newlabel{eq:discrete_loss}{{11}{9}{}{equation.11}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.4}Type-aware decomposition as factorization and routing layer}{9}{}\protected@file@percent }
+\newlabel{sec:method-types}{{3.4}{9}{}{subsection.3.4}{}}
+\citation{shi2025tabdiff,yuan2025ctu,nist2023sp80082}
+\citation{kollovieh2023tsdiff,sikder2023transfusion}
+\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Type assignment and six-type taxonomy.}}{11}{}\protected@file@percent }
+\newlabel{fig:type_taxonomy}{{2}{11}{}{figure.2}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {3.5}Joint optimization and sampling}{11}{}\protected@file@percent }
+\newlabel{sec:method-joint}{{3.5}{11}{}{subsection.3.5}{}}
+\citation{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}
+\citation{coletta2023constrained,yang2001interlock,stenger2024survey}
+\citation{lin1991divergence,yoon2019timegan}
+\@writefile{toc}{\contentsline {section}{\numberline {4}Benchmark}{12}{}\protected@file@percent }
+\newlabel{sec:benchmark}{{4}{12}{}{section.4}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Core fidelity, legality, and reproducibility}{12}{}\protected@file@percent }
+\newlabel{sec:benchmark-quant}{{4.1}{12}{}{subsection.4.1}{}}
+\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces Benchmark evidence chain.}}{13}{}\protected@file@percent }
+\newlabel{fig:benchmark_story}{{3}{13}{}{figure.3}{}}
+\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Core benchmark summary. Lower is better except for validity rate.}}{13}{}\protected@file@percent }
+\newlabel{tab:core_metrics}{{1}{13}{}{table.1}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Type-aware diagnostics}{14}{}\protected@file@percent }
+\newlabel{sec:benchmark-typed}{{4.2}{14}{}{subsection.4.2}{}}
+\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Type-aware diagnostic summary. Lower values indicate better alignment.}}{14}{}\protected@file@percent }
+\newlabel{tab:typed_diagnostics}{{2}{14}{}{table.2}{}}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.3}Ablation study}{14}{}\protected@file@percent }
+\newlabel{sec:benchmark-ablation}{{4.3}{14}{}{subsection.4.3}{}}
+\@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Ablation impact.}}{15}{}\protected@file@percent }
+\newlabel{fig:ablation_impact}{{4}{15}{}{figure.4}{}}
+\@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Ablation study. Lower is better except for anomaly AUPRC.}}{15}{}\protected@file@percent }
+\newlabel{tab:ablation}{{3}{15}{}{table.3}{}}
+\bibstyle{splncs04}
+\bibdata{references}
+\bibcite{10.1145/3055366.3055375}{1}
+\bibcite{info16100910}{2}
+\bibcite{austin2021structured}{3}
+\@writefile{toc}{\contentsline {section}{\numberline {5}Conclusion and Future Work}{16}{}\protected@file@percent }
+\newlabel{sec:conclusion}{{5}{16}{}{section.5}{}}
+\bibcite{coletta2023constrained}{4}
+\bibcite{dai2019transformerxlattentivelanguagemodels}{5}
+\bibcite{godefroid2017learnfuzzmachinelearninginput}{6}
+\bibcite{hang2023efficient}{7}
+\bibcite{ho2020denoising}{8}
+\bibcite{hoogeboom2021argmaxflowsmultinomialdiffusion}{9}
+\bibcite{jiang2023netdiffusionnetworkdataaugmentation}{10}
+\bibcite{10.1007/s10844-022-00753-1}{11}
+\bibcite{kollovieh2023tsdiff}{12}
+\bibcite{kong2021diffwaveversatilediffusionmodel}{13}
+\bibcite{pmlr-v202-kotelnikov23a}{14}
+\bibcite{li2022diffusionlmimprovescontrollabletext}{15}
+\bibcite{lin1991divergence}{16}
+\bibcite{Lin_2020}{17}
+\bibcite{liu2023pristiconditionaldiffusionframework}{18}
+\bibcite{11087622}{19}
+\bibcite{7469060}{20}
+\bibcite{meng2025aflnetyearslatercoverageguided}{21}
+\bibcite{Nankya2023-gp}{22}
+\bibcite{nist2023sp80082}{23}
+\bibcite{nie2023patchtst}{24}
+\bibcite{rasul2021autoregressivedenoisingdiffusionmodels}{25}
+\bibcite{Ring_2019}{26}
+\bibcite{sha2026ddpm}{27}
+\bibcite{she2019neuzzefficientfuzzingneural}{28}
+\bibcite{shi2024simplified}{29}
+\bibcite{shi2025tabdiff}{30}
+\bibcite{shin}{31}
+\bibcite{sikder2023transfusion}{32}
+\bibcite{song2021score}{33}
+\bibcite{stenger2024survey}{34}
+\bibcite{tashiro2021csdiconditionalscorebaseddiffusion}{35}
+\bibcite{vaswani2017attention}{36}
+\bibcite{10.1145/1151659.1159928}{37}
+\bibcite{wen2024diffstgprobabilisticspatiotemporalgraph}{38}
+\bibcite{wu2022autoformerdecompositiontransformersautocorrelation}{39}
+\bibcite{yang2001interlock}{40}
+\bibcite{10.1145/3544216.3544251}{41}
+\bibcite{yoon2019timegan}{42}
+\bibcite{yuan2025ctu}{43}
+\bibcite{zhou2021informerefficienttransformerlong}{44}
+\bibcite{zhou2022fedformerfrequencyenhanceddecomposed}{45}
+\gdef \@abspage@last{20}
--- a/LaTeX2e+Proceedings+Templates+download/main.bbl
+++ b/LaTeX2e+Proceedings+Templates+download/main.bbl
@@ -0,0 +1,308 @@
+\begin{thebibliography}{10}
+\providecommand{\url}[1]{\texttt{#1}}
+\providecommand{\urlprefix}{URL }
+\providecommand{\doi}[1]{https://doi.org/#1}
+
+\bibitem{10.1145/3055366.3055375}
+Ahmed, C.M., Palleti, V.R., Mathur, A.P.: Wadi: a water distribution testbed
+  for research in the design of secure cyber physical systems. In: Proceedings
+  of the 3rd International Workshop on Cyber-Physical Systems for Smart Water
+  Networks. p. 25–28. CySWATER '17, Association for Computing Machinery, New
+  York, NY, USA (2017). \doi{10.1145/3055366.3055375},
+  \url{https://doi.org/10.1145/3055366.3055375}
+
+\bibitem{info16100910}
+Ali, J., Ali, S., Al~Balushi, T., Nadir, Z.: Intrusion detection in industrial
+  control systems using transfer learning guided by reinforcement learning.
+  Information  \textbf{16}(10) (2025). \doi{10.3390/info16100910},
+  \url{https://www.mdpi.com/2078-2489/16/10/910}
+
+\bibitem{austin2021structured}
+Austin, J., Johnson, D.D., Ho, J., Tarlow, D., van~den Berg, R.: Structured
+  denoising diffusion models in discrete state-spaces. In: Ranzato, M.,
+  Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
+  Neural Information Processing Systems. vol.~34, pp. 17981--17993. Curran
+  Associates, Inc. (2021),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2021/file/958c530554f78bcd8e97125b70e6973d-Paper.pdf}
+
+\bibitem{coletta2023constrained}
+Coletta, A., Gopalakrishnan, S., Borrajo, D., Vyetrenko, S.: On the constrained
+  time-series generation problem. In: Oh, A., Naumann, T., Globerson, A.,
+  Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information
+  Processing Systems. vol.~36, pp. 61048--61059. Curran Associates, Inc.
+  (2023),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2023/file/bfb6a69c0d9e2bc596e1cd31f16fcdde-Paper-Conference.pdf}
+
+\bibitem{dai2019transformerxlattentivelanguagemodels}
+Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.:
+  Transformer-{XL}: Attentive language models beyond a fixed-length context.
+  In: Korhonen, A., Traum, D., M{\`a}rquez, L. (eds.) Proceedings of the 57th
+  Annual Meeting of the Association for Computational Linguistics. pp.
+  2978--2988. Association for Computational Linguistics, Florence, Italy (Jul
+  2019). \doi{10.18653/v1/P19-1285}, \url{https://aclanthology.org/P19-1285/}
+
+\bibitem{godefroid2017learnfuzzmachinelearninginput}
+Godefroid, P., Peleg, H., Singh, R.: Learn\&fuzz: Machine learning for input
+  fuzzing. In: 2017 32nd IEEE/ACM International Conference on Automated
+  Software Engineering (ASE). pp. 50--59 (2017). \doi{10.1109/ASE.2017.8115618}
+
+\bibitem{hang2023efficient}
+Hang, T., Gu, S., Li, C., Bao, J., Chen, D., Hu, H., Geng, X., Guo, B.:
+  Efficient diffusion training via min-snr weighting strategy. In: Proceedings
+  of the IEEE/CVF International Conference on Computer Vision (ICCV). pp.
+  7441--7451 (October 2023)
+
+\bibitem{ho2020denoising}
+Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In:
+  Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances
+  in Neural Information Processing Systems. vol.~33, pp. 6840--6851. Curran
+  Associates, Inc. (2020),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf}
+
+\bibitem{hoogeboom2021argmaxflowsmultinomialdiffusion}
+Hoogeboom, E., Nielsen, D., Jaini, P., Forr\'{e}, P., Welling, M.: Argmax flows
+  and multinomial diffusion: Learning categorical distributions. In: Ranzato,
+  M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
+  Neural Information Processing Systems. vol.~34, pp. 12454--12465. Curran
+  Associates, Inc. (2021),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2021/file/67d96d458abdef21792e6d8e590244e7-Paper.pdf}
+
+\bibitem{jiang2023netdiffusionnetworkdataaugmentation}
+Jiang, X., Liu, S., Gember-Jacobson, A., Bhagoji, A.N., Schmitt, P., Bronzino,
+  F., Feamster, N.: Netdiffusion: Network data augmentation through
+  protocol-constrained traffic generation. Proc. ACM Meas. Anal. Comput. Syst.
+  \textbf{8}(1) (Feb 2024). \doi{10.1145/3639037},
+  \url{https://doi.org/10.1145/3639037}
+
+\bibitem{10.1007/s10844-022-00753-1}
+Koay, A.M., Ko, R.K.L., Hettema, H., Radke, K.: Machine learning in industrial
+  control system (ics) security: current landscape, opportunities and
+  challenges. Journal of Intelligent Information Systems  \textbf{60}(2),
+  377--405 (2023)
+
+\bibitem{kollovieh2023tsdiff}
+Kollovieh, M., Ansari, A.F., Bohlke-Schneider, M., Zschiegner, J., Wang, H.,
+  Wang, Y.B.: Predict, refine, synthesize: Self-guiding diffusion models for
+  probabilistic time series forecasting. In: Oh, A., Naumann, T., Globerson,
+  A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information
+  Processing Systems. vol.~36, pp. 28341--28364. Curran Associates, Inc.
+  (2023),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2023/file/5a1a10c2c2c9b9af1514687bc24b8f3d-Paper-Conference.pdf}
+
+\bibitem{kong2021diffwaveversatilediffusionmodel}
+Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: Diffwave: A versatile
+  diffusion model for audio synthesis (2021),
+  \url{https://arxiv.org/abs/2009.09761}
+
+\bibitem{pmlr-v202-kotelnikov23a}
+Kotelnikov, A., Baranchuk, D., Rubachev, I., Babenko, A.: {T}ab{DDPM}:
+  Modelling tabular data with diffusion models. In: Krause, A., Brunskill, E.,
+  Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the
+  40th International Conference on Machine Learning. Proceedings of Machine
+  Learning Research, vol.~202, pp. 17564--17579. PMLR (23--29 Jul 2023),
+  \url{https://proceedings.mlr.press/v202/kotelnikov23a.html}
+
+\bibitem{li2022diffusionlmimprovescontrollabletext}
+Li, X., Thickstun, J., Gulrajani, I., Liang, P.S., Hashimoto, T.B.:
+  Diffusion-lm improves controllable text generation. In: Koyejo, S., Mohamed,
+  S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural
+  Information Processing Systems. vol.~35, pp. 4328--4343. Curran Associates,
+  Inc. (2022),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2022/file/1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf}
+
+\bibitem{lin1991divergence}
+Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on
+  Information Theory  \textbf{37}(1),  145--151 (1991). \doi{10.1109/18.61115}
+
+\bibitem{Lin_2020}
+Lin, Z., Jain, A., Wang, C., Fanti, G., Sekar, V.: Using gans for sharing
+  networked time series data: Challenges, initial promise, and open questions.
+  In: Proceedings of the ACM Internet Measurement Conference. p. 464–483. IMC
+  '20, Association for Computing Machinery, New York, NY, USA (2020).
+  \doi{10.1145/3419394.3423643}, \url{https://doi.org/10.1145/3419394.3423643}
+
+\bibitem{liu2023pristiconditionaldiffusionframework}
+Liu, M., Huang, H., Feng, H., Sun, L., Du, B., Fu, Y.: Pristi: A conditional
+  diffusion framework for spatiotemporal imputation. In: 2023 IEEE 39th
+  International Conference on Data Engineering (ICDE). pp. 1927--1939 (2023).
+  \doi{10.1109/ICDE55515.2023.00150}
+
+\bibitem{11087622}
+Liu, X., Xu, X., Liu, Z., Li, Z., Wu, K.: Spatio-temporal diffusion model for
+  cellular traffic generation. IEEE Transactions on Mobile Computing
+  \textbf{25}(1),  257--271 (2026). \doi{10.1109/TMC.2025.3591183}
+
+\bibitem{7469060}
+Mathur, A.P., Tippenhauer, N.O.: Swat: a water treatment testbed for research
+  and training on ics security. In: 2016 International Workshop on
+  Cyber-physical Systems for Smart Water Networks (CySWater). pp. 31--36
+  (2016). \doi{10.1109/CySWater.2016.7469060}
+
+\bibitem{meng2025aflnetyearslatercoverageguided}
+Meng, R., Pham, V.T., Böhme, M., Roychoudhury, A.: Aflnet five years later: On
+  coverage-guided protocol fuzzing. IEEE Transactions on Software Engineering
+  \textbf{51}(4),  960--974 (2025). \doi{10.1109/TSE.2025.3535925}
+
+\bibitem{Nankya2023-gp}
+Nankya, M., Chataut, R., Akl, R.: Securing industrial control systems:
+  Components, cyber threats, and machine learning-driven defense strategies.
+  Sensors  \textbf{23}(21) (2023). \doi{10.3390/s23218840},
+  \url{https://www.mdpi.com/1424-8220/23/21/8840}
+
+\bibitem{nist2023sp80082}
+{National Institute of Standards and Technology}: Guide to operational
+  technology (ot) security. Special Publication 800-82 Rev. 3, NIST (sep 2023).
+  \doi{10.6028/NIST.SP.800-82r3},
+  \url{https://csrc.nist.gov/pubs/sp/800/82/r3/final}
+
+\bibitem{nie2023patchtst}
+Nie, Y., Nguyen, N.H., Sinthong, P., Kalagnanam, J.: A time series is worth 64
+  words: Long-term forecasting with transformers. In: International Conference
+  on Learning Representations (ICLR) (2023),
+  \url{https://arxiv.org/abs/2211.14730}
+
+\bibitem{rasul2021autoregressivedenoisingdiffusionmodels}
+Rasul, K., Seward, C., Schuster, I., Vollgraf, R.: Autoregressive denoising
+  diffusion models for multivariate probabilistic time series forecasting. In:
+  Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference
+  on Machine Learning. Proceedings of Machine Learning Research, vol.~139, pp.
+  8857--8868. PMLR (18--24 Jul 2021),
+  \url{https://proceedings.mlr.press/v139/rasul21a.html}
+
+\bibitem{Ring_2019}
+Ring, M., Schlör, D., Landes, D., Hotho, A.: Flow-based network traffic
+  generation using generative adversarial networks. Computers \& Security
+  \textbf{82},  156--172 (2019).
+  \doi{https://doi.org/10.1016/j.cose.2018.12.012},
+  \url{https://www.sciencedirect.com/science/article/pii/S0167404818308393}
+
+\bibitem{sha2026ddpm}
+Sha, Y., Yuan, Y., Wu, Y., Zhao, H.: Ddpm fusing mamba and adaptive attention:
+  An augmentation method for industrial control systems anomaly data (jan
+  2026). \doi{10.2139/ssrn.6055903},
+  \url{https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055903}, sSRN
+  Electronic Journal
+
+\bibitem{she2019neuzzefficientfuzzingneural}
+She, D., Pei, K., Epstein, D., Yang, J., Ray, B., Jana, S.: Neuzz: Efficient
+  fuzzing with neural program smoothing. In: 2019 IEEE Symposium on Security
+  and Privacy (SP). pp. 803--817 (2019). \doi{10.1109/SP.2019.00052}
+
+\bibitem{shi2024simplified}
+Shi, J., Han, K., Wang, Z., Doucet, A., Titsias, M.: Simplified and generalized
+  masked diffusion for discrete data. In: Globerson, A., Mackey, L., Belgrave,
+  D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural
+  Information Processing Systems. vol.~37, pp. 103131--103167. Curran
+  Associates, Inc. (2024). \doi{10.52202/079017-3277},
+  \url{https://proceedings.neurips.cc/paper_files/paper/2024/file/bad233b9849f019aead5e5cc60cef70f-Paper-Conference.pdf}
+
+\bibitem{shi2025tabdiff}
+Shi, J., Xu, M., Hua, H., Zhang, H., Ermon, S., Leskovec, J.: Tabdiff: a
+  mixed-type diffusion model for tabular data generation (2025),
+  \url{https://arxiv.org/abs/2410.20626}
+
+\bibitem{shin}
+Shin, H.K., Lee, W., Choi, S., Yun, J.H., Min, B.G., Kim, H.: Hai security
+  dataset (2023). \doi{10.34740/kaggle/dsv/5821622},
+  \url{https://www.kaggle.com/dsv/5821622}
+
+\bibitem{sikder2023transfusion}
+Sikder, M.F., Ramachandranpillai, R., Heintz, F.: Transfusion: Generating long,
+  high fidelity time series using diffusion models with transformers. Machine
+  Learning with Applications  \textbf{20},  100652 (2025).
+  \doi{https://doi.org/10.1016/j.mlwa.2025.100652},
+  \url{https://www.sciencedirect.com/science/article/pii/S2666827025000350}
+
+\bibitem{song2021score}
+Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.:
+  Score-based generative modeling through stochastic differential equations
+  (2021), \url{https://arxiv.org/abs/2011.13456}
+
+\bibitem{stenger2024survey}
+Stenger, M., Leppich, R., Foster, I.T., Kounev, S., Bauer, A.: Evaluation is
+  key: a survey on evaluation measures for synthetic time series. Journal of
+  Big Data  \textbf{11}(1), ~66 (2024)
+
+\bibitem{tashiro2021csdiconditionalscorebaseddiffusion}
+Tashiro, Y., Song, J., Song, Y., Ermon, S.: Csdi: Conditional score-based
+  diffusion models for probabilistic time series imputation. In: Ranzato, M.,
+  Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
+  Neural Information Processing Systems. vol.~34, pp. 24804--24816. Curran
+  Associates, Inc. (2021),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2021/file/cfe8504bda37b575c70ee1a8276f3486-Paper.pdf}
+
+\bibitem{vaswani2017attention}
+Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
+  Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I.,
+  Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S.,
+  Garnett, R. (eds.) Advances in Neural Information Processing Systems.
+  vol.~30. Curran Associates, Inc. (2017),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf}
+
+\bibitem{10.1145/1151659.1159928}
+Vishwanath, K.V., Vahdat, A.: Realistic and responsive network traffic
+  generation. SIGCOMM Comput. Commun. Rev.  \textbf{36}(4),  111–122 (Aug
+  2006). \doi{10.1145/1151659.1159928},
+  \url{https://doi.org/10.1145/1151659.1159928}
+
+\bibitem{wen2024diffstgprobabilisticspatiotemporalgraph}
+Wen, H., Lin, Y., Xia, Y., Wan, H., Wen, Q., Zimmermann, R., Liang, Y.:
+  Diffstg: Probabilistic spatio-temporal graph forecasting with denoising
+  diffusion models. In: Proceedings of the 31st ACM International Conference on
+  Advances in Geographic Information Systems. SIGSPATIAL '23, Association for
+  Computing Machinery, New York, NY, USA (2023). \doi{10.1145/3589132.3625614},
+  \url{https://doi.org/10.1145/3589132.3625614}
+
+\bibitem{wu2022autoformerdecompositiontransformersautocorrelation}
+Wu, H., Xu, J., Wang, J., Long, M.: Autoformer: Decomposition transformers with
+  auto-correlation for long-term series forecasting. In: Ranzato, M.,
+  Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
+  Neural Information Processing Systems. vol.~34, pp. 22419--22430. Curran
+  Associates, Inc. (2021),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2021/file/bcc0d400288793e8bdcd7c19a8ac0c2b-Paper.pdf}
+
+\bibitem{yang2001interlock}
+Yang, S., Tan, L., He, C.: Automatic verification of safety interlock systems
+  for industrial processes. Journal of Loss Prevention in the Process
+  Industries  \textbf{14}(5),  379--386 (2001).
+  \doi{https://doi.org/10.1016/S0950-4230(01)00014-6},
+  \url{https://www.sciencedirect.com/science/article/pii/S0950423001000146}
+
+\bibitem{10.1145/3544216.3544251}
+Yin, Y., Lin, Z., Jin, M., Fanti, G., Sekar, V.: Practical gan-based synthetic
+  ip header trace generation using netshare. In: Proceedings of the ACM SIGCOMM
+  2022 Conference. p. 458–472. SIGCOMM '22, Association for Computing
+  Machinery, New York, NY, USA (2022). \doi{10.1145/3544216.3544251},
+  \url{https://doi.org/10.1145/3544216.3544251}
+
+\bibitem{yoon2019timegan}
+Yoon, J., Jarrett, D., van~der Schaar, M.: Time-series generative adversarial
+  networks. In: Wallach, H., Larochelle, H., Beygelzimer, A., d\textquotesingle
+  Alch\'{e}-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information
+  Processing Systems. vol.~32. Curran Associates, Inc. (2019),
+  \url{https://proceedings.neurips.cc/paper_files/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf}
+
+\bibitem{yuan2025ctu}
+Yuan, Y., Sha, Y., Zhao, H.: Ctu-ddpm: Generating industrial control system
+  time-series data with a cnn-transformer hybrid diffusion model. In:
+  Proceedings of the 2025 International Symposium on Artificial Intelligence
+  and Computational Social Sciences. p. 547–552. AICSS '25, Association for
+  Computing Machinery, New York, NY, USA (2025). \doi{10.1145/3776759.3776845},
+  \url{https://doi.org/10.1145/3776759.3776845}
+
+\bibitem{zhou2021informerefficienttransformerlong}
+Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.:
+  Informer: Beyond efficient transformer for long sequence time-series
+  forecasting. Proceedings of the AAAI Conference on Artificial Intelligence
+  \textbf{35}(12),  11106--11115 (May 2021). \doi{10.1609/aaai.v35i12.17325},
+  \url{https://ojs.aaai.org/index.php/AAAI/article/view/17325}
+
+\bibitem{zhou2022fedformerfrequencyenhanceddecomposed}
+Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., Jin, R.: {FED}former: Frequency
+  enhanced decomposed transformer for long-term series forecasting. In:
+  Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S.
+  (eds.) Proceedings of the 39th International Conference on Machine Learning.
+  Proceedings of Machine Learning Research, vol.~162, pp. 27268--27286. PMLR
+  (17--23 Jul 2022), \url{https://proceedings.mlr.press/v162/zhou22g.html}
+
+\end{thebibliography}
--- a/LaTeX2e+Proceedings+Templates+download/main.blg
+++ b/LaTeX2e+Proceedings+Templates+download/main.blg
@@ -0,0 +1,48 @@
+This is BibTeX, Version 0.99e
+Capacity: max_strings=200000, hash_size=200000, hash_prime=170003
+The top-level auxiliary file: main.aux
+Reallocating 'name_of_file' (item size: 1) to 9 items.
+The style file: splncs04.bst
+Reallocating 'name_of_file' (item size: 1) to 11 items.
+Database file #1: references.bib
+You've used 45 entries,
+            2850 wiz_defined-function locations,
+            919 strings with 18435 characters,
+and the built_in function-call counts, 34400 in all, are:
+= -- 2785
+> -- 1365
+< -- 60
+ -- 534
+- -- 488
+* -- 2411
+:= -- 4311
+add.period$ -- 109
+call.type$ -- 45
+change.case$ -- 392
+chr.to.int$ -- 0
+cite$ -- 45
+duplicate$ -- 2949
+empty$ -- 2813
+format.name$ -- 546
+if$ -- 7511
+int.to.chr$ -- 0
+int.to.str$ -- 45
+missing$ -- 645
+newline$ -- 141
+num.names$ -- 120
+pop$ -- 1153
+preamble$ -- 1
+purify$ -- 294
+quote$ -- 0
+skip$ -- 902
+stack$ -- 0
+substring$ -- 1833
+swap$ -- 1830
+text.length$ -- 60
+text.prefix$ -- 0
+top$ -- 0
+type$ -- 180
+warning$ -- 0
+while$ -- 224
+width$ -- 47
+write$ -- 561
--- a/LaTeX2e+Proceedings+Templates+download/main.log
+++ b/LaTeX2e+Proceedings+Templates+download/main.log
@@ -0,0 +1,752 @@
+This is pdfTeX, Version 3.141592653-2.6-1.40.28 (MiKTeX 25.12) (preloaded format=pdflatex 2026.4.14)  21 APR 2026 00:15
+entering extended mode
+ restricted \write18 enabled.
+ %&-line parsing enabled.
+**./main.tex
+(main.tex
+LaTeX2e <2025-11-01>
+L3 programming layer <2025-12-24>
+(llncs.cls
+Document Class: llncs 2024/01/29 v2.24 
+ LaTeX document class for Lecture Notes in Computer Science
+(D:\MikTex\tex/latex/base\article.cls
+Document Class: article 2025/01/22 v1.4n Standard LaTeX document class
+(D:\MikTex\tex/latex/base\size10.clo
+File: size10.clo 2025/01/22 v1.4n Standard LaTeX file (size option)
+)
+\c@part=\count275
+\c@section=\count276
+\c@subsection=\count277
+\c@subsubsection=\count278
+\c@paragraph=\count279
+\c@subparagraph=\count280
+\c@figure=\count281
+\c@table=\count282
+\abovecaptionskip=\skip49
+\belowcaptionskip=\skip50
+\bibindent=\dimen148
+) (D:\MikTex\tex/latex/tools\multicol.sty
+Package: multicol 2025/10/21 v2.0b multicolumn formatting (FMi)
+\c@tracingmulticols=\count283
+\mult@box=\box53
+\multicol@leftmargin=\dimen149
+\c@unbalance=\count284
+\c@collectmore=\count285
+\doublecol@number=\count286
+\multicoltolerance=\count287
+\multicolpretolerance=\count288
+\full@width=\dimen150
+\page@free=\dimen151
+\premulticols=\dimen152
+\postmulticols=\dimen153
+\multicolsep=\skip51
+\multicolbaselineskip=\skip52
+\partial@page=\box54
+\last@line=\box55
+\mc@boxedresult=\box56
+\maxbalancingoverflow=\dimen154
+\mult@rightbox=\box57
+\mult@grightbox=\box58
+\mult@firstbox=\box59
+\mult@gfirstbox=\box60
+\@tempa=\box61
+\@tempa=\box62
+\@tempa=\box63
+\@tempa=\box64
+\@tempa=\box65
+\@tempa=\box66
+\@tempa=\box67
+\@tempa=\box68
+\@tempa=\box69
+\@tempa=\box70
+\@tempa=\box71
+\@tempa=\box72
+\@tempa=\box73
+\@tempa=\box74
+\@tempa=\box75
+\@tempa=\box76
+\@tempa=\box77
+\@tempa=\box78
+\@tempa=\box79
+\@tempa=\box80
+\@tempa=\box81
+\@tempa=\box82
+\@tempa=\box83
+\@tempa=\box84
+\@tempa=\box85
+\@tempa=\box86
+\@tempa=\box87
+\@tempa=\box88
+\@tempa=\box89
+\@tempa=\box90
+\@tempa=\box91
+\@tempa=\box92
+\@tempa=\box93
+\@tempa=\box94
+\@tempa=\box95
+\@tempa=\box96
+\c@minrows=\count289
+\c@columnbadness=\count290
+\c@finalcolumnbadness=\count291
+\last@try=\dimen155
+\multicolovershoot=\dimen156
+\multicolundershoot=\dimen157
+\mult@nat@firstbox=\box97
+\colbreak@box=\box98
+\mc@col@check@num=\count292
+\g__mc_curr_col_int=\count293
+) (D:\MikTex\tex/latex/oberdiek\aliascnt.sty
+Package: aliascnt 2018/09/07 v1.5 Alias counters (HO)
+)
+\c@chapter=\count294
+LaTeX Font Info:    Redeclaring math symbol \Gamma on input line 379.
+LaTeX Font Info:    Redeclaring math symbol \Delta on input line 380.
+LaTeX Font Info:    Redeclaring math symbol \Theta on input line 381.
+LaTeX Font Info:    Redeclaring math symbol \Lambda on input line 382.
+LaTeX Font Info:    Redeclaring math symbol \Xi on input line 383.
+LaTeX Font Info:    Redeclaring math symbol \Pi on input line 384.
+LaTeX Font Info:    Redeclaring math symbol \Sigma on input line 385.
+LaTeX Font Info:    Redeclaring math symbol \Upsilon on input line 386.
+LaTeX Font Info:    Redeclaring math symbol \Phi on input line 387.
+LaTeX Font Info:    Redeclaring math symbol \Psi on input line 388.
+LaTeX Font Info:    Redeclaring math symbol \Omega on input line 389.
+LaTeX Info: Redefining \vec on input line 394.
+\tocchpnum=\dimen158
+\tocsecnum=\dimen159
+\tocsectotal=\dimen160
+\tocsubsecnum=\dimen161
+\tocsubsectotal=\dimen162
+\tocsubsubsecnum=\dimen163
+\tocsubsubsectotal=\dimen164
+\tocparanum=\dimen165
+\tocparatotal=\dimen166
+\tocsubparanum=\dimen167
+\@tempcntc=\count295
+\fnindent=\dimen168
+\c@@inst=\count296
+\c@@auth=\count297
+\c@auco=\count298
+\instindent=\dimen169
+\authrun=\box99
+\authorrunning=\toks17
+\tocauthor=\toks18
+\titrun=\box100
+\titlerunning=\toks19
+\toctitle=\toks20
+\c@theorem=\count299
+\c@case=\count300
+\c@conjecture=\count301
+\c@corollary=\count302
+\c@definition=\count303
+\c@example=\count304
+\c@exercise=\count305
+\c@lemma=\count306
+\c@note=\count307
+\c@problem=\count308
+\c@property=\count309
+\c@proposition=\count310
+\c@question=\count311
+\c@solution=\count312
+\c@remark=\count313
+\headlineindent=\dimen170
+)
+(D:\MikTex\tex/latex/base\fontenc.sty
+Package: fontenc 2025/07/18 v2.1d Standard LaTeX package
+) (D:\MikTex\tex/latex/lm\lmodern.sty
+Package: lmodern 2015/05/01 v1.6.1 Latin Modern Fonts
+LaTeX Font Info:    Overwriting symbol font `operators' in version `normal'
+(Font)                  OT1/cmr/m/n --> OT1/lmr/m/n on input line 22.
+LaTeX Font Info:    Overwriting symbol font `letters' in version `normal'
+(Font)                  OML/cmm/m/it --> OML/lmm/m/it on input line 23.
+LaTeX Font Info:    Overwriting symbol font `symbols' in version `normal'
+(Font)                  OMS/cmsy/m/n --> OMS/lmsy/m/n on input line 24.
+LaTeX Font Info:    Overwriting symbol font `largesymbols' in version `normal'
+(Font)                  OMX/cmex/m/n --> OMX/lmex/m/n on input line 25.
+LaTeX Font Info:    Overwriting symbol font `operators' in version `bold'
+(Font)                  OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 26.
+LaTeX Font Info:    Overwriting symbol font `letters' in version `bold'
+(Font)                  OML/cmm/b/it --> OML/lmm/b/it on input line 27.
+LaTeX Font Info:    Overwriting symbol font `symbols' in version `bold'
+(Font)                  OMS/cmsy/b/n --> OMS/lmsy/b/n on input line 28.
+LaTeX Font Info:    Overwriting symbol font `largesymbols' in version `bold'
+(Font)                  OMX/cmex/m/n --> OMX/lmex/m/n on input line 29.
+LaTeX Font Info:    Overwriting math alphabet `\mathbf' in version `normal'
+(Font)                  OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 31.
+LaTeX Font Info:    Overwriting math alphabet `\mathsf' in version `normal'
+(Font)                  OT1/cmss/m/n --> OT1/lmss/m/n on input line 32.
+LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `normal'
+(Font)                  OT1/cmr/m/it --> OT1/lmr/m/it on input line 33.
+LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `normal'
+(Font)                  OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 34.
+LaTeX Font Info:    Overwriting math alphabet `\mathbf' in version `bold'
+(Font)                  OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 35.
+LaTeX Font Info:    Overwriting math alphabet `\mathsf' in version `bold'
+(Font)                  OT1/cmss/bx/n --> OT1/lmss/bx/n on input line 36.
+LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `bold'
+(Font)                  OT1/cmr/bx/it --> OT1/lmr/bx/it on input line 37.
+LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `bold'
+(Font)                  OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 38.
+)
+(D:\MikTex\tex/latex/graphics\graphicx.sty
+Package: graphicx 2024/12/31 v1.2e Enhanced LaTeX Graphics (DPC,SPQR)
+
+(D:\MikTex\tex/latex/graphics\keyval.sty
+Package: keyval 2022/05/29 v1.15 key=value parser (DPC)
+\KV@toks@=\toks21
+)
+(D:\MikTex\tex/latex/graphics\graphics.sty
+Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR)
+
+(D:\MikTex\tex/latex/graphics\trig.sty
+Package: trig 2023/12/02 v1.11 sin cos tan (DPC)
+)
+(D:\MikTex\tex/latex/graphics-cfg\graphics.cfg
+File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
+)
+Package graphics Info: Driver file: pdftex.def on input line 106.
+
+(D:\MikTex\tex/latex/graphics-def\pdftex.def
+File: pdftex.def 2025/09/29 v1.2d Graphics/color driver for pdftex
+))
+\Gin@req@height=\dimen171
+\Gin@req@width=\dimen172
+)
+(D:\MikTex\tex/latex/amsmath\amsmath.sty
+Package: amsmath 2025/07/09 v2.17z AMS math features
+\@mathmargin=\skip53
+
+For additional information on amsmath, use the `?' option.
+(D:\MikTex\tex/latex/amsmath\amstext.sty
+Package: amstext 2024/11/17 v2.01 AMS text
+
+(D:\MikTex\tex/latex/amsmath\amsgen.sty
+File: amsgen.sty 1999/11/30 v2.0 generic functions
+\@emptytoks=\toks22
+\ex@=\dimen173
+))
+(D:\MikTex\tex/latex/amsmath\amsbsy.sty
+Package: amsbsy 1999/11/29 v1.2d Bold Symbols
+\pmbraise@=\dimen174
+)
+(D:\MikTex\tex/latex/amsmath\amsopn.sty
+Package: amsopn 2022/04/08 v2.04 operator names
+)
+\inf@bad=\count314
+LaTeX Info: Redefining \frac on input line 233.
+\uproot@=\count315
+\leftroot@=\count316
+LaTeX Info: Redefining \overline on input line 398.
+LaTeX Info: Redefining \colon on input line 409.
+\classnum@=\count317
+\DOTSCASE@=\count318
+LaTeX Info: Redefining \ldots on input line 495.
+LaTeX Info: Redefining \dots on input line 498.
+LaTeX Info: Redefining \cdots on input line 619.
+\Mathstrutbox@=\box101
+\strutbox@=\box102
+LaTeX Info: Redefining \big on input line 721.
+LaTeX Info: Redefining \Big on input line 722.
+LaTeX Info: Redefining \bigg on input line 723.
+LaTeX Info: Redefining \Bigg on input line 724.
+\big@size=\dimen175
+LaTeX Font Info:    Redeclaring font encoding OML on input line 742.
+LaTeX Font Info:    Redeclaring font encoding OMS on input line 743.
+
+
+Package amsmath Warning: Unable to redefine math accent \vec.
+
+\macc@depth=\count319
+LaTeX Info: Redefining \bmod on input line 904.
+LaTeX Info: Redefining \pmod on input line 909.
+LaTeX Info: Redefining \smash on input line 939.
+LaTeX Info: Redefining \relbar on input line 969.
+LaTeX Info: Redefining \Relbar on input line 970.
+\c@MaxMatrixCols=\count320
+\dotsspace@=\muskip17
+\c@parentequation=\count321
+\dspbrk@lvl=\count322
+\tag@help=\toks23
+\row@=\count323
+\column@=\count324
+\maxfields@=\count325
+\andhelp@=\toks24
+\eqnshift@=\dimen176
+\alignsep@=\dimen177
+\tagshift@=\dimen178
+\tagwidth@=\dimen179
+\totwidth@=\dimen180
+\lineht@=\dimen181
+\@envbody=\toks25
+\multlinegap=\skip54
+\multlinetaggap=\skip55
+\mathdisplay@stack=\toks26
+LaTeX Info: Redefining \[ on input line 2950.
+LaTeX Info: Redefining \] on input line 2951.
+) (D:\MikTex\tex/latex/amsfonts\amssymb.sty
+Package: amssymb 2013/01/14 v3.01 AMS font symbols
+
+(D:\MikTex\tex/latex/amsfonts\amsfonts.sty
+Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
+\symAMSa=\mathgroup4
+\symAMSb=\mathgroup5
+LaTeX Font Info:    Redeclaring math symbol \hbar on input line 98.
+LaTeX Font Info:    Overwriting math alphabet `\mathfrak' in version `bold'
+(Font)                  U/euf/m/n --> U/euf/b/n on input line 106.
+)) (D:\MikTex\tex/latex/tools\bm.sty
+Package: bm 2025/10/21 v1.2g Bold Symbol Support (DPC/FMi)
+\symboldoperators=\mathgroup6
+\symboldletters=\mathgroup7
+\symboldsymbols=\mathgroup8
+Package bm Info: No bold for \OMX/lmex/m/n, using \pmb.
+Package bm Info: No bold for \U/msa/m/n, using \pmb.
+Package bm Info: No bold for \U/msb/m/n, using \pmb.
+LaTeX Font Info:    Redeclaring math alphabet \mathbf on input line 149.
+) (D:\MikTex\tex/latex/tools\array.sty
+Package: array 2025/09/25 v2.6n Tabular extension package (FMi)
+\col@sep=\dimen182
+\ar@mcellbox=\box103
+\extrarowheight=\dimen183
+\NC@list=\toks27
+\extratabsurround=\skip56
+\backup@length=\skip57
+\ar@cellbox=\box104
+)
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/booktabs\booktabs.sty
+Package: booktabs 2020/01/12 v1.61803398 Publication quality tables
+\heavyrulewidth=\dimen184
+\lightrulewidth=\dimen185
+\cmidrulewidth=\dimen186
+\belowrulesep=\dimen187
+\belowbottomsep=\dimen188
+\aboverulesep=\dimen189
+\abovetopsep=\dimen190
+\cmidrulesep=\dimen191
+\cmidrulekern=\dimen192
+\defaultaddspace=\dimen193
+\@cmidla=\count326
+\@cmidlb=\count327
+\@aboverulesep=\dimen194
+\@belowrulesep=\dimen195
+\@thisruleclass=\count328
+\@lastruleclass=\count329
+\@thisrulewidth=\dimen196
+)
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\microtype.sty
+Package: microtype 2026/03/01 v3.2d Micro-typographical refinements (RS)
+
+(D:\MikTex\tex/latex/etoolbox\etoolbox.sty
+Package: etoolbox 2025/10/02 v2.5m e-TeX tools for LaTeX (JAW)
+\etb@tempcnta=\count330
+)
+\MT@toks=\toks28
+\MT@tempbox=\box105
+\MT@count=\count331
+LaTeX Info: Redefining \noprotrusionifhmode on input line 1084.
+LaTeX Info: Redefining \leftprotrusion on input line 1085.
+\MT@prot@toks=\toks29
+LaTeX Info: Redefining \rightprotrusion on input line 1104.
+LaTeX Info: Redefining \textls on input line 1449.
+\MT@outer@kern=\dimen197
+LaTeX Info: Redefining \microtypecontext on input line 2053.
+LaTeX Info: Redefining \textmicrotypecontext on input line 2070.
+\MT@listname@count=\count332
+
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\microtype-pdftex
+.def
+File: microtype-pdftex.def 2026/03/01 v3.2d Definitions specific to pdftex (RS)
+
+LaTeX Info: Redefining \lsstyle on input line 944.
+LaTeX Info: Redefining \lslig on input line 944.
+\MT@outer@space=\skip58
+)
+Package microtype Info: Loading configuration file microtype.cfg.
+
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\microtype.cfg
+File: microtype.cfg 2026/03/01 v3.2d microtype main configuration file (RS)
+)
+LaTeX Info: Redefining \microtypesetup on input line 3065.
+)
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/float\float.sty
+Package: float 2001/11/08 v1.3d Float enhancements (AL)
+\c@float@type=\count333
+\float@exts=\toks30
+\float@box=\box106
+\@float@everytoks=\toks31
+\@floatcapt=\box107
+)
+(D:\MikTex\tex/latex/url\url.sty
+\Urlmuskip=\muskip18
+Package: url 2013/09/16  ver 3.4  Verb mode for urls, etc.
+)
+LaTeX Font Info:    Trying to load font information for T1+lmr on input line 30
+.
+ (D:\MikTex\tex/latex/lm\t1lmr.fd
+File: t1lmr.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+(D:\MikTex\tex/latex/l3backend\l3backend-pdftex.def
+File: l3backend-pdftex.def 2025-10-09 L3 backend support: PDF output (pdfTeX)
+\l__color_backend_stack_int=\count334
+) (main.aux)
+\openout1 = `main.aux'.
+
+LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+LaTeX Font Info:    Checking defaults for TS1/cmr/m/n on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 30.
+LaTeX Font Info:    ... okay on input line 30.
+
+(D:\MikTex\tex/context/base/mkii\supp-pdf.mkii
+[Loading MPS to PDF converter (version 2006.09.02).]
+\scratchcounter=\count335
+\scratchdimen=\dimen198
+\scratchbox=\box108
+\nofMPsegments=\count336
+\nofMParguments=\count337
+\everyMPshowfont=\toks32
+\MPscratchCnt=\count338
+\MPscratchDim=\dimen199
+\MPnumerator=\count339
+\makeMPintoPDFobject=\count340
+\everyMPtoPDFconversion=\toks33
+) (D:\MikTex\tex/latex/epstopdf-pkg\epstopdf-base.sty
+Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf
+Package epstopdf-base Info: Redefining graphics rule for `.eps' on input line 4
+85.
+
+(D:\MikTex\tex/latex/00miktex\epstopdf-sys.cfg
+File: epstopdf-sys.cfg 2021/03/18 v2.0 Configuration of epstopdf for MiKTeX
+))
+LaTeX Info: Redefining \microtypecontext on input line 30.
+Package microtype Info: Applying patch `item' on input line 30.
+Package microtype Info: Applying patch `toc' on input line 30.
+Package microtype Info: Applying patch `eqnum' on input line 30.
+Package microtype Info: Applying patch `footnote' on input line 30.
+Package microtype Info: Applying patch `verbatim' on input line 30.
+LaTeX Info: Redefining \microtypesetup on input line 30.
+Package microtype Info: Generating PDF output.
+Package microtype Info: Character protrusion enabled (level 2).
+Package microtype Info: Using default protrusion set `alltext'.
+Package microtype Info: No font expansion.
+Package microtype Info: No adjustment of tracking.
+Package microtype Info: No adjustment of interword spacing.
+Package microtype Info: No adjustment of character kerning.
+
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\mt-cmr.cfg
+File: mt-cmr.cfg 2013/05/19 v2.2 microtype config. file: Computer Modern Roman 
+(RS)
+)
+LaTeX Font Info:    Trying to load font information for OT1+lmr on input line 3
+1.
+
+(D:\MikTex\tex/latex/lm\ot1lmr.fd
+File: ot1lmr.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+LaTeX Font Info:    Trying to load font information for OML+lmm on input line 3
+1.
+ (D:\MikTex\tex/latex/lm\omllmm.fd
+File: omllmm.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+LaTeX Font Info:    Trying to load font information for OMS+lmsy on input line 
+31.
+
+(D:\MikTex\tex/latex/lm\omslmsy.fd
+File: omslmsy.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+LaTeX Font Info:    Trying to load font information for OMX+lmex on input line 
+31.
+ (D:\MikTex\tex/latex/lm\omxlmex.fd
+File: omxlmex.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+LaTeX Font Info:    External font `lmex10' loaded for size
+(Font)              <10> on input line 31.
+LaTeX Font Info:    External font `lmex10' loaded for size
+(Font)              <7> on input line 31.
+LaTeX Font Info:    External font `lmex10' loaded for size
+(Font)              <5> on input line 31.
+LaTeX Font Info:    Trying to load font information for U+msa on input line 31.
+
+
+(D:\MikTex\tex/latex/amsfonts\umsa.fd
+File: umsa.fd 2013/01/14 v3.01 AMS symbols A
+)
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\mt-msa.cfg
+File: mt-msa.cfg 2006/02/04 v1.1 microtype config. file: AMS symbols (a) (RS)
+)
+LaTeX Font Info:    Trying to load font information for U+msb on input line 31.
+
+
+(D:\MikTex\tex/latex/amsfonts\umsb.fd
+File: umsb.fd 2013/01/14 v3.01 AMS symbols B
+)
+(C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\mt-msb.cfg
+File: mt-msb.cfg 2005/06/01 v1.0 microtype config. file: AMS symbols (b) (RS)
+)
+LaTeX Font Info:    Trying to load font information for T1+lmtt on input line 3
+1.
+
+(D:\MikTex\tex/latex/lm\t1lmtt.fd
+File: t1lmtt.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+)
+Package microtype Info: Loading generic protrusion settings for font family
+(microtype)             `lmtt' (encoding: T1).
+(microtype)             For optimal results, create family-specific settings.
+(microtype)             See the microtype manual for details.
+LaTeX Font Info:    External font `lmex10' loaded for size
+(Font)              <9> on input line 31.
+LaTeX Font Info:    External font `lmex10' loaded for size
+(Font)              <6> on input line 31.
+LaTeX Font Info:    Trying to load font information for TS1+lmr on input line 3
+6.
+ (D:\MikTex\tex/latex/lm\ts1lmr.fd
+File: ts1lmr.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
+) [1
+
+{C:/Users/Markyan04/AppData/Local/MiKTeX/fonts/map/pdftex/pdftex.map}{D:/MikTex
+/fonts/enc/dvips/lm/lm-ec.enc}{D:/MikTex/fonts/enc/dvips/lm/lm-rm.enc}{D:/MikTe
+x/fonts/enc/dvips/lm/lm-mathit.enc}{D:/MikTex/fonts/enc/dvips/lm/lm-mathsy.enc}
+{D:/MikTex/fonts/enc/dvips/lm/lm-ts1.enc}]
+Underfull \vbox (badness 5105) has occurred while \output is active []
+
+ [2]
+[3]
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+ [4]
+<fig-design-v4-from-user-svg-cropped.pdf, id=26, 616.3025pt x 172.645pt>
+File: fig-design-v4-from-user-svg-cropped.pdf Graphic file (type pdf)
+<use fig-design-v4-from-user-svg-cropped.pdf>
+Package pdftex.def Info: fig-design-v4-from-user-svg-cropped.pdf  used on input
+ line 71.
+(pdftex.def)             Requested size: 277.69987pt x 77.79225pt.
+
+[5] [6 <./fig-design-v4-from-user-svg-cropped.pdf>] [7{D:/MikTex/fonts/enc/dvip
+s/lm/lm-mathex.enc}]
+Overfull \hbox (8.34737pt too wide) in paragraph at lines 137--139
+\T1/lmr/m/n/10 Because dif-fu-sion op-ti-miza-tion can ex-hibit timestep im-bal
+-ance (i.e., some timesteps
+ []
+
+[8] [9]
+<typeclass-cropped.pdf, id=141, 616.3025pt x 221.82875pt>
+File: typeclass-cropped.pdf Graphic file (type pdf)
+<use typeclass-cropped.pdf>
+Package pdftex.def Info: typeclass-cropped.pdf  used on input line 195.
+(pdftex.def)             Requested size: 340.17958pt x 122.4462pt.
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+ [10]
+[11 <./typeclass-cropped.pdf>] [12]
+<fig-benchmark-story-v2.png, id=172, 1089.6345pt x 360.036pt>
+File: fig-benchmark-story-v2.png Graphic file (type png)
+<use fig-benchmark-story-v2.png>
+Package pdftex.def Info: fig-benchmark-story-v2.png  used on input line 225.
+(pdftex.def)             Requested size: 347.12354pt x 114.69197pt.
+
+Underfull \vbox (badness 10000) has occurred while \output is active []
+
+ [13 <./fig-benchmark-story-v2.png>]
+<fig-benchmark-ablations-v1.png, id=179, 727.299pt x 328.5pt>
+File: fig-benchmark-ablations-v1.png Graphic file (type png)
+<use fig-benchmark-ablations-v1.png>
+Package pdftex.def Info: fig-benchmark-ablations-v1.png  used on input line 279
+.
+(pdftex.def)             Requested size: 347.12354pt x 156.78598pt.
+ [14] [15 <./fig-benchmark-ablations-v1.png>]
+Overfull \hbox (5.31874pt too wide) in paragraph at lines 321--322
+\T1/lmr/m/n/10 This pa-per ad-dresses the data scarcity and share-abil-ity bar-
+ri-ers that limit machine-
+ []
+
+
+Overfull \hbox (2.54008pt too wide) in paragraph at lines 323--324
+[]\T1/lmr/m/n/10 Overall, Mask-DDPM pro-vides a re-pro-ducible foun-da-tion for
+ gen-er-at-ing share-
+ []
+
+(main.bbl [16]
+Underfull \hbox (badness 1609) in paragraph at lines 29--35
+[]\T1/lmr/m/n/9 Coletta, A., Gopalakr-ish-nan, S., Bor-rajo, D., Vyetrenko, S.:
+ On the con-
+ []
+
+
+Underfull \hbox (badness 2653) in paragraph at lines 84--91
+[]\T1/lmr/m/n/9 Kollovieh, M., Ansari, A.F., Bohlke-Schneider, M., Zschieg-ner,
+ J., Wang,
+ []
+
+
+Underfull \hbox (badness 2662) in paragraph at lines 84--91
+\T1/lmr/m/n/9 H., Wang, Y.B.: Pre-dict, re-fine, syn-the-size: Self-guiding dif
+-fu-sion mod-els
+ []
+
+[17]
+Underfull \hbox (badness 2626) in paragraph at lines 106--112
+\T1/lmr/m/n/9 wal, A., Bel-grave, D., Cho, K., Oh, A. (eds.) Ad-vances in Neu-r
+al In-
+ []
+
+
+Underfull \hbox (badness 1946) in paragraph at lines 106--112
+\T1/lmr/m/n/9 for-ma-tion Pro-cess-ing Sys-tems. vol. 35, pp. 4328--4343. Cur-r
+an As-so-ciates,
+ []
+
+
+Underfull \hbox (badness 1603) in paragraph at lines 118--123
+[]\T1/lmr/m/n/9 Lin, Z., Jain, A., Wang, C., Fanti, G., Sekar, V.: Us-ing gans 
+for shar-
+ []
+
+
+Underfull \hbox (badness 2310) in paragraph at lines 118--123
+\T1/lmr/m/n/9 tions. In: Pro-ceed-ings of the ACM In-ter-net Mea-sure-ment Con-
+fer-ence. p.
+ []
+
+
+Underfull \hbox (badness 1394) in paragraph at lines 118--123
+\T1/lmr/m/n/9 464^^U483. IMC '20, As-so-ci-a-tion for Com-put-ing Ma-chin-ery, 
+New York, NY,
+ []
+
+
+Underfull \hbox (badness 5316) in paragraph at lines 118--123
+\T1/lmr/m/n/9 USA (2020). https://doi.org/10.1145/3419394.3423643, $\T1/lmtt/m/
+n/9 https : / / doi . org / 10 .
+ []
+
+
+Underfull \hbox (badness 1043) in paragraph at lines 125--129
+[]\T1/lmr/m/n/9 Liu, M., Huang, H., Feng, H., Sun, L., Du, B., Fu, Y.: Pristi: 
+A con-di-
+ []
+
+
+Underfull \hbox (badness 5592) in paragraph at lines 153--157
+[]\T1/lmr/m/n/9 National In-sti-tute of Stan-dards and Tech-nol-ogy: Guide to o
+p-er-a-tional
+ []
+
+
+Underfull \hbox (badness 3514) in paragraph at lines 153--157
+\T1/lmr/m/n/9 tech-nol-ogy (ot) se-cu-rity. Spe-cial Pub-li-ca-tion 800-82 Rev.
+ 3, NIST (sep
+ []
+
+[18]
+Underfull \hbox (badness 1253) in paragraph at lines 180--185
+\T1/lmr/m/n/9 data (jan 2026). https://doi.org/10.2139/ssrn.6055903, $\T1/lmtt/
+m/n/9 https : / / papers . ssrn .
+ []
+
+
+Underfull \hbox (badness 1226) in paragraph at lines 192--198
+[]\T1/lmr/m/n/9 Shi, J., Han, K., Wang, Z., Doucet, A., Tit-sias, M.: Sim-pli-f
+ied and gen-
+ []
+
+
+Underfull \hbox (badness 3954) in paragraph at lines 192--198
+\T1/lmr/m/n/9 er-al-ized masked dif-fu-sion for dis-crete data. In: Glober-son,
+ A., Mackey,
+ []
+
+
+Underfull \hbox (badness 3701) in paragraph at lines 192--198
+\T1/lmr/m/n/9 vances in Neu-ral In-for-ma-tion Pro-cess-ing Sys-tems. vol. 37, 
+pp. 103131--
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 192--198
+\T1/lmr/m/n/9 103167. Cur-ran As-so-ciates, Inc. (2024). https://doi.org/10.522
+02/079017-
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 192--198
+\T1/lmr/m/n/9 3277, $\T1/lmtt/m/n/9 https : / / proceedings . neurips . cc / pa
+per _ files / paper / 2024 / file /
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 210--215
+[]\T1/lmr/m/n/9 Sikder, M.F., Ra-machan-dran-pil-lai, R., Heintz, F.: Trans-fu-
+sion: Gen-
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 210--215
+\T1/lmr/m/n/9 er-at-ing long, high fi-delity time se-ries us-ing dif-fu-sion mo
+d-els with
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 210--215
+\T1/lmr/m/n/9 trans-form-ers. Ma-chine Learn-ing with Ap-pli-ca-tions \T1/lmr/b
+x/n/9 20\T1/lmr/m/n/9 , 100652 (2025).
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 210--215
+\T1/lmr/m/n/9 https://doi.org/https://doi.org/10.1016/j.mlwa.2025.100652, $\T1/
+lmtt/m/n/9 https : / / www .
+ []
+
+
+Underfull \hbox (badness 2229) in paragraph at lines 243--247
+\T1/lmr/m/n/9 er-a-tion. SIG-COMM Com-put. Com-mun. Rev. \T1/lmr/bx/n/9 36\T1/l
+mr/m/n/9 (4), 111^^U122 (Aug 2006).
+ []
+
+
+Underfull \hbox (badness 10000) in paragraph at lines 243--247
+\T1/lmr/m/n/9 https://doi.org/10.1145/1151659.1159928, $\T1/lmtt/m/n/9 https : 
+/ / doi . org / 10 . 1145 / 1151659 .
+ []
+
+[19]) [20] (main.aux)
+ ***********
+LaTeX2e <2025-11-01>
+L3 programming layer <2025-12-24>
+ ***********
+ ) 
+Here is how much of TeX's memory you used:
+ 6340 strings out of 467871
+ 97321 string characters out of 5435199
+ 552249 words of memory out of 5000000
+ 35104 multiletter control sequences out of 15000+600000
+ 706871 words of font info for 99 fonts, out of 8000000 for 9000
+ 1141 hyphenation exceptions out of 8191
+ 57i,9n,65p,2477b,352s stack positions out of 10000i,1000n,20000p,200000b,200000s
+<D:/MikTex/fonts/type1/public/lm/lmbx10.pfb><D:/MikTex/fonts/type1/public/lm/
+lmbx12.pfb><D:/MikTex/fonts/type1/public/lm/lmbx9.pfb><D:/MikTex/fonts/type1/pu
+blic/lm/lmex10.pfb><D:/MikTex/fonts/type1/public/lm/lmmi10.pfb><D:/MikTex/fonts
+/type1/public/lm/lmmi5.pfb><D:/MikTex/fonts/type1/public/lm/lmmi7.pfb><D:/MikTe
+x/fonts/type1/public/lm/lmmi9.pfb><D:/MikTex/fonts/type1/public/lm/lmmib10.pfb>
+<D:/MikTex/fonts/type1/public/lm/lmmib7.pfb><D:/MikTex/fonts/type1/public/lm/lm
+r10.pfb><D:/MikTex/fonts/type1/public/lm/lmr5.pfb><D:/MikTex/fonts/type1/public
+/lm/lmr6.pfb><D:/MikTex/fonts/type1/public/lm/lmr7.pfb><D:/MikTex/fonts/type1/p
+ublic/lm/lmr9.pfb><D:/MikTex/fonts/type1/public/lm/lmri10.pfb><D:/MikTex/fonts/
+type1/public/lm/lmri9.pfb><D:/MikTex/fonts/type1/public/lm/lmsy10.pfb><D:/MikTe
+x/fonts/type1/public/lm/lmsy7.pfb><D:/MikTex/fonts/type1/public/lm/lmsy9.pfb><D
+:/MikTex/fonts/type1/public/lm/lmtt10.pfb><D:/MikTex/fonts/type1/public/lm/lmtt
+9.pfb><D:/MikTex/fonts/type1/public/amsfonts/symbols/msbm10.pfb>
+Output written on main.pdf (20 pages, 1116013 bytes).
+PDF statistics:
+ 314 PDF objects out of 1000 (max. 8388607)
+ 0 named destinations out of 1000 (max. 500000)
+ 13845 words of extra memory for PDF output out of 14400 (max. 10000000)
+
--- a/LaTeX2e+Proceedings+Templates+download/main.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/main.pdf
--- a/LaTeX2e+Proceedings+Templates+download/main.tex
+++ b/LaTeX2e+Proceedings+Templates+download/main.tex
@@ -0,0 +1,327 @@
+\documentclass[runningheads]{llncs}
+\usepackage[T1]{fontenc}
+\usepackage{lmodern}
+\usepackage{graphicx}
+\usepackage{amsmath}
+\usepackage{amssymb}
+\usepackage{amsfonts}
+\usepackage{bm}
+\usepackage{array}
+\usepackage{booktabs}
+\usepackage[expansion=false]{microtype}
+\usepackage{float}
+\usepackage{url}
+
+% Compatibility shim: the source manuscript uses natbib-style citep commands.
+\newcommand{\citep}[1]{\cite{#1}}
+
+\title{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}
+\titlerunning{Mask-DDPM for ICS Telemetry Synthesis}
+
+\author{Zhenglan Chen\inst{1} \and Mingzhe Yang\inst{1} \and Hongyu Yan\inst{1} \and Huan Yang\inst{2}}
+\authorrunning{Z. Chen et al.}
+
+\institute{Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University, Guangzhou, Guangdong 510631, China \\
+\email{\{20223803054,20223803063,20223803065\}@m.scnu.edu.cn}
+\and
+School of Artificial Intelligence, South China Normal University, Guangzhou, Guangdong 510631, China \\
+\email{huan.yang@m.scnu.edu.cn}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+Industrial control systems (ICS) security research is increasingly constrained by the scarcity and limited shareability of realistic communication traces and process measurements, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol-feature and process-signal level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-range temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees valid symbol generation by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows ($L=96$) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 $\pm$ 0.0079 (continuous), mean JSD = 0.0284 $\pm$ 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows.
+
+\keywords{Machine Learning \and Cyber Defense \and ICS}
+\end{abstract}
+
+\section{Introduction}
+\label{sec:intro}
+Industrial control systems (ICS) form the backbone of modern critical infrastructure, which includes power grids, water treatment, manufacturing, and transportation, among others. These systems monitor, regulate, and automate physical processes through sensors, actuators, programmable logic controllers (PLCs), and monitoring software. Unlike conventional IT systems, ICS operate in real time, closely coupled with physical processes and safety-critical constraints, using heterogeneous and legacy communication protocols such as Modbus/TCP and DNP3 that were not originally designed with robust security in mind. This architectural complexity and operational criticality make ICS high-impact targets for cyber attacks, where disruptions can result in physical damage, environmental harm, and even loss of life. Recent reviews of ICS security highlight the expanding attack surface due to increased connectivity, vulnerabilities in legacy systems, and the inadequacy of traditional security controls in capturing the nuances of ICS networks and protocols \citep{10.1007/s10844-022-00753-1,Nankya2023-gp}
+
+While machine learning (ML) techniques have shown promise for anomaly detection and automated cybersecurity within ICS, they rely heavily on labeled datasets that capture both benign operations and diverse attack patterns. In practice, real ICS traffic data, especially attack-triggered captures, are scarce due to confidentiality, safety, and legal restrictions, and available public ICS datasets are few, limited in scope, or fail to reflect current threat modalities. For instance, the HAI Security Dataset provides operational telemetry and anomaly flags from a realistic control system setup for research purposes, but must be carefully preprocessed to derive protocol-relevant features for ML tasks \citep{shin}. Data scarcity directly undermines model generalization, evaluation reproducibility, and the robustness of intrusion detection research, especially when training or testing ML models on realistic ICS behavior remains confined to small or outdated collections of examples \citep{info16100910}.
+
+Synthetic data generation offers a practical pathway to mitigate these challenges. By programmatically generating feature-level sequences that mimic the statistical and temporal structure of real ICS telemetry, researchers can augment scarce training sets, standardize benchmarking, and preserve operational confidentiality. Relative to raw packet captures, feature-level synthesis abstracts critical protocol semantics and statistical patterns without exposing sensitive fields, making it more compatible with safety constraints and compliance requirements in ICS environments. Modern generative modeling, including diffusion models, has advanced significantly in producing high-fidelity synthetic data across domains. Diffusion approaches, such as denoising diffusion probabilistic models, learn to transform noise into coherent structured samples and have been successfully applied to tabular or time series data synthesis with better stability and data coverage compared to adversarial methods \citep{pmlr-v202-kotelnikov23a,rasul2021autoregressivedenoisingdiffusionmodels}
+
+Despite these advances, most existing work either focuses on packet-level generation \citep{jiang2023netdiffusionnetworkdataaugmentation} or is limited to generic tabular data \citep{pmlr-v202-kotelnikov23a}, rather than domain-specific control sequence synthesis tailored for ICS protocols where temporal coherence, multi-channel dependencies, and discrete protocol legality are jointly required. This gap motivates our focus on protocol feature-level generation for ICS, which involves synthesizing sequences of protocol-relevant fields conditioned on their temporal and cross-channel structure. In this work, we formulate a hybrid modeling pipeline that decouples long-horizon trends and local statistical detail while preserving discrete semantics of protocol tokens. By combining causal Transformers with diffusion-based refiners, and enforcing deterministic validity constraints during sampling, our framework generates semantically coherent, temporally consistent, and distributionally faithful ICS feature sequences. We evaluate features derived from the HAI Security Dataset and demonstrate that our approach produces high-quality synthetic sequences suitable for downstream augmentation, benchmarking, and integration into packet-reconstruction workflows that respect realistic ICS constraints.
+
+% 2. Related Work
+\section{Related Work}
+\label{sec:related}
+Early generation of network data oriented towards "realism" mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \citep{10.1145/1151659.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \citep{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \citep{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \citep{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \citep{7469060,10.1145/3055366.3055375}. This makes "synthesis at the feature/telemetry level, aware of protocol and semantics" more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from "packet syntax reproduction" to "modeling of high-level spatio-temporal distributions and uncertainties", requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
+
+Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \citep{ho2020denoising}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \citep{song2021score}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \citep{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \citep{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \citep{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \citep{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \citep{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \citep{11087622}. These results overall point to a conclusion: when the research focus is on "telemetry/high-level features" rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a "monolithic diffusion" is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
+
+Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be "legal" under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \citep{austin2021structured}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \citep{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \citep{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \citep{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of "sequence legality and state coverage" \citep{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of "legality by construction" in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
+
+From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-facated fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \citep{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \citep{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \citep{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \citep{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \citep{nie2023patchtst}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \citep{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
+
+% 3. Methodology
+\section{Methodology}
+\label{sec:method}
+Industrial control system (ICS) telemetry is intrinsically mixed-type and mechanistically heterogeneous: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic: it directly affects what realistic synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled \citep{nist2023sp80082}.
+
+We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
+
+A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}. Figure~\ref{fig:design} summarizes how our framework maps these requirements into a staged generator for mixed-type ICS telemetry.
+
+\begin{figure}[htbp]
+  \centering
+  \includegraphics[width=0.8\textwidth]{fig-design-v4-from-user-svg-cropped.pdf}
+  \caption{Masked-DDPM: Unified Synthesis for ICS traffic}
+  \label{fig:design}
+\end{figure}
+
+Motivated by these considerations, we propose Mask-DDPM, organized in the following order:
+\begin{enumerate}
+  \item Transformer trend module: learns the dominant temporal backbone of continuous dynamics via attention-based sequence modeling \citep{vaswani2017attention}.
+
+  \item Residual DDPM for continuous variables: models distributional detail as stochastic residual structure conditioned on the learned trend \citep{ho2020denoising,kollovieh2023tsdiff}.
+
+  \item Masked diffusion for discrete variables: generates discrete ICS states with an absorbing/masking corruption process and categorical reconstruction \citep{austin2021structured,shi2024simplified}.
+
+  \item Type-aware decomposition: a type-aware factorization and routing layer that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
+\end{enumerate}
+
+This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. As shown in Figure~\ref{fig:design}, these components are arranged sequentially so that temporal scaffolding, residual refinement, and discrete legality are enforced in complementary rather than competing stages. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
+
+\subsection{Transformer trend module for continuous dynamics}
+\label{sec:method-trans}
+We instantiate the temporal backbone as a causal Transformer trend extractor, leveraging self-attention's ability to represent long-range dependencies and cross-channel interactions without recurrence \citep{vaswani2017attention}. Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation \citep{vaswani2017attention,nist2023sp80082}. Crucially, in our design the Transformer is not asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
+
+For continuous channels $\bm{X}$, we posit an additive decomposition:
+\begin{equation}
+\bm{X} = \bm{S} + \bm{R},
+\label{eq:additive_decomp}
+\end{equation}
+where $\bm{S} \in \mathbb{R}^{L \times d_c}$ is a smooth trend capturing predictable temporal evolution, and $\bm{R} \in \mathbb{R}^{L \times d_c}$ is a residual capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-based temporal objective. This separation reflects an explicit division of labor: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level, a strategy aligned with predict-then-refine perspectives in time-series diffusion modeling \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+
+We parameterize the trend $\bm{S}$ using a causal Transformer $f_\phi$. With teacher forcing, we train $F_\phi$ to predict the next-step trend from past observations:
+\begin{equation}
+\hat{\bm{S}}_{t+1} = f_{\phi}(\bm{X}_{1:t}), \quad t = 1, \dots, L-1.
+\label{eq:trend_prediction}
+\end{equation}
+using the mean-squared error objective:
+\begin{equation}
+\mathcal{L}_{\text{trend}}(\phi) = \frac{1}{(L-1)d_c} \sum_{t=1}^{L-1} \bigl\| \hat{\bm{S}}_{t+1} - \bm{X}_{t+1} \bigr\|_2^2.
+\label{eq:trend_loss}
+\end{equation}
+At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S}}$, and then define the residual target for diffusion as $\bm{R} = \bm{X} - \hat{\bm{S}}$. This setup intentionally locks in a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a conditioning-first temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an monolithic generator \citep{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}.
+
+\subsection{DDPM for continuous residual generation}
+\label{sec:method-ddpm}
+We model the residual $\bm{R}$ with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+
+Let $\bm{K}$ denote the number of diffusion steps, with a noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
+\begin{equation}
+q(\bm{r}_k \mid \bm{r}_0) = \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr)
+\label{eq:forward_corruption}
+\end{equation}
+equivalently,
+\begin{equation}
+\bm{r}_k = \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
+\label{eq:forward_corruption_eq}
+\end{equation}
+The learned reverse process is parameterized as:
+\begin{equation}
+p_{\theta}(\bm{r}_{k-1} \mid \bm{r}_k, \hat{\bm{S}}) = \mathcal{N}\bigl( \boldsymbol{\mu}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}),\; \boldsymbol{\Sigma}(k) \bigr).
+\label{eq:reverse_process}
+\end{equation}
+where $\mu_\theta$ is implemented by a Transformer denoiser that consumes (i) the noised residual $r_k$, (ii) a timestep embedding for $k$, and (iii) conditioning features derived from $\hat{\bm{S}}$.  This denoiser architecture is consistent with the growing use of attention-based denoisers for long-context time-series diffusion, while our key methodological emphasis is the trend-conditioned residual factorization as the object of diffusion learning \citep{ho2020denoising,sikder2023transfusion}.
+
+We train the denoiser using the standard DDPM $\epsilon$-prediction objective:
+\begin{equation}
+\mathcal{L}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right].
+\label{eq:ddpm_loss}
+\end{equation}
+Because diffusion optimization can exhibit timestep imbalance (i.e., some timesteps dominate gradients), we optionally apply an SNR-based reweighting consistent with Min-SNR training:
+\begin{equation}
+\mathcal{L}^{\text{snr}}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ w_k \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right],
+\label{eq:snr_loss}
+\end{equation}
+where $\mathrm{SNR}_k=\bar{\alpha}_k/(1-\bar{\alpha}_k)$ and $\gamma>0$ is a cap parameter \citep{hang2023efficient}.
+
+After sampling $\hat{\bm{R}}$ by reverse diffusion, we reconstruct the continuous output as $\hat{\bm{X}} = \hat{\bm{S}} + \hat{\bm{R}}$. Overall, the DDPM component serves as a distributional corrector on top of a temporally coherent backbone, which is particularly suited to ICS where low-frequency dynamics are strong and persistent but fine-scale variability (including bursts and regime-conditioned noise) remains important for realism. Relative to prior ICS diffusion efforts that primarily focus on continuous augmentation, our formulation elevates trend-conditioned residual diffusion as a modular mechanism for disentangling temporal structure from distributional refinement \citep{yuan2025ctu,sha2026ddpm}.
+
+\subsection{Masked diffusion for discrete ICS variables}
+\label{sec:method-discrete}
+Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., in-between modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables \citep{austin2021structured,shi2024simplified}. In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness \citep{nist2023sp80082}.
+
+We therefore adopt masked (absorbing) diffusion for discrete channels, where corruption replaces tokens with a special $\texttt{[MASK]}$ symbol according to a schedule \citep{shi2024simplified}. For each variable $j$, define a masking schedule $\{m_k\}_{k=1}^K$ (with $m_k\in[0,1]$) increasing in $k$. The forward corruption process is:
+\begin{equation}
+q(y^{(j)}_k \mid y^{(j)}_0) =
+\begin{cases}
+y^{(j)}_0,            & \text{with probability } 1 - m_k, \\
+\texttt{[MASK]},      & \text{with probability } m_k,
+\end{cases}
+\label{eq:masking_process}
+\end{equation}
+applied independently across $j$ and $t$. Let $\mathcal{M}$ denote the set of masked positions at step $k$. The denoiser $h_{\psi}$ predicts a categorical distribution over $\mathcal{V}_j$ for each masked token, conditioned on (i) the corrupted discrete sequence, (ii) the diffusion step $k$, and (iii) continuous context. Concretely, we condition on $\hat{\bm{S}}$ and $\hat{\bm{X}}$ to couple supervisory reconstruction to the underlying continuous dynamics:
+\begin{equation}
+p_{\psi}\bigl( y^{(j)}_0 \mid y_k, k, \hat{\bm{S}}, \hat{\bm{X}} \bigr) = h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}}).
+\label{eq:discrete_denoising}
+\end{equation}
+This conditioning choice is motivated by the fact that many discrete ICS states are not standalone, they are functions of regimes, thresholds, and procedural phases that manifest in continuous channels \citep{nist2023sp80082}. Training uses a categorical denoising objective:
+\begin{equation}
+\mathcal{L}_{\text{disc}}(\psi) = \mathbb{E}_{k} \left[ \frac{1}{|\mathcal{M}|} \sum_{(j,t) \in \mathcal{M}} \mathrm{CE}\bigl( h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}})_{j,t},\; y^{(j)}_{0,t} \bigr) \right],
+\label{eq:discrete_loss}
+\end{equation}
+where $\mathrm{CE}(\cdot,\cdot)$ is cross-entropy. At sampling time, we initialize all discrete tokens as $\texttt{[MASK]}$ and iteratively unmask them using the learned conditionals, ensuring that every output token lies in its legal vocabulary by construction. This discrete branch is a key differentiator of our pipeline: unlike typical continuous-only diffusion augmentation in ICS, we integrate masked diffusion as a first-class mechanism for supervisory-variable legality within the same synthesis workflow \citep{shi2024simplified,yuan2025ctu}.
+
+\subsection{Type-aware decomposition as factorization and routing layer}
+\label{sec:method-types}
+Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some derived tags are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables \citep{nist2023sp80082}.
+
+We therefore introduce a type-aware decomposition that formalizes this heterogeneity as a routing and constraint layer. Let $\tau(i)\in{1,\dots,6}$ assign each variable $i$ to a type class. For expository convenience, the assignment can be viewed as a mapping $\tau(i)=\mathrm{TypeAssign}(m_i, s_i, d_i)$, where $m_i$, $s_i$, and $d_i$ denote metadata/engineering role, temporal signature, and dependency pattern, respectively. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section. Importantly, this refinement does not change the core diffusion backbone; it changes which mechanism is responsible for which variable, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
+
+We use the following taxonomy:
+\begin{enumerate}
+	\item Type 1 (program-driven / setpoint-like): externally commanded, step-and-dwell variables. These variables can be treated as exogenous drivers (conditioning signals) or routed to specialized change-point / dwell-time models, rather than being forced into a smooth denoiser that may over-regularize step structure.
+
+	\item Type 2 (controller outputs): continuous variables tightly coupled to feedback loops; these benefit from conditional modeling where the conditioning includes relevant process variables and commanded setpoints.
+
+	\item Type 3 (actuator states/positions): often exhibit saturation, dwell, and rate limits; these may require stateful dynamics beyond generic residual diffusion, motivating either specialized conditional modules or additional inductive constraints.
+
+	\item Type 4 (process variables): inertia-dominated continuous dynamics; these are the primary beneficiaries of the Transformer trend + residual DDPM pipeline.
+
+	\item Type 5 (derived/deterministic variables): algebraic or rule-based functions of other variables; we enforce deterministic reconstruction $\hat{x}^{(i)} = g_i(\hat{X},\hat{Y})$ rather than learning a stochastic generator, improving logical consistency and sample efficiency.
+
+	\item Type 6 (auxiliary/low-impact variables): weakly coupled or sparse signals; we allow simplified modeling (e.g., calibrated marginals or lightweight temporal models) to avoid allocating diffusion capacity where it is not warranted.
+\end{enumerate}
+
+\begin{figure}[H]
+  \centering
+  \includegraphics[width=0.98\textwidth]{typeclass-cropped.pdf}
+  \caption{Type assignment and six-type taxonomy.}
+  \label{fig:type_taxonomy}
+\end{figure}
+
+Figure~\ref{fig:type_taxonomy} visualizes the six-type taxonomy and the routing logic behind it. Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
+
+From a novelty standpoint, this layer is not merely an engineering patch; it is an explicit methodological statement that ICS synthesis benefits from typed factorization, a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \citep{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
+
+\subsection{Joint optimization and sampling}
+\label{sec:method-joint}
+We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer $f_{\phi}$ to obtain $\hat{\bm{S}}$; (ii) we compute residual targets $\hat{\bm{R}} = \bm{X} - \hat{\bm{S}}$ for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM $p_{\theta}(\bm{R}\mid \hat{\bm{S}})$ and masked diffusion model $p_{\psi}(\bm{Y}\mid \text{masked}(\bm{Y}), \hat{\bm{S}}, \hat{\bm{X}})$; and (iv) we apply type-aware routing and deterministic reconstruction during sampling. This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate structure-vs.-realism conflicts \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+
+A simple combined objective is $\mathcal{L} = \lambda\mathcal{L}_{\text{cont}} + (1-\lambda)\mathcal{L}_{\text{disc}}$ with $\lambda\in[0,1]$ controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or as exogenous / specialized state channels (e.g., driver-like or actuator-state variables) are prevented from forcing the diffusion models into statistically incoherent compromises.
+
+At inference time, generation follows the same structured order: (i) trend $\hat{\bm{S}}$ via the Transformer, (ii) residual $\hat{\bm{R}}$ via DDPM, (iii) discrete $\hat{\bm{Y}}$ via masked diffusion, and (iv) type-aware assembly with deterministic reconstruction for routed variables. This pipeline produces $(\hat{\bm{X}},\hat{\bm{Y}})$ that are temporally coherent by construction (through $\hat{\bm{S}}$), distributionally expressive (through $\hat{\bm{R}}$ denoising), and discretely valid (through masked diffusion), while explicitly accounting for heterogeneous variable-generating mechanisms through type-aware routing. In combination, these choices constitute our central methodological contribution: a unified Transformer + mixed diffusion generator for ICS telemetry, augmented by typed factorization to align model capacity with domain mechanism \citep{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}.
+
+% 4. Benchmark
+\section{Benchmark}
+\label{sec:benchmark}
+A credible ICS generator must clear three hurdles. It must first be \emph{semantically legal}: any out-of-vocabulary supervisory token renders a sample unusable, regardless of marginal fidelity. It must then match the heterogeneous statistics of mixed-type telemetry, including continuous process channels and discrete supervisory states. Finally, it must preserve \emph{mechanism-level realism}: switch-and-dwell behavior, bounded control motion, cross-tag coordination, and short-horizon persistence. We therefore organize the benchmark as a funnel from legality and reproducibility to structural diagnosis and ablation \citep{coletta2023constrained,yang2001interlock,stenger2024survey}.
+
+For continuous channels, we use the Kolmogorov--Smirnov (KS) statistic because ICS process signals are often bounded, saturated, heavy-tailed, and plateau-dominated, so moment matching alone is too weak. KS directly compares empirical cumulative distributions, makes no parametric assumption, and is sensitive to support or shape mismatches that are operationally meaningful in telemetry. For discrete channels, realism is primarily about how probability mass is distributed over a finite vocabulary, so we use Jensen--Shannon divergence (JSD) between per-feature categorical marginals and average across discrete variables \citep{lin1991divergence,yoon2019timegan}. To assess short-horizon dynamics, we compare lag-1 autocorrelation feature-wise and report the mean absolute difference between real and synthetic lag-1 coefficients. We also track semantic legality by counting out-of-vocabulary discrete outputs and report a filtered KS that excludes near-constant channels so that trivially flat tags do not dominate the aggregate.
+
+\subsection{Core fidelity, legality, and reproducibility}
+\label{sec:benchmark-quant}
+Across three independent runs, Mask-DDPM achieves mean KS $=0.3311 \pm 0.0079$, mean JSD $=0.0284 \pm 0.0073$, and mean absolute lag-1 difference $=0.2684 \pm 0.0027$, while maintaining a validity rate of \textbf{100\%} across the modeled discrete channels. The small dispersion across runs shows that mixed-type fidelity is reproducible rather than dependent on a single favorable seed. On a representative diagnostic slice, the model attains mean KS $=0.4025$, filtered mean KS $=0.3191$, mean JSD $=0.0166$, and mean absolute lag-1 difference $=0.2859$, again with zero invalid discrete tokens. The main pattern is that discrete legality is already solved, while continuous mismatch is concentrated in a limited subset of difficult channels rather than spread uniformly across the telemetry space.
+
+\begin{figure}[htbp]
+  \centering
+  \includegraphics[width=\textwidth]{fig-benchmark-story-v2.png}
+  \caption{Benchmark evidence chain.}
+  \label{fig:benchmark_story}
+\end{figure}
+
+\begin{table}[htbp]
+\centering
+\caption{Core benchmark summary. Lower is better except for validity rate.}
+\label{tab:core_metrics}
+\begin{tabular}{@{}lcc@{}}
+\toprule
+\textbf{Metric} & \textbf{3-run mean $\pm$ std} & \textbf{Diagnostic slice} \\
+\midrule
+Mean KS (continuous) & $0.3311 \pm 0.0079$ & $0.4025$ \\
+Filtered mean KS & -- & $0.3191$ \\
+Mean JSD (discrete) & $0.0284 \pm 0.0073$ & $0.0166$ \\
+Mean abs. $\Delta$ lag-1 autocorr & $0.2684 \pm 0.0027$ & $0.2859$ \\
+Validity rate (26 discrete tags) $\uparrow$ & $100.0 \pm 0.0\%$ & $100.0\%$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+%Question about the following part. "Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis."
+Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis. The left panel shows seed-level stability across the three benchmark runs. The middle panel shows that the dominant continuous mismatch is concentrated in a relatively small subset of control-sensitive variables rather than indicating a global collapse of the generator. The right panel shows that the remaining realism gap is mechanism-specific, with program-like long-dwell behavior and actuator-state occupancy contributing more strongly than PV-like channels on this slice.
+
+\subsection{Type-aware diagnostics}
+\label{sec:benchmark-typed}
+Type-aware diagnostics make that mechanism gap explicit. Table~\ref{tab:typed_diagnostics} reports one representative statistic per variable family on the same diagnostic slice. Because each family is evaluated with a different proxy, the absolute-error column should be interpreted within type, while the relative-error column is the more comparable cross-type indicator.
+
+\begin{table}[htbp]
+\centering
+\caption{Type-aware diagnostic summary. Lower values indicate better alignment.}
+\label{tab:typed_diagnostics}
+\begin{tabular}{@{}llcc@{}}
+\toprule
+\textbf{Type} & \textbf{Proxy statistic} & \textbf{Mean abs. error} & \textbf{Mean rel. error} \\
+\midrule
+Program & mean dwell & $318.70$ & $2.19$ \\
+Controller & change rate & $0.104$ & $0.25$ \\
+Actuator & top-3 mass & $0.0615$ & $0.69$ \\
+PV & tail ratio & $1.614$ & $0.20$ \\
+Auxiliary & lag-1 autocorr & $0.125$ & $0.37$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Program-like channels remain the hardest family by a clear margin: mean-dwell mismatch is still large, indicating that the generator does not yet sustain the long plateaus characteristic of schedule-driven behavior. Actuator channels form the next clear difficulty, while PV channels are the most stable family under this diagnostic. In short, legality is solved, but the remaining realism gap is not uniform across types; it is dominated primarily by long-dwell program behavior and actuator-state occupancy.
+
+\subsection{Ablation study}
+\label{sec:benchmark-ablation}
+We evaluate ten controlled variants under a shared pipeline and summarize six representative metrics: continuous fidelity (KS), discrete fidelity (JSD), short-horizon dynamics (lag-1), cross-variable coupling, predictive transfer, and downstream anomaly utility. Figure~\ref{fig:ablation_impact} visualizes signed changes relative to the full model, and Table~\ref{tab:ablation} gives the underlying values.
+
+\begin{figure}[htbp]
+  \centering
+  \includegraphics[width=\textwidth]{fig-benchmark-ablations-v1.png}
+  \caption{Ablation impact.}
+  \label{fig:ablation_impact}
+\end{figure}
+
+\begin{table}[htbp]
+\centering
+\small
+\caption{Ablation study. Lower is better except for anomaly AUPRC.}
+\label{tab:ablation}
+\begin{tabular}{@{}lcccccc@{}}
+\toprule
+\textbf{Variant} & \textbf{KS$\downarrow$} & \textbf{JSD$\downarrow$} & \textbf{Lag-1$\downarrow$} & \textbf{Coupling$\downarrow$} & \textbf{Pred. RMSE$\downarrow$} & \textbf{AUPRC$\uparrow$} \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Full model}} \\
+Full model & $0.402$ & $0.028$ & $0.291$ & $0.215$ & $0.972$ & $0.644$ \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Structure and conditioning}} \\
+No temporal scaffold & $0.408$ & $0.031$ & $0.664$ & $0.306$ & $0.977$ & $0.645$ \\
+No file-level context & $0.405$ & $0.033$ & $0.237$ & $0.262$ & $0.986$ & $0.640$ \\
+No type routing & $0.356$ & $0.022$ & $0.138$ & $0.324$ & $1.017$ & $0.647$ \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Distribution shaping}} \\
+No quantile transform & $0.599$ & $0.010$ & $0.156$ & $0.300$ & $1.653$ & $0.417$ \\
+No post-calibration & $0.543$ & $0.024$ & $0.253$ & $0.249$ & $1.086$ & $0.647$ \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Loss and target design}} \\
+No SNR weighting & $0.400$ & $0.022$ & $0.299$ & $0.214$ & $0.961$ & $0.637$ \\
+No quantile loss & $0.413$ & $0.018$ & $0.311$ & $0.213$ & $0.965$ & $0.645$ \\
+No residual-stat loss & $0.404$ & $0.029$ & $0.285$ & $0.210$ & $0.970$ & $0.647$ \\
+Epsilon target & $0.482$ & $0.102$ & $0.728$ & $0.195$ & $0.968$ & $0.647$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The ablation results reveal three distinct roles. First, temporal staging is what makes the sequence look dynamical rather than merely plausible frame by frame: removing the temporal scaffold leaves KS nearly unchanged but more than doubles lag-1 error ($0.291 \rightarrow 0.664$) and substantially worsens coupling ($0.215 \rightarrow 0.306$). Second, quantile-based distribution shaping is one of the main contributors to usable continuous realism: without the quantile transform, KS degrades sharply ($0.402 \rightarrow 0.599$), synthetic-only predictive RMSE deteriorates ($0.972 \rightarrow 1.653$), and anomaly utility collapses ($0.644 \rightarrow 0.417$). Third, routing is the key counterexample to one-dimensional evaluation: disabling type routing can improve KS or lag-1 in isolation, yet it worsens coupling ($0.215 \rightarrow 0.324$) and predictive transfer ($0.972 \rightarrow 1.017$), showing that typed decomposition helps preserve coordinated mechanism-level behavior.
+
+Taken together, the benchmark supports a focused claim. Mask-DDPM already provides stable mixed-type fidelity and perfect discrete legality, while the remaining error is concentrated in ICS-specific channels whose realism depends on rare switching, long dwell intervals, constrained occupancy, and persistent local dynamics.
+
+% 5. Conclusion and Future Work
+\section{Conclusion and Future Work}
+\label{sec:conclusion}
+This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control systems (ICS) security by proposing Mask-DDPM, a hybrid synthetic telemetry generator at the protocol-feature level. By combining a causal Transformer trend module, a trend-conditioned residual DDPM, a masked diffusion branch for discrete variables, and a type-aware routing layer, the framework preserves long-horizon temporal structure, improves local distributional fidelity, and guarantees discrete semantic legality. On windows derived from the HAI Security Dataset, the model achieves stable mixed-type fidelity across seeds, with mean KS = 0.3311 $\pm$ 0.0079 on continuous features, mean JSD = 0.0284 $\pm$ 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027.
+
+Overall, Mask-DDPM provides a reproducible foundation for generating shareable and semantically valid ICS feature sequences for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Future work will proceed in two complementary directions. Vertically, we will strengthen the theoretical foundation of the framework by introducing more explicit control-theoretic constraints, structured state-space or causal priors, and formal transition models for supervisory logic, so that legality, stability, and cross-channel coupling can be characterized more rigorously. Horizontally, we will extend the framework beyond the current setting to additional industrial control protocols such as Modbus/TCP, DNP3, IEC 104, and OPC UA, and investigate analogous adaptations to automotive communication protocols such as CAN/CAN FD and automotive Ethernet. A related extension is controllable attack or violation injection on top of legal base traces, enabling reproducible adversarial benchmarks for anomaly detection and intrusion-detection studies.
+
+\bibliographystyle{splncs04}
+\bibliography{references}
+\end{document}
--- a/LaTeX2e+Proceedings+Templates+download/readme.txt
+++ b/LaTeX2e+Proceedings+Templates+download/readme.txt
@@ -0,0 +1,20 @@
+Dear llncs user,
+
+The files in this directory belong to the LaTeX2e package
+for Springer's Lecture Notes in Computer Science (LNCS) and
+other proceedings book series.
+
+It consists of the following files:
+
+  readme.txt         this file
+
+  history.txt        the version history of the package
+
+  llncs.cls          the LaTeX2e document class
+
+  samplepaper.tex    a sample paper
+  fig1.eps           a figure used in the sample paper
+
+  llncsdoc.pdf       the documentation of the class (PDF version)
+
+  splncs04.bst       current LNCS BibTeX style with alphabetic sorting
--- a/LaTeX2e+Proceedings+Templates+download/references.bib
+++ b/LaTeX2e+Proceedings+Templates+download/references.bib
@@ -0,0 +1,633 @@
+
+@inproceedings{vaswani2017attention,
+ author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {I. Guyon and U. Von Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
+ pages = {},
+ publisher = {Curran Associates, Inc.},
+ title = {Attention is All you Need},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf},
+ volume = {30},
+ year = {2017}
+}
+
+@inproceedings{ho2020denoising,
+ author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
+ pages = {6840--6851},
+ publisher = {Curran Associates, Inc.},
+ title = {Denoising Diffusion Probabilistic Models},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
+ volume = {33},
+ year = {2020}
+}
+
+@inproceedings{austin2021structured,
+ author = {Austin, Jacob and Johnson, Daniel D. and Ho, Jonathan and Tarlow, Daniel and van den Berg, Rianne},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
+ pages = {17981--17993},
+ publisher = {Curran Associates, Inc.},
+ title = {Structured Denoising Diffusion Models in Discrete State-Spaces},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/958c530554f78bcd8e97125b70e6973d-Paper.pdf},
+ volume = {34},
+ year = {2021}
+}
+
+@inproceedings{shi2024simplified,
+ author = {Shi, Jiaxin and Han, Kehang and Wang, Zhe and Doucet, Arnaud and Titsias, Michalis},
+ booktitle = {Advances in Neural Information Processing Systems},
+ doi = {10.52202/079017-3277},
+ editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
+ pages = {103131--103167},
+ publisher = {Curran Associates, Inc.},
+ title = {Simplified and Generalized Masked Diffusion for Discrete Data},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/bad233b9849f019aead5e5cc60cef70f-Paper-Conference.pdf},
+ volume = {37},
+ year = {2024}
+}
+
+@InProceedings{hang2023efficient,
+    author    = {Hang, Tiankai and Gu, Shuyang and Li, Chen and Bao, Jianmin and Chen, Dong and Hu, Han and Geng, Xin and Guo, Baining},
+    title     = {Efficient Diffusion Training via Min-SNR Weighting Strategy},
+    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+    month     = {October},
+    year      = {2023},
+    pages     = {7441-7451}
+}
+
+@inproceedings{kollovieh2023tsdiff,
+ author = {Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang (Bernie)},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
+ pages = {28341--28364},
+ publisher = {Curran Associates, Inc.},
+ title = {Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/5a1a10c2c2c9b9af1514687bc24b8f3d-Paper-Conference.pdf},
+ volume = {36},
+ year = {2023}
+}
+
+@article{sikder2023transfusion,
+title = {TransFusion: Generating long, high fidelity time series using diffusion models with transformers},
+journal = {Machine Learning with Applications},
+volume = {20},
+pages = {100652},
+year = {2025},
+issn = {2666-8270},
+doi = {https://doi.org/10.1016/j.mlwa.2025.100652},
+url = {https://www.sciencedirect.com/science/article/pii/S2666827025000350},
+author = {Md Fahim Sikder and Resmi Ramachandranpillai and Fredrik Heintz},
+keywords = {Time series generation, Generative models, Diffusion models, Synthetic data, Long-sequenced data},
+abstract = {The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. TransFusion is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.}
+}
+
+@misc{song2021score,
+      title={Score-Based Generative Modeling through Stochastic Differential Equations},
+      author={Yang Song and Jascha Sohl-Dickstein and Diederik P. Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},
+      year={2021},
+      eprint={2011.13456},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2011.13456},
+}
+
+@misc{shi2025tabdiff,
+      title={TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation},
+      author={Juntong Shi and Minkai Xu and Harper Hua and Hengrui Zhang and Stefano Ermon and Jure Leskovec},
+      year={2025},
+      eprint={2410.20626},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2410.20626},
+}
+
+@inproceedings{yuan2025ctu,
+author = {Yuan, Yusong and Sha, Yun and Zhao, Haidong},
+title = {CTU-DDPM: Generating Industrial Control System Time-Series Data with a CNN-Transformer Hybrid Diffusion Model},
+year = {2025},
+isbn = {9798400721007},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3776759.3776845},
+doi = {10.1145/3776759.3776845},
+abstract = {The security of Industrial Control Systems (ICS) is of paramount importance to national security. Anomaly detection, as a critical security measure, can effectively identify attack behaviors targeting ICS. However, the performance of anomaly detection methods is highly dependent on high-quality datasets, and real anomalous data, in particular, is often difficult to obtain due to its sensitive security implications. To address this challenge, this paper proposes CTU-DDPM, a method for generating multivariate time series data based on Diffusion Models. Our aim is to Generate high-quality industrial control time series data to enhance the performance of anomaly detection methods. This research constructs a diffusion model that fuses a Convolutional Neural Network (CNN) and a Transformer architecture. This hybrid approach is designed to achieve more precise and realistic data generation in complex industrial time series, thereby effectively compensating for the scarcity of authentic anomalous data and providing crucial data support for ICS security.},
+booktitle = {Proceedings of the 2025 International Symposium on Artificial Intelligence and Computational Social Sciences},
+pages = {547–552},
+numpages = {6},
+keywords = {Convolutional Neural Network, Diffusion Model, Generation, Industrial Control Systems, Time Series Data, Transformer},
+location = {
+},
+series = {AICSS '25}
+}
+
+@misc{sha2026ddpm,
+  title={DDPM Fusing Mamba and Adaptive Attention: An Augmentation Method for Industrial Control Systems Anomaly Data},
+  author={Sha, Yun and Yuan, Yusong and Wu, Yonghao and Zhao, Haidong},
+  year={2026},
+  month={jan},
+  note={SSRN Electronic Journal},
+  eprint={6055903},
+  archivePrefix={SSRN},
+  doi={10.2139/ssrn.6055903},
+  url={https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055903}
+}
+
+@techreport{nist2023sp80082,
+  title={Guide to Operational Technology (OT) Security},
+  author={{National Institute of Standards and Technology}},
+  institution={NIST},
+  type={Special Publication},
+  number={800-82 Rev. 3},
+  year={2023},
+  month={sep},
+  doi={10.6028/NIST.SP.800-82r3},
+  url={https://csrc.nist.gov/pubs/sp/800/82/r3/final}
+}
+
+@article{10.1007/s10844-022-00753-1,
+  title={Machine learning in industrial control system (ICS) security: current landscape, opportunities and challenges},
+  author={Koay, Abigail MY and Ko, Ryan K L and Hettema, Hinne and Radke, Kenneth},
+  journal={Journal of Intelligent Information Systems},
+  volume={60},
+  number={2},
+  pages={377--405},
+  year={2023},
+  publisher={Springer}
+}
+
+@Article{Nankya2023-gp,
+AUTHOR = {Nankya, Mary and Chataut, Robin and Akl, Robert},
+TITLE = {Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense Strategies},
+JOURNAL = {Sensors},
+VOLUME = {23},
+YEAR = {2023},
+NUMBER = {21},
+ARTICLE-NUMBER = {8840},
+URL = {https://www.mdpi.com/1424-8220/23/21/8840},
+PubMedID = {37960539},
+ISSN = {1424-8220},
+ABSTRACT = {Industrial Control Systems (ICS), which include Supervisory Control and Data Acquisition (SCADA) systems, Distributed Control Systems (DCS), and Programmable Logic Controllers (PLC), play a crucial role in managing and regulating industrial processes. However, ensuring the security of these systems is of utmost importance due to the potentially severe consequences of cyber attacks. This article presents an overview of ICS security, covering its components, protocols, industrial applications, and performance aspects. It also highlights the typical threats and vulnerabilities faced by these systems. Moreover, the article identifies key factors that influence the design decisions concerning control, communication, reliability, and redundancy properties of ICS, as these are critical in determining the security needs of the system. The article outlines existing security countermeasures, including network segmentation, access control, patch management, and security monitoring. Furthermore, the article explores the integration of machine learning techniques to enhance the cybersecurity of ICS. Machine learning offers several advantages, such as anomaly detection, threat intelligence analysis, and predictive maintenance. However, combining machine learning with other security measures is essential to establish a comprehensive defense strategy for ICS. The article also addresses the challenges associated with existing measures and provides recommendations for improving ICS security. This paper becomes a valuable reference for researchers aiming to make meaningful contributions within the constantly evolving ICS domain by providing an in-depth examination of the present state, challenges, and potential future advancements.},
+DOI = {10.3390/s23218840}
+}
+
+@misc{shin,
+  title = {HAI Security Dataset},
+  url = {https://www.kaggle.com/dsv/5821622},
+  doi = {10.34740/kaggle/dsv/5821622},
+  publisher = {Kaggle},
+  author = {Shin, Hyeok-Ki and Lee, Woomyo and Choi, Seungoh and Yun, Jeong-Han and Min, Byung Gil and Kim, HyoungChun},
+  year = {2023}
+}
+
+@Article{info16100910,
+AUTHOR = {Ali, Jokha and Ali, Saqib and Al Balushi, Taiseera and Nadir, Zia},
+TITLE = {Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning},
+JOURNAL = {Information},
+VOLUME = {16},
+YEAR = {2025},
+NUMBER = {10},
+ARTICLE-NUMBER = {910},
+URL = {https://www.mdpi.com/2078-2489/16/10/910},
+ISSN = {2078-2489},
+ABSTRACT = {Securing Industrial Control Systems (ICSs) is critical, but it is made challenging by the constant evolution of cyber threats and the scarcity of labeled attack data in these specialized environments. Standard intrusion detection systems (IDSs) often fail to adapt when transferred to new networks with limited data. To address this, this paper introduces an adaptive intrusion detection framework that combines a hybrid Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) model with a novel transfer learning strategy. We employ a Reinforcement Learning (RL) agent to intelligently guide the fine-tuning process, which allows the IDS to dynamically adjust its parameters such as layer freezing and learning rates in real-time based on performance feedback. We evaluated our system in a realistic data-scarce scenario using only 50 labeled training samples. Our RL-Guided model achieved a final F1-score of 0.9825, significantly outperforming a standard neural fine-tuning model (0.861) and a target baseline model (0.759). Analysis of the RL agent’s behavior confirmed that it learned a balanced and effective policy for adapting the model to the target domain. We conclude that the proposed RL-guided approach creates a highly accurate and adaptive IDS that overcomes the limitations of static transfer learning methods. This dynamic fine-tuning strategy is a powerful and promising direction for building resilient cybersecurity defenses for critical infrastructure.},
+DOI = {10.3390/info16100910}
+}
+
+@InProceedings{pmlr-v202-kotelnikov23a,
+  title =          {{T}ab{DDPM}: Modelling Tabular Data with Diffusion Models},
+  author =       {Kotelnikov, Akim and Baranchuk, Dmitry and Rubachev, Ivan and Babenko, Artem},
+  booktitle =          {Proceedings of the 40th International Conference on Machine Learning},
+  pages =          {17564--17579},
+  year =          {2023},
+  editor =          {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
+  volume =          {202},
+  series =          {Proceedings of Machine Learning Research},
+  month =          {23--29 Jul},
+  publisher =    {PMLR},
+  pdf =          {https://proceedings.mlr.press/v202/kotelnikov23a/kotelnikov23a.pdf},
+  url =          {https://proceedings.mlr.press/v202/kotelnikov23a.html},
+  abstract =          {Denoising diffusion probabilistic models are becoming the leading generative modeling paradigm for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling since the individual features can be of a completely different nature, i.e., some of them can be continuous and some can be discrete. To address such data types, we introduce TabDDPM — a diffusion model that can be universally applied to any tabular dataset and handles any feature types. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields.}
+}
+
+@InProceedings{rasul2021autoregressivedenoisingdiffusionmodels,
+  title = 	 {Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting},
+  author =       {Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland},
+  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
+  pages = 	 {8857--8868},
+  year = 	 {2021},
+  editor = 	 {Meila, Marina and Zhang, Tong},
+  volume = 	 {139},
+  series = 	 {Proceedings of Machine Learning Research},
+  month = 	 {18--24 Jul},
+  publisher =    {PMLR},
+  pdf = 	 {http://proceedings.mlr.press/v139/rasul21a/rasul21a.pdf},
+  url = 	 {https://proceedings.mlr.press/v139/rasul21a.html},
+  abstract = 	 {In this work, we propose TimeGrad, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.}
+}
+
+@article{jiang2023netdiffusionnetworkdataaugmentation,
+author = {Jiang, Xi and Liu, Shinan and Gember-Jacobson, Aaron and Bhagoji, Arjun Nitin and Schmitt, Paul and Bronzino, Francesco and Feamster, Nick},
+title = {NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation},
+year = {2024},
+issue_date = {March 2024},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+volume = {8},
+number = {1},
+url = {https://doi.org/10.1145/3639037},
+doi = {10.1145/3639037},
+abstract = {Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flow statistics or a few selected packet attributes, do not always suffice, especially when model training relies on having features that are only available from packet traces. This shortfall manifests in both insufficient statistical resemblance to real traces and suboptimal performance on ML tasks when employed for data augmentation. In this paper, we apply diffusion models to generate high-resolution synthetic network traffic traces. We present NetDiffusion1, a tool that uses a finely-tuned, controlled variant of a Stable Diffusion model to generate synthetic network traffic that is high fidelity and conforms to protocol specifications. Our evaluation demonstrates that packet captures generated from NetDiffusion can achieve higher statistical similarity to real data and improved ML model performance than current state-of-the-art approaches (e.g., GAN-based approaches). Furthermore, our synthetic traces are compatible with common network analysis tools and support a myriad of network tasks, suggesting that NetDiffusion can serve a broader spectrum of network analysis and testing tasks, extending beyond ML-centric applications.},
+journal = {Proc. ACM Meas. Anal. Comput. Syst.},
+month = feb,
+articleno = {11},
+numpages = {32},
+keywords = {diffusion model, network traffic, synthesis}
+}
+
+@article{Ring_2019,
+title = {Flow-based network traffic generation using Generative Adversarial Networks},
+journal = {Computers \& Security},
+volume = {82},
+pages = {156-172},
+year = {2019},
+issn = {0167-4048},
+doi = {https://doi.org/10.1016/j.cose.2018.12.012},
+url = {https://www.sciencedirect.com/science/article/pii/S0167404818308393},
+author = {Markus Ring and Daniel Schlör and Dieter Landes and Andreas Hotho},
+keywords = {GANs, TTUR WGAN-GP, NetFlow, Generation, IDS},
+abstract = {Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.}
+}
+
+@inproceedings{10.1145/3544216.3544251,
+author = {Yin, Yucheng and Lin, Zinan and Jin, Minhao and Fanti, Giulia and Sekar, Vyas},
+title = {Practical GAN-based synthetic IP header trace generation using NetShare},
+year = {2022},
+isbn = {9781450394208},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3544216.3544251},
+doi = {10.1145/3544216.3544251},
+abstract = {We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46\% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.},
+booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
+pages = {458–472},
+numpages = {15},
+keywords = {synthetic data generation, privacy, network packets, network flows, generative adversarial networks},
+location = {Amsterdam, Netherlands},
+series = {SIGCOMM '22}
+}
+
+@inproceedings{Lin_2020,
+author = {Lin, Zinan and Jain, Alankar and Wang, Chen and Fanti, Giulia and Sekar, Vyas},
+title = {Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions},
+year = {2020},
+isbn = {9781450381383},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3419394.3423643},
+doi = {10.1145/3419394.3423643},
+abstract = {Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43\% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.},
+booktitle = {Proceedings of the ACM Internet Measurement Conference},
+pages = {464–483},
+numpages = {20},
+keywords = {generative adversarial networks, privacy, synthetic data generation, time series},
+location = {Virtual Event, USA},
+series = {IMC '20}
+}
+
+@INPROCEEDINGS{7469060,
+  author={Mathur, Aditya P. and Tippenhauer, Nils Ole},
+  booktitle={2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater)},
+  title={SWaT: a water treatment testbed for research and training on ICS security},
+  year={2016},
+  volume={},
+  number={},
+  pages={31-36},
+  keywords={Sensors;Actuators;Feeds;Process control;Chemicals;Chemical sensors;Security;Cyber Physical Systems;Industrial Control Systems;Cyber Attacks;Cyber Defense;Water Testbed},
+  doi={10.1109/CySWater.2016.7469060}
+}
+
+@inproceedings{10.1145/3055366.3055375,
+author = {Ahmed, Chuadhry Mujeeb and Palleti, Venkata Reddy and Mathur, Aditya P.},
+title = {WADI: a water distribution testbed for research in the design of secure cyber physical systems},
+year = {2017},
+isbn = {9781450349758},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3055366.3055375},
+doi = {10.1145/3055366.3055375},
+abstract = {The architecture of a water distribution testbed (WADI), and on-going research in the design of secure water distribution system is presented. WADI consists of three stages controlled by Programmable Logic Controllers (PLCs) and two stages controlled via Remote Terminal Units (RTUs). Each PLC and RTU uses sensors to estimate the system state and the actuators to effect control. WADI is currently used to (a) conduct security analysis for water distribution networks, (b) experimentally assess detection mechanisms for potential cyber and physical attacks, and (c) understand how the impact of an attack on one CPS could cascade to other connected CPSs. The cascading effects of attacks can be studied in WADI through its connection to two other testbeds, namely for water treatment and power generation and distribution.},
+booktitle = {Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks},
+pages = {25–28},
+numpages = {4},
+keywords = {attack detection, cyber physical systems, cyber security, industrial control systems, water distribution testbed},
+location = {Pittsburgh, Pennsylvania},
+series = {CySWATER '17}
+}
+
+@inproceedings{tashiro2021csdiconditionalscorebaseddiffusion,
+ author = {Tashiro, Yusuke and Song, Jiaming and Song, Yang and Ermon, Stefano},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
+ pages = {24804--24816},
+ publisher = {Curran Associates, Inc.},
+ title = {CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/cfe8504bda37b575c70ee1a8276f3486-Paper.pdf},
+ volume = {34},
+ year = {2021}
+}
+
+@inproceedings{wen2024diffstgprobabilisticspatiotemporalgraph,
+author = {Wen, Haomin and Lin, Youfang and Xia, Yutong and Wan, Huaiyu and Wen, Qingsong and Zimmermann, Roger and Liang, Yuxuan},
+title = {DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting  with Denoising Diffusion Models},
+year = {2023},
+isbn = {9798400701689},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3589132.3625614},
+doi = {10.1145/3589132.3625614},
+abstract = {Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal graph (STG) forecasting. Despite their success, they fail to model intrinsic uncertainties within STG data, which cripples their practicality in downstream tasks for decision-making. To this end, this paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex ST dependencies. In this study, we present the first attempt to generalize the popular de-noising diffusion probabilistic models to STGs, leading to a novel non-autoregressive framework called DiffSTG, along with the first denoising network UGnet for STG in the framework. Our approach combines the spatio-temporal learning capabilities of STGNNs with the uncertainty measurements of diffusion models. Extensive experiments validate that DiffSTG reduces the Continuous Ranked Probability Score (CRPS) by 4\%-14\%, and Root Mean Squared Error (RMSE) by 2\%-7\% over existing methods on three real-world datasets.},
+booktitle = {Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems},
+articleno = {60},
+numpages = {12},
+keywords = {spatio-temporal graph forecasting, probabilistic forecasting, diffusion model},
+location = {Hamburg, Germany},
+series = {SIGSPATIAL '23}
+}
+
+@INPROCEEDINGS{liu2023pristiconditionaldiffusionframework,
+  author={Liu, Mingzhe and Huang, Han and Feng, Hao and Sun, Leilei and Du, Bowen and Fu, Yanjie},
+  booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},
+  title={PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation},
+  year={2023},
+  volume={},
+  number={},
+  pages={1927-1939},
+  keywords={Correlation;Scalability;Transforms;Predictive models;Feature extraction;Propagation losses;Probabilistic logic;Spatiotemporal Imputation;Diffusion Model;Spatiotemporal Dependency Learning},
+  doi={10.1109/ICDE55515.2023.00150}}
+
+
+@misc{kong2021diffwaveversatilediffusionmodel,
+      title={DiffWave: A Versatile Diffusion Model for Audio Synthesis},
+      author={Zhifeng Kong and Wei Ping and Jiaji Huang and Kexin Zhao and Bryan Catanzaro},
+      year={2021},
+      eprint={2009.09761},
+      archivePrefix={arXiv},
+      primaryClass={eess.AS},
+      url={https://arxiv.org/abs/2009.09761},
+}
+
+@ARTICLE{11087622,
+  author={Liu, Xiaosi and Xu, Xiaowen and Liu, Zhidan and Li, Zhenjiang and Wu, Kaishun},
+  journal={IEEE Transactions on Mobile Computing},
+  title={Spatio-Temporal Diffusion Model for Cellular Traffic Generation},
+  year={2026},
+  volume={25},
+  number={1},
+  pages={257-271},
+  keywords={Base stations;Diffusion models;Data models;Uncertainty;Predictive models;Generative adversarial networks;Knowledge graphs;Mobile computing;Telecommunication traffic;Semantics;Cellular traffic;data generation;diffusion model;spatio-temporal graph},
+  doi={10.1109/TMC.2025.3591183}
+}
+
+@inproceedings{hoogeboom2021argmaxflowsmultinomialdiffusion,
+author = {Hoogeboom, Emiel and Nielsen, Didrik and Jaini, Priyank and Forr\'{e}, Patrick and Welling, Max},
+booktitle = {Advances in Neural Information Processing Systems},
+editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
+pages = {12454--12465},
+publisher = {Curran Associates, Inc.},
+title = {Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions},
+url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/67d96d458abdef21792e6d8e590244e7-Paper.pdf},
+volume = {34},
+year = {2021}
+}
+
+@inproceedings{li2022diffusionlmimprovescontrollabletext,
+ author = {Li, Xiang and Thickstun, John and Gulrajani, Ishaan and Liang, Percy S and Hashimoto, Tatsunori B},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
+ pages = {4328--4343},
+ publisher = {Curran Associates, Inc.},
+ title = {Diffusion-LM Improves Controllable Text Generation},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf},
+ volume = {35},
+ year = {2022}
+}
+
+@ARTICLE{meng2025aflnetyearslatercoverageguided,
+  author={Meng, Ruijie and Pham, Van-Thuan and Böhme, Marcel and Roychoudhury, Abhik},
+  journal={IEEE Transactions on Software Engineering},
+  title={AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing},
+  year={2025},
+  volume={51},
+  number={4},
+  pages={960-974},
+  keywords={Protocols;Servers;Fuzzing;Codes;Security;Data models;Source coding;Computer bugs;Software systems;Reliability;Greybox fuzzing;network protocol testing;stateful fuzzing},
+  doi={10.1109/TSE.2025.3535925}}
+
+@INPROCEEDINGS{godefroid2017learnfuzzmachinelearninginput,
+  author={Godefroid, Patrice and Peleg, Hila and Singh, Rishabh},
+  booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)},
+  title={Learn\&Fuzz: Machine learning for input fuzzing},
+  year={2017},
+  volume={},
+  number={},
+  pages={50-59},
+  keywords={Portable document format;Grammar;Training;Probability distribution;Recurrent neural networks;Fuzzing;Deep Learning;Grammar-based Fuzzing;Grammar Learning},
+  doi={10.1109/ASE.2017.8115618}}
+
+
+@INPROCEEDINGS{she2019neuzzefficientfuzzingneural,
+  author={She, Dongdong and Pei, Kexin and Epstein, Dave and Yang, Junfeng and Ray, Baishakhi and Jana, Suman},
+  booktitle={2019 IEEE Symposium on Security and Privacy (SP)},
+  title={NEUZZ: Efficient Fuzzing with Neural Program Smoothing},
+  year={2019},
+  volume={},
+  number={},
+  pages={803-817},
+  keywords={Optimization;Fuzzing;Computer bugs;Artificial neural networks;Smoothing methods;Evolutionary computation;fuzzing;-neural-program-smoothing;-gradient-guided-mutation},
+  doi={10.1109/SP.2019.00052}}
+
+@inproceedings{dai2019transformerxlattentivelanguagemodels,
+    title = "Transformer-{XL}: Attentive Language Models beyond a Fixed-Length Context",
+    author = "Dai, Zihang  and
+      Yang, Zhilin  and
+      Yang, Yiming  and
+      Carbonell, Jaime  and
+      Le, Quoc  and
+      Salakhutdinov, Ruslan",
+    editor = "Korhonen, Anna  and
+      Traum, David  and
+      M{\`a}rquez, Llu{\'i}s",
+    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
+    month = jul,
+    year = "2019",
+    address = "Florence, Italy",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/P19-1285/",
+    doi = "10.18653/v1/P19-1285",
+    pages = "2978--2988",
+    abstract = "Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80{\%} longer than RNNs and 450{\%} longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch."
+}
+
+@article{zhou2021informerefficienttransformerlong,
+    title={Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
+    volume={35}, url={https://ojs.aaai.org/index.php/AAAI/article/view/17325},
+    DOI={10.1609/aaai.v35i12.17325},
+    abstractNote={Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences’ dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.},
+    number={12},
+    journal={Proceedings of the AAAI Conference on Artificial Intelligence},
+    author={Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai},
+    year={2021},
+    month={May},
+    pages={11106-11115}
+}
+
+@inproceedings{wu2022autoformerdecompositiontransformersautocorrelation,
+ author = {Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
+ pages = {22419--22430},
+ publisher = {Curran Associates, Inc.},
+ title = {Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/bcc0d400288793e8bdcd7c19a8ac0c2b-Paper.pdf},
+ volume = {34},
+ year = {2021}
+}
+
+@InProceedings{zhou2022fedformerfrequencyenhanceddecomposed,
+  title = 	 {{FED}former: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting},
+  author =       {Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong},
+  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
+  pages = 	 {27268--27286},
+  year = 	 {2022},
+  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
+  volume = 	 {162},
+  series = 	 {Proceedings of Machine Learning Research},
+  month = 	 {17--23 Jul},
+  publisher =    {PMLR},
+  pdf = 	 {https://proceedings.mlr.press/v162/zhou22g/zhou22g.pdf},
+  url = 	 {https://proceedings.mlr.press/v162/zhou22g.html},
+  abstract = 	 {Long-term time series forecasting is challenging since prediction accuracy tends to decrease dramatically with the increasing horizon. Although Transformer-based methods have significantly improved state-of-the-art results for long-term forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in a well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer (FEDformer), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, Fedformer can reduce prediction error by 14.8% and 22.6% for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.}
+}
+
+@article{2023,
+   title={A Note on Extremal Sombor Indices of Trees with a Given Degree Sequence},
+   volume={90},
+   ISSN={0340-6253},
+   url={http://dx.doi.org/10.46793/match.90-1.197D},
+   DOI={10.46793/match.90-1.197d},
+   number={1},
+   journal={Match Communications in Mathematical and in Computer Chemistry},
+   publisher={University Library in Kragujevac},
+   author={Damjanović, Ivan and Milošević, Marko and Stevanović, Dragan},
+   year={2023},
+   pages={197–202}
+}
+
+@article{stenger2024survey,
+  title={Evaluation is key: a survey on evaluation measures for synthetic time series},
+  author={Stenger, Michael and Leppich, Robert and Foster, Ian T and Kounev, Samuel and Bauer, Andre},
+  journal={Journal of Big Data},
+  volume={11},
+  number={1},
+  pages={66},
+  year={2024},
+  publisher={Springer}
+}
+
+@ARTICLE{lin1991divergence,
+  author={Lin, J.},
+  journal={IEEE Transactions on Information Theory},
+  title={Divergence measures based on the Shannon entropy},
+  year={1991},
+  volume={37},
+  number={1},
+  pages={145-151},
+  keywords={Entropy;Probability distribution;Upper bound;Pattern analysis;Signal analysis;Signal processing;Pattern recognition;Taxonomy;Genetics;Computer science},
+  doi={10.1109/18.61115}}
+
+@inproceedings{yoon2019timegan,
+ author = {Yoon, Jinsung and Jarrett, Daniel and van der Schaar, Mihaela},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
+ pages = {},
+ publisher = {Curran Associates, Inc.},
+ title = {Time-series Generative Adversarial Networks},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf},
+ volume = {32},
+ year = {2019}
+}
+
+@inproceedings{10.1145/3490354.3494393,
+author = {Ni, Hao and Szpruch, Lukasz and Sabate-Vidales, Marc and Xiao, Baoren and Wiese, Magnus and Liao, Shujian},
+title = {Sig-wasserstein GANs for time series generation},
+year = {2022},
+isbn = {9781450391481},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3490354.3494393},
+doi = {10.1145/3490354.3494393},
+abstract = {Synthetic data is an emerging technology that can significantly accelerate the development and deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed signature W1 metric. The former are the Logsig-RNN models based on the stochastic differential equations, whereas the latter originates from the universal and principled mathematical features to characterize the measure induced by time series. SigWGAN allows turning computationally challenging GAN min-max problem into supervised learning while generating high fidelity samples. We validate the proposed model on both synthetic data generated by popular quantitative risk models and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-Wasserstein-GANs.git},
+booktitle = {Proceedings of the Second ACM International Conference on AI in Finance},
+articleno = {28},
+numpages = {8},
+keywords = {signatures, neural networks, generative modelling},
+location = {Virtual Event},
+series = {ICAIF '21}
+}
+
+@inproceedings{coletta2023constrained,
+ author = {Coletta, Andrea and Gopalakrishnan, Sriram and Borrajo, Daniel and Vyetrenko, Svitlana},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
+ pages = {61048--61059},
+ publisher = {Curran Associates, Inc.},
+ title = {On the Constrained Time-Series Generation Problem},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/bfb6a69c0d9e2bc596e1cd31f16fcdde-Paper-Conference.pdf},
+ volume = {36},
+ year = {2023}
+}
+
+@article{yang2001interlock,
+title = {Automatic verification of safety interlock systems for industrial processes},
+journal = {Journal of Loss Prevention in the Process Industries},
+volume = {14},
+number = {5},
+pages = {379-386},
+year = {2001},
+issn = {0950-4230},
+doi = {https://doi.org/10.1016/S0950-4230(01)00014-6},
+url = {https://www.sciencedirect.com/science/article/pii/S0950423001000146},
+author = {S.H. Yang and L.S. Tan and C.H. He},
+keywords = {Safety interlock system, Symbolic model checking, Safety verification, Industrial processes},
+abstract = {The safety interlock system (SIS) is one of the most important protective measurements in industrial processes that provide automatic actions to correct an abnormal plant event. This paper considers the use of formal techniques based on symbolic model checking and computation tree logic (CTL) in the specification to automatically verify the SIS for industrial processes. It addresses the problem of modelling industrial processes and presenting the SIS in CTL. It shows how symbolic model checking can be used efficiently in the verification of a SIS. A transferring system for a penicillin process is used as a case study.}
+}
+
+@article{10.1145/1151659.1159928,
+author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
+title = {Realistic and responsive network traffic generation},
+year = {2006},
+issue_date = {October 2006},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+volume = {36},
+number = {4},
+issn = {0146-4833},
+url = {https://doi.org/10.1145/1151659.1159928},
+doi = {10.1145/1151659.1159928},
+abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
+journal = {SIGCOMM Comput. Commun. Rev.},
+month = aug,
+pages = {111–122},
+numpages = {12},
+keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets}
+}
+
+@inproceedings{nie2023patchtst,
+  title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
+  author={Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant},
+  booktitle={International Conference on Learning Representations (ICLR)},
+  year={2023},
+  url={https://arxiv.org/abs/2211.14730}
+}
--- a/LaTeX2e+Proceedings+Templates+download/samplepaper.tex
+++ b/LaTeX2e+Proceedings+Templates+download/samplepaper.tex
@@ -0,0 +1,173 @@
+% This is samplepaper.tex, a sample chapter demonstrating the
+% LLNCS macro package for Springer Computer Science proceedings;
+% Version 2.21 of 2022/01/12
+%
+\documentclass[runningheads]{llncs}
+%
+\usepackage[T1]{fontenc}
+% T1 fonts will be used to generate the final print and online PDFs,
+% so please use T1 fonts in your manuscript whenever possible.
+% Other font encondings may result in incorrect characters.
+%
+\usepackage{graphicx}
+% Used for displaying a sample figure. If possible, figure files should
+% be included in EPS format.
+%
+% If you use the hyperref package, please uncomment the following two lines
+% to display URLs in blue roman font according to Springer's eBook style:
+%\usepackage{color}
+%\renewcommand\UrlFont{\color{blue}\rmfamily}
+%\urlstyle{rm}
+%
+\begin{document}
+%
+\title{Contribution Title}
+%
+%\titlerunning{Abbreviated paper title}
+% If the paper title is too long for the running head, you can set
+% an abbreviated paper title here
+%
+\author{First Author\inst{1}\orcidID{0000-1111-2222-3333} \and
+Second Author\inst{2,3}\orcidID{1111-2222-3333-4444} \and
+Third Author\inst{3}\orcidID{2222--3333-4444-5555}}
+%
+\authorrunning{F. Author et al.}
+% First names are abbreviated in the running head.
+% If there are more than two authors, 'et al.' is used.
+%
+\institute{Princeton University, Princeton NJ 08544, USA \and
+Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany
+\email{lncs@springer.com}\\
+\url{http://www.springer.com/gp/computer-science/lncs} \and
+ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany\\
+\email{\{abc,lncs\}@uni-heidelberg.de}}
+%
+\maketitle              % typeset the header of the contribution
+%
+\begin{abstract}
+The abstract should briefly summarize the contents of the paper in
+150--250 words.
+
+\keywords{First keyword  \and Second keyword \and Another keyword.}
+\end{abstract}
+%
+%
+%
+\section{First Section}
+\subsection{A Subsection Sample}
+Please note that the first paragraph of a section or subsection is
+not indented. The first paragraph that follows a table, figure,
+equation etc. does not need an indent, either.
+
+Subsequent paragraphs, however, are indented.
+
+\subsubsection{Sample Heading (Third Level)} Only two levels of
+headings should be numbered. Lower level headings remain unnumbered;
+they are formatted as run-in headings.
+
+\paragraph{Sample Heading (Fourth Level)}
+The contribution should contain no more than four levels of
+headings. Table~\ref{tab1} gives a summary of all heading levels.
+
+\begin{table}
+\caption{Table captions should be placed above the
+tables.}\label{tab1}
+\begin{tabular}{|l|l|l|}
+\hline
+Heading level &  Example & Font size and style\\
+\hline
+Title (centered) &  {\Large\bfseries Lecture Notes} & 14 point, bold\\
+1st-level heading &  {\large\bfseries 1 Introduction} & 12 point, bold\\
+2nd-level heading & {\bfseries 2.1 Printing Area} & 10 point, bold\\
+3rd-level heading & {\bfseries Run-in Heading in Bold.} Text follows & 10 point, bold\\
+4th-level heading & {\itshape Lowest Level Heading.} Text follows & 10 point, italic\\
+\hline
+\end{tabular}
+\end{table}
+
+
+\noindent Displayed equations are centered and set on a separate
+line.
+\begin{equation}
+x + y = z
+\end{equation}
+Please try to avoid rasterized images for line-art diagrams and
+schemas. Whenever possible, use vector graphics instead (see
+Fig.~\ref{fig1}).
+
+\begin{figure}
+\includegraphics[width=\textwidth]{fig1.eps}
+\caption{A figure caption is always placed below the illustration.
+Please note that short captions are centered, while long ones are
+justified by the macro package automatically.} \label{fig1}
+\end{figure}
+
+\begin{theorem}
+This is a sample theorem. The run-in heading is set in bold, while
+the following text appears in italics. Definitions, lemmas,
+propositions, and corollaries are styled the same way.
+\end{theorem}
+%
+% the environments 'definition', 'lemma', 'proposition', 'corollary',
+% 'remark', and 'example' are defined in the LLNCS documentclass as well.
+%
+\begin{proof}
+Proofs, examples, and remarks have the initial word in italics,
+while the following text appears in normal font.
+\end{proof}
+For citations of references, we prefer the use of square brackets
+and consecutive numbers. Citations using labels or the author/year
+convention are also acceptable. The following bibliography provides
+a sample reference list with entries for journal
+articles~\cite{ref_article1}, an LNCS chapter~\cite{ref_lncs1}, a
+book~\cite{ref_book1}, proceedings without editors~\cite{ref_proc1},
+and a homepage~\cite{ref_url1}. Multiple citations are grouped
+\cite{ref_article1,ref_lncs1,ref_book1},
+\cite{ref_article1,ref_book1,ref_proc1,ref_url1}.
+
+\begin{credits}
+\subsubsection{\ackname} A bold run-in heading in small font size at the end of the paper is
+used for general acknowledgments, for example: This study was funded
+by X (grant number Y).
+
+\subsubsection{\discintname}
+It is now necessary to declare any competing interests or to specifically
+state that the authors have no competing interests. Please place the
+statement with a bold run-in heading in small font size beneath the
+(optional) acknowledgments\footnote{If EquinOCS, our proceedings submission
+system, is used, then the disclaimer can be provided directly in the system.},
+for example: The authors have no competing interests to declare that are
+relevant to the content of this article. Or: Author A has received research
+grants from Company W. Author B has received a speaker honorarium from
+Company X and owns stock in Company Y. Author C is a member of committee Z.
+\end{credits}
+%
+% ---- Bibliography ----
+%
+% BibTeX users should specify bibliography style 'splncs04'.
+% References will then be sorted and formatted in the correct style.
+%
+% \bibliographystyle{splncs04}
+% \bibliography{mybibliography}
+%
+\begin{thebibliography}{8}
+\bibitem{ref_article1}
+Author, F.: Article title. Journal \textbf{2}(5), 99--110 (2016)
+
+\bibitem{ref_lncs1}
+Author, F., Author, S.: Title of a proceedings paper. In: Editor,
+F., Editor, S. (eds.) CONFERENCE 2016, LNCS, vol. 9999, pp. 1--13.
+Springer, Heidelberg (2016). \doi{10.10007/1234567890}
+
+\bibitem{ref_book1}
+Author, F., Author, S., Author, T.: Book title. 2nd edn. Publisher,
+Location (1999)
+
+\bibitem{ref_proc1}
+Author, A.-B.: Contribution title. In: 9th International Proceedings
+on Proceedings, pp. 1--2. Publisher, Location (2010)
+
+\bibitem{ref_url1}
+LNCS Homepage, \url{http://www.springer.com/lncs}, last accessed 2023/10/25
+\end{thebibliography}
+\end{document}
--- a/LaTeX2e+Proceedings+Templates+download/splncs04.bst
+++ b/LaTeX2e+Proceedings+Templates+download/splncs04.bst
--- a/LaTeX2e+Proceedings+Templates+download/typeclass-cropped.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/typeclass-cropped.pdf
--- a/README.md
+++ b/README.md
@@ -1 +1,12 @@
-## Hi there!
+## File organisation
+Currently, there are 4 parts of docs:
+
+- **API**: contains detailed docs about API designs. (Note that `system_design.md` is a temp draft for discussing high-level design, not for API definitations).
+
+- **knowledges**: contains theorical knowledges/designs for the project (mainly diffusion stuffs).
+
+- **notes4coding**: contains deatiled description for coding, may be useful to coding agents (like codex).
+
+- **papers**: contains useful papers we've found in pdf
+
+Here is the Feishu Docs link: https://my.feishu.cn/wiki/Za4dwCsG6iPD9qklRLWcoJOZnnb?from=from_copylink
--- a/arxiv-style/IEEEtran.cls
+++ b/arxiv-style/IEEEtran.cls
--- a/arxiv-style/README.md
+++ b/arxiv-style/README.md
@@ -0,0 +1,27 @@
+## Files layout
+- `arxiv.sty` and `template.tex`: The arxiv template we are using.
+
+- `equations.tex`: **Duplicated**, contains equations in methodology section
+
+- `main.tex` and `references.bib`:This prints out our paper, currently using arxiv template. Note that references are template independent.
+
+## How to compile
+It's recommanded to use `MiKTeX` as compiler on windows.
+
+To compile latex into pdf, follow these steps:
+
+```bash
+pdflatex ./main.tex
+
+# Build reference DB, run once unless references.bib updated
+bibtex main
+
+# Always running compiling command twice
+pdflatex ./main.tex
+pdflatex ./main.tex
+```
+
+## Troubleshooting
+If you encounter warnings during the compiling process, simply press `Enter`.
+
+If you find the reference in pdf is like `[??]`, compile twice.
--- a/arxiv-style/arxiv.sty
+++ b/arxiv-style/arxiv.sty
@@ -0,0 +1,262 @@
+\NeedsTeXFormat{LaTeX2e}
+
+\ProcessOptions\relax
+
+% fonts
+\renewcommand{\rmdefault}{ptm}
+\renewcommand{\sfdefault}{phv}
+
+% set page geometry
+\usepackage[verbose=true,letterpaper]{geometry}
+\AtBeginDocument{
+  \newgeometry{
+    textheight=9in,
+    textwidth=6.5in,
+    top=1in,
+    headheight=14pt,
+    headsep=25pt,
+    footskip=30pt
+  }
+}
+
+\widowpenalty=10000
+\clubpenalty=10000
+\flushbottom
+\sloppy
+
+
+
+\newcommand{\headeright}{A Preprint}
+\newcommand{\undertitle}{A Preprint}
+\newcommand{\shorttitle}{\@title}
+
+\usepackage{fancyhdr}
+\fancyhf{}
+\pagestyle{fancy}
+\renewcommand{\headrulewidth}{0.4pt}
+\fancyheadoffset{0pt}
+\rhead{\scshape \footnotesize \headeright}
+\chead{\shorttitle}
+\cfoot{\thepage}
+
+
+%Handling Keywords
+\def\keywordname{{\bfseries \emph{Keywords}}}%
+\def\keywords#1{\par\addvspace\medskipamount{\rightskip=0pt plus1cm
+\def\and{\ifhmode\unskip\nobreak\fi\ $\cdot$
+}\noindent\keywordname\enspace\ignorespaces#1\par}}
+
+% font sizes with reduced leading
+\renewcommand{\normalsize}{%
+  \@setfontsize\normalsize\@xpt\@xipt
+  \abovedisplayskip      7\p@ \@plus 2\p@ \@minus 5\p@
+  \abovedisplayshortskip \z@ \@plus 3\p@
+  \belowdisplayskip      \abovedisplayskip
+  \belowdisplayshortskip 4\p@ \@plus 3\p@ \@minus 3\p@
+}
+\normalsize
+\renewcommand{\small}{%
+  \@setfontsize\small\@ixpt\@xpt
+  \abovedisplayskip      6\p@ \@plus 1.5\p@ \@minus 4\p@
+  \abovedisplayshortskip \z@  \@plus 2\p@
+  \belowdisplayskip      \abovedisplayskip
+  \belowdisplayshortskip 3\p@ \@plus 2\p@   \@minus 2\p@
+}
+\renewcommand{\footnotesize}{\@setfontsize\footnotesize\@ixpt\@xpt}
+\renewcommand{\scriptsize}{\@setfontsize\scriptsize\@viipt\@viiipt}
+\renewcommand{\tiny}{\@setfontsize\tiny\@vipt\@viipt}
+\renewcommand{\large}{\@setfontsize\large\@xiipt{14}}
+\renewcommand{\Large}{\@setfontsize\Large\@xivpt{16}}
+\renewcommand{\LARGE}{\@setfontsize\LARGE\@xviipt{20}}
+\renewcommand{\huge}{\@setfontsize\huge\@xxpt{23}}
+\renewcommand{\Huge}{\@setfontsize\Huge\@xxvpt{28}}
+
+% sections with less space
+\providecommand{\section}{}
+\renewcommand{\section}{%
+  \@startsection{section}{1}{\z@}%
+                {-2.0ex \@plus -0.5ex \@minus -0.2ex}%
+                { 1.5ex \@plus  0.3ex \@minus  0.2ex}%
+                {\large\bf\raggedright}%
+}
+\providecommand{\subsection}{}
+\renewcommand{\subsection}{%
+  \@startsection{subsection}{2}{\z@}%
+                {-1.8ex \@plus -0.5ex \@minus -0.2ex}%
+                { 0.8ex \@plus  0.2ex}%
+                {\normalsize\bf\raggedright}%
+}
+\providecommand{\subsubsection}{}
+\renewcommand{\subsubsection}{%
+  \@startsection{subsubsection}{3}{\z@}%
+                {-1.5ex \@plus -0.5ex \@minus -0.2ex}%
+                { 0.5ex \@plus  0.2ex}%
+                {\normalsize\bf\raggedright}%
+}
+\providecommand{\paragraph}{}
+\renewcommand{\paragraph}{%
+  \@startsection{paragraph}{4}{\z@}%
+                {1.5ex \@plus 0.5ex \@minus 0.2ex}%
+                {-1em}%
+                {\normalsize\bf}%
+}
+\providecommand{\subparagraph}{}
+\renewcommand{\subparagraph}{%
+  \@startsection{subparagraph}{5}{\z@}%
+                {1.5ex \@plus 0.5ex \@minus 0.2ex}%
+                {-1em}%
+                {\normalsize\bf}%
+}
+\providecommand{\subsubsubsection}{}
+\renewcommand{\subsubsubsection}{%
+  \vskip5pt{\noindent\normalsize\rm\raggedright}%
+}
+
+% float placement
+\renewcommand{\topfraction      }{0.85}
+\renewcommand{\bottomfraction   }{0.4}
+\renewcommand{\textfraction     }{0.1}
+\renewcommand{\floatpagefraction}{0.7}
+
+\newlength{\@abovecaptionskip}\setlength{\@abovecaptionskip}{7\p@}
+\newlength{\@belowcaptionskip}\setlength{\@belowcaptionskip}{\z@}
+
+\setlength{\abovecaptionskip}{\@abovecaptionskip}
+\setlength{\belowcaptionskip}{\@belowcaptionskip}
+
+% swap above/belowcaptionskip lengths for tables
+\renewenvironment{table}
+  {\setlength{\abovecaptionskip}{\@belowcaptionskip}%
+   \setlength{\belowcaptionskip}{\@abovecaptionskip}%
+   \@float{table}}
+  {\end@float}
+
+% footnote formatting
+\setlength{\footnotesep }{6.65\p@}
+\setlength{\skip\footins}{9\p@ \@plus 4\p@ \@minus 2\p@}
+\renewcommand{\footnoterule}{\kern-3\p@ \hrule width 12pc \kern 2.6\p@}
+\setcounter{footnote}{0}
+
+% paragraph formatting
+\setlength{\parindent}{\z@}
+\setlength{\parskip  }{5.5\p@}
+
+% list formatting
+\setlength{\topsep       }{4\p@ \@plus 1\p@   \@minus 2\p@}
+\setlength{\partopsep    }{1\p@ \@plus 0.5\p@ \@minus 0.5\p@}
+\setlength{\itemsep      }{2\p@ \@plus 1\p@   \@minus 0.5\p@}
+\setlength{\parsep       }{2\p@ \@plus 1\p@   \@minus 0.5\p@}
+\setlength{\leftmargin   }{3pc}
+\setlength{\leftmargini  }{\leftmargin}
+\setlength{\leftmarginii }{2em}
+\setlength{\leftmarginiii}{1.5em}
+\setlength{\leftmarginiv }{1.0em}
+\setlength{\leftmarginv  }{0.5em}
+\def\@listi  {\leftmargin\leftmargini}
+\def\@listii {\leftmargin\leftmarginii
+              \labelwidth\leftmarginii
+              \advance\labelwidth-\labelsep
+              \topsep  2\p@ \@plus 1\p@    \@minus 0.5\p@
+              \parsep  1\p@ \@plus 0.5\p@ \@minus 0.5\p@
+              \itemsep \parsep}
+\def\@listiii{\leftmargin\leftmarginiii
+              \labelwidth\leftmarginiii
+              \advance\labelwidth-\labelsep
+              \topsep    1\p@ \@plus 0.5\p@ \@minus 0.5\p@
+              \parsep    \z@
+              \partopsep 0.5\p@ \@plus 0\p@ \@minus 0.5\p@
+              \itemsep \topsep}
+\def\@listiv {\leftmargin\leftmarginiv
+              \labelwidth\leftmarginiv
+              \advance\labelwidth-\labelsep}
+\def\@listv  {\leftmargin\leftmarginv
+              \labelwidth\leftmarginv
+              \advance\labelwidth-\labelsep}
+\def\@listvi {\leftmargin\leftmarginvi
+              \labelwidth\leftmarginvi
+              \advance\labelwidth-\labelsep}
+
+% create title
+\providecommand{\maketitle}{}
+\renewcommand{\maketitle}{%
+  \par
+  \begingroup
+    \renewcommand{\thefootnote}{\fnsymbol{footnote}}
+    % for perfect author name centering
+    %\renewcommand{\@makefnmark}{\hbox to \z@{$^{\@thefnmark}$\hss}}
+    % The footnote-mark was overlapping the footnote-text,
+    % added the following to fix this problem               (MK)
+    \long\def\@makefntext##1{%
+      \parindent 1em\noindent
+      \hbox to 1.8em{\hss $\m@th ^{\@thefnmark}$}##1
+    }
+    \thispagestyle{empty}
+    \@maketitle
+    \@thanks
+    %\@notice
+  \endgroup
+  \let\maketitle\relax
+  \let\thanks\relax
+}
+
+% rules for title box at top of first page
+\newcommand{\@toptitlebar}{
+  \hrule height 2\p@
+  \vskip 0.25in
+  \vskip -\parskip%
+}
+\newcommand{\@bottomtitlebar}{
+  \vskip 0.29in
+  \vskip -\parskip
+  \hrule height 2\p@
+  \vskip 0.09in%
+}
+
+% create title (includes both anonymized and non-anonymized versions)
+\providecommand{\@maketitle}{}
+\renewcommand{\@maketitle}{%
+  \vbox{%
+    \hsize\textwidth
+    \linewidth\hsize
+    \vskip 0.1in
+    \@toptitlebar
+    \centering
+    {\LARGE\sc \@title\par}
+    \@bottomtitlebar
+    \textsc{\undertitle}\\
+    \vskip 0.1in
+    \def\And{%
+      \end{tabular}\hfil\linebreak[0]\hfil%
+      \begin{tabular}[t]{c}\bf\rule{\z@}{24\p@}\ignorespaces%
+    }
+    \def\AND{%
+      \end{tabular}\hfil\linebreak[4]\hfil%
+      \begin{tabular}[t]{c}\bf\rule{\z@}{24\p@}\ignorespaces%
+    }
+    \begin{tabular}[t]{c}\bf\rule{\z@}{24\p@}\@author\end{tabular}%
+  \vskip 0.4in \@minus 0.1in \center{\@date}   \vskip 0.2in
+  }
+}
+
+% add conference notice to bottom of first page
+\newcommand{\ftype@noticebox}{8}
+\newcommand{\@notice}{%
+  % give a bit of extra room back to authors on first page
+  \enlargethispage{2\baselineskip}%
+  \@float{noticebox}[b]%
+    \footnotesize\@noticestring%
+  \end@float%
+}
+
+% abstract styling
+\renewenvironment{abstract}
+{
+  \centerline
+  {\large \bfseries \scshape Abstract}
+  \begin{quote}
+}
+{
+  \end{quote}
+}
+
+\endinput
--- a/arxiv-style/equations.tex
+++ b/arxiv-style/equations.tex
@@ -0,0 +1,95 @@
+\documentclass[10pt, twocolumn]{article}
+\usepackage{amsmath, amssymb}
+\usepackage{bm}
+\usepackage[margin=1in]{geometry}
+\usepackage{microtype}
+
+\title{Equations: Mask-DDPM Methodology}
+\author{}
+\date{}
+
+\begin{document}
+\maketitle
+
+\section{Problem Formulation}
+Each training instance is a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$.
+
+\section{Transformer Trend Module for Continuous Dynamics}
+We posit an additive decomposition of the continuous signal:
+\begin{equation}
+\bm{X} = \bm{S} + \bm{R},
+\label{eq:additive_decomp}
+\end{equation}
+where $\bm{S} \in \mathbb{R}^{L \times d_c}$ captures the smooth temporal trend and $\bm{R} \in \mathbb{R}^{L \times d_c}$ represents distributional residuals.
+
+The causal Transformer trend extractor $f_{\phi}$ predicts the next-step trend via:
+\begin{equation}
+\hat{\bm{S}}_{t+1} = f_{\phi}(\bm{X}_{1:t}), \quad t = 1, \dots, L-1.
+\label{eq:trend_prediction}
+\end{equation}
+Training minimizes the mean-squared error:
+\begin{equation}
+\mathcal{L}_{\text{trend}}(\phi) = \frac{1}{(L-1)d_c} \sum_{t=1}^{L-1} \bigl\| \hat{\bm{S}}_{t+1} - \bm{X}_{t+1} \bigr\|_2^2.
+\label{eq:trend_loss}
+\end{equation}
+At inference, the residual target is defined as $\bm{R} = \bm{X} - \hat{\bm{S}}$.
+
+\section{DDPM for Continuous Residual Generation}
+Let $K$ denote diffusion steps with noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
+\begin{align}
+q(\bm{r}_k \mid \bm{r}_0) &= \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr), \\
+\bm{r}_k &= \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}),
+\label{eq:forward_process}
+\end{align}
+where $\bm{r}_0 \equiv \bm{R}$.
+
+The reverse process is parameterized as:
+\begin{equation}
+p_{\theta}(\bm{r}_{k-1} \mid \bm{r}_k, \hat{\bm{S}}) = \mathcal{N}\bigl( \boldsymbol{\mu}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}),\; \boldsymbol{\Sigma}(k) \bigr).
+\label{eq:reverse_process}
+\end{equation}
+Training employs the $\epsilon$-prediction objective:
+\begin{equation}
+\mathcal{L}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right].
+\label{eq:ddpm_loss}
+\end{equation}
+Optionally, SNR-based reweighting yields:
+\begin{equation}
+\mathcal{L}^{\text{snr}}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ w_k \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right],
+\label{eq:snr_loss}
+\end{equation}
+where $w_k = \min(\mathrm{SNR}_k, \gamma) / \mathrm{SNR}_k$ and $\mathrm{SNR}_k = \bar{\alpha}_k / (1 - \bar{\alpha}_k)$. The final continuous output is reconstructed as $\hat{\bm{X}} = \hat{\bm{S}} + \hat{\bm{R}}$.
+
+\section{Masked Diffusion for Discrete Variables}
+For discrete channel $j$, the forward masking process follows schedule $\{m_k\}_{k=1}^K$:
+\begin{equation}
+q(y^{(j)}_k \mid y^{(j)}_0) =
+\begin{cases}
+y^{(j)}_0,      & \text{with probability } 1 - m_k, \\
+\texttt{[MASK]}, & \text{with probability } m_k,
+\end{cases}
+\label{eq:masking_process}
+\end{equation}
+applied independently across variables and timesteps.
+
+The denoiser $h_{\psi}$ predicts categorical distributions conditioned on continuous context:
+\begin{equation}
+p_{\psi}\bigl( y^{(j)}_0 \mid y_k, k, \hat{\bm{S}}, \hat{\bm{X}} \bigr) = h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}}).
+\label{eq:discrete_denoising}
+\end{equation}
+Training minimizes the categorical cross-entropy:
+\begin{equation}
+\mathcal{L}_{\text{disc}}(\psi) = \mathbb{E}_{k} \left[ \frac{1}{|\mathcal{M}|} \sum_{(j,t) \in \mathcal{M}} \mathrm{CE}\bigl( h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}})_{j,t},\; y^{(j)}_{0,t} \bigr) \right],
+\label{eq:discrete_loss}
+\end{equation}
+where $\mathcal{M}$ denotes masked positions at step $k$.
+
+\section{Joint Optimization}
+The combined objective balances continuous and discrete learning:
+\begin{equation}
+\mathcal{L} = \lambda \, \mathcal{L}_{\text{cont}} + (1 - \lambda) \, \mathcal{L}_{\text{disc}}, \quad \lambda \in [0,1].
+\label{eq:joint_objective}
+\end{equation}
+Type-aware routing enforces deterministic reconstruction $\hat{x}^{(i)} = g_i(\hat{\bm{X}}, \hat{\bm{Y}})$ for derived variables.
+
+\end{document}
--- a/arxiv-style/fig-benchmark-ablations-v1.png
+++ b/arxiv-style/fig-benchmark-ablations-v1.png
--- a/arxiv-style/fig-benchmark-story-v2.png
+++ b/arxiv-style/fig-benchmark-story-v2.png
--- a/arxiv-style/fig-design-v1.png
+++ b/arxiv-style/fig-design-v1.png
--- a/arxiv-style/fig-design-v2.png
+++ b/arxiv-style/fig-design-v2.png
--- a/arxiv-style/fig-design-v4-from-user-svg-cropped.pdf
+++ b/arxiv-style/fig-design-v4-from-user-svg-cropped.pdf
--- a/arxiv-style/fig-design-v4.png
+++ b/arxiv-style/fig-design-v4.png
--- a/arxiv-style/fig-overall-benchmark-v1.png
+++ b/arxiv-style/fig-overall-benchmark-v1.png
--- a/arxiv-style/fig-scripts/.python-version
+++ b/arxiv-style/fig-scripts/.python-version
@@ -0,0 +1 @@
+3.12
--- a/arxiv-style/fig-scripts/draw_channels.py
+++ b/arxiv-style/fig-scripts/draw_channels.py
@@ -0,0 +1,237 @@
+#!/usr/bin/env python3
+"""
+Draw *separate* SVG figures for:
+  1) Continuous channels  (multiple smooth curves per figure)
+  2) Discrete channels    (multiple step-like/token curves per figure)
+
+Outputs (default):
+  out/continuous_channels.svg
+  out/discrete_channels.svg
+
+Notes:
+- Transparent background (good for draw.io / LaTeX / diagrams).
+- No axes/frames by default (diagram-friendly).
+- Curves are synthetic placeholders; replace `make_*_channels()` with your real data.
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+# ----------------------------
+# Data generators (placeholders)
+# ----------------------------
+
+@dataclass
+class GenParams:
+    seconds: float = 10.0
+    fs: int = 200
+    seed: int = 7
+    n_cont: int = 6          # number of continuous channels (curves)
+    n_disc: int = 5          # number of discrete channels (curves)
+    disc_vocab: int = 8      # token/vocab size for discrete channels
+    disc_change_rate_hz: float = 1.2  # how often discrete tokens change
+
+
+def make_continuous_channels(p: GenParams) -> tuple[np.ndarray, np.ndarray]:
+    """
+    Returns:
+      t: shape (T,)
+      Y: shape (n_cont, T)
+    """
+    rng = np.random.default_rng(p.seed)
+    T = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, T, endpoint=False)
+
+    Y = []
+    for i in range(p.n_cont):
+        # Multi-scale smooth-ish signals
+        f1 = 0.15 + 0.06 * i
+        f2 = 0.8 + 0.15 * (i % 3)
+        phase = rng.uniform(0, 2 * np.pi)
+        y = (
+            0.9 * np.sin(2 * np.pi * f1 * t + phase)
+            + 0.35 * np.sin(2 * np.pi * f2 * t + 1.3 * phase)
+        )
+        # Add mild colored-ish noise by smoothing white noise
+        w = rng.normal(0, 1, size=T)
+        w = np.convolve(w, np.ones(9) / 9.0, mode="same")
+        y = y + 0.15 * w
+
+        # Normalize each channel for consistent visual scale
+        y = (y - np.mean(y)) / (np.std(y) + 1e-9)
+        y = 0.8 * y + 0.15 * i  # vertical offset to separate curves a bit
+        Y.append(y)
+
+    return t, np.vstack(Y)
+
+
+def make_discrete_channels(p: GenParams) -> tuple[np.ndarray, np.ndarray]:
+    """
+    Discrete channels as piecewise-constant token IDs (integers).
+    Returns:
+      t: shape (T,)
+      X: shape (n_disc, T)  (integers in [0, disc_vocab-1])
+    """
+    rng = np.random.default_rng(p.seed + 100)
+    T = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, T, endpoint=False)
+
+    # expected number of changes per channel
+    expected_changes = int(max(1, p.seconds * p.disc_change_rate_hz))
+
+    X = np.zeros((p.n_disc, T), dtype=int)
+    for c in range(p.n_disc):
+        # pick change points
+        k = rng.poisson(expected_changes) + 1
+        change_pts = np.unique(rng.integers(0, T, size=k))
+        change_pts = np.sort(np.concatenate([[0], change_pts, [T]]))
+
+        cur = rng.integers(0, p.disc_vocab)
+        for a, b in zip(change_pts[:-1], change_pts[1:]):
+            # occasional token jump
+            if a != 0:
+                if rng.random() < 0.85:
+                    cur = rng.integers(0, p.disc_vocab)
+            X[c, a:b] = cur
+
+    return t, X
+
+
+# ----------------------------
+# Plotting helpers
+# ----------------------------
+
+def _make_transparent_figure(width_in: float, height_in: float) -> tuple[plt.Figure, plt.Axes]:
+    fig = plt.figure(figsize=(width_in, height_in), dpi=200)
+    fig.patch.set_alpha(0.0)
+    ax = fig.add_axes([0.03, 0.03, 0.94, 0.94])
+    ax.patch.set_alpha(0.0)
+    return fig, ax
+
+
+def save_continuous_channels_svg(
+    t: np.ndarray,
+    Y: np.ndarray,
+    out_path: Path,
+    *,
+    lw: float = 2.0,
+    clean: bool = True,
+) -> None:
+    """
+    Plot multiple continuous curves in one figure and save SVG.
+    Y shape: (n_cont, T)
+    """
+    fig, ax = _make_transparent_figure(width_in=6.0, height_in=2.2)
+
+    # Let matplotlib choose different colors automatically (good defaults).
+    for i in range(Y.shape[0]):
+        ax.plot(t, Y[i], linewidth=lw)
+
+    if clean:
+        ax.set_axis_off()
+    else:
+        ax.set_xlabel("t")
+        ax.set_ylabel("value")
+
+    # Set limits with padding
+    y_all = Y.reshape(-1)
+    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
+    ypad = 0.08 * (ymax - ymin + 1e-9)
+    ax.set_xlim(t[0], t[-1])
+    ax.set_ylim(ymin - ypad, ymax + ypad)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
+    plt.close(fig)
+
+
+def save_discrete_channels_svg(
+    t: np.ndarray,
+    X: np.ndarray,
+    out_path: Path,
+    *,
+    lw: float = 2.0,
+    clean: bool = True,
+    vertical_spacing: float = 1.25,
+) -> None:
+    """
+    Plot multiple discrete (piecewise-constant) curves in one figure and save SVG.
+    X shape: (n_disc, T) integers.
+
+    We draw each channel as a step plot, offset vertically so curves don't overlap.
+    """
+    fig, ax = _make_transparent_figure(width_in=6.0, height_in=2.2)
+
+    for i in range(X.shape[0]):
+        y = X[i].astype(float) + i * vertical_spacing
+        ax.step(t, y, where="post", linewidth=lw)
+
+    if clean:
+        ax.set_axis_off()
+    else:
+        ax.set_xlabel("t")
+        ax.set_ylabel("token id (offset)")
+
+    y_all = (X.astype(float) + np.arange(X.shape[0])[:, None] * vertical_spacing).reshape(-1)
+    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
+    ypad = 0.10 * (ymax - ymin + 1e-9)
+    ax.set_xlim(t[0], t[-1])
+    ax.set_ylim(ymin - ypad, ymax + ypad)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
+    plt.close(fig)
+
+
+# ----------------------------
+# CLI
+# ----------------------------
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--outdir", type=Path, default=Path("out"))
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=10.0)
+    ap.add_argument("--fs", type=int, default=200)
+
+    ap.add_argument("--n-cont", type=int, default=6)
+    ap.add_argument("--n-disc", type=int, default=5)
+    ap.add_argument("--disc-vocab", type=int, default=8)
+    ap.add_argument("--disc-change-rate", type=float, default=1.2)
+
+    ap.add_argument("--keep-axes", action="store_true", help="Show axes/labels (default: off)")
+    args = ap.parse_args()
+
+    p = GenParams(
+        seconds=args.seconds,
+        fs=args.fs,
+        seed=args.seed,
+        n_cont=args.n_cont,
+        n_disc=args.n_disc,
+        disc_vocab=args.disc_vocab,
+        disc_change_rate_hz=args.disc_change_rate,
+    )
+
+    t_c, Y = make_continuous_channels(p)
+    t_d, X = make_discrete_channels(p)
+
+    cont_path = args.outdir / "continuous_channels.svg"
+    disc_path = args.outdir / "discrete_channels.svg"
+
+    save_continuous_channels_svg(t_c, Y, cont_path, clean=not args.keep_axes)
+    save_discrete_channels_svg(t_d, X, disc_path, clean=not args.keep_axes)
+
+    print("Wrote:")
+    print(f"  {cont_path}")
+    print(f"  {disc_path}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/draw_synthetic_ics_optionA.py
+++ b/arxiv-style/fig-scripts/draw_synthetic_ics_optionA.py
@@ -0,0 +1,272 @@
+#!/usr/bin/env python3
+"""
+Option A: "Synthetic ICS Data" mini-panel (high-level features, not packets)
+
+What it draws (one SVG, transparent background):
+- Top: 2–3 continuous feature curves (smooth, time-aligned)
+- Bottom: discrete/categorical feature strip (colored blocks)
+- One vertical dashed alignment line crossing both
+- Optional shaded regime window
+- Optional "real vs synthetic" ghost overlay (faint gray behind one curve)
+
+Usage:
+  uv run python draw_synthetic_ics_optionA.py --out ./assets/synth_ics_optionA.svg
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle
+
+
+@dataclass
+class Params:
+    seed: int = 7
+    seconds: float = 10.0
+    fs: int = 300
+
+    n_curves: int = 3              # continuous channels shown
+    n_bins: int = 40               # discrete blocks across x
+    disc_vocab: int = 8            # number of discrete categories
+
+    # Layout / style
+    width_in: float = 6.0
+    height_in: float = 2.2
+    curve_lw: float = 2.3
+    ghost_lw: float = 2.0          # "real" overlay line width
+    strip_height: float = 0.65     # bar height in [0,1] strip axis
+    strip_gap_frac: float = 0.10   # gap between blocks (fraction of block width)
+
+    # Visual cues
+    show_alignment_line: bool = True
+    align_x_frac: float = 0.58     # where to place dashed line, fraction of timeline
+    show_regime_window: bool = True
+    regime_start_frac: float = 0.30
+    regime_end_frac: float = 0.45
+    show_real_ghost: bool = True   # faint gray "real" behind first synthetic curve
+
+
+def _smooth(x: np.ndarray, win: int) -> np.ndarray:
+    win = max(3, int(win) | 1)  # odd
+    k = np.ones(win, dtype=float)
+    k /= k.sum()
+    return np.convolve(x, k, mode="same")
+
+
+def make_continuous_curves(p: Params) -> tuple[np.ndarray, np.ndarray, np.ndarray | None]:
+    """
+    Returns:
+      t: (T,)
+      Y_syn: (n_curves, T)  synthetic curves
+      y_real: (T,) or None  optional "real" ghost curve (for one channel)
+    """
+    rng = np.random.default_rng(p.seed)
+    T = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, T, endpoint=False)
+
+    Y = []
+    for i in range(p.n_curves):
+        # multi-scale smooth temporal patterns
+        f_slow = 0.09 + 0.03 * (i % 3)
+        f_mid = 0.65 + 0.18 * (i % 4)
+        ph = rng.uniform(0, 2 * np.pi)
+
+        y = (
+            0.95 * np.sin(2 * np.pi * f_slow * t + ph)
+            + 0.30 * np.sin(2 * np.pi * f_mid * t + 0.7 * ph)
+        )
+
+        # regime-like bumps
+        bumps = np.zeros_like(t)
+        for _ in range(2):
+            mu = rng.uniform(0.8, p.seconds - 0.8)
+            sig = rng.uniform(0.35, 0.85)
+            bumps += np.exp(-0.5 * ((t - mu) / (sig + 1e-9)) ** 2)
+        y += 0.55 * bumps
+
+        # mild smooth noise
+        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.04))
+        y += 0.10 * noise
+
+        # normalize for clean presentation
+        y = (y - y.mean()) / (y.std() + 1e-9)
+        y *= 0.42
+        Y.append(y)
+
+    Y_syn = np.vstack(Y)
+
+    # Optional "real" ghost: similar to first curve, but slightly different
+    y_real = None
+    if p.show_real_ghost:
+        base = Y_syn[0].copy()
+        drift = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.18))
+        drift = drift / (np.std(drift) + 1e-9)
+        y_real = base * 0.95 + 0.07 * drift
+
+    return t, Y_syn, y_real
+
+
+def make_discrete_strip(p: Params) -> np.ndarray:
+    """
+    Piecewise-constant categorical IDs across n_bins.
+    Returns:
+      ids: (n_bins,) in [0, disc_vocab-1]
+    """
+    rng = np.random.default_rng(p.seed + 123)
+    n = p.n_bins
+    ids = np.zeros(n, dtype=int)
+
+    cur = rng.integers(0, p.disc_vocab)
+    for i in range(n):
+        # occasional change
+        if i == 0 or rng.random() < 0.28:
+            cur = rng.integers(0, p.disc_vocab)
+        ids[i] = cur
+
+    return ids
+
+
+def _axes_clean(ax: plt.Axes) -> None:
+    """Keep axes lines optional but remove all text/numbers (diagram-friendly)."""
+    ax.set_xlabel("")
+    ax.set_ylabel("")
+    ax.set_title("")
+    ax.set_xticks([])
+    ax.set_yticks([])
+    ax.tick_params(
+        axis="both",
+        which="both",
+        bottom=False,
+        left=False,
+        top=False,
+        right=False,
+        labelbottom=False,
+        labelleft=False,
+    )
+
+
+def draw_optionA(out_path: Path, p: Params) -> None:
+    # Figure
+    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
+    fig.patch.set_alpha(0.0)
+
+    # Two stacked axes (shared x)
+    ax_top = fig.add_axes([0.08, 0.32, 0.90, 0.62])
+    ax_bot = fig.add_axes([0.08, 0.12, 0.90, 0.16], sharex=ax_top)
+    ax_top.patch.set_alpha(0.0)
+    ax_bot.patch.set_alpha(0.0)
+
+    # Generate data
+    t, Y_syn, y_real = make_continuous_curves(p)
+    ids = make_discrete_strip(p)
+
+    x0, x1 = float(t[0]), float(t[-1])
+    span = x1 - x0
+
+    # Optional shaded regime window
+    if p.show_regime_window:
+        rs = x0 + p.regime_start_frac * span
+        re = x0 + p.regime_end_frac * span
+        ax_top.axvspan(rs, re, alpha=0.12)  # default color, semi-transparent
+        ax_bot.axvspan(rs, re, alpha=0.12)
+
+    # Optional vertical dashed alignment line
+    if p.show_alignment_line:
+        vx = x0 + p.align_x_frac * span
+        ax_top.axvline(vx, linestyle="--", linewidth=1.2, alpha=0.7)
+        ax_bot.axvline(vx, linestyle="--", linewidth=1.2, alpha=0.7)
+
+    # Continuous curves (use fixed colors for consistency)
+    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#9467bd"]  # blue, orange, green, purple
+
+    # Ghost "real" behind the first curve (faint gray)
+    if y_real is not None:
+        ax_top.plot(t, y_real, linewidth=p.ghost_lw, color="0.65", alpha=0.55, zorder=1)
+
+    for i in range(Y_syn.shape[0]):
+        ax_top.plot(
+            t, Y_syn[i],
+            linewidth=p.curve_lw,
+            color=curve_colors[i % len(curve_colors)],
+            zorder=2
+        )
+
+    # Set top y-limits with padding
+    ymin, ymax = float(Y_syn.min()), float(Y_syn.max())
+    ypad = 0.10 * (ymax - ymin + 1e-9)
+    ax_top.set_xlim(x0, x1)
+    ax_top.set_ylim(ymin - ypad, ymax + ypad)
+
+    # Discrete strip as colored blocks
+    palette = [
+        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
+        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
+    ]
+
+    n = len(ids)
+    bin_w = span / n
+    gap = p.strip_gap_frac * bin_w
+    ax_bot.set_ylim(0, 1)
+
+    y = (1 - p.strip_height) / 2
+    for i, cat in enumerate(ids):
+        left = x0 + i * bin_w + gap / 2
+        width = bin_w - gap
+        ax_bot.add_patch(
+            Rectangle(
+                (left, y), width, p.strip_height,
+                facecolor=palette[int(cat) % len(palette)],
+                edgecolor="none",
+            )
+        )
+
+    # Clean axes: no ticks/labels; keep spines (axes lines) visible
+    _axes_clean(ax_top)
+    _axes_clean(ax_bot)
+    for ax in (ax_top, ax_bot):
+        for side in ("left", "bottom", "top", "right"):
+            ax.spines[side].set_visible(True)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
+    plt.close(fig)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--out", type=Path, default=Path("synth_ics_optionA.svg"))
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=10.0)
+    ap.add_argument("--fs", type=int, default=300)
+    ap.add_argument("--curves", type=int, default=3)
+    ap.add_argument("--bins", type=int, default=40)
+    ap.add_argument("--vocab", type=int, default=8)
+
+    ap.add_argument("--no-align", action="store_true")
+    ap.add_argument("--no-regime", action="store_true")
+    ap.add_argument("--no-ghost", action="store_true")
+    args = ap.parse_args()
+
+    p = Params(
+        seed=args.seed,
+        seconds=args.seconds,
+        fs=args.fs,
+        n_curves=args.curves,
+        n_bins=args.bins,
+        disc_vocab=args.vocab,
+        show_alignment_line=not args.no_align,
+        show_regime_window=not args.no_regime,
+        show_real_ghost=not args.no_ghost,
+    )
+
+    draw_optionA(args.out, p)
+    print(f"Wrote: {args.out}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/draw_synthetic_ics_optionB.py
+++ b/arxiv-style/fig-scripts/draw_synthetic_ics_optionB.py
@@ -0,0 +1,318 @@
+#!/usr/bin/env python3
+"""
+Option B: "Synthetic ICS Data" as a mini process-story strip (high-level features)
+- ONE SVG, transparent background
+- Two frames by default: "steady/normal" -> "disturbance/recovery"
+- Each frame contains:
+    - Top: multiple continuous feature curves
+    - Bottom: discrete/categorical strip (colored blocks)
+    - A vertical dashed alignment line crossing both
+    - Optional shaded regime window
+- A right-pointing arrow between frames
+
+No text, no numbers (axes lines only). Good for draw.io embedding.
+
+Run:
+  uv run python draw_synthetic_ics_optionB.py --out ./assets/synth_ics_optionB.svg
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle, FancyArrowPatch
+
+
+@dataclass
+class Params:
+    seed: int = 7
+    seconds: float = 8.0
+    fs: int = 250
+
+    # Two-frame story
+    n_frames: int = 2
+
+    # Per-frame visuals
+    n_curves: int = 3
+    n_bins: int = 32
+    disc_vocab: int = 8
+
+    # Layout
+    width_in: float = 8.2
+    height_in: float = 2.4
+    # Relative layout inside the figure
+    margin_left: float = 0.05
+    margin_right: float = 0.05
+    margin_bottom: float = 0.12
+    margin_top: float = 0.10
+    frame_gap: float = 0.08   # gap (figure fraction) between frames (space for arrow)
+
+    # Styling
+    curve_lw: float = 2.1
+    ghost_lw: float = 1.8
+    strip_height: float = 0.65
+    strip_gap_frac: float = 0.12
+
+    # Cues
+    show_alignment_line: bool = True
+    align_x_frac: float = 0.60
+    show_regime_window: bool = True
+    regime_start_frac: float = 0.30
+    regime_end_frac: float = 0.46
+    show_real_ghost: bool = False  # keep default off for cleaner story
+    show_axes_spines: bool = True  # axes lines only (no ticks/labels)
+
+
+# ---------- helpers ----------
+
+def _smooth(x: np.ndarray, win: int) -> np.ndarray:
+    win = max(3, int(win) | 1)
+    k = np.ones(win, dtype=float)
+    k /= k.sum()
+    return np.convolve(x, k, mode="same")
+
+
+def _axes_only(ax: plt.Axes, *, keep_spines: bool) -> None:
+    ax.set_xlabel("")
+    ax.set_ylabel("")
+    ax.set_title("")
+    ax.set_xticks([])
+    ax.set_yticks([])
+    ax.tick_params(
+        axis="both",
+        which="both",
+        bottom=False,
+        left=False,
+        top=False,
+        right=False,
+        labelbottom=False,
+        labelleft=False,
+    )
+    ax.grid(False)
+    if keep_spines:
+        for s in ("left", "right", "top", "bottom"):
+            ax.spines[s].set_visible(True)
+    else:
+        for s in ("left", "right", "top", "bottom"):
+            ax.spines[s].set_visible(False)
+
+
+def make_frame_continuous(seed: int, seconds: float, fs: int, n_curves: int, style: str) -> tuple[np.ndarray, np.ndarray]:
+    """
+    style:
+      - "steady": smoother, smaller bumps
+      - "disturb": larger bumps and more variance
+    """
+    rng = np.random.default_rng(seed)
+    T = int(seconds * fs)
+    t = np.linspace(0, seconds, T, endpoint=False)
+
+    amp_bump = 0.40 if style == "steady" else 0.85
+    amp_noise = 0.09 if style == "steady" else 0.14
+    amp_scale = 0.38 if style == "steady" else 0.46
+
+    base_freqs = [0.10, 0.08, 0.12, 0.09]
+    mid_freqs = [0.65, 0.78, 0.90, 0.72]
+
+    Y = []
+    for i in range(n_curves):
+        f_slow = base_freqs[i % len(base_freqs)]
+        f_mid = mid_freqs[i % len(mid_freqs)]
+        ph = rng.uniform(0, 2 * np.pi)
+
+        y = (
+            0.95 * np.sin(2 * np.pi * f_slow * t + ph)
+            + 0.28 * np.sin(2 * np.pi * f_mid * t + 0.65 * ph)
+        )
+
+        bumps = np.zeros_like(t)
+        n_bumps = 2 if style == "steady" else 3
+        for _ in range(n_bumps):
+            mu = rng.uniform(0.9, seconds - 0.9)
+            sig = rng.uniform(0.35, 0.75) if style == "steady" else rng.uniform(0.20, 0.55)
+            bumps += np.exp(-0.5 * ((t - mu) / (sig + 1e-9)) ** 2)
+        y += amp_bump * bumps
+
+        noise = _smooth(rng.normal(0, 1, size=T), win=int(fs * 0.04))
+        y += amp_noise * noise
+
+        y = (y - y.mean()) / (y.std() + 1e-9)
+        y *= amp_scale
+        Y.append(y)
+
+    return t, np.vstack(Y)
+
+
+def make_frame_discrete(seed: int, n_bins: int, vocab: int, style: str) -> np.ndarray:
+    """
+    style:
+      - "steady": fewer transitions
+      - "disturb": more transitions
+    """
+    rng = np.random.default_rng(seed + 111)
+    ids = np.zeros(n_bins, dtype=int)
+
+    p_change = 0.20 if style == "steady" else 0.38
+    cur = rng.integers(0, vocab)
+    for i in range(n_bins):
+        if i == 0 or rng.random() < p_change:
+            cur = rng.integers(0, vocab)
+        ids[i] = cur
+    return ids
+
+
+def draw_frame(ax_top: plt.Axes, ax_bot: plt.Axes, t: np.ndarray, Y: np.ndarray, ids: np.ndarray, p: Params) -> None:
+    # Optional cues
+    x0, x1 = float(t[0]), float(t[-1])
+    span = x1 - x0
+
+    if p.show_regime_window:
+        rs = x0 + p.regime_start_frac * span
+        re = x0 + p.regime_end_frac * span
+        ax_top.axvspan(rs, re, alpha=0.12)  # default color
+        ax_bot.axvspan(rs, re, alpha=0.12)
+
+    if p.show_alignment_line:
+        vx = x0 + p.align_x_frac * span
+        ax_top.axvline(vx, linestyle="--", linewidth=1.15, alpha=0.7)
+        ax_bot.axvline(vx, linestyle="--", linewidth=1.15, alpha=0.7)
+
+    # Curves
+    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#9467bd"]
+    for i in range(Y.shape[0]):
+        ax_top.plot(t, Y[i], linewidth=p.curve_lw, color=curve_colors[i % len(curve_colors)])
+
+    ymin, ymax = float(Y.min()), float(Y.max())
+    ypad = 0.10 * (ymax - ymin + 1e-9)
+    ax_top.set_xlim(x0, x1)
+    ax_top.set_ylim(ymin - ypad, ymax + ypad)
+
+    # Discrete strip
+    palette = [
+        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
+        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
+    ]
+
+    ax_bot.set_xlim(x0, x1)
+    ax_bot.set_ylim(0, 1)
+
+    n = len(ids)
+    bin_w = span / n
+    gap = p.strip_gap_frac * bin_w
+    y = (1 - p.strip_height) / 2
+
+    for i, cat in enumerate(ids):
+        left = x0 + i * bin_w + gap / 2
+        width = bin_w - gap
+        ax_bot.add_patch(
+            Rectangle((left, y), width, p.strip_height, facecolor=palette[int(cat) % len(palette)], edgecolor="none")
+        )
+
+    # Axes-only style
+    _axes_only(ax_top, keep_spines=p.show_axes_spines)
+    _axes_only(ax_bot, keep_spines=p.show_axes_spines)
+
+
+# ---------- main drawing ----------
+
+def draw_optionB(out_path: Path, p: Params) -> None:
+    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
+    fig.patch.set_alpha(0.0)
+
+    # Compute frame layout in figure coordinates
+    # Each frame has two stacked axes: top curves and bottom strip.
+    usable_w = 1.0 - p.margin_left - p.margin_right
+    usable_h = 1.0 - p.margin_bottom - p.margin_top
+
+    # Leave gap between frames for arrow
+    total_gap = p.frame_gap * (p.n_frames - 1)
+    frame_w = (usable_w - total_gap) / p.n_frames
+
+    # Within each frame: vertical split
+    top_h = usable_h * 0.70
+    bot_h = usable_h * 0.18
+    v_gap = usable_h * 0.06
+    # bottoms
+    bot_y = p.margin_bottom
+    top_y = bot_y + bot_h + v_gap
+
+    axes_pairs = []
+    for f in range(p.n_frames):
+        left = p.margin_left + f * (frame_w + p.frame_gap)
+        ax_top = fig.add_axes([left, top_y, frame_w, top_h])
+        ax_bot = fig.add_axes([left, bot_y, frame_w, bot_h], sharex=ax_top)
+        ax_top.patch.set_alpha(0.0)
+        ax_bot.patch.set_alpha(0.0)
+        axes_pairs.append((ax_top, ax_bot))
+
+    # Data per frame
+    styles = ["steady", "disturb"] if p.n_frames == 2 else ["steady"] * (p.n_frames - 1) + ["disturb"]
+    for idx, ((ax_top, ax_bot), style) in enumerate(zip(axes_pairs, styles)):
+        t, Y = make_frame_continuous(p.seed + 10 * idx, p.seconds, p.fs, p.n_curves, style=style)
+        ids = make_frame_discrete(p.seed + 10 * idx, p.n_bins, p.disc_vocab, style=style)
+        draw_frame(ax_top, ax_bot, t, Y, ids, p)
+
+    # Add a visual arrow between frames (in figure coordinates)
+    if p.n_frames >= 2:
+        for f in range(p.n_frames - 1):
+            # center between frame f and f+1
+            x_left = p.margin_left + f * (frame_w + p.frame_gap) + frame_w
+            x_right = p.margin_left + (f + 1) * (frame_w + p.frame_gap)
+            x_mid = (x_left + x_right) / 2
+            # arrow y in the middle of the frame stack
+            y_mid = bot_y + (bot_h + v_gap + top_h) / 2
+
+            arr = FancyArrowPatch(
+                (x_mid - 0.015, y_mid),
+                (x_mid + 0.015, y_mid),
+                transform=fig.transFigure,
+                arrowstyle="-|>",
+                mutation_scale=18,
+                linewidth=1.6,
+                color="black",
+            )
+            fig.patches.append(arr)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
+    plt.close(fig)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--out", type=Path, default=Path("synth_ics_optionB.svg"))
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=8.0)
+    ap.add_argument("--fs", type=int, default=250)
+    ap.add_argument("--frames", type=int, default=2, choices=[2, 3], help="2 or 3 frames (story strip)")
+    ap.add_argument("--curves", type=int, default=3)
+    ap.add_argument("--bins", type=int, default=32)
+    ap.add_argument("--vocab", type=int, default=8)
+    ap.add_argument("--no-align", action="store_true")
+    ap.add_argument("--no-regime", action="store_true")
+    ap.add_argument("--no-spines", action="store_true")
+    args = ap.parse_args()
+
+    p = Params(
+        seed=args.seed,
+        seconds=args.seconds,
+        fs=args.fs,
+        n_frames=args.frames,
+        n_curves=args.curves,
+        n_bins=args.bins,
+        disc_vocab=args.vocab,
+        show_alignment_line=not args.no_align,
+        show_regime_window=not args.no_regime,
+        show_axes_spines=not args.no_spines,
+    )
+
+    draw_optionB(args.out, p)
+    print(f"Wrote: {args.out}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/draw_transformer_lower_half.py
+++ b/arxiv-style/fig-scripts/draw_transformer_lower_half.py
@@ -0,0 +1,201 @@
+#!/usr/bin/env python3
+"""
+Draw the *Transformer section* lower-half visuals:
+- Continuous channels: multiple smooth curves (like the colored trend lines)
+- Discrete channels: small colored bars/ticks along the bottom
+
+Output: ONE SVG with transparent background, axes hidden.
+
+Run:
+  uv run python draw_transformer_lower_half.py --out ./assets/transformer_lower_half.svg
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle
+
+
+@dataclass
+class Params:
+    seed: int = 7
+    seconds: float = 10.0
+    fs: int = 300
+
+    # Continuous channels
+    n_curves: int = 3
+    curve_lw: float = 2.4
+
+    # Discrete bars
+    n_bins: int = 40          # number of discrete bars/ticks across time
+    bar_height: float = 0.11  # relative height inside bar strip axis
+    bar_gap: float = 0.08     # gap between bars (fraction of bar width)
+
+    # Canvas sizing
+    width_in: float = 5.8
+    height_in: float = 1.9
+
+
+def _smooth(x: np.ndarray, win: int) -> np.ndarray:
+    win = max(3, int(win) | 1)  # odd
+    k = np.ones(win, dtype=float)
+    k /= k.sum()
+    return np.convolve(x, k, mode="same")
+
+
+def make_continuous_curves(p: Params) -> tuple[np.ndarray, np.ndarray]:
+    """
+    Produce 3 smooth curves with gentle long-term temporal patterning.
+    Returns:
+      t: (T,)
+      Y: (n_curves, T)
+    """
+    rng = np.random.default_rng(p.seed)
+    T = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, T, endpoint=False)
+
+    Y = []
+    base_freqs = [0.12, 0.09, 0.15]
+    mid_freqs = [0.65, 0.85, 0.75]
+
+    for i in range(p.n_curves):
+        f1 = base_freqs[i % len(base_freqs)]
+        f2 = mid_freqs[i % len(mid_freqs)]
+        ph = rng.uniform(0, 2 * np.pi)
+
+        # Smooth trend + mid wiggle
+        y = (
+            1.00 * np.sin(2 * np.pi * f1 * t + ph)
+            + 0.35 * np.sin(2 * np.pi * f2 * t + 0.7 * ph)
+        )
+
+        # Add a couple of smooth bumps (like slow pattern changes)
+        bumps = np.zeros_like(t)
+        for _ in range(2):
+            mu = rng.uniform(0.8, p.seconds - 0.8)
+            sig = rng.uniform(0.35, 0.75)
+            bumps += np.exp(-0.5 * ((t - mu) / sig) ** 2)
+        y += 0.55 * bumps
+
+        # Mild smooth noise
+        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.04))
+        y += 0.12 * noise
+
+        # Normalize and compress amplitude to fit nicely
+        y = (y - y.mean()) / (y.std() + 1e-9)
+        y *= 0.42
+
+        Y.append(y)
+
+    return t, np.vstack(Y)
+
+
+def make_discrete_bars(p: Params) -> np.ndarray:
+    """
+    Generate discrete "token-like" bars across time bins.
+    Returns:
+      ids: (n_bins,) integer category ids
+    """
+    rng = np.random.default_rng(p.seed + 123)
+    n = p.n_bins
+
+    # A piecewise-constant sequence with occasional changes (looks like discrete channel)
+    ids = np.zeros(n, dtype=int)
+    cur = rng.integers(0, 8)
+    for i in range(n):
+        if i == 0 or rng.random() < 0.25:
+            cur = rng.integers(0, 8)
+        ids[i] = cur
+    return ids
+
+
+def draw_transformer_lower_half_svg(out_path: Path, p: Params) -> None:
+    # --- Figure + transparent background ---
+    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
+    fig.patch.set_alpha(0.0)
+
+    # Two stacked axes: curves (top), bars (bottom)
+    # Tight, diagram-style layout
+    ax_curves = fig.add_axes([0.06, 0.28, 0.90, 0.68])  # [left, bottom, width, height]
+    ax_bars = fig.add_axes([0.06, 0.10, 0.90, 0.14])
+
+    ax_curves.patch.set_alpha(0.0)
+    ax_bars.patch.set_alpha(0.0)
+
+    for ax in (ax_curves, ax_bars):
+        ax.set_axis_off()
+
+    # --- Data ---
+    t, Y = make_continuous_curves(p)
+    ids = make_discrete_bars(p)
+
+    # --- Continuous curves (explicit colors to match the “multi-colored” look) ---
+    # Feel free to swap these hex colors to match your figure theme.
+    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c"]  # blue / orange / green
+
+    for i in range(Y.shape[0]):
+        ax_curves.plot(t, Y[i], linewidth=p.curve_lw, color=curve_colors[i % len(curve_colors)])
+
+    # Set curve bounds with padding (keeps it clean)
+    ymin, ymax = float(Y.min()), float(Y.max())
+    pad = 0.10 * (ymax - ymin + 1e-9)
+    ax_curves.set_xlim(t[0], t[-1])
+    ax_curves.set_ylim(ymin - pad, ymax + pad)
+
+    # --- Discrete bars: small colored rectangles along the timeline ---
+    # A small palette for categories (repeats if more categories appear)
+    bar_palette = [
+        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
+        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
+    ]
+
+    # Convert bins into time spans
+    n = len(ids)
+    x0, x1 = t[0], t[-1]
+    total = x1 - x0
+    bin_w = total / n
+    gap = p.bar_gap * bin_w
+
+    # Draw bars in [0,1] y-space inside ax_bars
+    ax_bars.set_xlim(x0, x1)
+    ax_bars.set_ylim(0, 1)
+
+    for i, cat in enumerate(ids):
+        left = x0 + i * bin_w + gap / 2
+        width = bin_w - gap
+        color = bar_palette[int(cat) % len(bar_palette)]
+        rect = Rectangle(
+            (left, (1 - p.bar_height) / 2),
+            width,
+            p.bar_height,
+            facecolor=color,
+            edgecolor="none",
+        )
+        ax_bars.add_patch(rect)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
+    plt.close(fig)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--out", type=Path, default=Path("transformer_lower_half.svg"))
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=10.0)
+    ap.add_argument("--fs", type=int, default=300)
+    ap.add_argument("--bins", type=int, default=40)
+    args = ap.parse_args()
+
+    p = Params(seed=args.seed, seconds=args.seconds, fs=args.fs, n_bins=args.bins)
+    draw_transformer_lower_half_svg(args.out, p)
+    print(f"Wrote: {args.out}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/draw_transformer_lower_half_axes.py
+++ b/arxiv-style/fig-scripts/draw_transformer_lower_half_axes.py
@@ -0,0 +1,202 @@
+#!/usr/bin/env python3
+"""
+Transformer section lower-half visuals WITH AXES ONLY:
+- Axes spines visible
+- NO numbers (tick labels hidden)
+- NO words (axis labels removed)
+- Transparent background
+- One SVG output
+
+Run:
+  uv run python draw_transformer_lower_half_axes_only.py --out ./assets/transformer_lower_half_axes_only.svg
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle
+
+
+@dataclass
+class Params:
+    seed: int = 7
+    seconds: float = 10.0
+    fs: int = 300
+
+    # Continuous channels
+    n_curves: int = 3
+    curve_lw: float = 2.4
+
+    # Discrete bars
+    n_bins: int = 40
+    bar_height: float = 0.55   # fraction of the discrete-axis y-range
+    bar_gap: float = 0.08      # fraction of bar width
+
+    # Figure size
+    width_in: float = 6.6
+    height_in: float = 2.6
+
+
+def _smooth(x: np.ndarray, win: int) -> np.ndarray:
+    win = max(3, int(win) | 1)  # odd
+    k = np.ones(win, dtype=float)
+    k /= k.sum()
+    return np.convolve(x, k, mode="same")
+
+
+def make_continuous_curves(p: Params) -> tuple[np.ndarray, np.ndarray]:
+    rng = np.random.default_rng(p.seed)
+    T = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, T, endpoint=False)
+
+    Y = []
+    base_freqs = [0.12, 0.09, 0.15]
+    mid_freqs = [0.65, 0.85, 0.75]
+
+    for i in range(p.n_curves):
+        f1 = base_freqs[i % len(base_freqs)]
+        f2 = mid_freqs[i % len(mid_freqs)]
+        ph = rng.uniform(0, 2 * np.pi)
+
+        y = (
+            1.00 * np.sin(2 * np.pi * f1 * t + ph)
+            + 0.35 * np.sin(2 * np.pi * f2 * t + 0.7 * ph)
+        )
+
+        bumps = np.zeros_like(t)
+        for _ in range(2):
+            mu = rng.uniform(0.8, p.seconds - 0.8)
+            sig = rng.uniform(0.35, 0.75)
+            bumps += np.exp(-0.5 * ((t - mu) / sig) ** 2)
+        y += 0.55 * bumps
+
+        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.04))
+        y += 0.12 * noise
+
+        y = (y - y.mean()) / (y.std() + 1e-9)
+        y *= 0.42
+        Y.append(y)
+
+    return t, np.vstack(Y)
+
+
+def make_discrete_bars(p: Params) -> np.ndarray:
+    rng = np.random.default_rng(p.seed + 123)
+    n = p.n_bins
+
+    ids = np.zeros(n, dtype=int)
+    cur = rng.integers(0, 8)
+    for i in range(n):
+        if i == 0 or rng.random() < 0.25:
+            cur = rng.integers(0, 8)
+        ids[i] = cur
+    return ids
+
+
+def _axes_only(ax: plt.Axes) -> None:
+    """Keep spines (axes lines), remove all ticks/labels/words."""
+    # No labels
+    ax.set_xlabel("")
+    ax.set_ylabel("")
+    ax.set_title("")
+
+    # Keep spines as the only axes element
+    for side in ("top", "right", "bottom", "left"):
+        ax.spines[side].set_visible(True)
+
+    # Remove tick marks and tick labels entirely
+    ax.set_xticks([])
+    ax.set_yticks([])
+    ax.tick_params(
+        axis="both",
+        which="both",
+        bottom=False,
+        left=False,
+        top=False,
+        right=False,
+        labelbottom=False,
+        labelleft=False,
+    )
+
+    # No grid
+    ax.grid(False)
+
+
+def draw_transformer_lower_half_svg(out_path: Path, p: Params) -> None:
+    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
+    fig.patch.set_alpha(0.0)
+
+    # Two axes sharing x (top curves, bottom bars)
+    ax_curves = fig.add_axes([0.10, 0.38, 0.86, 0.56])
+    ax_bars = fig.add_axes([0.10, 0.14, 0.86, 0.18], sharex=ax_curves)
+
+    ax_curves.patch.set_alpha(0.0)
+    ax_bars.patch.set_alpha(0.0)
+
+    # Data
+    t, Y = make_continuous_curves(p)
+    ids = make_discrete_bars(p)
+
+    # Top: continuous curves
+    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c"]  # blue / orange / green
+    for i in range(Y.shape[0]):
+        ax_curves.plot(t, Y[i], linewidth=p.curve_lw, color=curve_colors[i % len(curve_colors)])
+
+    ymin, ymax = float(Y.min()), float(Y.max())
+    ypad = 0.10 * (ymax - ymin + 1e-9)
+    ax_curves.set_xlim(t[0], t[-1])
+    ax_curves.set_ylim(ymin - ypad, ymax + ypad)
+
+    # Bottom: discrete bars (colored strip)
+    bar_palette = [
+        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
+        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
+    ]
+
+    x0, x1 = t[0], t[-1]
+    total = x1 - x0
+    n = len(ids)
+    bin_w = total / n
+    gap = p.bar_gap * bin_w
+
+    ax_bars.set_xlim(x0, x1)
+    ax_bars.set_ylim(0, 1)
+
+    bar_y = (1 - p.bar_height) / 2
+    for i, cat in enumerate(ids):
+        left = x0 + i * bin_w + gap / 2
+        width = bin_w - gap
+        color = bar_palette[int(cat) % len(bar_palette)]
+        ax_bars.add_patch(Rectangle((left, bar_y), width, p.bar_height, facecolor=color, edgecolor="none"))
+
+    # Apply "axes only" styling (no numbers/words)
+    _axes_only(ax_curves)
+    _axes_only(ax_bars)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
+    plt.close(fig)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--out", type=Path, default=Path("transformer_lower_half_axes_only.svg"))
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=10.0)
+    ap.add_argument("--fs", type=int, default=300)
+    ap.add_argument("--bins", type=int, default=40)
+    ap.add_argument("--curves", type=int, default=3)
+    args = ap.parse_args()
+
+    p = Params(seed=args.seed, seconds=args.seconds, fs=args.fs, n_bins=args.bins, n_curves=args.curves)
+    draw_transformer_lower_half_svg(args.out, p)
+    print(f"Wrote: {args.out}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/gen_noise_ddmp.py
+++ b/arxiv-style/fig-scripts/gen_noise_ddmp.py
@@ -0,0 +1,161 @@
+#!/usr/bin/env python3
+"""
+Generate "Noisy Residual" and "Denoised Residual" curves as SVGs.
+
+- Produces TWO separate SVG files:
+    noisy_residual.svg
+    denoised_residual.svg
+
+- Curves are synthetic but shaped like residual noise + denoised residual.
+- Uses only matplotlib + numpy.
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+@dataclass
+class CurveParams:
+    seconds: float = 12.0          # length of the signal
+    fs: int = 250                  # samples per second
+    seed: int = 7                  # RNG seed for reproducibility
+    base_amp: float = 0.12         # smooth baseline amplitude
+    noise_amp: float = 0.55        # high-frequency noise amplitude
+    burst_amp: float = 1.2         # occasional spike amplitude
+    burst_rate_hz: float = 0.35    # average spike frequency
+    denoise_smooth_ms: float = 120 # smoothing window for "denoised" (ms)
+
+
+def gaussian_smooth(x: np.ndarray, sigma_samples: float) -> np.ndarray:
+    """Gaussian smoothing using explicit kernel convolution (no SciPy dependency)."""
+    if sigma_samples <= 0:
+        return x.copy()
+
+    radius = int(np.ceil(4 * sigma_samples))
+    k = np.arange(-radius, radius + 1, dtype=float)
+    kernel = np.exp(-(k**2) / (2 * sigma_samples**2))
+    kernel /= kernel.sum()
+    return np.convolve(x, kernel, mode="same")
+
+
+def make_residual(params: CurveParams) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    """
+    Create synthetic residual:
+    - baseline: smooth wavy trend + slight drift
+    - noise: band-limited-ish high-frequency noise
+    - bursts: sparse spikes / impulse-like events
+    Returns: (t, noisy, denoised)
+    """
+    rng = np.random.default_rng(params.seed)
+    n = int(params.seconds * params.fs)
+    t = np.linspace(0, params.seconds, n, endpoint=False)
+
+    # Smooth baseline (small): combination of sinusoids + small random drift
+    baseline = (
+        0.7 * np.sin(2 * np.pi * 0.35 * t + 0.2)
+        + 0.35 * np.sin(2 * np.pi * 0.9 * t + 1.2)
+        + 0.25 * np.sin(2 * np.pi * 0.15 * t + 2.0)
+    )
+    baseline *= params.base_amp
+    drift = np.cumsum(rng.normal(0, 1, size=n))
+    drift = drift / (np.max(np.abs(drift)) + 1e-9) * (params.base_amp * 0.25)
+    baseline = baseline + drift
+
+    # High-frequency noise: whitened then lightly smoothed to look "oscillatory"
+    raw = rng.normal(0, 1, size=n)
+    hf = raw - gaussian_smooth(raw, sigma_samples=params.fs * 0.03)  # remove slow part
+    hf = hf / (np.std(hf) + 1e-9)
+    hf *= params.noise_amp
+
+    # Bursts/spikes: Poisson process impulses convolved with short kernel
+    expected_bursts = params.burst_rate_hz * params.seconds
+    k_bursts = rng.poisson(expected_bursts)
+    impulses = np.zeros(n)
+    if k_bursts > 0:
+        idx = rng.integers(0, n, size=k_bursts)
+        impulses[idx] = rng.normal(loc=1.0, scale=0.4, size=k_bursts)
+
+    # Shape impulses into spikes (asymmetric bump)
+    spike_kernel_len = int(params.fs * 0.06)  # ~60ms
+    spike_kernel_len = max(spike_kernel_len, 7)
+    spike_t = np.arange(spike_kernel_len)
+    spike_kernel = np.exp(-spike_t / (params.fs * 0.012))  # fast decay
+    spike_kernel *= np.hanning(spike_kernel_len)  # taper
+    spike_kernel /= (spike_kernel.max() + 1e-9)
+
+    bursts = np.convolve(impulses, spike_kernel, mode="same")
+    bursts *= params.burst_amp
+
+    noisy = baseline + hf + bursts
+
+    # "Denoised": remove high-frequency using Gaussian smoothing,
+    # but keep spike structures partially.
+    smooth_sigma = (params.denoise_smooth_ms / 1000.0) * params.fs / 3.0
+    denoised = gaussian_smooth(noisy, sigma_samples=smooth_sigma)
+
+    return t, noisy, denoised
+
+
+def save_curve_svg(
+    t: np.ndarray,
+    y: np.ndarray,
+    out_path: Path,
+    *,
+    width_in: float = 5.4,
+    height_in: float = 1.6,
+    lw: float = 2.2,
+    pad: float = 0.03,
+) -> None:
+    """
+    Save a clean, figure-only SVG suitable for embedding in diagrams.
+    - No axes, ticks, labels.
+    - Tight bounding box.
+    """
+    fig = plt.figure(figsize=(width_in, height_in), dpi=200)
+    ax = fig.add_axes([pad, pad, 1 - 2 * pad, 1 - 2 * pad])
+
+    ax.plot(t, y, linewidth=lw)
+
+    # Make it "icon-like" for diagrams: no axes or frames
+    ax.set_axis_off()
+
+    # Ensure bounds include a little padding
+    ymin, ymax = np.min(y), np.max(y)
+    ypad = 0.08 * (ymax - ymin + 1e-9)
+    ax.set_xlim(t[0], t[-1])
+    ax.set_ylim(ymin - ypad, ymax + ypad)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out_path, format="svg", bbox_inches="tight", pad_inches=0.0)
+    plt.close(fig)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--outdir", type=Path, default=Path("."), help="Output directory")
+    ap.add_argument("--seed", type=int, default=7, help="RNG seed")
+    ap.add_argument("--seconds", type=float, default=12.0, help="Signal length (s)")
+    ap.add_argument("--fs", type=int, default=250, help="Sampling rate (Hz)")
+    ap.add_argument("--prefix", type=str, default="", help="Filename prefix (optional)")
+    args = ap.parse_args()
+
+    params = CurveParams(seconds=args.seconds, fs=args.fs, seed=args.seed)
+    t, noisy, denoised = make_residual(params)
+
+    noisy_path = args.outdir / f"{args.prefix}noisy_residual.svg"
+    den_path = args.outdir / f"{args.prefix}denoised_residual.svg"
+
+    save_curve_svg(t, noisy, noisy_path)
+    save_curve_svg(t, denoised, den_path)
+
+    print(f"Wrote:\n  {noisy_path}\n  {den_path}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/make_ddpm_like_svg.py
+++ b/arxiv-style/fig-scripts/make_ddpm_like_svg.py
@@ -0,0 +1,188 @@
+#!/usr/bin/env python3
+"""
+DDPM-like residual curve SVGs (separate files, fixed colors):
+- noisy_residual.svg    (blue)
+- denoised_residual.svg (purple)
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+@dataclass
+class DDPMStyleParams:
+    seconds: float = 12.0
+    fs: int = 250
+    seed: int = 7
+
+    baseline_amp: float = 0.10
+    mid_wiggle_amp: float = 0.18
+    colored_noise_amp: float = 0.65
+    colored_alpha: float = 1.0
+
+    burst_rate_hz: float = 0.30
+    burst_amp: float = 0.9
+    burst_width_ms: float = 55
+
+    denoise_sigmas_ms: tuple[float, ...] = (25, 60, 140)
+    denoise_weights: tuple[float, ...] = (0.25, 0.35, 0.40)
+    denoise_texture_keep: float = 0.10
+
+
+def gaussian_smooth(x: np.ndarray, sigma_samples: float) -> np.ndarray:
+    if sigma_samples <= 0:
+        return x.copy()
+    radius = int(np.ceil(4 * sigma_samples))
+    k = np.arange(-radius, radius + 1, dtype=float)
+    kernel = np.exp(-(k**2) / (2 * sigma_samples**2))
+    kernel /= kernel.sum()
+    return np.convolve(x, kernel, mode="same")
+
+
+def colored_noise_1_f(n: int, rng: np.random.Generator, alpha: float) -> np.ndarray:
+    white = rng.normal(0, 1, size=n)
+    spec = np.fft.rfft(white)
+
+    freqs = np.fft.rfftfreq(n, d=1.0)
+    scale = np.ones_like(freqs)
+    nonzero = freqs > 0
+    scale[nonzero] = 1.0 / (freqs[nonzero] ** (alpha / 2.0))
+
+    spec *= scale
+    x = np.fft.irfft(spec, n=n)
+
+    x = x - np.mean(x)
+    x = x / (np.std(x) + 1e-9)
+    return x
+
+
+def make_ddpm_like_residual(p: DDPMStyleParams) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    rng = np.random.default_rng(p.seed)
+    n = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, n, endpoint=False)
+
+    baseline = (
+        0.8 * np.sin(2 * np.pi * 0.18 * t + 0.4)
+        + 0.35 * np.sin(2 * np.pi * 0.06 * t + 2.2)
+    ) * p.baseline_amp
+
+    mid = (
+        0.9 * np.sin(2 * np.pi * 0.9 * t + 1.1)
+        + 0.5 * np.sin(2 * np.pi * 1.6 * t + 0.2)
+        + 0.3 * np.sin(2 * np.pi * 2.4 * t + 2.6)
+    ) * p.mid_wiggle_amp
+
+    col = colored_noise_1_f(n, rng, alpha=p.colored_alpha) * p.colored_noise_amp
+
+    expected = p.burst_rate_hz * p.seconds
+    k = rng.poisson(expected)
+    impulses = np.zeros(n)
+    if k > 0:
+        idx = rng.integers(0, n, size=k)
+        impulses[idx] = rng.normal(loc=1.0, scale=0.35, size=k)
+
+    width = max(int(p.fs * (p.burst_width_ms / 1000.0)), 7)
+    u = np.arange(width)
+    kernel = np.exp(-u / (p.fs * 0.012)) * np.hanning(width)
+    kernel /= (kernel.max() + 1e-9)
+    bursts = np.convolve(impulses, kernel, mode="same") * p.burst_amp
+
+    noisy = baseline + mid + col + bursts
+
+    sigmas_samples = [(ms / 1000.0) * p.fs / 3.0 for ms in p.denoise_sigmas_ms]
+    smooths = [gaussian_smooth(noisy, s) for s in sigmas_samples]
+
+    den_base = np.zeros_like(noisy)
+    for w, sm in zip(p.denoise_weights, smooths):
+        den_base += w * sm
+
+    hf = noisy - gaussian_smooth(noisy, sigma_samples=p.fs * 0.03)
+    denoised = den_base + p.denoise_texture_keep * (hf / (np.std(hf) + 1e-9)) * (0.10 * np.std(den_base))
+
+    return t, noisy, denoised
+
+
+def save_single_curve_svg(
+    t: np.ndarray,
+    y: np.ndarray,
+    out_path: Path,
+    *,
+    color: str,
+    lw: float = 2.2,
+) -> None:
+    fig = plt.figure(figsize=(5.4, 1.6), dpi=200)
+
+    # Make figure background transparent
+    fig.patch.set_alpha(0.0)
+
+    ax = fig.add_axes([0.03, 0.03, 0.94, 0.94])
+
+    # Make axes background transparent
+    ax.patch.set_alpha(0.0)
+
+    ax.plot(t, y, linewidth=lw, color=color)
+
+    # clean, diagram-friendly
+    ax.set_axis_off()
+    ymin, ymax = np.min(y), np.max(y)
+    ypad = 0.08 * (ymax - ymin + 1e-9)
+    ax.set_xlim(t[0], t[-1])
+    ax.set_ylim(ymin - ypad, ymax + ypad)
+
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(
+        out_path,
+        format="svg",
+        bbox_inches="tight",
+        pad_inches=0.0,
+        transparent=True,   # <-- key for transparent output
+    )
+    plt.close(fig)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--outdir", type=Path, default=Path("."))
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=12.0)
+    ap.add_argument("--fs", type=int, default=250)
+
+    ap.add_argument("--alpha", type=float, default=1.0)
+    ap.add_argument("--noise-amp", type=float, default=0.65)
+    ap.add_argument("--texture-keep", type=float, default=0.10)
+
+    ap.add_argument("--prefix", type=str, default="")
+    args = ap.parse_args()
+
+    p = DDPMStyleParams(
+        seconds=args.seconds,
+        fs=args.fs,
+        seed=args.seed,
+        colored_alpha=args.alpha,
+        colored_noise_amp=args.noise_amp,
+        denoise_texture_keep=args.texture_keep,
+    )
+
+    t, noisy, den = make_ddpm_like_residual(p)
+
+    outdir = args.outdir
+    noisy_path = outdir / f"{args.prefix}noisy_residual.svg"
+    den_path = outdir / f"{args.prefix}denoised_residual.svg"
+
+    # Fixed colors as you requested
+    save_single_curve_svg(t, noisy, noisy_path, color="blue")
+    save_single_curve_svg(t, den, den_path, color="purple")
+
+    print("Wrote:")
+    print(f"  {noisy_path}")
+    print(f"  {den_path}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/pyproject.toml
+++ b/arxiv-style/fig-scripts/pyproject.toml
@@ -0,0 +1,10 @@
+[project]
+name = "fig-gen-ddpm"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+  "numpy>=1.26",
+  "matplotlib>=3.8",
+]
--- a/arxiv-style/fig-scripts/synth_ics_3d_waterfall.py
+++ b/arxiv-style/fig-scripts/synth_ics_3d_waterfall.py
@@ -0,0 +1,240 @@
+#!/usr/bin/env python3
+"""
+3D "final combined outcome" (time × channel × value) with:
+- NO numbers on axes (tick labels removed)
+- Axis *titles* kept (texts are okay)
+- Reduced whitespace: tight bbox + minimal margins
+- White background (non-transparent) suitable for embedding into another SVG
+
+Output:
+  default PNG, optional SVG (2D projected vectors)
+
+Run:
+  uv run python synth_ics_3d_waterfall_tight.py --out ./assets/synth_ics_3d.png
+  uv run python synth_ics_3d_waterfall_tight.py --out ./assets/synth_ics_3d.svg --format svg
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+@dataclass
+class Params:
+    seed: int = 7
+    seconds: float = 10.0
+    fs: int = 220
+
+    n_cont: int = 5
+    n_disc: int = 2
+    disc_vocab: int = 8
+    disc_change_rate_hz: float = 1.1
+
+    # view
+    elev: float = 25.0
+    azim: float = -58.0
+
+    # figure size (smaller, more "cube-like")
+    fig_w: float = 5.4
+    fig_h: float = 5.0
+
+    # discrete rendering
+    disc_z_scale: float = 0.45
+    disc_z_offset: float = -1.4
+
+    # margins (figure fraction)
+    left: float = 0.03
+    right: float = 0.99
+    bottom: float = 0.03
+    top: float = 0.99
+
+
+def _smooth(x: np.ndarray, win: int) -> np.ndarray:
+    win = max(3, int(win) | 1)
+    k = np.ones(win, dtype=float)
+    k /= k.sum()
+    return np.convolve(x, k, mode="same")
+
+
+def make_continuous(p: Params) -> tuple[np.ndarray, np.ndarray]:
+    rng = np.random.default_rng(p.seed)
+    T = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, T, endpoint=False)
+
+    Y = []
+    base_freqs = [0.08, 0.10, 0.12, 0.09, 0.11]
+    mid_freqs = [0.55, 0.70, 0.85, 0.62, 0.78]
+
+    for i in range(p.n_cont):
+        f1 = base_freqs[i % len(base_freqs)]
+        f2 = mid_freqs[i % len(mid_freqs)]
+        ph = rng.uniform(0, 2 * np.pi)
+
+        y = (
+            0.95 * np.sin(2 * np.pi * f1 * t + ph)
+            + 0.28 * np.sin(2 * np.pi * f2 * t + 0.65 * ph)
+        )
+
+        bumps = np.zeros_like(t)
+        for _ in range(rng.integers(2, 4)):
+            mu = rng.uniform(0.8, p.seconds - 0.8)
+            sig = rng.uniform(0.25, 0.80)
+            bumps += np.exp(-0.5 * ((t - mu) / (sig + 1e-9)) ** 2)
+        y += 0.55 * bumps
+
+        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.05))
+        y += 0.10 * noise
+
+        y = (y - y.mean()) / (y.std() + 1e-9)
+        Y.append(y)
+
+    return t, np.vstack(Y)  # (n_cont, T)
+
+
+def make_discrete(p: Params, t: np.ndarray) -> np.ndarray:
+    rng = np.random.default_rng(p.seed + 123)
+    T = len(t)
+
+    expected_changes = max(1, int(p.seconds * p.disc_change_rate_hz))
+    X = np.zeros((p.n_disc, T), dtype=int)
+
+    for c in range(p.n_disc):
+        k = rng.poisson(expected_changes) + 1
+        pts = np.unique(rng.integers(0, T, size=k))
+        pts = np.sort(np.concatenate([[0], pts, [T]]))
+
+        cur = rng.integers(0, p.disc_vocab)
+        for a, b in zip(pts[:-1], pts[1:]):
+            if a != 0 and rng.random() < 0.85:
+                cur = rng.integers(0, p.disc_vocab)
+            X[c, a:b] = cur
+
+    return X
+
+
+def style_3d_axes(ax):
+    # Make panes white but less visually heavy
+    try:
+        # Keep pane fill ON (white background) but reduce edge prominence
+        ax.xaxis.pane.set_edgecolor("0.7")
+        ax.yaxis.pane.set_edgecolor("0.7")
+        ax.zaxis.pane.set_edgecolor("0.7")
+    except Exception:
+        pass
+
+    ax.grid(True, linewidth=0.4, alpha=0.30)
+
+
+def remove_tick_numbers_keep_axis_titles(ax):
+    # Remove tick labels (numbers) and tick marks, keep axis titles
+    ax.set_xticklabels([])
+    ax.set_yticklabels([])
+    ax.set_zticklabels([])
+
+    ax.tick_params(
+        axis="both",
+        which="both",
+        length=0,   # no tick marks
+        pad=0,
+    )
+    # 3D has separate tick_params for z on some versions; this still works broadly:
+    try:
+        ax.zaxis.set_tick_params(length=0, pad=0)
+    except Exception:
+        pass
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--out", type=Path, default=Path("synth_ics_3d.png"))
+    ap.add_argument("--format", choices=["png", "svg"], default="png")
+
+    ap.add_argument("--seed", type=int, default=7)
+    ap.add_argument("--seconds", type=float, default=10.0)
+    ap.add_argument("--fs", type=int, default=220)
+
+    ap.add_argument("--n-cont", type=int, default=5)
+    ap.add_argument("--n-disc", type=int, default=2)
+    ap.add_argument("--disc-vocab", type=int, default=8)
+    ap.add_argument("--disc-rate", type=float, default=1.1)
+
+    ap.add_argument("--elev", type=float, default=25.0)
+    ap.add_argument("--azim", type=float, default=-58.0)
+
+    ap.add_argument("--fig-w", type=float, default=5.4)
+    ap.add_argument("--fig-h", type=float, default=5.0)
+
+    ap.add_argument("--disc-z-scale", type=float, default=0.45)
+    ap.add_argument("--disc-z-offset", type=float, default=-1.4)
+
+    args = ap.parse_args()
+
+    p = Params(
+        seed=args.seed,
+        seconds=args.seconds,
+        fs=args.fs,
+        n_cont=args.n_cont,
+        n_disc=args.n_disc,
+        disc_vocab=args.disc_vocab,
+        disc_change_rate_hz=args.disc_rate,
+        elev=args.elev,
+        azim=args.azim,
+        fig_w=args.fig_w,
+        fig_h=args.fig_h,
+        disc_z_scale=args.disc_z_scale,
+        disc_z_offset=args.disc_z_offset,
+    )
+
+    t, Yc = make_continuous(p)
+    Xd = make_discrete(p, t)
+
+    fig = plt.figure(figsize=(p.fig_w, p.fig_h), dpi=220, facecolor="white")
+    ax = fig.add_subplot(111, projection="3d")
+    style_3d_axes(ax)
+
+    # Reduce whitespace around axes (tight placement)
+    fig.subplots_adjust(left=p.left, right=p.right, bottom=p.bottom, top=p.top)
+
+    # Draw continuous channels
+    for i in range(p.n_cont):
+        y = np.full_like(t, fill_value=i, dtype=float)
+        z = Yc[i]
+        ax.plot(t, y, z, linewidth=2.0)
+
+    # Draw discrete channels as steps
+    for j in range(p.n_disc):
+        ch = p.n_cont + j
+        y = np.full_like(t, fill_value=ch, dtype=float)
+        z = p.disc_z_offset + p.disc_z_scale * Xd[j].astype(float)
+        ax.step(t, y, z, where="post", linewidth=2.2)
+
+    # Axis titles kept
+    ax.set_xlabel("time")
+    ax.set_ylabel("channel")
+    ax.set_zlabel("value")
+
+    # Remove numeric tick labels + tick marks
+    remove_tick_numbers_keep_axis_titles(ax)
+
+    # Camera
+    ax.view_init(elev=p.elev, azim=p.azim)
+
+    # Save tightly (minimize white border)
+    args.out.parent.mkdir(parents=True, exist_ok=True)
+    save_kwargs = dict(bbox_inches="tight", pad_inches=0.03, facecolor="white")
+    if args.format == "svg" or args.out.suffix.lower() == ".svg":
+        fig.savefig(args.out, format="svg", **save_kwargs)
+    else:
+        fig.savefig(args.out, format="png", **save_kwargs)
+
+    plt.close(fig)
+    print(f"Wrote: {args.out}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-scripts/transformer_math_figure.py
+++ b/arxiv-style/fig-scripts/transformer_math_figure.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+"""
+Transformer-ish "trend" visuals with NO equations:
+- attention_weights.svg      : heatmap-like attention map (looks like "Transformer attends to positions")
+- token_activation_trends.svg: multiple token-channel curves (continuous trends)
+- discrete_tokens.svg        : step-like discrete channel trends (optional)
+
+All SVGs have transparent background and no axes (diagram-friendly).
+"""
+
+from __future__ import annotations
+
+import argparse
+from dataclasses import dataclass
+from pathlib import Path
+
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+# ----------------------------
+# Synthetic data generators
+# ----------------------------
+
+@dataclass
+class Params:
+    seed: int = 7
+    T: int = 24                 # sequence length (positions)
+    n_heads: int = 4            # attention heads to blend/choose
+    n_curves: int = 7           # curves in token_activation_trends
+    seconds: float = 10.0
+    fs: int = 200
+
+
+def _gaussian(x: np.ndarray, mu: float, sig: float) -> np.ndarray:
+    return np.exp(-0.5 * ((x - mu) / (sig + 1e-9)) ** 2)
+
+
+def make_attention_map(T: int, rng: np.random.Generator, mode: str) -> np.ndarray:
+    """
+    Create a transformer-like attention weight matrix A (T x T) with different visual styles:
+      - "local": mostly near-diagonal attention
+      - "global": some global tokens attend broadly
+      - "causal": lower-triangular (decoder-like) with local preference
+    """
+    i = np.arange(T)[:, None]  # query positions
+    j = np.arange(T)[None, :]  # key positions
+
+    if mode == "local":
+        logits = -((i - j) ** 2) / (2 * (2.2 ** 2))
+        logits += 0.15 * rng.normal(size=(T, T))
+    elif mode == "global":
+        logits = -((i - j) ** 2) / (2 * (3.0 ** 2))
+        # Add a few "global" key positions that many queries attend to
+        globals_ = rng.choice(T, size=max(2, T // 10), replace=False)
+        for g in globals_:
+            logits += 1.2 * _gaussian(j, mu=g, sig=1.0)
+        logits += 0.12 * rng.normal(size=(T, T))
+    elif mode == "causal":
+        logits = -((i - j) ** 2) / (2 * (2.0 ** 2))
+        logits += 0.12 * rng.normal(size=(T, T))
+        logits = np.where(j <= i, logits, -1e9)  # mask future
+    else:
+        raise ValueError(f"Unknown attention mode: {mode}")
+
+    # softmax rows
+    logits = logits - np.max(logits, axis=1, keepdims=True)
+    A = np.exp(logits)
+    A /= (np.sum(A, axis=1, keepdims=True) + 1e-9)
+    return A
+
+
+def make_token_activation_trends(p: Params) -> tuple[np.ndarray, np.ndarray]:
+    """
+    Multiple smooth curves that feel like "representations evolving across layers/time".
+    Returns:
+      t: (N,)
+      Y: (n_curves, N)
+    """
+    rng = np.random.default_rng(p.seed)
+    N = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, N, endpoint=False)
+
+    Y = []
+    for k in range(p.n_curves):
+        # Multi-scale smooth components + some bursty response
+        f1 = 0.10 + 0.04 * k
+        f2 = 0.60 + 0.18 * (k % 3)
+        phase = rng.uniform(0, 2 * np.pi)
+
+        base = 0.9 * np.sin(2 * np.pi * f1 * t + phase) + 0.35 * np.sin(2 * np.pi * f2 * t + 0.7 * phase)
+
+        # "attention-like gating": a few bumps where the curve spikes smoothly
+        bumps = np.zeros_like(t)
+        for _ in range(rng.integers(2, 5)):
+            mu = rng.uniform(0.5, p.seconds - 0.5)
+            sig = rng.uniform(0.15, 0.55)
+            bumps += 0.9 * _gaussian(t, mu=mu, sig=sig)
+
+        noise = rng.normal(0, 1, size=N)
+        noise = np.convolve(noise, np.ones(11) / 11.0, mode="same")  # smooth noise
+
+        y = base + 0.85 * bumps + 0.12 * noise
+
+        # normalize and vertically offset
+        y = (y - y.mean()) / (y.std() + 1e-9)
+        y = 0.75 * y + 0.18 * k
+        Y.append(y)
+
+    return t, np.vstack(Y)
+
+
+def make_discrete_trends(p: Params, vocab: int = 9, change_rate_hz: float = 1.3) -> tuple[np.ndarray, np.ndarray]:
+    """
+    Discrete step-like channels: useful if you want a "token-id / discrete feature" feel.
+    Returns:
+      t: (N,)
+      X: (n_curves, N) integers
+    """
+    rng = np.random.default_rng(p.seed + 123)
+    N = int(p.seconds * p.fs)
+    t = np.linspace(0, p.seconds, N, endpoint=False)
+
+    expected = max(1, int(p.seconds * change_rate_hz))
+    X = np.zeros((p.n_curves, N), dtype=int)
+    for c in range(p.n_curves):
+        k = rng.poisson(expected) + 1
+        pts = np.unique(rng.integers(0, N, size=k))
+        pts = np.sort(np.concatenate([[0], pts, [N]]))
+
+        cur = rng.integers(0, vocab)
+        for a, b in zip(pts[:-1], pts[1:]):
+            if a != 0 and rng.random() < 0.9:
+                cur = rng.integers(0, vocab)
+            X[c, a:b] = cur
+
+    return t, X
+
+
+# ----------------------------
+# Plot helpers (SVG, transparent, axes-free)
+# ----------------------------
+
+def _transparent_fig_ax(width_in: float, height_in: float):
+    fig = plt.figure(figsize=(width_in, height_in), dpi=200)
+    fig.patch.set_alpha(0.0)
+    ax = fig.add_axes([0.03, 0.03, 0.94, 0.94])
+    ax.patch.set_alpha(0.0)
+    ax.set_axis_off()
+    return fig, ax
+
+
+def save_attention_svg(A: np.ndarray, out: Path, *, show_colorbar: bool = False) -> None:
+    fig, ax = _transparent_fig_ax(4.2, 4.2)
+
+    # Using default colormap (no explicit color specification)
+    im = ax.imshow(A, aspect="equal", interpolation="nearest")
+
+    if show_colorbar:
+        # colorbar can be useful, but it adds clutter in diagrams
+        cax = fig.add_axes([0.92, 0.10, 0.03, 0.80])
+        cb = fig.colorbar(im, cax=cax)
+        cb.outline.set_linewidth(1.0)
+
+    out.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
+    plt.close(fig)
+
+
+def save_multi_curve_svg(t: np.ndarray, Y: np.ndarray, out: Path, *, lw: float = 2.0) -> None:
+    fig, ax = _transparent_fig_ax(6.0, 2.2)
+
+    for i in range(Y.shape[0]):
+        ax.plot(t, Y[i], linewidth=lw)
+
+    y_all = Y.reshape(-1)
+    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
+    ypad = 0.08 * (ymax - ymin + 1e-9)
+    ax.set_xlim(t[0], t[-1])
+    ax.set_ylim(ymin - ypad, ymax + ypad)
+
+    out.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
+    plt.close(fig)
+
+
+def save_discrete_svg(t: np.ndarray, X: np.ndarray, out: Path, *, lw: float = 2.0, spacing: float = 1.25) -> None:
+    fig, ax = _transparent_fig_ax(6.0, 2.2)
+
+    for i in range(X.shape[0]):
+        y = X[i].astype(float) + i * spacing
+        ax.step(t, y, where="post", linewidth=lw)
+
+    y_all = (X.astype(float) + np.arange(X.shape[0])[:, None] * spacing).reshape(-1)
+    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
+    ypad = 0.10 * (ymax - ymin + 1e-9)
+    ax.set_xlim(t[0], t[-1])
+    ax.set_ylim(ymin - ypad, ymax + ypad)
+
+    out.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(out, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
+    plt.close(fig)
+
+
+# ----------------------------
+# CLI
+# ----------------------------
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--outdir", type=Path, default=Path("out"))
+    ap.add_argument("--seed", type=int, default=7)
+
+    # attention
+    ap.add_argument("--T", type=int, default=24)
+    ap.add_argument("--attn-mode", type=str, default="local", choices=["local", "global", "causal"])
+    ap.add_argument("--colorbar", action="store_true")
+
+    # curves
+    ap.add_argument("--seconds", type=float, default=10.0)
+    ap.add_argument("--fs", type=int, default=200)
+    ap.add_argument("--n-curves", type=int, default=7)
+
+    # discrete optional
+    ap.add_argument("--with-discrete", action="store_true")
+    ap.add_argument("--disc-vocab", type=int, default=9)
+    ap.add_argument("--disc-rate", type=float, default=1.3)
+
+    args = ap.parse_args()
+
+    p = Params(
+        seed=args.seed,
+        T=args.T,
+        n_curves=args.n_curves,
+        seconds=args.seconds,
+        fs=args.fs,
+    )
+
+    rng = np.random.default_rng(args.seed)
+
+    # 1) attention map
+    A = make_attention_map(args.T, rng, mode=args.attn_mode)
+    save_attention_svg(A, args.outdir / "attention_weights.svg", show_colorbar=args.colorbar)
+
+    # 2) continuous trends
+    t, Y = make_token_activation_trends(p)
+    save_multi_curve_svg(t, Y, args.outdir / "token_activation_trends.svg")
+
+    # 3) discrete trends (optional)
+    if args.with_discrete:
+        td, X = make_discrete_trends(p, vocab=args.disc_vocab, change_rate_hz=args.disc_rate)
+        save_discrete_svg(td, X, args.outdir / "discrete_tokens.svg")
+
+    print("Wrote:")
+    print(f"  {args.outdir / 'attention_weights.svg'}")
+    print(f"  {args.outdir / 'token_activation_trends.svg'}")
+    if args.with_discrete:
+        print(f"  {args.outdir / 'discrete_tokens.svg'}")
+
+
+if __name__ == "__main__":
+    main()
--- a/arxiv-style/fig-type-aware-routing-realdata.pdf
+++ b/arxiv-style/fig-type-aware-routing-realdata.pdf
--- a/arxiv-style/main-ieee.tex
+++ b/arxiv-style/main-ieee.tex
@@ -0,0 +1,302 @@
+\documentclass[conference]{IEEEtran}
+\IEEEoverridecommandlockouts
+
+\usepackage{cite}
+\usepackage{amsmath,amssymb,amsfonts}
+\usepackage{graphicx}
+\usepackage{url}
+\usepackage{textcomp}
+\usepackage{xcolor}
+\usepackage{booktabs}
+\usepackage{bm}
+
+\title{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}
+
+% 若不需要日期，取消下面一行的注释
+
+\author{
+\IEEEauthorblockN{Zhenglan Chen}
+\IEEEauthorblockA{\textit{Aberdeen Institute of Data Science and Artificial Intelligence} \\
+\textit{South China Normal University}\\
+Guangzhuo, China \\
+email address or ORCID}
+\and
+\IEEEauthorblockN{Mingzhe Yang}
+\IEEEauthorblockA{\textit{Aberdeen Institute of Data Science and Artificial Intelligence} \\
+\textit{South China Normal University}\\
+Guangzhuo, China \\
+email address or ORCID}
+\and
+\IEEEauthorblockN{Hongyu Yan}
+\IEEEauthorblockA{\textit{Aberdeen Institute of Data Science and Artificial Intelligence} \\
+\textit{South China Normal University}\\
+Guangzhuo, China \\
+email address or ORCID}
+\and
+\IEEEauthorblockN{Huan Yang}
+\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
+\textit{South China Normal University}\\
+Guangzhuo, China \\
+email address or ORCID}
+}
+
+% 页眉设置
+
+%%% PDF 元数'
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+Industrial control systems (ICS) security research is increasingly constrained by the scarcity and non-shareability of realistic traffic and telemetry, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol feature/telemetry level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-horizon temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees categorical legality by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows (L=96) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 ± 0.0079 (continuous), mean JSD = 0.0284 ± 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 ± 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows.
+\end{abstract}
+
+% 关键'
+\begin{IEEEkeywords}Machine Learning, Cyber Defense, ICS\end{IEEEkeywords}
+
+% 1. Introduction
+\section{Introduction}
+\label{sec:intro}
+Industrial control systems (ICS) form the backbone of modern critical infrastructure, which includes power grids, water treatment, manufacturing, and transportation, among others. These systems monitor, regulate, and automate the physical processes through sensors, actuators, programmable logic controllers (PLCs), and monitoring software. Unlike conventional IT systems, ICS operate in real time, closely coupled with physical processes and safety-critical constraints, using heterogeneous and legacy communication protocols such as Modbus/TCP and DNP3 that were not originally designed with robust security in mind. This architectural complexity and operational criticality make ICS high-impact targets for cyber attacks, where disruptions can result in physical damage, environmental harm, and even loss of life. Recent reviews of ICS security highlight the expanding attack surface due to increased connectivity, legacy systems' vulnerabilities, and the inadequacy of traditional security controls in capturing the nuances of ICS networks and protocols \cite{10.1007/s10844-022-00753-1,Nankya2023-gp}
+
+While machine learning (ML) techniques have shown promise for anomaly detection and automated cybersecurity within ICS, they rely heavily on labeled datasets that capture both benign operations and diverse attack patterns. In practice, real ICS traffic data, especially attack-triggered captures, are scarce due to confidentiality, safety, and legal restrictions, and available public ICS datasets are few, limited in scope, or fail to reflect current threat modalities. For instance, the HAI Security Dataset provides operational telemetry and anomaly flags from a realistic control system setup for research purposes, but must be carefully preprocessed to derive protocol-relevant features for ML tasks \cite{shin}. Data scarcity directly undermines model generalization, evaluation reproducibility, and the robustness of intrusion detection research, especially when training or testing ML models on realistic ICS behavior remains confined to small or outdated collections of examples \cite{info16100910}.
+
+Synthetic data generation offers a practical pathway to mitigate these challenges. By programmatically generating feature-level sequences that mimic the statistical and temporal structure of real ICS telemetry, researchers can augment scarce training sets, standardize benchmarking, and preserve operational confidentiality. Relative to raw packet captures, feature-level synthesis abstracts critical protocol semantics and statistical patterns without exposing sensitive fields, making it more compatible with safety constraints and compliance requirements in ICS environments. Modern generative modeling, including diffusion models, has advanced significantly in producing high-fidelity synthetic data across domains. Diffusion approaches, such as denoising diffusion probabilistic models, learn to transform noise into coherent structured samples and have been successfully applied to tabular or time series data synthesis with better stability and data coverage compared to adversarial methods \cite{pmlr-v202-kotelnikov23a,rasul2021autoregressivedenoisingdiffusionmodels}
+
+Despite these advances, most existing work either focuses on packet-level generation \cite{jiang2023netdiffusionnetworkdataaugmentation} or is limited to generic tabular data \cite{pmlr-v202-kotelnikov23a}, rather than domain-specific control sequence synthesis tailored for ICS protocols where temporal coherence, multi-channel dependencies, and discrete protocol legality are jointly required. This gap motivates our focus on protocol feature-level generation for ICS, which involves synthesizing sequences of protocol-relevant fields conditioned on their temporal and cross-channel structure. In this work, we formulate a hybrid modeling pipeline that decouples long-horizon trends and local statistical detail while preserving discrete semantics of protocol tokens. By combining causal Transformers with diffusion-based refiners, and enforcing deterministic validity constraints during sampling, our framework generates semantically coherent, temporally consistent, and distributionally faithful ICS feature sequences. We evaluate features derived from the HAI Security Dataset and demonstrate that our approach produces high-quality synthetic sequences suitable for downstream augmentation, benchmarking, and integration into packet-construction workflows that respect realistic ICS constraints.
+
+% 2. Related Work
+\section{Related Work}
+\label{sec:related}
+Early generation of network data oriented towards ``realism'' mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \cite{10.1145/1159913.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \cite{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \cite{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \cite{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \cite{7469060,10.1145/3055366.3055375}. This makes ``synthesis at the feature/telemetry level, aware of protocol and semantics'' more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from ``packet syntax reproduction'' to ``modeling of high-level spatio-temporal distributions and uncertainties'', requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
+
+Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \cite{ho2020denoising}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \cite{song2021score}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \cite{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \cite{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \cite{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \cite{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \cite{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \cite{11087622}. These results overall point to a conclusion: when the research focus is on ``telemetry/high-level features'' rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a ``monolithic diffusion'' is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
+
+Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be ``legal'' under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \cite{austin2021structured}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \cite{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \cite{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \cite{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of ``sequence legality and state coverage'' \cite{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of ``legality by construction'' in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
+
+From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-modal fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \cite{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \cite{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \cite{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \cite{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \cite{2023}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \cite{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
+
+% 3. Methodology
+\section{Methodology}
+\label{sec:method}
+Industrial control system (ICS) telemetry is intrinsically mixed-type and mechanistically heterogeneous: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic--it directly affects what "realistic'synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled \cite{nist2023sp80082}.
+
+We formalize each training instance as a fixed-length window of length We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
+
+A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \cite{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \cite{kollovieh2023tsdiff,sikder2023transfusion}.
+
+\begin{figure*}[t]
+  \centering
+  \includegraphics[width=\textwidth]{fig-design-v2.png}
+  % \caption{Description of the figure.}
+  \label{fig:design}
+\end{figure*}
+
+Motivated by these considerations, we propose Mask-DDPM, organized in the following order:
+\begin{enumerate}
+  \item Transformer trend module: learns the dominant temporal backbone of continuous dynamics via attention-based sequence modeling \cite{vaswani2017attention}.
+
+  \item Residual DDPM for continuous variables: models distributional detail as stochastic residual structure conditioned on the learned trend \cite{ho2020denoising,kollovieh2023tsdiff}.
+
+  \item Masked diffusion for discrete variables: generates discrete ICS states with an absorbing/masking corruption process and categorical reconstruction \cite{austin2021structured,shi2024simplified}.
+
+  \item Type-aware decomposition: a type-aware factorization and routing layer that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
+\end{enumerate}
+
+This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \cite{yuan2025ctu,sha2026ddpm}.
+
+\subsection{Transformer trend module for continuous dynamics}
+\label{sec:method-trans}
+We instantiate the temporal backbone as a causal Transformer trend extractor, leveraging self-attention's ability to represent long-range dependencies and cross-channel interactions without recurrence \cite{vaswani2017attention}. Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling--common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation \cite{vaswani2017attention,nist2023sp80082}. Crucially, in our design the Transformer is not asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
+
+For continuous channels $\bm{X}$, we posit an additive decomposition:
+\begin{equation}
+\bm{X} = \bm{S} + \bm{R},
+\label{eq:additive_decomp}
+\end{equation}
+where $\bm{S} \in \mathbb{R}^{L \times d_c}$ is a smooth trend capturing predictable temporal evolution, and $\bm{R} \in \mathbb{R}^{L \times d_c}$ is a residual capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-based temporal objective. This separation reflects an explicit division of labor: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level--a strategy aligned with "predict-then-refine'perspectives in time-series diffusion modeling \cite{kollovieh2023tsdiff,sikder2023transfusion}.
+
+We parameterize the trend $\bm{S}$ using a causal Transformer $f_\phi$. With teacher forcing, we train $F_\phi$ to predict the next-step trend from past observations:
+\begin{equation}
+\hat{\bm{S}}_{t+1} = f_{\phi}(\bm{X}_{1:t}), \quad t = 1, \dots, L-1.
+\label{eq:trend_prediction}
+\end{equation}
+using the mean-squared error objective:
+\begin{equation}
+\mathcal{L}_{\text{trend}}(\phi) = \frac{1}{(L-1)d_c} \sum_{t=1}^{L-1} \bigl\| \hat{\bm{S}}_{t+1} - \bm{X}_{t+1} \bigr\|_2^2.
+\label{eq:trend_loss}
+\end{equation}
+At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S}}$, and and then define the residual target for diffusion as $\bm{R} = \bm{X} - \hat{\bm{S}}$. This setup intentionally "locks in'a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a conditioning-first temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an end-to-end monolithic generator \cite{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}.
+
+\subsection{DDPM for continuous residual generation}
+\label{sec:method-ddpm}
+We model the residual RRR with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \cite{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \cite{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \cite{kollovieh2023tsdiff,sikder2023transfusion}.
+
+Let $\bm{K}$ denote the number of diffusion steps, with a noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
+\begin{equation}
+q(\bm{r}_k \mid \bm{r}_0) = \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr)
+\label{eq:forward_corruption}
+\end{equation}
+equivalently,
+\begin{equation}
+\bm{r}_k = \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
+\label{eq:forward_corruption_eq}
+\end{equation}
+The learned reverse process is parameterized as:
+\begin{equation}
+p_{\theta}(\bm{r}_{k-1} \mid \bm{r}_k, \hat{\bm{S}}) = \mathcal{N}\bigl( \boldsymbol{\mu}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}),\; \boldsymbol{\Sigma}(k) \bigr).
+\label{eq:reverse_process}
+\end{equation}
+where $\mu_\theta$ is implemented by a Transformer denoiser that consumes (i) the noised residual $r_k$, (ii) a timestep embedding for $k$, and (iii) conditioning features derived from $\hat{\bm{S}}$.  This denoiser architecture is consistent with the growing use of attention-based denoisers for long-context time-series diffusion, while our key methodological emphasis is the trend-conditioned residual factorization as the object of diffusion learning \cite{ho2020denoising,sikder2023transfusion}.
+
+We train the denoiser using the standard DDPM $\epsilon$-prediction objective:
+\begin{equation}
+\mathcal{L}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right].
+\label{eq:ddpm_loss}
+\end{equation}
+Because diffusion optimization can exhibit timestep imbalance (i.e., some timesteps dominate gradients), we optionally apply an SNR-based reweighting consistent with Min-SNR training:
+\begin{equation}
+\mathcal{L}^{\text{snr}}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ w_k \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right],
+\label{eq:snr_loss}
+\end{equation}
+where $\mathrm{SNR}_k=\bar{\alpha}_k/(1-\bar{\alpha}_k)$ and $\gamma>0$ is a cap parameter \cite{hang2023efficient}.
+
+After sampling $\hat{\bm{R}}$ by reverse diffusion, we reconstruct the continuous output as $\hat{\bm{X}} = \hat{\bm{S}} + \hat{\bm{R}}$. Overall, the DDPM component serves as a distributional corrector on top of a temporally coherent backbone, which is particularly suited to ICS where low-frequency dynamics are strong and persistent but fine-scale variability (including bursts and regime-conditioned noise) remains important for realism. Relative to prior ICS diffusion efforts that primarily focus on continuous augmentation, our formulation elevates trend-conditioned residual diffusion as a modular mechanism for disentangling temporal structure from distributional refinement \cite{yuan2025ctu,sha2026ddpm}.
+
+\subsection{Masked diffusion for discrete ICS variables}
+\label{sec:method-discrete}
+Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., "in-between'modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables \cite{austin2021structured,shi2024simplified}. In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness \cite{nist2023sp80082}.
+
+We therefore adopt masked (absorbing) diffusion for discrete channels, where corruption replaces tokens with a special $\texttt{[MASK]}$ symbol according to a schedule \cite{shi2024simplified}. For each variable $j$, define a masking schedule $\{m_k\}_{k=1}^K$ (with $m_k\in[0,1]$) increasing in $k$. The forward corruption process is:
+\begin{equation}
+q(y^{(j)}_k \mid y^{(j)}_0) =
+\begin{cases}
+y^{(j)}_0,            & \text{with probability } 1 - m_k, \\
+\texttt{[MASK]},      & \text{with probability } m_k,
+\end{cases}
+\label{eq:masking_process}
+\end{equation}
+applied independently across $j$ and $t$. Let $\mathcal{M}$ denote the set of masked positions at step $k$. The denoiser $h_{\psi}$ predicts a categorical distribution over $\mathcal{V}_j$ for each masked token, conditioned on (i) the corrupted discrete sequence, (ii) the diffusion step $k$, and (iii) continuous context. Concretely, we condition on $\hat{\bm{S}}$ and $\hat{\bm{X}}$ to couple supervisory reconstruction to the underlying continuous dynamics:
+\begin{equation}
+p_{\psi}\bigl( y^{(j)}_0 \mid y_k, k, \hat{\bm{S}}, \hat{\bm{X}} \bigr) = h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}}).
+\label{eq:discrete_denoising}
+\end{equation}
+This conditioning choice is motivated by the fact that many discrete ICS states are not standalone, they are functions of regimes, thresholds, and procedural phases that manifest in continuous channels \cite{nist2023sp80082}. Training uses a categorical denoising objective:
+\begin{equation}
+\mathcal{L}_{\text{disc}}(\psi) = \mathbb{E}_{k} \left[ \frac{1}{|\mathcal{M}|} \sum_{(j,t) \in \mathcal{M}} \mathrm{CE}\bigl( h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}})_{j,t},\; y^{(j)}_{0,t} \bigr) \right],
+\label{eq:discrete_loss}
+\end{equation}
+where $\mathrm{CE}(\cdot,\cdot)$ is cross-entropy. At sampling time, we initialize all discrete tokens as $\texttt{[MASK]}$ and iteratively unmask them using the learned conditionals, ensuring that every output token lies in its legal vocabulary by construction. This discrete branch is a key differentiator of our pipeline: unlike typical continuous-only diffusion augmentation in ICS, we integrate masked diffusion as a first-class mechanism for supervisory-variable legality within the same end-to-end synthesis workflow \cite{shi2024simplified,yuan2025ctu}.
+
+\subsection{Type-aware decomposition as factorization and routing layer}
+\label{sec:method-types}
+Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some "derived tags'are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables \cite{nist2023sp80082}.
+
+We therefore introduce a type-aware decomposition that formalizes this heterogeneity as a routing and constraint layer.  Let $\tau(i)\in{1,\dots,6}$ assign each variable (i) to a type class. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section. Importantly, this refinement does not change the core diffusion backbone; it changes which mechanism is responsible for which variable, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
+
+We use the following taxonomy:
+\begin{enumerate}
+	\item Type 1 (program-driven / setpoint-like): externally commanded, step-and-dwell variables. These variables can be treated as exogenous drivers (conditioning signals) or routed to specialized change-point / dwell-time models, rather than being forced into a smooth denoiser that may over-regularize step structure.
+
+	\item Type 2 (controller outputs): continuous variables tightly coupled to feedback loops; these benefit from conditional modeling where the conditioning includes relevant process variables and commanded setpoints.
+
+	\item Type 3 (actuator states/positions): often exhibit saturation, dwell, and rate limits; these may require stateful dynamics beyond generic residual diffusion, motivating either specialized conditional modules or additional inductive constraints.
+
+	\item Type 4 (process variables): inertia-dominated continuous dynamics; these are the primary beneficiaries of the Transformer trend + residual DDPM pipeline.
+
+	\item Type 5 (derived/deterministic variables): algebraic or rule-based functions of other variables; we enforce deterministic reconstruction $\hat{x}^{(i)} = g_i(\hat{X},\hat{Y})$ rather than learning a stochastic generator, improving logical consistency and sample efficiency.
+
+	\item Type 6 (auxiliary/low-impact variables): weakly coupled or sparse signals; we allow simplified modeling (e.g., calibrated marginals or lightweight temporal models) to avoid allocating diffusion capacity where it is not warranted.
+\end{enumerate}
+
+Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
+
+From a novelty standpoint, this layer is not merely an engineering "patch' it is an explicit methodological statement that ICS synthesis benefits from typed factorization--a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \cite{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
+
+\subsection{Joint optimization and end-to-end sampling}
+\label{sec:method-joint}
+We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer $f_{\phi}$ to obtain $\hat{\bm{S}}$; (ii) we compute residual targets $\hat{\bm{R}} = \bm{X} - \hat{\bm{S}}$ for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM $p_{\theta}(\bm{R}\mid \hat{\bm{S}})$ and masked diffusion model $p_{\psi}(\bm{Y}\mid \text{masked}(\bm{Y}), \hat{\bm{S}}, \hat{\bm{X}})$; and (iv) we apply type-aware routing and deterministic reconstruction during sampling. This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate "structure vs. realism'conflicts \cite{kollovieh2023tsdiff,sikder2023transfusion}.
+
+A simple combined objective is $\mathcal{L} = \lambda\mathcal{L}_{\text{cont}} + (1-\lambda)\mathcal{L}_{\text{disc}}$ with $\lambda\in[0,1]$ controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or by specialized drivers (Type 1/3, depending on configuration) are prevented from forcing the diffusion models into statistically incoherent compromises.
+
+At inference time, generation follows the same structured order: (i) trend $\hat{\bm{S}}$ via the Transformer, (ii) residual $\hat{\bm{R}}$ via DDPM, (iii) discrete $\hat{\bm{Y}}$ via masked diffusion, and (iv) type-aware assembly with deterministic reconstruction for routed variables. This pipeline produces $(\hat{\bm{X}},\hat{\bm{Y}})$ that are temporally coherent by construction (through $\hat{\bm{S}}$), distributionally expressive (through $\hat{\bm{R}}$ denoising), and discretely valid (through masked diffusion), while explicitly accounting for heterogeneous variable-generating mechanisms through type-aware routing. In combination, these choices constitute our central methodological contribution: a unified Transformer + mixed diffusion generator for ICS telemetry, augmented by typed factorization to align model capacity with domain mechanism \cite{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}.
+
+% 4. Benchmark
+\section{Benchmark}
+\label{sec:benchmark}
+We evaluate the proposed pipeline on feature sequences derived from the HAI Security Dataset, using fixed-length windows (L=96) that preserve the mixed-type structure of ICS telemetry. The goal of this benchmark is not only to report "overall similarity' but to justify why the proposed factorization is a better fit for protocol feature synthesis: continuous channels must match physical marginals \cite{coletta2023constrained}, discrete channels must remain semantically legal, and both must retain short-horizon dynamics that underpin state transitions and interlocks \cite{yang2001interlock}.
+
+This emphasis reflects evaluation practice in time-series generation, where strong results are typically supported by multiple complementary views (marginal fidelity, dependency/temporal structure, and downstream plausibility), rather than a single aggregate score \cite{stenger2024survey}. In the ICS setting, this multi-view requirement is sharper: a generator that matches continuous marginals while emitting out-of-vocabulary supervisory tokens is unusable for protocol reconstruction, and a generator that matches marginals but breaks lag structure can produce temporally implausible command/response sequences.
+
+Recent ICS time-series generators often emphasize aggregate similarity scores and utility-driven evaluations (e.g., anomaly-detection performance) to demonstrate realism, which is valuable but can under-specify mixed-type protocol constraints. Our benchmark complements these practices by making mixed-type legality and per-feature distributional alignment explicit: discrete outputs are evaluated as categorical distributions (JSD) and are constrained to remain within the legal vocabulary by construction, while continuous channels are evaluated with nonparametric distribution tests (KS) \cite{yoon2019timegan}. This combination provides a direct, protocol-relevant justification for the hybrid design, rather than relying on a single composite score that may mask discrete failures.
+
+For continuous channels, we measure distributional alignment using the Kolmogorov--Smirnov (KS) statistic computed per feature between the empirical distributions of real and synthetic samples, and then averaged across features. For discrete channels, we quantify marginal fidelity with Jensen--Shannon divergence (JSD) \cite{lin1991divergence,yoon2019timegan} between categorical distributions per feature, averaged across discrete variables. To assess temporal realism, we compare lag-1 autocorrelation at the feature level and report the mean absolute difference between real and synthetic lag-1 autocorrelation, averaged across features. In addition, to avoid degenerate comparisons driven by near-constant tags, features whose empirical standard deviation falls below a small threshold are excluded from continuous KS aggregation; such channels carry limited distributional information and can distort summary statistics.
+
+\subsection{Quantitative results}
+\label{sec:benchmark-quant}
+Across all runs, the mean continuous KS is 0.3311 (std 0.0079) and the mean discrete JSD is 0.0284 (std 0.0073), indicating that the generator preserves both continuous marginals and discrete semantic distributions at the feature level. Temporal consistency is similarly stable across runs, with a mean lag-1 autocorrelation difference of 0.2684 (std 0.0027), suggesting that the synthesized windows retain short-horizon dynamical structure \cite{ni2021sigwasserstein} instead of collapsing to marginal matching alone. The best-performing instance (by mean KS) attains 0.3224, and the small inter-seed variance shows that the reported fidelity is reproducible rather than driven by a single favorable initialization.
+\begin{figure*}[t]
+  \centering
+  \includegraphics[width=\textwidth]{fig-overall-benchmark-v1.png}
+  % \caption{Description of the figure.}
+  \label{fig:benchmark}
+\end{figure*}
+
+\begin{table}[htbp]
+\centering
+\caption{Summary of benchmark metrics. Lower values indicate better performance.}
+\label{tab:metrics}
+\begin{tabular}{@{}l l c@{}}
+\toprule
+\textbf{Metric} & \textbf{Aggregation} & \textbf{Mean $\pm$ Std} \\
+\midrule
+KS (continuous) & mean over continuous features & 0.3311 $\pm$ 0.0079 \\
+JSD (discrete)  & mean over discrete features   & 0.0284 $\pm$ 0.0073 \\
+Abs $\Delta$ lag-1 autocorr & mean over features & 0.2684 $\pm$ 0.0027 \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+To make the benchmark actionable (and comparable to prior work), we report type-appropriate, interpretable statistics instead of collapsing everything into a single similarity score. This matters in mixed-type ICS telemetry: continuous fidelity can be high while discrete semantics fail, and vice versa. By separating continuous (KS), discrete (JSD), and temporal (lag-1) views, the evaluation directly matches the design goals of the hybrid generator: distributional refinement for continuous residuals, vocabulary-valid reconstruction for discrete supervision, and trend-induced short-horizon coherence.
+
+In addition, the seed-averaged reporting mirrors evaluation conventions in recent diffusion-based time-series generation studies, where robustness across runs is increasingly treated as a first-class signal rather than an afterthought. In this sense, the small inter-seed variance is itself evidence that the factorized training and typed routing reduce instability and localized error concentration, which is frequently observed when heterogeneous channels compete for the same modeling capacity.
+
+% 5. Future Work
+\section{Future Work}
+\label{sec:future}
+Future work will further expand from "generating legal ICS feature sequences" to "data construction and adversarial evaluation for security tasks". The core contribution of this paper focuses on generating feature sequences that are temporally consistent, have credible distributions, and have legal discrete values under mixed types and multi-scale dynamics. However, in the actual research of intrusion detection and anomaly detection, the more critical bottleneck is often the lack of "illegal data/anomaly data" with clear attack semantics and sufficient coverage. Therefore, a direct and important extension direction is to use the legal sequences generated in this paper as a controllable and reproducible "base line operation flow", and then, on the premise of maintaining sequence-level legality and engineering constraints, inject or mix illegal behaviors according to specified attack patterns, thereby systematically constructing a dataset for training and evaluating the recognition of illegal data packets.
+
+Specifically, attack injection can be upgraded from "simple perturbation" to "semantically consistent patterned rewriting": on continuous channels, implement bias injection, covert manipulation near thresholds, instantaneous mutations, and intermittent bursts, etc., so that it can both mimic the temporal characteristics pursued by attackers for concealment and not violate the basic boundary conditions of process dynamics; on discrete channels, implement illegal state transitions, alarm suppression/delayed triggering, pattern camouflage, etc., so that it reflects the trajectory morphology of "unreachable but forcibly created" under real control logic. Furthermore, the attack injection process itself can be coordinated with the type routing and constraint layer in this paper: for deterministically derived variables, illegal behaviors should be transmitted through the modification of upstream variables to maintain consistency; for supervised variables constrained by finite-state machines, interpretable illegal transitions should be generated through the "minimum violation path" or "controlled violation intensity", and violation points and violation types should be explicitly marked to facilitate downstream detection tasks to learn more fine-grained discrimination criteria.
+
+In terms of method morphology, this direction also naturally supports stronger controllability and measurability: attack patterns can be regarded as conditional variables to uniformly conditionally orchestrate legitimate generation and illegal injection, generating control samples of "different attack strategies under the same legitimate framework", thereby transforming dataset construction into a repeatable scenario generation process; meanwhile, by controlling the injection location, duration, amplitude, and coupling range, the performance degradation curves of detectors under different threat intensities and different operating condition stages can be systematically scanned, forming a more stable benchmark than "single acquisition/single script". Ultimately, this approach will transform the legitimate data generation capabilities presented in this paper into the infrastructure for security research: first providing a shareable and reproducible legitimate operation distribution, then injecting illegal patterns with clear semantics in a controllable manner, producing a dataset with sufficient coverage and consistent annotation for training and evaluating models that identify illegal packets/abnormal sequences, and promoting the improvement of reproducibility and engineering credibility in this direction.
+
+% 6. Conclusion
+\section{Conclusion}
+\label{sec:conclusion}
+This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control system (ICS) security by proposing a practical synthetic telemetry generation framework at the protocol feature level. We introduced Mask-DDPM, a hybrid generator designed explicitly for the mixed-type and multi-scale nature of ICS data, where continuous process dynamics must remain temporally coherent while discrete supervisory variables must remain categorically legal by construction.
+
+Our main contributions are: (i) a causal Transformer trend module that provides a stable long-horizon temporal scaffold for continuous channels; (ii) a trend-conditioned residual DDPM that focuses modeling capacity on local stochastic detail and marginal fidelity without destabilizing global structure; (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees in-vocabulary outputs and supports semantics-aware conditioning on continuous context; and (iv) a type-aware decomposition/routing layer that aligns model mechanisms with heterogeneous ICS variable origins (e.g., process inertia, step-and-dwell setpoints, deterministic derived tags), enabling deterministic enforcement where appropriate and improving capacity allocation.
+
+We evaluated the approach on windows derived from the HAI Security Dataset and reported mixed-type, protocol-relevant metrics rather than a single aggregate score. Across seeds, the model achieves stable fidelity with mean KS = 0.3311 ± 0.0079 on continuous features, mean JSD = 0.0284 ± 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference 0.2684 ± 0.0027, indicating that Mask-DDPM preserves both marginal distributions and short-horizon dynamics while maintaining discrete legality.
+
+Overall, Mask-DDPM provides a reproducible foundation for generating shareable, semantically valid ICS feature sequences suitable for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Building on this capability, a natural next step is to move from purely legal synthesis toward controllable scenario construction, including structured attack/violation injection under engineering constraints to support adversarial evaluation and more comprehensive security benchmarks.
+% 参考文'
+\bibliographystyle{IEEEtran}
+\bibliography{references}
+
+\end{document}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/arxiv-style/main.tex
+++ b/arxiv-style/main.tex
@@ -0,0 +1,370 @@
+\documentclass{article}
+
+\usepackage{arxiv}
+
+\usepackage[utf8]{inputenc} % allow utf-8 input
+\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
+\usepackage{hyperref}       % hyperlinks
+\usepackage{url}            % simple URL typesetting
+\usepackage{booktabs}       % professional-quality tables
+\usepackage{amsfonts}       % blackboard math symbols
+\usepackage{nicefrac}       % compact symbols for 1/2, etc.
+\usepackage{microtype}      % microtypography
+\usepackage{amsmath}		% cleveref must be loaded after amsmath!
+\usepackage{cleveref}       % smart cross-referencing
+\usepackage{lipsum}         % Can be removed after putting your text content
+\usepackage{graphicx}
+\usepackage[numbers]{natbib}
+\usepackage{doi}
+
+% Packages for equations
+\usepackage{amssymb}
+\usepackage{bm}
+\usepackage{array}       % For column formatting
+\usepackage{caption}     % Better caption spacing
+\usepackage{float}       % Precise figure placement
+
+\title{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}
+
+\date{}
+
+\newif\ifuniqueAffiliation
+\uniqueAffiliationtrue
+
+\ifuniqueAffiliation
+\author{
+    Zhenglan Chen \\
+	Aberdeen Institute of Data Science and Artificial Intelligence\\
+	South China Normal University\\
+	Guangzhou, Guangdong 510631, China \\
+	\texttt{20223803054@m.scnu.edu.cn} \\
+	\And
+	Mingzhe Yang \\
+	Aberdeen Institute of Data Science and Artificial Intelligence\\
+	South China Normal University\\
+	Guangzhou, Guangdong 510631, China \\
+	\texttt{20223803063@m.scnu.edu.cn} \\
+	\And
+	Hongyu Yan \\
+	Aberdeen Institute of Data Science and Artificial Intelligence\\
+	South China Normal University\\
+	Guangzhou, Guangdong 510631, China \\
+	\texttt{20223803065@m.scnu.edu.cn}
+	\And
+	Huan Yang \\
+    School of Artificial Intelligence\\
+    South China Normal University\\
+    Guangzhou, Guangdong 510631, China \\
+    \texttt{huan.yang@m.scnu.edu.cn} \\
+}
+\fi
+
+\renewcommand{\shorttitle}{\textit{arXiv} Template}
+
+\hypersetup{
+pdftitle={Your Paper Title},
+pdfsubject={cs.LG, cs.CR},
+pdfauthor={David S.~Hippocampus, Elias D.~Striatum},
+pdfkeywords={Keyword1, Keyword2, Keyword3},
+}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+Industrial control systems (ICS) security research is increasingly constrained by the scarcity and limited shareability of realistic communication traces and process measurements, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol-feature and process-signal level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-range temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees valid symbol generation by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows ($L=96$) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 $\pm$ 0.0079 (continuous), mean JSD = 0.0284 $\pm$ 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows.
+\end{abstract}
+
+\keywords{Machine Learning \and Cyber Defense \and ICS}
+
+% 1. Introduction
+\section{Introduction}
+\label{sec:intro}
+Industrial control systems (ICS) form the backbone of modern critical infrastructure, which includes power grids, water treatment, manufacturing, and transportation, among others. These systems monitor, regulate, and automate physical processes through sensors, actuators, programmable logic controllers (PLCs), and monitoring software. Unlike conventional IT systems, ICS operate in real time, closely coupled with physical processes and safety-critical constraints, using heterogeneous and legacy communication protocols such as Modbus/TCP and DNP3 that were not originally designed with robust security in mind. This architectural complexity and operational criticality make ICS high-impact targets for cyber attacks, where disruptions can result in physical damage, environmental harm, and even loss of life. Recent reviews of ICS security highlight the expanding attack surface due to increased connectivity, vulnerabilities in legacy systems, and the inadequacy of traditional security controls in capturing the nuances of ICS networks and protocols \citep{10.1007/s10844-022-00753-1, Nankya2023-gp}
+
+While machine learning (ML) techniques have shown promise for anomaly detection and automated cybersecurity within ICS, they rely heavily on labeled datasets that capture both benign operations and diverse attack patterns. In practice, real ICS traffic data, especially attack-triggered captures, are scarce due to confidentiality, safety, and legal restrictions, and available public ICS datasets are few, limited in scope, or fail to reflect current threat modalities. For instance, the HAI Security Dataset provides operational telemetry and anomaly flags from a realistic control system setup for research purposes, but must be carefully preprocessed to derive protocol-relevant features for ML tasks \citep{shin}. Data scarcity directly undermines model generalization, evaluation reproducibility, and the robustness of intrusion detection research, especially when training or testing ML models on realistic ICS behavior remains confined to small or outdated collections of examples \citep{info16100910}.
+
+Synthetic data generation offers a practical pathway to mitigate these challenges. By programmatically generating feature-level sequences that mimic the statistical and temporal structure of real ICS telemetry, researchers can augment scarce training sets, standardize benchmarking, and preserve operational confidentiality. Relative to raw packet captures, feature-level synthesis abstracts critical protocol semantics and statistical patterns without exposing sensitive fields, making it more compatible with safety constraints and compliance requirements in ICS environments. Modern generative modeling, including diffusion models, has advanced significantly in producing high-fidelity synthetic data across domains. Diffusion approaches, such as denoising diffusion probabilistic models, learn to transform noise into coherent structured samples and have been successfully applied to tabular or time series data synthesis with better stability and data coverage compared to adversarial methods \citep{pmlr-v202-kotelnikov23a, rasul2021autoregressivedenoisingdiffusionmodels}
+
+Despite these advances, most existing work either focuses on packet-level generation \citep{jiang2023netdiffusionnetworkdataaugmentation} or is limited to generic tabular data \citep{pmlr-v202-kotelnikov23a}, rather than domain-specific control sequence synthesis tailored for ICS protocols where temporal coherence, multi-channel dependencies, and discrete protocol legality are jointly required. This gap motivates our focus on protocol feature-level generation for ICS, which involves synthesizing sequences of protocol-relevant fields conditioned on their temporal and cross-channel structure. In this work, we formulate a hybrid modeling pipeline that decouples long-horizon trends and local statistical detail while preserving discrete semantics of protocol tokens. By combining causal Transformers with diffusion-based refiners, and enforcing deterministic validity constraints during sampling, our framework generates semantically coherent, temporally consistent, and distributionally faithful ICS feature sequences. We evaluate features derived from the HAI Security Dataset and demonstrate that our approach produces high-quality synthetic sequences suitable for downstream augmentation, benchmarking, and integration into packet-reconstruction workflows that respect realistic ICS constraints.
+
+% 2. Related Work
+\section{Related Work}
+\label{sec:related}
+Early generation of network data oriented towards "realism" mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \citep{10.1145/1151659.1159928,10.1145/1159913.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \citep{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \citep{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \citep{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \citep{7469060,10.1145/3055366.3055375}. This makes "synthesis at the feature/telemetry level, aware of protocol and semantics" more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from "packet syntax reproduction" to "modeling of high-level spatio-temporal distributions and uncertainties", requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
+
+Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \citep{NEURIPS2020_4c5bcfec}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \citep{song2021scorebasedgenerativemodelingstochastic}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \citep{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \citep{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \citep{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \citep{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \citep{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \citep{11087622}. These results overall point to a conclusion: when the research focus is on "telemetry/high-level features" rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a "monolithic diffusion" is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
+
+Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be "legal" under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \citep{austin2023structureddenoisingdiffusionmodels}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \citep{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \citep{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \citep{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of "sequence legality and state coverage" \citep{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of "legality by construction" in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
+
+From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-facated fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \citep{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \citep{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \citep{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \citep{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \citep{nie2023patchtst}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \citep{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
+
+% 3. Methodology
+\section{Methodology}
+\label{sec:method}
+Industrial control system (ICS) telemetry is intrinsically mixed-type and mechanistically heterogeneous: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic: it directly affects what realistic synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled \citep{nist2023sp80082}.
+
+We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
+
+A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}. Figure~\ref{fig:design} summarizes how our framework maps these requirements into a staged generator for mixed-type ICS telemetry.
+
+\begin{figure}[htbp]
+  \centering
+  \includegraphics[width=0.8\textwidth]{fig-design-v4-from-user-svg-cropped.pdf}
+  \caption{Masked-DDPM: Unified Synthesis for ICS traffic}
+  \label{fig:design}
+\end{figure}
+
+Motivated by these considerations, we propose Mask-DDPM, organized in the following order:
+\begin{enumerate}
+  \item Transformer trend module: learns the dominant temporal backbone of continuous dynamics via attention-based sequence modeling \citep{vaswani2017attention}.
+
+  \item Residual DDPM for continuous variables: models distributional detail as stochastic residual structure conditioned on the learned trend \citep{ho2020denoising,kollovieh2023tsdiff}.
+
+  \item Masked diffusion for discrete variables: generates discrete ICS states with an absorbing/masking corruption process and categorical reconstruction \citep{austin2021structured, shi2024simplified}.
+
+  \item Type-aware decomposition: a type-aware factorization and routing layer that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
+\end{enumerate}
+
+This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. As shown in Figure~\ref{fig:design}, these components are arranged sequentially so that temporal scaffolding, residual refinement, and discrete legality are enforced in complementary rather than competing stages. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
+
+\subsection{Transformer trend module for continuous dynamics}
+\label{sec:method-trans}
+We instantiate the temporal backbone as a causal Transformer trend extractor, leveraging self-attention's ability to represent long-range dependencies and cross-channel interactions without recurrence \citep{vaswani2017attention}. Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation \citep{vaswani2017attention,nist2023sp80082}. Crucially, in our design the Transformer is not asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
+
+For continuous channels $\bm{X}$, we posit an additive decomposition:
+\begin{equation}
+\bm{X} = \bm{S} + \bm{R},
+\label{eq:additive_decomp}
+\end{equation}
+where $\bm{S} \in \mathbb{R}^{L \times d_c}$ is a smooth trend capturing predictable temporal evolution, and $\bm{R} \in \mathbb{R}^{L \times d_c}$ is a residual capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-based temporal objective. This separation reflects an explicit division of labor: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level, a strategy aligned with predict-then-refine perspectives in time-series diffusion modeling \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+
+We parameterize the trend $\bm{S}$ using a causal Transformer $f_\phi$. With teacher forcing, we train $F_\phi$ to predict the next-step trend from past observations:
+\begin{equation}
+\hat{\bm{S}}_{t+1} = f_{\phi}(\bm{X}_{1:t}), \quad t = 1, \dots, L-1.
+\label{eq:trend_prediction}
+\end{equation}
+using the mean-squared error objective:
+\begin{equation}
+\mathcal{L}_{\text{trend}}(\phi) = \frac{1}{(L-1)d_c} \sum_{t=1}^{L-1} \bigl\| \hat{\bm{S}}_{t+1} - \bm{X}_{t+1} \bigr\|_2^2.
+\label{eq:trend_loss}
+\end{equation}
+At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S}}$, and then define the residual target for diffusion as $\bm{R} = \bm{X} - \hat{\bm{S}}$. This setup intentionally locks in a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a conditioning-first temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an end-to-end monolithic generator \citep{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}.
+
+\subsection{DDPM for continuous residual generation}
+\label{sec:method-ddpm}
+We model the residual $\bm{R}$ with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff, sikder2023transfusion}.
+
+Let $\bm{K}$ denote the number of diffusion steps, with a noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
+\begin{equation}
+q(\bm{r}_k \mid \bm{r}_0) = \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr)
+\label{eq:forward_corruption}
+\end{equation}
+equivalently,
+\begin{equation}
+\bm{r}_k = \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
+\label{eq:forward_corruption_eq}
+\end{equation}
+The learned reverse process is parameterized as:
+\begin{equation}
+p_{\theta}(\bm{r}_{k-1} \mid \bm{r}_k, \hat{\bm{S}}) = \mathcal{N}\bigl( \boldsymbol{\mu}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}),\; \boldsymbol{\Sigma}(k) \bigr).
+\label{eq:reverse_process}
+\end{equation}
+where $\mu_\theta$ is implemented by a Transformer denoiser that consumes (i) the noised residual $r_k$, (ii) a timestep embedding for $k$, and (iii) conditioning features derived from $\hat{\bm{S}}$.  This denoiser architecture is consistent with the growing use of attention-based denoisers for long-context time-series diffusion, while our key methodological emphasis is the trend-conditioned residual factorization as the object of diffusion learning \citep{ho2020denoising,sikder2023transfusion}.
+
+We train the denoiser using the standard DDPM $\epsilon$-prediction objective:
+\begin{equation}
+\mathcal{L}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right].
+\label{eq:ddpm_loss}
+\end{equation}
+Because diffusion optimization can exhibit timestep imbalance (i.e., some timesteps dominate gradients), we optionally apply an SNR-based reweighting consistent with Min-SNR training:
+\begin{equation}
+\mathcal{L}^{\text{snr}}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ w_k \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right],
+\label{eq:snr_loss}
+\end{equation}
+where $\mathrm{SNR}_k=\bar{\alpha}_k/(1-\bar{\alpha}_k)$ and $\gamma>0$ is a cap parameter \citep{hang2023efficient}.
+
+After sampling $\hat{\bm{R}}$ by reverse diffusion, we reconstruct the continuous output as $\hat{\bm{X}} = \hat{\bm{S}} + \hat{\bm{R}}$. Overall, the DDPM component serves as a distributional corrector on top of a temporally coherent backbone, which is particularly suited to ICS where low-frequency dynamics are strong and persistent but fine-scale variability (including bursts and regime-conditioned noise) remains important for realism. Relative to prior ICS diffusion efforts that primarily focus on continuous augmentation, our formulation elevates trend-conditioned residual diffusion as a modular mechanism for disentangling temporal structure from distributional refinement \citep{yuan2025ctu,sha2026ddpm}.
+
+\subsection{Masked diffusion for discrete ICS variables}
+\label{sec:method-discrete}
+Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., in-between modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables \citep{austin2021structured,shi2024simplified}. In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness \citep{nist2023sp80082}.
+
+We therefore adopt masked (absorbing) diffusion for discrete channels, where corruption replaces tokens with a special $\texttt{[MASK]}$ symbol according to a schedule \citep{shi2024simplified}. For each variable $j$, define a masking schedule $\{m_k\}_{k=1}^K$ (with $m_k\in[0,1]$) increasing in $k$. The forward corruption process is:
+\begin{equation}
+q(y^{(j)}_k \mid y^{(j)}_0) =
+\begin{cases}
+y^{(j)}_0,            & \text{with probability } 1 - m_k, \\
+\texttt{[MASK]},      & \text{with probability } m_k,
+\end{cases}
+\label{eq:masking_process}
+\end{equation}
+applied independently across $j$ and $t$. Let $\mathcal{M}$ denote the set of masked positions at step $k$. The denoiser $h_{\psi}$ predicts a categorical distribution over $\mathcal{V}_j$ for each masked token, conditioned on (i) the corrupted discrete sequence, (ii) the diffusion step $k$, and (iii) continuous context. Concretely, we condition on $\hat{\bm{S}}$ and $\hat{\bm{X}}$ to couple supervisory reconstruction to the underlying continuous dynamics:
+\begin{equation}
+p_{\psi}\bigl( y^{(j)}_0 \mid y_k, k, \hat{\bm{S}}, \hat{\bm{X}} \bigr) = h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}}).
+\label{eq:discrete_denoising}
+\end{equation}
+This conditioning choice is motivated by the fact that many discrete ICS states are not standalone, they are functions of regimes, thresholds, and procedural phases that manifest in continuous channels \citep{nist2023sp80082}. Training uses a categorical denoising objective:
+\begin{equation}
+\mathcal{L}_{\text{disc}}(\psi) = \mathbb{E}_{k} \left[ \frac{1}{|\mathcal{M}|} \sum_{(j,t) \in \mathcal{M}} \mathrm{CE}\bigl( h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}})_{j,t},\; y^{(j)}_{0,t} \bigr) \right],
+\label{eq:discrete_loss}
+\end{equation}
+where $\mathrm{CE}(\cdot,\cdot)$ is cross-entropy. At sampling time, we initialize all discrete tokens as $\texttt{[MASK]}$ and iteratively unmask them using the learned conditionals, ensuring that every output token lies in its legal vocabulary by construction. This discrete branch is a key differentiator of our pipeline: unlike typical continuous-only diffusion augmentation in ICS, we integrate masked diffusion as a first-class mechanism for supervisory-variable legality within the same end-to-end synthesis workflow \citep{shi2024simplified,yuan2025ctu}.
+
+\subsection{Type-aware decomposition as factorization and routing layer}
+\label{sec:method-types}
+Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some derived tags are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables \citep{nist2023sp80082}.
+
+We therefore introduce a type-aware decomposition that formalizes this heterogeneity as a routing and constraint layer. Let $\tau(i)\in{1,\dots,6}$ assign each variable $i$ to a type class. For expository convenience, the assignment can be viewed as a mapping $\tau(i)=\mathrm{TypeAssign}(m_i, s_i, d_i)$, where $m_i$, $s_i$, and $d_i$ denote metadata/engineering role, temporal signature, and dependency pattern, respectively. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section. Importantly, this refinement does not change the core diffusion backbone; it changes which mechanism is responsible for which variable, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
+
+We use the following taxonomy:
+\begin{enumerate}
+	\item Type 1 (program-driven / setpoint-like): externally commanded, step-and-dwell variables. These variables can be treated as exogenous drivers (conditioning signals) or routed to specialized change-point / dwell-time models, rather than being forced into a smooth denoiser that may over-regularize step structure.
+
+	\item Type 2 (controller outputs): continuous variables tightly coupled to feedback loops; these benefit from conditional modeling where the conditioning includes relevant process variables and commanded setpoints.
+
+	\item Type 3 (actuator states/positions): often exhibit saturation, dwell, and rate limits; these may require stateful dynamics beyond generic residual diffusion, motivating either specialized conditional modules or additional inductive constraints.
+
+	\item Type 4 (process variables): inertia-dominated continuous dynamics; these are the primary beneficiaries of the Transformer trend + residual DDPM pipeline.
+
+	\item Type 5 (derived/deterministic variables): algebraic or rule-based functions of other variables; we enforce deterministic reconstruction $\hat{x}^{(i)} = g_i(\hat{X},\hat{Y})$ rather than learning a stochastic generator, improving logical consistency and sample efficiency.
+
+	\item Type 6 (auxiliary/low-impact variables): weakly coupled or sparse signals; we allow simplified modeling (e.g., calibrated marginals or lightweight temporal models) to avoid allocating diffusion capacity where it is not warranted.
+\end{enumerate}
+
+\begin{figure}[H]
+  \centering
+  \includegraphics[width=0.98\textwidth]{typeclass-cropped.pdf}
+  \caption{Type assignment and six-type taxonomy.}
+  \label{fig:type_taxonomy}
+\end{figure}
+
+Figure~\ref{fig:type_taxonomy} visualizes the six-type taxonomy and the routing logic behind it. Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
+
+From a novelty standpoint, this layer is not merely an engineering patch; it is an explicit methodological statement that ICS synthesis benefits from typed factorization, a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \citep{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
+
+\subsection{Joint optimization and end-to-end sampling}
+\label{sec:method-joint}
+We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer $f_{\phi}$ to obtain $\hat{\bm{S}}$; (ii) we compute residual targets $\hat{\bm{R}} = \bm{X} - \hat{\bm{S}}$ for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM $p_{\theta}(\bm{R}\mid \hat{\bm{S}})$ and masked diffusion model $p_{\psi}(\bm{Y}\mid \text{masked}(\bm{Y}), \hat{\bm{S}}, \hat{\bm{X}})$; and (iv) we apply type-aware routing and deterministic reconstruction during sampling. This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate structure-vs.-realism conflicts \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+
+A simple combined objective is $\mathcal{L} = \lambda\mathcal{L}_{\text{cont}} + (1-\lambda)\mathcal{L}_{\text{disc}}$ with $\lambda\in[0,1]$ controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or as exogenous / specialized state channels (e.g., driver-like or actuator-state variables) are prevented from forcing the diffusion models into statistically incoherent compromises.
+
+At inference time, generation follows the same structured order: (i) trend $\hat{\bm{S}}$ via the Transformer, (ii) residual $\hat{\bm{R}}$ via DDPM, (iii) discrete $\hat{\bm{Y}}$ via masked diffusion, and (iv) type-aware assembly with deterministic reconstruction for routed variables. This pipeline produces $(\hat{\bm{X}},\hat{\bm{Y}})$ that are temporally coherent by construction (through $\hat{\bm{S}}$), distributionally expressive (through $\hat{\bm{R}}$ denoising), and discretely valid (through masked diffusion), while explicitly accounting for heterogeneous variable-generating mechanisms through type-aware routing. In combination, these choices constitute our central methodological contribution: a unified Transformer + mixed diffusion generator for ICS telemetry, augmented by typed factorization to align model capacity with domain mechanism \citep{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}.
+
+% 4. Benchmark
+\section{Benchmark}
+\label{sec:benchmark}
+A credible ICS generator must clear three hurdles. It must first be \emph{semantically legal}: any out-of-vocabulary supervisory token renders a sample unusable, regardless of marginal fidelity. It must then match the heterogeneous statistics of mixed-type telemetry, including continuous process channels and discrete supervisory states. Finally, it must preserve \emph{mechanism-level realism}: switch-and-dwell behavior, bounded control motion, cross-tag coordination, and short-horizon persistence. We therefore organize the benchmark as a funnel from legality and reproducibility to structural diagnosis and ablation \citep{coletta2023constrained,yang2001interlock,stenger2024survey}.
+
+For continuous channels, we use the Kolmogorov--Smirnov (KS) statistic because ICS process signals are often bounded, saturated, heavy-tailed, and plateau-dominated, so moment matching alone is too weak. KS directly compares empirical cumulative distributions, makes no parametric assumption, and is sensitive to support or shape mismatches that are operationally meaningful in telemetry. For discrete channels, realism is primarily about how probability mass is distributed over a finite vocabulary, so we use Jensen--Shannon divergence (JSD) between per-feature categorical marginals and average across discrete variables \citep{lin1991divergence,yoon2019timegan}. To assess short-horizon dynamics, we compare lag-1 autocorrelation feature-wise and report the mean absolute difference between real and synthetic lag-1 coefficients. We also track semantic legality by counting out-of-vocabulary discrete outputs and report a filtered KS that excludes near-constant channels so that trivially flat tags do not dominate the aggregate.
+
+\subsection{Core fidelity, legality, and reproducibility}
+\label{sec:benchmark-quant}
+Across three independent runs, Mask-DDPM achieves mean KS $=0.3311 \pm 0.0079$, mean JSD $=0.0284 \pm 0.0073$, and mean absolute lag-1 difference $=0.2684 \pm 0.0027$, while maintaining a validity rate of \textbf{100\%} across the modeled discrete channels. The small dispersion across runs shows that mixed-type fidelity is reproducible rather than dependent on a single favorable seed. On a representative diagnostic slice, the model attains mean KS $=0.4025$, filtered mean KS $=0.3191$, mean JSD $=0.0166$, and mean absolute lag-1 difference $=0.2859$, again with zero invalid discrete tokens. The main pattern is that discrete legality is already solved, while continuous mismatch is concentrated in a limited subset of difficult channels rather than spread uniformly across the telemetry space.
+
+\begin{figure}[htbp]
+  \centering
+  \includegraphics[width=\textwidth]{fig-benchmark-story-v2.png}
+  \caption{Benchmark evidence chain.}
+  \label{fig:benchmark_story}
+\end{figure}
+
+\begin{table}[htbp]
+\centering
+\caption{Core benchmark summary. Lower is better except for validity rate.}
+\label{tab:core_metrics}
+\begin{tabular}{@{}lcc@{}}
+\toprule
+\textbf{Metric} & \textbf{3-run mean $\pm$ std} & \textbf{Diagnostic slice} \\
+\midrule
+Mean KS (continuous) & $0.3311 \pm 0.0079$ & $0.4025$ \\
+Filtered mean KS & -- & $0.3191$ \\
+Mean JSD (discrete) & $0.0284 \pm 0.0073$ & $0.0166$ \\
+Mean abs. $\Delta$ lag-1 autocorr & $0.2684 \pm 0.0027$ & $0.2859$ \\
+Validity rate (26 discrete tags) $\uparrow$ & $100.0 \pm 0.0\%$ & $100.0\%$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+%Question about the following part. "Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis."
+Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis. The left panel shows seed-level stability across the three benchmark runs. The middle panel shows that the dominant continuous mismatch is concentrated in a relatively small subset of control-sensitive variables rather than indicating a global collapse of the generator. The right panel shows that the remaining realism gap is mechanism-specific, with program-like long-dwell behavior and actuator-state occupancy contributing more strongly than PV-like channels on this slice.
+
+\subsection{Type-aware diagnostics}
+\label{sec:benchmark-typed}
+Type-aware diagnostics make that mechanism gap explicit. Table~\ref{tab:typed_diagnostics} reports one representative statistic per variable family on the same diagnostic slice. Because each family is evaluated with a different proxy, the absolute-error column should be interpreted within type, while the relative-error column is the more comparable cross-type indicator.
+
+\begin{table}[htbp]
+\centering
+\caption{Type-aware diagnostic summary. Lower values indicate better alignment.}
+\label{tab:typed_diagnostics}
+\begin{tabular}{@{}llcc@{}}
+\toprule
+\textbf{Type} & \textbf{Proxy statistic} & \textbf{Mean abs. error} & \textbf{Mean rel. error} \\
+\midrule
+Program & mean dwell & $318.70$ & $2.19$ \\
+Controller & change rate & $0.104$ & $0.25$ \\
+Actuator & top-3 mass & $0.0615$ & $0.69$ \\
+PV & tail ratio & $1.614$ & $0.20$ \\
+Auxiliary & lag-1 autocorr & $0.125$ & $0.37$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Program-like channels remain the hardest family by a clear margin: mean-dwell mismatch is still large, indicating that the generator does not yet sustain the long plateaus characteristic of schedule-driven behavior. Actuator channels form the next clear difficulty, while PV channels are the most stable family under this diagnostic. In short, legality is solved, but the remaining realism gap is not uniform across types; it is dominated primarily by long-dwell program behavior and actuator-state occupancy.
+
+\subsection{Ablation study}
+\label{sec:benchmark-ablation}
+We evaluate ten controlled variants under a shared pipeline and summarize six representative metrics: continuous fidelity (KS), discrete fidelity (JSD), short-horizon dynamics (lag-1), cross-variable coupling, predictive transfer, and downstream anomaly utility. Figure~\ref{fig:ablation_impact} visualizes signed changes relative to the full model, and Table~\ref{tab:ablation} gives the underlying values.
+
+\begin{figure}[htbp]
+  \centering
+  \includegraphics[width=\textwidth]{fig-benchmark-ablations-v1.png}
+  \caption{Ablation impact.}
+  \label{fig:ablation_impact}
+\end{figure}
+
+\begin{table}[htbp]
+\centering
+\small
+\caption{Ablation study. Lower is better except for anomaly AUPRC.}
+\label{tab:ablation}
+\begin{tabular}{@{}lcccccc@{}}
+\toprule
+\textbf{Variant} & \textbf{KS$\downarrow$} & \textbf{JSD$\downarrow$} & \textbf{Lag-1$\downarrow$} & \textbf{Coupling$\downarrow$} & \textbf{Pred. RMSE$\downarrow$} & \textbf{AUPRC$\uparrow$} \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Full model}} \\
+Full model & $0.402$ & $0.028$ & $0.291$ & $0.215$ & $0.972$ & $0.644$ \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Structure and conditioning}} \\
+No temporal scaffold & $0.408$ & $0.031$ & $0.664$ & $0.306$ & $0.977$ & $0.645$ \\
+No file-level context & $0.405$ & $0.033$ & $0.237$ & $0.262$ & $0.986$ & $0.640$ \\
+No type routing & $0.356$ & $0.022$ & $0.138$ & $0.324$ & $1.017$ & $0.647$ \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Distribution shaping}} \\
+No quantile transform & $0.599$ & $0.010$ & $0.156$ & $0.300$ & $1.653$ & $0.417$ \\
+No post-calibration & $0.543$ & $0.024$ & $0.253$ & $0.249$ & $1.086$ & $0.647$ \\
+\midrule
+\multicolumn{7}{@{}l}{\textit{Loss and target design}} \\
+No SNR weighting & $0.400$ & $0.022$ & $0.299$ & $0.214$ & $0.961$ & $0.637$ \\
+No quantile loss & $0.413$ & $0.018$ & $0.311$ & $0.213$ & $0.965$ & $0.645$ \\
+No residual-stat loss & $0.404$ & $0.029$ & $0.285$ & $0.210$ & $0.970$ & $0.647$ \\
+Epsilon target & $0.482$ & $0.102$ & $0.728$ & $0.195$ & $0.968$ & $0.647$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The ablation results reveal three distinct roles. First, temporal staging is what makes the sequence look dynamical rather than merely plausible frame by frame: removing the temporal scaffold leaves KS nearly unchanged but more than doubles lag-1 error ($0.291 \rightarrow 0.664$) and substantially worsens coupling ($0.215 \rightarrow 0.306$). Second, quantile-based distribution shaping is one of the main contributors to usable continuous realism: without the quantile transform, KS degrades sharply ($0.402 \rightarrow 0.599$), synthetic-only predictive RMSE deteriorates ($0.972 \rightarrow 1.653$), and anomaly utility collapses ($0.644 \rightarrow 0.417$). Third, routing is the key counterexample to one-dimensional evaluation: disabling type routing can improve KS or lag-1 in isolation, yet it worsens coupling ($0.215 \rightarrow 0.324$) and predictive transfer ($0.972 \rightarrow 1.017$), showing that typed decomposition helps preserve coordinated mechanism-level behavior.
+
+Taken together, the benchmark supports a focused claim. Mask-DDPM already provides stable mixed-type fidelity and perfect discrete legality, while the remaining error is concentrated in ICS-specific channels whose realism depends on rare switching, long dwell intervals, constrained occupancy, and persistent local dynamics.
+
+% 5. Conclusion and Future Work
+\section{Conclusion and Future Work}
+\label{sec:conclusion}
+This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control systems (ICS) security by proposing Mask-DDPM, a hybrid synthetic telemetry generator at the protocol-feature level. By combining a causal Transformer trend module, a trend-conditioned residual DDPM, a masked diffusion branch for discrete variables, and a type-aware routing layer, the framework preserves long-horizon temporal structure, improves local distributional fidelity, and guarantees discrete semantic legality. On windows derived from the HAI Security Dataset, the model achieves stable mixed-type fidelity across seeds, with mean KS = 0.3311 $\pm$ 0.0079 on continuous features, mean JSD = 0.0284 $\pm$ 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027.
+
+Overall, Mask-DDPM provides a reproducible foundation for generating shareable and semantically valid ICS feature sequences for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Future work will proceed in two complementary directions. Vertically, we will strengthen the theoretical foundation of the framework by introducing more explicit control-theoretic constraints, structured state-space or causal priors, and formal transition models for supervisory logic, so that legality, stability, and cross-channel coupling can be characterized more rigorously. Horizontally, we will extend the framework beyond the current setting to additional industrial control protocols such as Modbus/TCP, DNP3, IEC 104, and OPC UA, and investigate analogous adaptations to automotive communication protocols such as CAN/CAN FD and automotive Ethernet. A related extension is controllable attack or violation injection on top of legal base traces, enabling reproducible adversarial benchmarks for anomaly detection and intrusion-detection studies.
+
+\bibliographystyle{unsrtnat}
+\bibliography{references}
+
+\end{document}
--- a/arxiv-style/references.bib
+++ b/arxiv-style/references.bib
@@ -0,0 +1,614 @@
+
+@inproceedings{vaswani2017attention,
+  title={Attention Is All You Need},
+  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
+  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
+  volume={30},
+  year={2017},
+  url={https://arxiv.org/abs/1706.03762}
+}
+
+@inproceedings{ho2020denoising,
+  title={Denoising Diffusion Probabilistic Models},
+  author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
+  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
+  volume={33},
+  pages={6840--6851},
+  year={2020},
+  url={https://arxiv.org/abs/2006.11239}
+}
+
+@inproceedings{austin2021structured,
+  title={Structured Denoising Diffusion Models in Discrete State-Spaces},
+  author={Austin, Jacob and Johnson, Daniel D and Ho, Jonathan and Tarlow, Daniel and van den Berg, Rianne},
+  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
+  volume={34},
+  pages={17981--17993},
+  year={2021},
+  url={https://arxiv.org/abs/2107.03006}
+}
+
+@article{shi2024simplified,
+  title={Simplified and Generalized Masked Diffusion for Discrete Data},
+  author={Shi, Juntong and Han, Ke and Wang, Zinan and Doucet, Arnaud and Titsias, Michalis K},
+  journal={arXiv preprint},
+  eprint={2406.04329},
+  archivePrefix={arXiv},
+  year={2024},
+  url={https://arxiv.org/abs/2406.04329}
+}
+
+@inproceedings{hang2023efficient,
+  title={Efficient Diffusion Training via Min-SNR Weighting Strategy},
+  author={Hang, Tianyu and Gu, Shuyang and Li, Chen and Bao, Jianmin and Chen, Dong and Hu, Han and Geng, Xin and Guo, Boxin},
+  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+  pages={7407--7417},
+  year={2023},
+  doi={10.1109/ICCV51070.2023.00702},
+  url={https://arxiv.org/abs/2303.09556}
+}
+
+@inproceedings{kollovieh2023tsdiff,
+  title={Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting},
+  author={Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Fatir Ansari, Abdul and Salinas, David},
+  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
+  volume={36},
+  year={2023},
+  url={https://arxiv.org/abs/2307.11494}
+}
+
+@article{sikder2023transfusion,
+  title={TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers},
+  author={Sikder, M. F. and Ramachandranpillai, R. and Heintz, F.},
+  journal={arXiv preprint},
+  eprint={2307.12667},
+  archivePrefix={arXiv},
+  year={2023},
+  url={https://arxiv.org/abs/2307.12667}
+}
+
+@inproceedings{song2021score,
+  title={Score-Based Generative Modeling through Stochastic Differential Equations},
+  author={Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P and Kumar, Abhishek and Ermon, Stefano and Poole, Ben},
+  booktitle={International Conference on Learning Representations (ICLR)},
+  year={2021},
+  url={https://arxiv.org/abs/2011.13456}
+}
+
+@inproceedings{shi2025tabdiff,
+  title={TabDiff: A Mixed-type Diffusion Model for Tabular Data Generation},
+  author={Shi, Juntong and Xu, Minkai and Hua, Harper and Zhang, Hengrui and Ermon, Stefano and Leskovec, Jure},
+  booktitle={International Conference on Learning Representations (ICLR)},
+  year={2025},
+  url={https://arxiv.org/abs/2410.20626}
+}
+
+@inproceedings{yuan2025ctu,
+  title={CTU-DDPM: Generating Industrial Control System Time-Series Data with a CNN-Transformer Hybrid Diffusion Model},
+  author={Yuan, Yusong and Sha, Yun and Zhao, Wei and Zhang, Kun},
+  booktitle={Proceedings of the 2025 International Symposium on Artificial Intelligence and Computational Social Sciences (ACM AICSS)},
+  pages={123--132},
+  year={2025},
+  doi={10.1145/3776759.3776845},
+  url={https://dl.acm.org/doi/10.1145/3776759.3776845}
+}
+
+@misc{sha2026ddpm,
+  title={DDPM Fusing Mamba and Adaptive Attention: An Augmentation Method for Industrial Control Systems Anomaly Data},
+  author={Sha, Yun and Yuan, Yusong and Wu, Yonghao and Zhao, Haidong},
+  year={2026},
+  month={jan},
+  note={SSRN Electronic Journal},
+  eprint={6055903},
+  archivePrefix={SSRN},
+  doi={10.2139/ssrn.6055903},
+  url={https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055903}
+}
+
+@techreport{nist2023sp80082,
+  title={Guide to Operational Technology (OT) Security},
+  author={{National Institute of Standards and Technology}},
+  institution={NIST},
+  type={Special Publication},
+  number={800-82 Rev. 3},
+  year={2023},
+  month={sep},
+  doi={10.6028/NIST.SP.800-82r3},
+  url={https://csrc.nist.gov/pubs/sp/800/82/r3/final}
+}
+
+Reference for Introduction Part
+
+@article{10.1007/s10844-022-00753-1,
+author = {Koay, Abigail M. Y. and Ko, Ryan K. L and Hettema, Hinne and Radke, Kenneth},
+title = {Machine learning in industrial control system (ICS) security: current landscape, opportunities and challenges},
+year = {2022},
+issue_date = {Apr 2023},
+publisher = {Kluwer Academic Publishers},
+address = {USA},
+volume = {60},
+number = {2},
+issn = {0925-9902},
+url = {https://doi.org/10.1007/s10844-022-00753-1},
+doi = {10.1007/s10844-022-00753-1},
+abstract = {The advent of Industry 4.0 has led to a rapid increase in cyber attacks on industrial systems and processes, particularly on Industrial Control Systems (ICS). These systems are increasingly becoming prime targets for cyber criminals and nation-states looking to extort large ransoms or cause disruptions due to their ability to cause devastating impact whenever they cease working or malfunction. Although myriads of cyber attack detection systems have been proposed and developed, these detection systems still face many challenges that are typically not found in traditional detection systems. Motivated by the need to better understand these challenges to improve current approaches, this paper aims to (1) understand the current vulnerability landscape in ICS, (2) survey current advancements of Machine Learning (ML) based methods with respect to the usage of ML base classifiers (3) provide insights to benefits and limitations of recent advancement with respect to two performance vectors; detection accuracy and attack variety. Based on our findings, we present key open challenges which will represent exciting research opportunities for the research community.},
+journal = {J. Intell. Inf. Syst.},
+month = oct,
+pages = {377–405},
+numpages = {29},
+keywords = {Operational technology, Cyber security, Dataset, Industrial control systems, Machine learning, Critical infrastructure}
+}
+
+@ARTICLE{Nankya2023-gp,
+  title     = "Securing industrial Control Systems: Components, cyber threats,
+               and machine learning-driven defense strategies",
+  author    = "Nankya, Mary and Chataut, Robin and Akl, Robert",
+  abstract  = "Industrial Control Systems (ICS), which include Supervisory
+               Control and Data Acquisition (SCADA) systems, Distributed
+               Control Systems (DCS), and Programmable Logic Controllers (PLC),
+               play a crucial role in managing and regulating industrial
+               processes. However, ensuring the security of these systems is of
+               utmost importance due to the potentially severe consequences of
+               cyber attacks. This article presents an overview of ICS
+               security, covering its components, protocols, industrial
+               applications, and performance aspects. It also highlights the
+               typical threats and vulnerabilities faced by these systems.
+               Moreover, the article identifies key factors that influence the
+               design decisions concerning control, communication, reliability,
+               and redundancy properties of ICS, as these are critical in
+               determining the security needs of the system. The article
+               outlines existing security countermeasures, including network
+               segmentation, access control, patch management, and security
+               monitoring. Furthermore, the article explores the integration of
+               machine learning techniques to enhance the cybersecurity of ICS.
+               Machine learning offers several advantages, such as anomaly
+               detection, threat intelligence analysis, and predictive
+               maintenance. However, combining machine learning with other
+               security measures is essential to establish a comprehensive
+               defense strategy for ICS. The article also addresses the
+               challenges associated with existing measures and provides
+               recommendations for improving ICS security. This paper becomes a
+               valuable reference for researchers aiming to make meaningful
+               contributions within the constantly evolving ICS domain by
+               providing an in-depth examination of the present state,
+               challenges, and potential future advancements.",
+  journal   = "Sensors (Basel)",
+  publisher = "MDPI AG",
+  volume    =  23,
+  number    =  21,
+  pages     = "8840",
+  month     =  oct,
+  year      =  2023,
+  keywords  = "SCADA; anomaly detection; artificial intelligence; attacks;
+               cyber defense; cyber threats; industrial control systems;
+               security; vulnerabilities",
+  copyright = "https://creativecommons.org/licenses/by/4.0/",
+  language  = "en"
+}
+
+@misc{shin,
+  title = {HAI Security Dataset},
+  url = {https://www.kaggle.com/dsv/5821622},
+  doi = {10.34740/kaggle/dsv/5821622},
+  publisher = {Kaggle},
+  author = {Shin, Hyeok-Ki and Lee, Woomyo and Choi, Seungoh and Yun, Jeong-Han and Min, Byung Gil and Kim, HyoungChun},
+  year = {2023}
+}
+
+@Article{info16100910,
+AUTHOR = {Ali, Jokha and Ali, Saqib and Al Balushi, Taiseera and Nadir, Zia},
+TITLE = {Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning},
+JOURNAL = {Information},
+VOLUME = {16},
+YEAR = {2025},
+NUMBER = {10},
+ARTICLE-NUMBER = {910},
+URL = {https://www.mdpi.com/2078-2489/16/10/910},
+ISSN = {2078-2489},
+ABSTRACT = {Securing Industrial Control Systems (ICSs) is critical, but it is made challenging by the constant evolution of cyber threats and the scarcity of labeled attack data in these specialized environments. Standard intrusion detection systems (IDSs) often fail to adapt when transferred to new networks with limited data. To address this, this paper introduces an adaptive intrusion detection framework that combines a hybrid Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) model with a novel transfer learning strategy. We employ a Reinforcement Learning (RL) agent to intelligently guide the fine-tuning process, which allows the IDS to dynamically adjust its parameters such as layer freezing and learning rates in real-time based on performance feedback. We evaluated our system in a realistic data-scarce scenario using only 50 labeled training samples. Our RL-Guided model achieved a final F1-score of 0.9825, significantly outperforming a standard neural fine-tuning model (0.861) and a target baseline model (0.759). Analysis of the RL agent’s behavior confirmed that it learned a balanced and effective policy for adapting the model to the target domain. We conclude that the proposed RL-guided approach creates a highly accurate and adaptive IDS that overcomes the limitations of static transfer learning methods. This dynamic fine-tuning strategy is a powerful and promising direction for building resilient cybersecurity defenses for critical infrastructure.},
+DOI = {10.3390/info16100910}
+}
+
+@InProceedings{pmlr-v202-kotelnikov23a,
+  title =          {{T}ab{DDPM}: Modelling Tabular Data with Diffusion Models},
+  author =       {Kotelnikov, Akim and Baranchuk, Dmitry and Rubachev, Ivan and Babenko, Artem},
+  booktitle =          {Proceedings of the 40th International Conference on Machine Learning},
+  pages =          {17564--17579},
+  year =          {2023},
+  editor =          {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
+  volume =          {202},
+  series =          {Proceedings of Machine Learning Research},
+  month =          {23--29 Jul},
+  publisher =    {PMLR},
+  pdf =          {https://proceedings.mlr.press/v202/kotelnikov23a/kotelnikov23a.pdf},
+  url =          {https://proceedings.mlr.press/v202/kotelnikov23a.html},
+  abstract =          {Denoising diffusion probabilistic models are becoming the leading generative modeling paradigm for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling since the individual features can be of a completely different nature, i.e., some of them can be continuous and some can be discrete. To address such data types, we introduce TabDDPM — a diffusion model that can be universally applied to any tabular dataset and handles any feature types. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields.}
+}
+
+@misc{rasul2021autoregressivedenoisingdiffusionmodels,
+      title={Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting},
+      author={Kashif Rasul and Calvin Seward and Ingmar Schuster and Roland Vollgraf},
+      year={2021},
+      eprint={2101.12072},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2101.12072},
+}
+
+@misc{jiang2023netdiffusionnetworkdataaugmentation,
+      title={NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation},
+      author={Xi Jiang and Shinan Liu and Aaron Gember-Jacobson and Arjun Nitin Bhagoji and Paul Schmitt and Francesco Bronzino and Nick Feamster},
+      year={2023},
+      eprint={2310.08543},
+      archivePrefix={arXiv},
+      primaryClass={cs.NI},
+      url={https://arxiv.org/abs/2310.08543},
+}
+
+Reference for Related Work
+
+@inproceedings{10.1145/1159913.1159928,
+author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
+title = {Realistic and responsive network traffic generation},
+year = {2006},
+isbn = {1595933085},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/1159913.1159928},
+doi = {10.1145/1159913.1159928},
+abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
+booktitle = {Proceedings of the 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications},
+pages = {111–122},
+numpages = {12},
+keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets},
+location = {Pisa, Italy},
+series = {SIGCOMM '06}
+}
+
+@article{Ring_2019,
+   title={Flow-based network traffic generation using Generative Adversarial Networks},
+   volume={82},
+   ISSN={0167-4048},
+   url={http://dx.doi.org/10.1016/j.cose.2018.12.012},
+   DOI={10.1016/j.cose.2018.12.012},
+   journal={Computers \& Security},
+   publisher={Elsevier BV},
+   author={Ring, Markus and Schlör, Daniel and Landes, Dieter and Hotho, Andreas},
+   year={2019},
+   month=may, pages={156–172}
+}
+
+@inproceedings{10.1145/3544216.3544251,
+author = {Yin, Yucheng and Lin, Zinan and Jin, Minhao and Fanti, Giulia and Sekar, Vyas},
+title = {Practical GAN-based synthetic IP header trace generation using NetShare},
+year = {2022},
+isbn = {9781450394208},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3544216.3544251},
+doi = {10.1145/3544216.3544251},
+abstract = {We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46\% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.},
+booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
+pages = {458–472},
+numpages = {15},
+keywords = {synthetic data generation, privacy, network packets, network flows, generative adversarial networks},
+location = {Amsterdam, Netherlands},
+series = {SIGCOMM '22}
+}
+
+@inproceedings{Lin_2020, series={IMC ’20},
+   title={Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions},
+   url={http://dx.doi.org/10.1145/3419394.3423643},
+   DOI={10.1145/3419394.3423643},
+   booktitle={Proceedings of the ACM Internet Measurement Conference},
+   publisher={ACM},
+   author={Lin, Zinan and Jain, Alankar and Wang, Chen and Fanti, Giulia and Sekar, Vyas},
+   year={2020},
+   month=oct, pages={464–483},
+   collection={IMC ’20}
+}
+
+@INPROCEEDINGS{7469060,
+  author={Mathur, Aditya P. and Tippenhauer, Nils Ole},
+  booktitle={2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater)},
+  title={SWaT: a water treatment testbed for research and training on ICS security},
+  year={2016},
+  volume={},
+  number={},
+  pages={31-36},
+  keywords={Sensors;Actuators;Feeds;Process control;Chemicals;Chemical sensors;Security;Cyber Physical Systems;Industrial Control Systems;Cyber Attacks;Cyber Defense;Water Testbed},
+  doi={10.1109/CySWater.2016.7469060}
+}
+
+@inproceedings{10.1145/3055366.3055375,
+author = {Ahmed, Chuadhry Mujeeb and Palleti, Venkata Reddy and Mathur, Aditya P.},
+title = {WADI: a water distribution testbed for research in the design of secure cyber physical systems},
+year = {2017},
+isbn = {9781450349758},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3055366.3055375},
+doi = {10.1145/3055366.3055375},
+abstract = {The architecture of a water distribution testbed (WADI), and on-going research in the design of secure water distribution system is presented. WADI consists of three stages controlled by Programmable Logic Controllers (PLCs) and two stages controlled via Remote Terminal Units (RTUs). Each PLC and RTU uses sensors to estimate the system state and the actuators to effect control. WADI is currently used to (a) conduct security analysis for water distribution networks, (b) experimentally assess detection mechanisms for potential cyber and physical attacks, and (c) understand how the impact of an attack on one CPS could cascade to other connected CPSs. The cascading effects of attacks can be studied in WADI through its connection to two other testbeds, namely for water treatment and power generation and distribution.},
+booktitle = {Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks},
+pages = {25–28},
+numpages = {4},
+keywords = {attack detection, cyber physical systems, cyber security, industrial control systems, water distribution testbed},
+location = {Pittsburgh, Pennsylvania},
+series = {CySWATER '17}
+}
+
+@misc{tashiro2021csdiconditionalscorebaseddiffusion,
+      title={CSDI Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation},
+      author={Yusuke Tashiro and Jiaming Song and Yang Song and Stefano Ermon},
+      year={2021},
+      eprint={2107.03502},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={httpsarxiv.orgabs2107.03502},
+}
+
+@misc{wen2024diffstgprobabilisticspatiotemporalgraph,
+      title={DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models},
+      author={Haomin Wen and Youfang Lin and Yutong Xia and Huaiyu Wan and Qingsong Wen and Roger Zimmermann and Yuxuan Liang},
+      year={2024},
+      eprint={2301.13629},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2301.13629},
+}
+
+@misc{liu2023pristiconditionaldiffusionframework,
+      title={PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation},
+      author={Mingzhe Liu and Han Huang and Hao Feng and Leilei Sun and Bowen Du and Yanjie Fu},
+      year={2023},
+      eprint={2302.09746},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2302.09746},
+}
+
+@misc{kong2021diffwaveversatilediffusionmodel,
+      title={DiffWave: A Versatile Diffusion Model for Audio Synthesis},
+      author={Zhifeng Kong and Wei Ping and Jiaji Huang and Kexin Zhao and Bryan Catanzaro},
+      year={2021},
+      eprint={2009.09761},
+      archivePrefix={arXiv},
+      primaryClass={eess.AS},
+      url={https://arxiv.org/abs/2009.09761},
+}
+
+@ARTICLE{11087622,
+  author={Liu, Xiaosi and Xu, Xiaowen and Liu, Zhidan and Li, Zhenjiang and Wu, Kaishun},
+  journal={IEEE Transactions on Mobile Computing},
+  title={Spatio-Temporal Diffusion Model for Cellular Traffic Generation},
+  year={2026},
+  volume={25},
+  number={1},
+  pages={257-271},
+  keywords={Base stations;Diffusion models;Data models;Uncertainty;Predictive models;Generative adversarial networks;Knowledge graphs;Mobile computing;Telecommunication traffic;Semantics;Cellular traffic;data generation;diffusion model;spatio-temporal graph},
+  doi={10.1109/TMC.2025.3591183}
+}
+
+@misc{hoogeboom2021argmaxflowsmultinomialdiffusion,
+      title={Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions},
+      author={Emiel Hoogeboom and Didrik Nielsen and Priyank Jaini and Patrick Forré and Max Welling},
+      year={2021},
+      eprint={2102.05379},
+      archivePrefix={arXiv},
+      primaryClass={stat.ML},
+      url={https://arxiv.org/abs/2102.05379},
+}
+
+@misc{li2022diffusionlmimprovescontrollabletext,
+      title={Diffusion-LM Improves Controllable Text Generation},
+      author={Xiang Lisa Li and John Thickstun and Ishaan Gulrajani and Percy Liang and Tatsunori B. Hashimoto},
+      year={2022},
+      eprint={2205.14217},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={httpsarxiv.orgabs2205.14217},
+}
+
+@misc{meng2025aflnetyearslatercoverageguided,
+      title={AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing},
+      author={Ruijie Meng and Van-Thuan Pham and Marcel Böhme and Abhik Roychoudhury},
+      year={2025},
+      eprint={2412.20324},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2412.20324},
+}
+
+@misc{godefroid2017learnfuzzmachinelearninginput,
+      title={Learn\&Fuzz: Machine Learning for Input Fuzzing},
+      author={Patrice Godefroid and Hila Peleg and Rishabh Singh},
+      year={2017},
+      eprint={1701.07232},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/1701.07232},
+}
+
+@misc{she2019neuzzefficientfuzzingneural,
+      title={NEUZZ: Efficient Fuzzing with Neural Program Smoothing},
+      author={Dongdong She and Kexin Pei and Dave Epstein and Junfeng Yang and Baishakhi Ray and Suman Jana},
+      year={2019},
+      eprint={1807.05620},
+      archivePrefix={arXiv},
+      primaryClass={cs.CR},
+      url={https://arxiv.org/abs/1807.05620},
+}
+
+@misc{dai2019transformerxlattentivelanguagemodels,
+      title={Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context},
+      author={Zihang Dai and Zhilin Yang and Yiming Yang and Jaime Carbonell and Quoc V. Le and Ruslan Salakhutdinov},
+      year={2019},
+      eprint={1901.02860},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/1901.02860},
+}
+
+@misc{zhou2021informerefficienttransformerlong,
+      title={Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
+      author={Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wancai Zhang},
+      year={2021},
+      eprint={2012.07436},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2012.07436},
+}
+
+@misc{wu2022autoformerdecompositiontransformersautocorrelation,
+      title={Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting},
+      author={Haixu Wu and Jiehui Xu and Jianmin Wang and Mingsheng Long},
+      year={2022},
+      eprint={2106.13008},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2106.13008},
+}
+
+@misc{zhou2022fedformerfrequencyenhanceddecomposed,
+      title={FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting},
+      author={Tian Zhou and Ziqing Ma and Qingsong Wen and Xue Wang and Liang Sun and Rong Jin},
+      year={2022},
+      eprint={2201.12740},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2201.12740},
+}
+
+@article{2023,
+   title={A Note on Extremal Sombor Indices of Trees with a Given Degree Sequence},
+   volume={90},
+   ISSN={0340-6253},
+   url={http://dx.doi.org/10.46793/match.90-1.197D},
+   DOI={10.46793/match.90-1.197d},
+   number={1},
+   journal={Match Communications in Mathematical and in Computer Chemistry},
+   publisher={University Library in Kragujevac},
+   author={Damjanović, Ivan and Milošević, Marko and Stevanović, Dragan},
+   year={2023},
+   pages={197–202}
+}
+
+Reference for Benchmark
+
+@article{stenger2024survey,
+  title={Evaluation is key: a survey on evaluation measures for synthetic time series},
+  author={Stenger, Michael and Leppich, Robert and Foster, Ian T and Kounev, Samuel and Bauer, Andre},
+  journal={Journal of Big Data},
+  volume={11},
+  number={1},
+  pages={66},
+  year={2024},
+  publisher={Springer}
+}
+
+@article{lin1991divergence,
+  title={Divergence measures based on the Shannon entropy},
+  author={Lin, Jianhua},
+  journal={IEEE Transactions on Information Theory},
+  volume={37},
+  number={1},
+  pages={145--151},
+  year={1991}
+}
+
+@inproceedings{yoon2019timegan,
+  title={Time-series generative adversarial networks},
+  author={Yoon, Jinsung and Jarrett, Daniel and van der Schaar, Mihaela},
+  booktitle={Advances in Neural Information Processing Systems},
+  volume={32},
+  year={2019}
+}
+
+@article{ni2021sigwasserstein,
+  title={Sig-Wasserstein GANs for time series generation},
+  author={Ni, Hao and Szpruch, Lukasz and Wiese, Magnus and Liao, Shujian and Xiao, Baoren},
+  journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems},
+  volume={5},
+  number={3},
+  pages={1--25},
+  year={2021}
+}
+
+@inproceedings{coletta2023constrained,
+  title={On the constrained time-series generation problem},
+  author={Coletta, Alessandro and Rossi, Roberto and others},
+  booktitle={Advances in Neural Information Processing Systems},
+  volume={36},
+  year={2023}
+}
+
+@article{yang2001interlock,
+  title={Automatic verification of safety interlock systems for industrial processes},
+  author={Yang, Sheng-Hong and Hsieh, Min-Chi},
+  journal={Journal of Loss Prevention in the Process Industries},
+  volume={14},
+  number={6},
+  pages={473--483},
+  year={2001},
+  publisher={Elsevier}
+}
+
+@misc{austin2023structureddenoisingdiffusionmodels,
+      title={Structured Denoising Diffusion Models in Discrete State-Spaces},
+      author={Jacob Austin and Daniel D. Johnson and Jonathan Ho and Daniel Tarlow and Rianne van den Berg},
+      year={2023},
+      eprint={2107.03006},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2107.03006},
+}
+
+@article{10.1145/1151659.1159928,
+author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
+title = {Realistic and responsive network traffic generation},
+year = {2006},
+issue_date = {October 2006},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+volume = {36},
+number = {4},
+issn = {0146-4833},
+url = {https://doi.org/10.1145/1151659.1159928},
+doi = {10.1145/1151659.1159928},
+abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
+journal = {SIGCOMM Comput. Commun. Rev.},
+month = aug,
+pages = {111–122},
+numpages = {12},
+keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets}
+}
+
+@inproceedings{NEURIPS2020_4c5bcfec,
+ author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
+ pages = {6840--6851},
+ publisher = {Curran Associates, Inc.},
+ title = {Denoising Diffusion Probabilistic Models},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
+ volume = {33},
+ year = {2020}
+}
+
+@misc{song2021scorebasedgenerativemodelingstochastic,
+      title={Score-Based Generative Modeling through Stochastic Differential Equations},
+      author={Yang Song and Jascha Sohl-Dickstein and Diederik P. Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},
+      year={2021},
+      eprint={2011.13456},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2011.13456},
+}
+@inproceedings{nie2023patchtst,
+  title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
+  author={Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant},
+  booktitle={International Conference on Learning Representations (ICLR)},
+  year={2023},
+  url={https://arxiv.org/abs/2211.14730}
+}
--- a/arxiv-style/typeclass-cropped.pdf
+++ b/arxiv-style/typeclass-cropped.pdf
--- a/copy.cmd
+++ b/copy.cmd
@@ -0,0 +1,66 @@
+@echo off
+setlocal
+
+:: ====================================================================
+:: 脚本名称: copy_md_files.cmd
+:: 脚本功能: 遍历/papers所有子文件夹，将所有.md文件复制到/papers/md文件夹下。
+:: 作者: [你的名字/公司名，可选]
+:: 日期: 2023-10-27
+:: 版本: 1.0
+:: ====================================================================
+
+:: 定义源目录和目标目录
+set "SOURCE_DIR=%~dp0papers"
+set "TARGET_DIR=%~dp0papers\md"
+
+echo.
+echo ======================================================
+echo           开始复制Markdown文件
+echo ======================================================
+echo.
+
+:: 检查源目录是否存在
+if not exist "%SOURCE_DIR%" (
+    echo 错误：源目录 "%SOURCE_DIR%" 不存在。
+    echo 请确保脚本与"papers"文件夹在同一目录下，或者修改SOURCE_DIR变量。
+    goto :eof
+)
+
+:: 创建目标目录（如果不存在）
+if not exist "%TARGET_DIR%" (
+    echo 创建目标目录: "%TARGET_DIR%"
+    mkdir "%TARGET_DIR%"
+    if errorlevel 1 (
+        echo 错误：无法创建目标目录 "%TARGET_DIR%"。
+        goto :eof
+    )
+) else (
+    echo 目标目录 "%TARGET_DIR%" 已存在。
+)
+
+echo.
+echo 正在扫描 "%SOURCE_DIR%" 及其子目录中的 .md 文件...
+echo.
+
+:: 遍历所有子文件夹并复制.md文件
+:: /s: 包含所有子目录
+:: /i: 不区分大小写匹配文件名（这里针对*.md）
+:: /y: 覆盖现有文件而不提示
+:: /d: 复制源文件比目标文件新，才进行复制 (可选，这里使用/y覆盖，所以可以省略/d)
+:: /h: 复制隐藏和系统文件 (如果.md文件可能是隐藏的)
+for /r "%SOURCE_DIR%" %%f in (*.md) do (
+    echo 正在复制: "%%f"
+    copy "%%f" "%TARGET_DIR%" >nul
+    if errorlevel 1 (
+        echo 警告：复制文件 "%%f" 失败。
+    )
+)
+
+echo.
+echo ======================================================
+echo           Markdown文件复制完成
+echo ======================================================
+echo.
+
+endlocal
+pause
--- a/fig/fig-design-v3.drawio.svg
+++ b/fig/fig-design-v3.drawio.svg
--- a/fig/fig-design-v3.png
+++ b/fig/fig-design-v3.png
--- a/fig/fig-design-v4.drawio.svg
+++ b/fig/fig-design-v4.drawio.svg
--- a/fig/fig-type-aware-v1.drawio.svg
+++ b/fig/fig-type-aware-v1.drawio.svg
--- a/fig/mask-ddpm-figure.drawio.svg
+++ b/fig/mask-ddpm-figure.drawio.svg
--- a/knowledges/Intro-Modbus-Diffusion-Tech-Route.md
+++ b/knowledges/Intro-Modbus-Diffusion-Tech-Route.md
@@ -0,0 +1,429 @@
+# Diffusion模型应用于Modbus
+
+## 🧑‍💻 User
+
+请你详细介绍这篇论文的内容，并且详细介绍Diffsion模型。
+
+---
+## 🤖 Assistant
+
+你好。我是人工智能工程师专家。基于你提供的 IEEE Transactions on Mobile Computing (2026) 论文 **《Spatio-Temporal Diffusion Model for Cellular Traffic Generation》**（用于蜂窝流量生成的时空扩散模型），我将为你进行深度的技术解读。
+
+这篇论文提出了一种名为 **STOUTER** 的创新模型，旨在解决大规模蜂窝网络流量数据生成中的核心难题：如何在模拟流量不确定性的同时，精准捕捉复杂的时空模式。
+
+以下是论文的详细解读以及对扩散模型（Diffusion Model）技术的深度剖析。
+
+---
+
+### 第一部分：论文《STOUTER》深度解读
+
+#### 1. 研究背景与痛点
+*   **背景**：5G/6G 网络规划（如基站选址、资源分配）需要海量的流量数据支持。
+*   **痛点**：
+    *   **数据隐私与获取难**：运营商数据敏感，难以公开获取。
+    *   **现有生成模型的局限**：传统的 GAN（生成对抗网络）在大规模生成时容易出现**模式坍塌（Mode Collapse）**，且生成的样本缺乏多样性。
+    *   **复杂的时空特性**：蜂窝流量具有长短期周期性（日/周模式）、空间异质性（不同区域基站负载不同）以及高度的不确定性（突发流量）。
+
+#### 2. 核心解决方案：STOUTER 模型
+论文提出了 **STOUTER** (Spatio-Temporal diffusiOn model for cellUlar Traffic genERation)。这是一个基于扩散模型的生成框架，主要由以下三个核心模块组成：
+
+##### A. 时空图建模 (Spatio-Temporal Graph Modeling)
+为了让模型“理解”流量的时空规律，作者构建了两个图：
+1.  **时间图 (Temporal Graph)**：
+    *   **节点**：一天内的24小时。
+    *   **边**：连接相邻的小时（短期依赖）和一周内不同天数的同一小时（长期周期性，如周一上午9点与周二上午9点）。
+    *   **作用**：捕捉长期和短期的周期性模式。
+2.  **基站图 (Base Station Graph)**：
+    *   **节点**：基站。
+    *   **边**：基于地理距离连接（如距离 < 1km）。
+    *   **特征**：融合了 POI（兴趣点）信息（如教育、医疗、餐饮等），因为不同功能的区域流量模式完全不同。
+    *   **作用**：捕捉空间相关性和区域功能差异。
+
+利用 **图自编码器 (Graph Autoencoder)** 对这两个图进行预训练，提取出潜在的时空特征嵌入（Embedding）。
+
+##### B. 基于扩散的生成过程 (Diffusion-Based Generation)
+这是模型的核心引擎（后文详细解释扩散模型原理）。
+*   **前向过程**：向真实的流量数据中逐步添加高斯噪声，直到数据变成纯噪声。
+*   **反向过程**：从纯噪声开始，利用神经网络逐步去噪，恢复出逼真的流量数据。
+*   **创新点**：利用扩散模型的随机采样特性，完美模拟了蜂窝流量中的**不确定性（Uncertainty Pattern）**。
+
+##### C. 时空特征融合模块 (STFFM) 与 STUnet
+在反向去噪过程中，如果不加引导，模型只能生成随机噪声。作者设计了 **STUnet**（Spatio-Temporal UNet）作为去噪网络：
+*   **STFFM (Spatio-Temporal Feature Fusion Module)**：将上述预训练好的“时间图特征”和“基站图特征”融合，并结合扩散步长（Time Step），作为“条件”注入到网络中。
+*   **机制**：引导去噪过程朝着符合真实时空规律的方向进行，确保生成的数据既有随机性（多样性），又符合物理世界的时空逻辑。
+
+#### 3. 实验结果
+*   **数据集**：使用了意大利米兰（Milan）和特伦蒂诺（Trentino）的真实电信数据。
+*   **性能提升**：在 Jensen-Shannon 散度（JSD，衡量分布相似度）指标上，STOUTER 相比现有的 GAN 和 VAE 方法提升了 **52.77%**。
+*   **结论**：生成的数据在长期趋势、短期波动和空间分布上都高度逼真，可直接用于训练流量预测模型（下游任务）。
+
+---
+
+### 第二部分：扩散模型 (Diffusion Model) 技术详解
+
+作为一名 AI 工程师，我认为理解扩散模型是理解这篇论文乃至当前生成式 AI（如 Stable Diffusion, Sora）的关键。
+
+#### 1. 什么是扩散模型？
+扩散模型是一类基于概率的生成模型。它的灵感来源于热力学中的扩散现象（例如一滴墨水滴入水中，分子会逐渐扩散直至均匀分布）。
+
+在 AI 领域，我们将这个过程分为两个阶段：
+*   **前向扩散（破坏数据）**：系统地破坏数据结构，直到它变成随机噪声。
+*   **反向扩散（创造数据）**：学习逆转破坏过程，从噪声中重构出数据。
+
+#### 2. 技术原理拆解
+
+##### A. 前向过程 (Forward Process / Diffusion Process)
+假设我们有一个真实数据分布 $x_0$ （在本论文中是真实的流量矩阵）。我们定义一个马尔可夫链（Markov Chain），在每一步 $t$ 添加少量的高斯噪声。
+*   公式逻辑： $q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I)$ 
+*   随着步数  $T$  增加（例如  $T=1000$ ），原始数据  $x_0$  最终变成标准的正态分布噪声 $x_T$ 。
+*   **工程师视角**：这个过程是不含可学习参数的，是固定的数学变换。
+
+##### B. 反向过程 (Reverse Process / Denoising Process)
+这是模型训练的核心。既然我们知道前向过程是加噪声，如果我们能训练一个神经网络 $p_\theta$ ，让它预测每一步加入了什么噪声，我们就能把它减掉。
+*   **目标**：从  $x_T$  （纯噪声）开始，通过神经网络预测并减去噪声，逐步得到  $x_{T-1}, \dots, x_0$ 。
+*   **神经网络的作用**：论文中的 **STUnet** 就是这个网络。它的输入是当前时刻的噪声数据 $x_t$ 和时空条件，输出是预测的噪声  $\epsilon_\theta$ 。
+*   **损失函数**：通常使用均方误差（MSE）或 L1 损失，计算“预测的噪声”与“实际加入的噪声”之间的差异。
+
+#### 3. 为什么扩散模型优于 GAN？
+在这篇论文的语境下，扩散模型的优势非常明显：
+1.  **训练稳定性**：GAN 需要生成器和判别器博弈，容易导致梯度消失或不收敛。扩散模型的训练目标（预测噪声）非常明确且稳定。
+2.  **避免模式坍塌 (Mode Collapse)**：GAN 倾向于记住某些特定的样本，生成重复数据。扩散模型通过对整个数据分布进行似然估计，能生成分布覆盖更广、多样性更强的数据（即更好地模拟流量的**不确定性**）。
+3.  **高质量生成**：通过逐步去噪的迭代过程（Coarse-to-Fine），扩散模型能生成细节更丰富的数据。
+
+#### 4. 在本论文中的具体应用
+这篇论文巧妙地将扩散模型从“图像生成”迁移到了“时空序列生成”：
+*   **图像像素  $\rightarrow$  流量数值**：将基站的流量数据看作一张二维“图像”（时间  $\times$  空间）。
+*   **条件引导 (Guidance)**：普通的扩散模型是无条件生成的。STOUTER 使用 **STFFM** 将时空图的 Embedding 作为条件（Condition），告诉扩散模型：“请不要生成任意的噪声，而是生成符合‘周一早上9点商业区’特征的流量数据”。
+
+---
+
+### 总结与专家点评
+
+**《Spatio-Temporal Diffusion Model for Cellular Traffic Generation》** 是一篇结合了 **图神经网络 (GNN)** 与 **生成式扩散模型 (Diffusion Model)** 的前沿工作。
+
+作为专家，我认为其核心价值在于：
+1.  **方法论的创新**：证明了扩散模型在非图像领域的时空数据（Time-Series Data）生成上同样具有强大的统治力。
+2.  **工程落地价值**：它不仅仅是理论创新，更解决了网络规划中“无数据可用”的实际工程痛点。通过物理约束（图结构）引导生成过程，保证了生成数据的**物理可解释性**，这是单纯依靠数据驱动的黑盒模型无法比拟的。
+
+这篇论文代表了人工智能在通信网络（AI for Networking）领域的最新发展方向。
+
+---
+## 🧑‍💻 User
+
+现在请你考虑将这个模型，迁移到Modbus协议上，以生成有效的数据（模拟数据包的主要特征的同时，精准捕捉数据包间的时空模式）。
+
+---
+## 🤖 Assistant
+
+这是一个非常具有挑战性但也极具应用价值的工程转化问题。
+
+作为一名人工智能工程师，将 **STOUTER** 模型从**宏观的蜂窝流量（连续、统计型数据）**迁移到**微观的 Modbus 协议（离散、指令型、强约束数据）**，我们需要进行底层的**语义重构**。不能简单地照搬模型，必须针对工业控制系统（ICS）的特性进行架构调整。
+
+Modbus 数据不仅包含“流量大小”，更包含了**指令逻辑（Function Code）**、**寄存器状态（Data/Values）**以及**设备拓扑（Master-Slave）**。
+
+以下是我为你设计的**Modbus-Diffusion 迁移架构方案**。
+
+---
+
+### 一、 核心概念映射：从 Cellular 到 ICS
+
+首先，我们需要重新定义 STOUTER 中的“空间”和“时间”概念，使其适配 Modbus 环境。
+
+| STOUTER 原概念 | 蜂窝网络 (Cellular) 含义 | **迁移后的 Modbus/ICS 含义** |
+| :--- | :--- | :--- |
+| **空间节点 (Nodes)** | 蜂窝基站 (Base Stations) | **工业资产/寄存器 (Assets/Registers)** <br> (例：PLC_1 的温度传感器、PLC_2 的阀门线圈) |
+| **空间边 (Edges)** | 地理距离 (Physical Distance) | **逻辑/物理连接 (Logical/Process Connection)** <br> (例：Master 与 Slave 的连接，或“温度”与“冷却阀”的工艺关联) |
+| **POI 语义** | 学校、医院、商场 | **设备功能语义** <br> (例：传感器、执行器、HMI、报警器) |
+| **时间周期** | 24小时人类活动周期 | **轮询周期 (Polling Cycle) & 工艺周期** <br> (例：流水线的一个作业节拍，或 SCADA 的固定查询频率) |
+| **流量数据 (Value)** | 吞吐量 (Throughput) | **多模态数据包特征** <br> (包含：时间间隔、功能码、寄存器地址、寄存器数值) |
+
+---
+
+### 二、 架构设计：Modbus-STOUTER
+
+为了生成有效的 Modbus 数据，我建议将模型分为三个阶段：**拓扑图构建**、**时空扩散生成**、**协议合规化重组**。
+
+#### 1. 空间建模：构建“设备-寄存器图” (Device-Register Graph)
+原论文使用 POI 和距离构建基站图，在 Modbus 中，我们需要构建一个**工业语义拓扑图**。
+
+*   **节点定义**：
+    *   **粗粒度**：以 PLC/RTU 设备为节点。
+    *   **细粒度（推荐）**：以关键寄存器（Holding Registers/Coils）为节点。例如，节点 A 是“锅炉温度”，节点 B 是“注水阀状态”。
+*   **边 (Edges) 的定义**：
+    *   **通信边**：Master 到 Slave 的物理链路。
+    *   **工艺边**：基于物理过程的关联。例如，当“温度寄存器”数值升高时，“阀门寄存器”通常会动作。这种关联可以通过分析历史数据的互相关性（Cross-Correlation）自动构建，或者手动基于 P&ID 图构建。
+*   **图卷积 (GCN) 作用**：学习工业现场的**控制逻辑依赖性**。例如，模型会学到：“节点 B 的值变化通常滞后于节点 A”。
+
+#### 2. 时间建模：基于周期的时序图 (Cyclic Temporal Graph)
+工业数据不像人类行为那样有明显的“早晚高峰”，而是呈现**高频周期性**和**状态突变性**。
+
+*   **时间编码**：不再是“小时”，而是以 **Polling Interval（轮询间隔）** 或 **Process Step（工艺步骤）** 为单位。
+*   **长短期记忆**：
+    *   **短期**：一个 TCP 连接内的 Request-Response 序列。
+    *   **长期**：设备的运行状态周期（如：启动-运行-停机-维护）。
+
+#### 3. 核心扩散过程的改造 (Modified Diffusion Core)
+这是最关键的技术难点。Modbus 数据包不仅有连续值（如温度 45.3），还有离散值（如功能码 0x03, 0x10）。STOUTER 原生只处理连续值。
+
+**解决方案：嵌入式扩散 (Embedding-based Diffusion)**
+
+*   **输入向量设计**：对于每个时间步 $t$ 和节点 $i$，输入向量 $x_{t,i}$ 不再是一个标量，而是一个**拼接向量**：
+     $$x_{t,i} = [\text{Embed}(\text{FunctionCode}) \oplus \text{Embed}(\text{Address}) \oplus \text{Normalize}(\text{Value}) \oplus \text{InterArrivalTime}]$$ 
+*   **混合噪声添加**：
+    *   对于连续部分（Value, Time）：使用标准高斯噪声。
+    *   对于离散部分（Func Code, Address）：建议将其映射到连续的 Latent Space 后再加噪声，或者使用 **Discrete Diffusion (如 Multinomial Diffusion)** 技术。
+*   **STUnet 的改进**：
+    *   在 STFFM（时空特征融合模块）中，增加一个 **Protocol Constraints Head (协议约束头)**，用于学习 Modbus 的固有规则（例如：读指令后通常紧跟返回数据的响应）。
+
+---
+
+### 三、 关键创新：由“噪声”到“有效数据包”
+
+扩散模型输出的是张量（Tensor），不是二进制数据包（PCAP）。为了保证生成的 Modbus 数据不仅“像”，而且“有效”（Valid），必须引入**后处理模块**。
+
+#### **Post-Processing: 确定性协议组装器 (Deterministic Protocol Assembler)**
+
+不要让 AI 生成 TCP Checksum 或 Protocol ID 这种死板的数据，AI 很容易出错且浪费算力。AI 应该负责生成**语义 (Semantics)**。
+
+1.  **AI 输出**：扩散模型生成一个序列，包含：`{时间戳: 10ms, 源: Master, 目标: Slave_1, 动作: Write, 地址: 40001, 值: 1234}`。
+2.  **规则组装**：使用 Python 的 `scapy` 或 `pymodbus` 库，作为“解码器”。
+    *   接收 AI 的语义输出。
+    *   自动计算 Length 字段。
+    *   自动填充 Transaction ID（序列自增）。
+    *   自动打包成符合 Modbus TCP 规范的二进制流。
+
+---
+
+### 四、 这种迁移的优势与潜在价值
+
+相比于传统的基于规则的模糊测试（Fuzzing）或简单的回放攻击脚本，这种基于时空扩散模型的方法有巨大的优势：
+
+1.  **捕捉隐蔽的攻击模式**：
+    *   传统的攻击检测通过阈值（如：流量突然激增）。
+    *   Modbus-STOUTER 可以模拟**“慢速攻击”**或**“逻辑攻击”**。例如，模型可以生成一系列完全符合协议规范，但在时空逻辑上异常的数据包（例如：在不该开启阀门的时间点，生成了一个合法的“开启阀门”指令）。这对训练高级 IDS（入侵检测系统）至关重要。
+
+2.  **生成高保真数字孪生数据**：
+    *   工业环境数据获取极难（怕影响生产）。该模型可以基于少量的采样数据，生成海量的、带有**工艺逻辑相关性**的虚拟数据，用于系统压力测试。
+
+3.  **处理数据包的不确定性**：
+    *   工业网络中存在抖动（Jitter）和丢包。扩散模型天然的概率特性可以完美模拟网络状况的**随机波动**，而不仅仅是理想状态下的数据。
+
+# PART 2：mzyang觉得离散值处理不妥
+下面我就只写你要的这部分：**两种“加噪/去噪”方法的详细介绍** （连续字段用高斯扩散、离散字段用 Mask），并且写成能直接塞进你文档的风格。
+
+---
+
+## 方法 1：连续字段用高斯噪声扩散（Gaussian Diffusion / DDPM 思路）
+
+### 适用对象
+
+连续字段指的是能取实数的量，例如：
+
+* `inter_arrival_time`（到达间隔）
+* `value`（读写数值，或某种连续统计量）
+* `payload_len`（长度也可以当连续/或分桶）
+
+这些变量有“大小”和“接近”的概念，适合用高斯噪声逐步扰动。
+
+---
+
+### 核心想法（直觉）
+
+把真实连续数据 (x_0) 逐步加噪，直到变成“纯噪声”；
+再训练模型学会从噪声一步步还原回真实数据。
+
+一句话就是： ** 先把数据揉乱，再学会揉回来。 **
+
+---
+
+### 前向过程（加噪）
+
+DDPM 经典做法是定义一个时间步 (t=1..T)，每一步加一点高斯噪声：
+
+[
+q(x_t \mid x_{t-1}) = \mathcal{N}\big(\sqrt{1-\beta_t},x_{t-1}, \beta_t I\big)
+]
+
+其中：
+
+* (\beta_t) 是第 (t) 步的噪声强度（通常从小到大）
+* (I) 是单位矩阵
+
+常用的“闭式写法”（更实用）：
+
+[
+x_t = \sqrt{\bar{\alpha}_t},x_0 + \sqrt{1-\bar{\alpha}_t},\epsilon,\quad \epsilon\sim \mathcal N(0,I)
+]
+
+这里：
+
+* (\alpha_t = 1-\beta_t)
+* (\bar{\alpha}*t=\prod*{s=1}^t \alpha_s)
+
+> 直观理解：(t) 越大，(\sqrt{1-\bar{\alpha}_t}) 越大，噪声占比越高，(x_t) 越接近纯随机。
+
+---
+
+### 反向过程（去噪 / 生成）
+
+反向就是要学一个模型 (p_\theta) 逐步把 (x_t) 还原成 (x_{t-1})。
+
+常见训练方式是让模型预测噪声 (\epsilon)（也可以预测 (x_0)）：
+
+[
+\epsilon_\theta = f_\theta(x_t, t, \text{context})
+]
+
+训练损失最常用就是 MSE：
+
+[
+\mathcal L_{\text{cont}} = \mathbb E\big[|\epsilon - \epsilon_\theta(x_t,t)|^2\big]
+]
+
+---
+
+### 采样（生成）怎么做（你们落地时最关心的）
+
+1. 先从高斯噪声开始：(x_T \sim \mathcal N(0,I))
+2. 对 (t=T,T-1,\dots,1)：
+
+   * 用模型预测噪声 (\epsilon_\theta(x_t,t))
+   * 按 DDPM 的反向更新公式得到 (x_{t-1})
+3. 最终得到 (x_0) 作为生成的连续特征
+
+> 你不用在文档里写完整反向公式；只要说明“模型逐步去噪得到连续特征”即可。
+
+---
+
+### 优点 & 注意点（写给评审/队友看的）
+
+**优点**
+
+* 连续变量生成质量通常很好（分布、波动、burst 都更自然）
+* 训练稳定（MSE 好优化）
+
+** 注意点 **
+
+* 连续字段最好做归一化/标准化（比如 z-score），否则噪声尺度不好选
+* (\beta_t)（噪声 schedule）会影响效果：一般从小到大更稳
+
+---
+
+---
+
+## 方法 2：离散字段用 Mask 噪声（Mask Corruption / 离散去噪）
+
+### 适用对象
+
+离散字段是“选项题”，例如：
+
+* `func_code`（功能码）
+* `msg_type`（request/response）
+* `direction`（C→S / S→C）
+* `exception_code`（异常码类别）
+
+这类变量没有“加法”和“中间值”，所以不能像连续那样加高斯噪声。
+
+---
+
+### 核心想法（直觉）
+
+离散扩散的“噪声”不是加一个数，而是 ** 把一些 token 盖住/弄丢 ** 。
+
+* 前向：随机把一部分离散值替换成 `[MASK]`
+* 反向：模型根据上下文把 `[MASK]` 猜回去（分类问题）
+
+一句话：**先挖空，再补全。**
+
+---
+
+### 前向过程（加噪 = Mask）
+
+给每个时间步 (t) 定义一个 mask 比例（或 mask 概率）(m_t)，通常 (t) 越大 mask 越多。
+
+对离散序列 (x^{\text{disc}}_0) 的每个位置：
+
+* 以概率 (m_t)：替换成 `[MASK]`
+* 以概率 (1-m_t)：保留原 token
+
+得到被污染的 (x^{\text{disc}}_t)。
+
+> 你也可以扩展为“mask 或随机替换成别的 token”，但第一版用纯 mask 就够清晰、好实现。
+
+---
+
+### 反向过程（去噪 = 猜 token）
+
+模型输入：被 mask 的离散序列 + 时间步 (t) +（可选）连续特征上下文
+模型输出：每个被 mask 位置的类别分布（softmax）：
+
+[
+p_\theta(x^{\text{disc}}_0 \mid x^{\text{disc}}_t, t, \text{context})
+]
+
+训练损失是交叉熵（只在被 mask 的位置算更常见）：
+
+[
+\mathcal L_{\text{disc}} = -\sum_{j\in \text{masked}} \log p_\theta(x^{\text{disc}}_{0,j})
+]
+
+> 直观：模型的任务就是“把挖掉的格子填回正确答案”。
+
+---
+
+### 采样（生成）怎么做（简单版本）
+
+最简单可用的一种生成策略：
+
+1. 初始离散序列设为全 mask：`[MASK, MASK, ...]`
+2. 对 (t=T,T-1,\dots,1)：
+
+   * 用模型给每个 mask 位置输出 softmax 概率
+   * 采样/取最大概率填入一部分 token
+   * 随着 (t) 变小，mask 数逐步减少（填得越来越完整）
+
+最终得到完整的离散字段序列。
+
+> 你们文档里可以写成：“从全 mask 开始逐步恢复离散字段”，不用纠结具体填多少个位置的策略。
+
+---
+
+### 优点 & 注意点（很关键）
+
+**优点**
+
+* 对离散字段天然正确：不会引入“编号距离”的假设
+* 特别适合协议字段（功能码/请求响应类型），因为可以利用上下文预测
+
+**注意点**
+
+* 高基数离散变量（例如地址 0~65535）不建议直接当一个巨大 softmax 来 mask：会难学、数据稀疏
+  更建议把它拆成 “region + offset” 或分桶后再做离散建模
+* 需要一个合理的 mask schedule：一开始别 mask 太狠，否则模型只能瞎猜
+
+---
+
+---
+
+## 你们把这两种方法合在一起怎么说（文档一句话版）
+
+对于每个事件的特征向量，我们将字段分为：
+
+* 连续字段：用高斯扩散（DDPM）去噪生成（MSE loss）
+* 离散字段：用 mask 噪声进行离散去噪（分类恢复，CE loss）
+  最终用一个统一 backbone 同时输出连续头和离散头，训练目标为两者加权和：
+
+[
+\mathcal L = \lambda\mathcal L_{\text{cont}} + (1-\lambda)\mathcal L_{\text{disc}}
+]
+
+
+如果你决定实施这个项目，我建议按以下步骤进行：
+
+1.  **数据预处理**：获取 Modbus PCAP 包，解析为 `(Timestamp, Source, Dest, Function, Register, Value)` 格式的序列。
+2.  **图构建**：根据源/目 IP 和寄存器地址，构建邻接矩阵。
+3.  **模型修改**：修改 STUnet 的输入输出层，使其支持多维特征（不仅仅是流量值）。
+4.  **混合训练**：使用 L1 Loss 训练数值预测，使用 Cross-Entropy Loss 训练功能码分类。
+5.  **验证**：不仅验证 JSD（分布相似度），还要验证 **Protocol Validity Rate（协议合规率）** 和 **Process Logic Consistency（工艺逻辑一致性）**。
+
+这个迁移方案将原本用于“预测人上网行为”的模型，升维成了一个“理解机器控制逻辑”的生成器，在工业安全和数字孪生领域具有极高的创新性。
--- a/knowledges/draft-incomplete-methodology.md
+++ b/knowledges/draft-incomplete-methodology.md
@@ -0,0 +1,189 @@
+(Updated from your current draft; no benchmark-metric details are introduced here, as requested.) 
+
+## Methodology
+
+Industrial control system (ICS) telemetry is intrinsically **mixed-type** and **mechanistically heterogeneous**: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic—it directly affects what “realistic” synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled. [12]
+
+We formalize each training instance as a fixed-length window of length (L), consisting of (i) continuous channels (X\in\mathbb{R}^{L\times d_c}) and (ii) discrete channels (Y={y^{(j)}*{1:L}}*{j=1}^{d_d}), where each discrete variable (y^{(j)}_t\in\mathcal{V}_j) belongs to a finite vocabulary (\mathcal{V}_j). Our objective is to learn a generator that produces synthetic ((\hat{X},\hat{Y})) that are simultaneously temporally coherent and distributionally faithful, while also ensuring (\hat{y}^{(j)}_t\in\mathcal{V}_j) for all (j,t) by construction (rather than via post-hoc rounding or thresholding).
+
+A key empirical and methodological tension in ICS synthesis is that *temporal realism* and *marginal/distributional realism* can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables. [2,8] Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism. [6,7]
+
+Motivated by these considerations, we propose **Mask-DDPM**, organized in the following order:
+
+1. **Transformer trend module**: learns the dominant temporal backbone of continuous dynamics via attention-based sequence modeling. [1]
+2. **Residual DDPM for continuous variables**: models distributional detail as stochastic residual structure conditioned on the learned trend. [2,6]
+3. **Masked diffusion for discrete variables**: generates discrete ICS states with an absorbing/masking corruption process and categorical reconstruction. [3,4]
+4. **Type-aware decomposition**: a **type-aware factorization and routing layer** that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
+
+This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator. [10,11]
+
+---
+
+## Transformer trend module for continuous dynamics
+
+We instantiate the temporal backbone as a **causal Transformer** trend extractor, leveraging self-attention’s ability to represent long-range dependencies and cross-channel interactions without recurrence. [1] Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling—common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation. [1,12] Crucially, in our design the Transformer is *not* asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
+
+For continuous channels (X), we posit an additive decomposition
+[
+X = S + R,
+]
+where (S\in\mathbb{R}^{L\times d_c}) is a smooth **trend** capturing predictable temporal evolution and (R\in\mathbb{R}^{L\times d_c}) is a **residual** capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-oriented temporal objective. This separation reflects an explicit *division of labor*: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level—a strategy aligned with “predict-then-refine” perspectives in time-series diffusion modeling. [6,7]
+
+We parameterize the trend (S) using a causal Transformer (f_{\phi}). With teacher forcing, we train (f_{\phi}) to predict the next-step trend from past observations:
+[
+\hat{S}*{t+1} = f*{\phi}(X_{1:t}), \qquad t=1,\dots,L-1,
+]
+using the mean-squared error objective
+[
+\mathcal{L}*{\text{trend}}(\phi)=\frac{1}{(L-1)d_c}\sum*{t=1}^{L-1}\left| \hat{S}*{t+1} - X*{t+1}\right|_2^2.
+]
+At inference, we roll out the Transformer autoregressively to obtain (\hat{S}), and define the residual target for diffusion as (R = X - \hat{S}). This setup intentionally “locks in” a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a *conditioning-first* temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an end-to-end monolithic generator. [1,6,10]
+
+---
+
+## DDPM for continuous residual generation
+
+We model the residual (R) with a denoising diffusion probabilistic model (DDPM) conditioned on the trend (\hat{S}). [2] Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models. [2,8] Conditioning the diffusion model on (\hat{S}) is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available. [6,7]
+
+Let (K) denote the number of diffusion steps, with a noise schedule ({\beta_k}_{k=1}^K), (\alpha_k = 1-\beta_k), and (\bar{\alpha}*k=\prod*{i=1}^k \alpha_i). The forward corruption process is
+[
+q(r_k\mid r_0)=\mathcal{N}!\left(\sqrt{\bar{\alpha}_k},r_0,\ (1-\bar{\alpha}_k)\mathbf{I}\right),
+]
+equivalently,
+[
+r_k = \sqrt{\bar{\alpha}_k},r_0 + \sqrt{1-\bar{\alpha}_k},\epsilon,\qquad \epsilon\sim\mathcal{N}(0,\mathbf{I}),
+]
+where (r_0\equiv R) and (r_k) is the noised residual at step (k).
+
+The learned reverse process is parameterized as
+[
+p_{\theta}(r_{k-1}\mid r_k,\hat{S})=\mathcal{N}!\left(\mu_{\theta}(r_k,k,\hat{S}),\ \Sigma(k)\right),
+]
+where (\mu_\theta) is implemented by a **Transformer denoiser** that consumes (i) the noised residual (r_k), (ii) a timestep embedding for (k), and (iii) conditioning features derived from (\hat{S}). This denoiser architecture is consistent with the growing use of attention-based denoisers for long-context time-series diffusion, while our key methodological emphasis is the *trend-conditioned residual* factorization as the object of diffusion learning. [2,7]
+
+We train the denoiser using the standard DDPM (\epsilon)-prediction objective:
+[
+\mathcal{L}*{\text{cont}}(\theta)
+= \mathbb{E}*{k,r_0,\epsilon}!\left[
+\left|
+\epsilon - \epsilon_{\theta}(r_k,k,\hat{S})
+\right|*2^2
+\right].
+]
+Because diffusion optimization can exhibit timestep imbalance (i.e., some timesteps dominate gradients), we optionally apply an SNR-based reweighting consistent with Min-SNR training:
+[
+\mathcal{L}^{\text{snr}}*{\text{cont}}(\theta)
+= \mathbb{E}*{k,r_0,\epsilon}!\left[
+w_k\left|
+\epsilon - \epsilon*{\theta}(r_k,k,\hat{S})
+\right|_2^2
+\right],
+\qquad
+w_k=\frac{\min(\mathrm{SNR}_k,\gamma)}{\mathrm{SNR}_k},
+]
+where (\mathrm{SNR}_k=\bar{\alpha}_k/(1-\bar{\alpha}_k)) and (\gamma>0) is a cap parameter. [5]
+
+After sampling (\hat{R}) by reverse diffusion, we reconstruct the continuous output as
+[
+\hat{X} = \hat{S} + \hat{R}.
+]
+Overall, the DDPM component serves as a **distributional corrector** on top of a temporally coherent backbone, which is particularly suited to ICS where low-frequency dynamics are strong and persistent but fine-scale variability (including bursts and regime-conditioned noise) remains important for realism. Relative to prior ICS diffusion efforts that primarily focus on continuous augmentation, our formulation elevates *trend-conditioned residual diffusion* as a modular mechanism for disentangling temporal structure from distributional refinement. [10,11]
+
+---
+
+## Masked diffusion for discrete ICS variables
+
+Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., “in-between” modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables. [3,4] In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness. [12]
+
+We therefore adopt **masked (absorbing) diffusion** for discrete channels, where corruption replaces tokens with a special (\texttt{[MASK]}) symbol according to a schedule. [4] For each variable (j), define a masking schedule ({m_k}_{k=1}^K) (with (m_k\in[0,1]) increasing in (k)). The forward corruption process is
+[
+q(y^{(j)}_k \mid y^{(j)}_0)=
+\begin{cases}
+y^{(j)}*0, & \text{with probability } 1-m_k,\
+\texttt{[MASK]}, & \text{with probability } m_k,
+\end{cases}
+]
+applied independently across (j) and (t). Let (\mathcal{M}) denote the set of masked positions at step (k). The denoiser (h*{\psi}) predicts a categorical distribution over (\mathcal{V}*j) for each masked token, conditioned on (i) the corrupted discrete sequence, (ii) the diffusion step (k), and (iii) continuous context. Concretely, we condition on (\hat{S}) and (optionally) (\hat{X}) to couple supervisory reconstruction to the underlying continuous dynamics:
+[
+p*{\psi}!\left(y^{(j)}*0 \mid y_k, k, \hat{S}, \hat{X}\right)
+= h*{\psi}(y_k,k,\hat{S},\hat{X}).
+]
+This conditioning choice is motivated by the fact that many discrete ICS states are not standalone—they are functions of regimes, thresholds, and procedural phases that manifest in continuous channels. [12]
+
+Training uses a categorical denoising objective:
+[
+\mathcal{L}*{\text{disc}}(\psi)
+= \mathbb{E}*{k}!\left[
+\frac{1}{|\mathcal{M}|}
+\sum_{(j,t)\in\mathcal{M}}
+\mathrm{CE}!\left(
+h_{\psi}(y_k,k,\hat{S},\hat{X})*{j,t},
+y^{(j)}*{0,t}
+\right)
+\right],
+]
+where (\mathrm{CE}(\cdot,\cdot)) is cross-entropy. At sampling time, we initialize all discrete tokens as (\texttt{[MASK]}) and iteratively unmask them using the learned conditionals, ensuring that every output token lies in its legal vocabulary by construction. This discrete branch is a key differentiator of our pipeline: unlike typical continuous-only diffusion augmentation in ICS, we integrate masked diffusion as a first-class mechanism for supervisory-variable legality within the same end-to-end synthesis workflow. [4,10]
+
+---
+
+## Type-aware decomposition as a performance refinement layer
+
+Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some “derived tags” are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables. [12]
+
+We therefore introduce a **type-aware decomposition** that formalizes this heterogeneity as a routing and constraint layer. Let (\tau(i)\in{1,\dots,6}) assign each variable (i) to a type class. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section.  Importantly, this refinement does **not** change the core diffusion backbone; it changes *which mechanism is responsible for which variable*, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
+
+We use the following taxonomy:
+
+* **Type 1 (program-driven / setpoint-like):** externally commanded, step-and-dwell variables. These variables can be treated as exogenous drivers (conditioning signals) or routed to specialized change-point / dwell-time models, rather than being forced into a smooth denoiser that may over-regularize step structure.
+* **Type 2 (controller outputs):** continuous variables tightly coupled to feedback loops; these benefit from conditional modeling where the conditioning includes relevant process variables and commanded setpoints.
+* **Type 3 (actuator states/positions):** often exhibit saturation, dwell, and rate limits; these may require stateful dynamics beyond generic residual diffusion, motivating either specialized conditional modules or additional inductive constraints.
+* **Type 4 (process variables):** inertia-dominated continuous dynamics; these are the primary beneficiaries of the **Transformer trend + residual DDPM** pipeline. 
+* **Type 5 (derived/deterministic variables):** algebraic or rule-based functions of other variables; we enforce deterministic reconstruction (\hat{x}^{(i)} = g_i(\hat{X},\hat{Y})) rather than learning a stochastic generator, improving logical consistency and sample efficiency. 
+* **Type 6 (auxiliary/low-impact variables):** weakly coupled or sparse signals; we allow simplified modeling (e.g., calibrated marginals or lightweight temporal models) to avoid allocating diffusion capacity where it is not warranted. 
+
+Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves **capacity allocation** by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables **constraint enforcement** by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves **mechanism alignment** by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness. 
+
+From a novelty standpoint, this layer is not merely an engineering “patch”; it is an explicit methodological statement that ICS synthesis benefits from **typed factorization**—a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis. [9,10,12]
+
+---
+
+## Joint optimization and end-to-end sampling
+
+We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer (f_{\phi}) to obtain (\hat{S}); (ii) we compute residual targets (R=X-\hat{S}) for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM (p_{\theta}(R\mid \hat{S})) and masked diffusion model (p_{\psi}(Y\mid \text{masked}(Y), \hat{S}, \hat{X})); and (iv) we apply type-aware routing and deterministic reconstruction during sampling.  This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate “structure vs. realism” conflicts. [6,7]
+
+A simple combined objective is
+[
+\mathcal{L} = \lambda,\mathcal{L}*{\text{cont}} + (1-\lambda),\mathcal{L}*{\text{disc}},
+]
+with (\lambda\in[0,1]) controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or by specialized drivers (Type 1/3, depending on configuration) are prevented from forcing the diffusion models into statistically incoherent compromises.
+
+At inference time, generation follows the same structured order: (i) trend (\hat{S}) via the Transformer, (ii) residual (\hat{R}) via DDPM, (iii) discrete (\hat{Y}) via masked diffusion, and (iv) type-aware assembly with deterministic reconstruction for routed variables. This pipeline produces ((\hat{X},\hat{Y})) that are temporally coherent by construction (through (\hat{S})), distributionally expressive (through (\hat{R}) denoising), and discretely valid (through masked diffusion), while explicitly accounting for heterogeneous variable-generating mechanisms through type-aware routing. In combination, these choices constitute our central methodological contribution: a unified Transformer + mixed diffusion generator for ICS telemetry, augmented by typed factorization to align model capacity with domain mechanism. [2,4,10,12]
+
+---
+
+# References
+
+[1] Vaswani, A., Shazeer, N., Parmar, N., et al. *Attention Is All You Need.* NeurIPS, 2017. arXiv:1706.03762. ([arXiv][1])
+[2] Ho, J., Jain, A., Abbeel, P. *Denoising Diffusion Probabilistic Models.* NeurIPS, 2020. arXiv:2006.11239. ([Proceedings of Machine Learning Research][2])
+[3] Austin, J., Johnson, D. D., Ho, J., Tarlow, D., van den Berg, R. *Structured Denoising Diffusion Models in Discrete State-Spaces (D3PM).* NeurIPS, 2021. arXiv:2107.03006. ([arXiv][3])
+[4] Shi, J., Han, K., Wang, Z., Doucet, A., Titsias, M. K. *Simplified and Generalized Masked Diffusion for Discrete Data.* arXiv:2406.04329, 2024. ([arXiv][4])
+[5] Hang, T., Wu, C., Zhang, H., et al. *Efficient Diffusion Training via Min-SNR Weighting Strategy.* arXiv:2303.09556, 2023. ([arXiv][5])
+[6] Kollovieh, M., Ansari, A. F., Bohlke-Schneider, M., et al. *Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting (TSDiff).* arXiv:2307.11494, 2023. ([arXiv][6])
+[7] Sikder, M. F., Ramachandranpillai, R., Heintz, F. *TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers.* arXiv:2307.12667, 2023. ([arXiv][7])
+[8] Song, Y., Sohl-Dickstein, J., Kingma, D. P., et al. *Score-Based Generative Modeling through Stochastic Differential Equations.* ICLR, 2021. arXiv:2011.13456. ([arXiv][8])
+[9] Zhang, H., Zhang, J., Li, J., et al. *TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation.* arXiv:2410.20626, 2024. ([arXiv][9])
+[10] Yuan, H., Sha, K., Zhao, W. *CTU-DDPM: Conditional Transformer U-net DDPM for Industrial Control System Anomaly Data Augmentation.* ACM AICSS, 2025. DOI:10.1145/3776759.3776845.
+[11] Sha, K., et al. *DDPM Fusing Mamba and Adaptive Attention: An Augmentation Method for Industrial Control Systems Anomaly Data.* SSRN, posted Jan 10, 2026. (SSRN 6055903). ([SSRN][10])
+[12] NIST. *Guide to Operational Technology (OT) Security (SP 800-82r3).* 2023. ([NIST Computer Security Resource Center][11])
+
+[1]: https://arxiv.org/abs/2209.15421 "https://arxiv.org/abs/2209.15421"
+[2]: https://proceedings.mlr.press/v202/kotelnikov23a/kotelnikov23a.pdf "https://proceedings.mlr.press/v202/kotelnikov23a/kotelnikov23a.pdf"
+[3]: https://arxiv.org/html/2209.15421v2 "https://arxiv.org/html/2209.15421v2"
+[4]: https://arxiv.org/abs/2011.13456 "https://arxiv.org/abs/2011.13456"
+[5]: https://arxiv.org/abs/2303.09556 "https://arxiv.org/abs/2303.09556"
+[6]: https://arxiv.org/pdf/2401.03006 "https://arxiv.org/pdf/2401.03006"
+[7]: https://arxiv.org/abs/2307.12667 "https://arxiv.org/abs/2307.12667"
+[8]: https://arxiv.org/abs/2406.04329 "https://arxiv.org/abs/2406.04329"
+[9]: https://arxiv.org/abs/2406.07524 "https://arxiv.org/abs/2406.07524"
+[10]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055903 "https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055903"
+[11]: https://csrc.nist.gov/pubs/sp/800/82/r3/final "https://csrc.nist.gov/pubs/sp/800/82/r3/final"
--- a/knowledges/mask-ddpm.md
+++ b/knowledges/mask-ddpm.md
@@ -0,0 +1,918 @@
+# mask‑ddpm 项目说明书（内部长文版｜约 3 千字）
+
+> 目标：以“论文叙述 + 说明书”混合风格，把项目的背景、方法、流程、评估、局限、未来方向完整写清楚。  
+> 面向对象：对扩散模型/时序建模不熟悉的读者也能读懂。  
+> 注意：不写具体实现代码，只写概念与方法。  
+
+---
+
+## 摘要（Abstract）
+
+工业控制系统（ICS）数据具有“多变量、强时序、混合类型”的特点。传统生成模型往往只能满足分布或时序之一，难以同时兼顾。我们提出一种 **两阶段混合扩散框架（mask‑ddpm）**：第一阶段学习时序趋势，第二阶段用扩散模型学习残差分布，同时对离散变量采用遮蔽‑恢复式扩散。为解决少数变量主导 KS 的问题，我们进一步提出 **Type‑aware 分治思路**，将程序驱动变量与过程变量分离处理，并配套完善的诊断与评估体系。实验表明，该框架能显著改善分布一致性，同时保持一定的时序结构，且具备可解释的误差定位能力。
+
+---
+
+## 1. 背景与动机（Why this problem is hard）
+
+工业数据和普通时间序列有本质区别：
+
+1) **物理惯性**：许多变量变化缓慢，前一刻会强烈影响后一刻。
+2) **程序驱动**：部分变量来自操作者/调度程序，呈现“阶跃 + 长时间停留”。
+3) **混合类型**：连续传感器与离散状态共存，必须分别处理。
+
+如果只用一个模型去学所有变量，会产生明显问题：
+- 模型为了对齐分布，会牺牲时序结构 → lag‑1 变差
+- 模型为了时序稳定，会模糊分布 → KS 上升
+- 离散变量如果当连续处理，会出现非法值 → JSD 上升
+
+因此，需要一种结构化的建模策略来“拆分矛盾”。
+
+---
+
+## 2. 问题定义（Problem Formulation）
+
+给定真实序列 \(X \in \mathbb{R}^{T\times D}\)，其中：
+- \(T\)：时间长度
+- \(D\)：变量维度（连续 + 离散）
+
+目标是学习生成模型 \(G\)，输出 \(\hat{X}\)，使得：
+
+**分布一致性**：
+\[
+\forall i,\; F^{(i)}_{gen}(x) \approx F^{(i)}_{real}(x)
+\]
+
+**时序一致性**：
+\[
+\rho_1(\hat{x}^{(i)}) \approx \rho_1(x^{(i)})
+\]
+
+**离散合法性**：
+\[
+\hat{x}^{(j)} \in \mathcal{V}_j
+\]
+
+其中 \(\rho_1\) 表示 lag‑1 相关，\(\mathcal{V}_j\) 是离散变量的合法词表。
+
+---
+
+## 3. 方法概览（Method Overview）
+
+核心设计是 **两阶段 + 混合扩散**：
+
+### 阶段 1：时序趋势建模
+用序列模型学习“整体趋势”。直觉上，这一步只负责回答“序列怎么变化”。
+
+### 阶段 2：分布残差建模
+用扩散模型学习“趋势以外的残差”，修正分布形状。直觉上，这一步只负责回答“数值分布像不像”。
+
+这种解耦让“时序一致性”与“分布一致性”不再直接冲突。
+
+---
+
+## 4. 模型结构（Model Details）
+
+### 4.1 连续变量扩散（DDPM）
+
+扩散过程：
+\[
+ x_t = \sqrt{\bar{\alpha}_t}\,x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon,\quad \epsilon \sim \mathcal{N}(0,I)
+\]
+
+**解释**：
+- \(x_0\) 是真实残差
+- \(x_t\) 是加噪后的数据
+- 模型学习从 \(x_t\) 还原 \(x_0\) 或 \(\epsilon\)
+
+### 4.2 离散变量扩散（Mask Diffusion）
+
+离散变量通过“遮蔽 + 恢复”建模：
+- 随机遮蔽部分 token
+- 学习预测原始 token
+
+这样保证离散输出 **合法且可解释**。
+
+### 4.3 时序模块（Temporal GRU）
+
+GRU 用于学习连续序列的趋势：
+- 输入：连续序列
+- 输出：趋势序列
+- 用于构建残差：\(x_{resid} = x - trend\)
+
+直观理解：
+- GRU 负责“走向”
+- 扩散负责“修正细节”
+
+---
+
+## 5. 损失函数设计（Loss Design）
+
+### 5.1 连续损失
+
+\[
+\mathcal{L}_{cont} =
+\begin{cases}
+\|\hat{\epsilon}-\epsilon\|^2 & (预测噪声)\\
+\|\hat{x}_0-x_0\|^2 & (预测原值)
+\end{cases}
+\]
+
+**解释**：衡量“去噪后是否接近真实残差”。
+
+### 5.2 离散损失
+
+\[
+\mathcal{L}_{disc} = \frac{1}{|\mathcal{M}|}\sum_{(i,t)\in\mathcal{M}} CE(\hat{p}_{i,t}, y_{i,t})
+\]
+
+**解释**：只对被遮蔽的 token 做交叉熵，确保离散预测正确。
+
+### 5.3 总损失
+
+\[
+\mathcal{L} = \lambda \mathcal{L}_{cont} + (1-\lambda)\mathcal{L}_{disc}
+\]
+
+**解释**：\(\lambda\) 决定分布与离散的权衡。
+
+### 5.4 额外增强（可选）
+
+**SNR 权重**：
+\[
+\mathcal{L}_{snr} = \frac{\text{SNR}_t}{\text{SNR}_t+\gamma}\,\mathcal{L}_{cont}
+\]
+
+**分位数对齐**：
+\[
+\mathcal{L}_Q = \frac{1}{K}\sum_k \|Q_k(x_{real}) - Q_k(x_{gen})\|_1
+\]
+
+这些项有助于降低 KS，但可能牺牲时序一致性。
+
+---
+
+## 6. 评估指标（Evaluation Metrics）
+
+### 6.1 KS（分布一致性）
+\[
+KS_i = \sup_x |F^{(i)}_{gen}(x) - F^{(i)}_{real}(x)|
+\]
+
+**解释**：越小代表分布越接近。
+
+### 6.2 JSD（离散一致性）
+\[
+JSD(P,Q)=\tfrac12 KL(P\|M)+\tfrac12 KL(Q\|M)
+\]
+
+**解释**：衡量离散分布差异。
+
+### 6.3 Lag‑1 Diff（时序一致性）
+\[
+\Delta_{lag1} = \frac{1}{d}\sum_i |\rho_1(\hat{x}_i) - \rho_1(x_i)|
+\]
+
+**解释**：衡量相邻时刻相关性偏差。
+
+---
+
+## 7. 诊断与分治思想（Type‑aware）
+
+现实中，少数变量会主导 KS：
+- setpoint/demand（程序驱动）
+- actuator（饱和/停留）
+- derived tags（确定性映射）
+
+因此我们提出 Type‑aware 分治：
+
+| 类型 | 特性 | 处理思路 |
+|------|------|---------|
+| Type1 | 程序驱动 | 单独建模/重采样 |
+| Type2 | 控制器输出 | 条件生成 |
+| Type3 | 执行器位置 | 状态 + 停留模型 |
+| Type4 | PV 过程量 | 扩散建模 |
+| Type5 | 派生变量 | 确定性映射 |
+| Type6 | 辅助量 | 简单统计模型 |
+
+当前实现为 **KS‑only baseline**（经验重采样），只用于诊断上界。
+
+---
+
+## 8. 方法优势与局限
+
+### 优势
+- 将时序与分布解耦，训练更稳定
+- 连续/离散混合扩散保证合法性
+- 诊断工具完整，可定位问题变量
+
+### 局限
+- Type1/2/3 仍是难点
+- KS 与时序存在 trade‑off
+- KS‑only baseline 破坏联合分布
+
+---
+
+## 9. 未来方向
+
+1) Type1/2/3 条件生成器替代重采样
+2) Type4 加工况 embedding
+3) 增加跨变量相关性评估
+
+---
+
+## 10. 结论
+
+mask‑ddpm 通过“两阶段趋势 + 扩散残差”实现了更平衡的 ICS 序列生成框架。项目已经在大多数变量上取得合理分布对齐，但仍需针对程序驱动变量进行专门建模。该框架提供了一个清晰可扩展的研究路径。
+
+---
+
+## 附录：公式解释简表
+
+| 公式 | 含义 |
+|------|------|
+| DDPM 加噪 | 描述连续变量如何被噪声腐蚀 |
+| 连续损失 | 衡量去噪结果与真实残差距离 |
+| 离散交叉熵 | 衡量遮蔽 token 是否正确恢复 |
+| KS | 分布最大偏差 |
+| JSD | 离散分布偏差 |
+| Lag‑1 Diff | 时序相关差异 |
+
+
+---
+
+# Part 1 / 4 — 方法与理论基础（约 5k）
+
+> 本部分重点：把“这个方法为什么合理”讲清楚，建立读者对整体框架的直觉与理论基础。
+
+## 1.1 工业时序生成的独特性
+
+工业控制系统（ICS）数据和常规时序数据（如股票、天气）有本质差别。最关键的三点是：
+
+**(a) 强物理约束**
+许多过程变量受到物理规律约束，比如液位变化不可能突然跳跃，压力和温度的变化速度有上限。这意味着模型必须尊重“惯性”与“平滑性”。
+
+**(b) 强程序驱动**
+工业系统中存在大量“外部控制信号”，例如设定值（setpoint）、需求（demand）、运行模式。这些变量不是由过程自然演化产生，而是外部程序/操作者指令造成。它们往往呈现“阶跃+停留”模式。
+
+**(c) 混合变量类型**
+同时存在连续变量（传感器、控制量）和离散变量（状态、模式、告警）。如果把离散变量强行当连续变量，会造成语义错误，例如出现“介于运行/停机之间”的无意义状态。
+
+因此，工业时序生成不是单纯的“预测下一步”，而是需要同时兼顾分布、时序、离散合法性。
+
+---
+
+## 1.2 为什么单一模型会失败
+
+如果用单一的序列模型（例如 LSTM/Transformer）去同时优化分布与时序，通常会出现以下冲突：
+
+- **分布损失优化 → 时序退化**
+  模型为了匹配每个变量的分布，会倾向于忽略时间结构，生成“看起来像分布，但像噪声”的序列。
+
+- **时序损失优化 → 分布退化**
+  模型为了保持时序连续性，生成的数值往往过于平滑，导致分布偏离真实数据（尤其重尾、尖峰）。
+
+- **离散变量错误**
+  离散变量被当作连续变量，会出现非法 token，导致 JSD 增大。
+
+因此，必须引入结构性拆分：让不同模块只负责自己擅长的部分。
+
+---
+
+## 1.3 设计原则：解耦与专职化
+
+本项目遵循三条核心原则：
+
+**原则 1：解耦“时序”与“分布”**
+- 时序结构由 Temporal 模块负责
+- 分布细节由 Diffusion 模块负责
+
+**原则 2：连续与离散分开建模**
+- 连续变量适合高斯扩散
+- 离散变量适合 mask‑diffusion
+
+**原则 3：对难学变量分类处理**
+- 程序驱动变量（Type1）不适合 DDPM
+- 执行器变量（Type3）需要额外模型
+
+这些原则构成了本项目的总体方法论。
+
+---
+
+## 1.4 两阶段框架的理论直觉
+
+两阶段方法可以理解为：
+
+\[
+X = \text{Trend}(X) + \text{Residual}(X)
+\]
+
+**趋势部分**：决定序列“怎么走”，例如逐渐上升/下降，保持物理惯性。  
+**残差部分**：决定数值“长什么样”，例如分布形状、重尾、异常点等。
+
+如果直接用一个模型去学 \(X\)，它必须同时解决趋势与分布问题。而我们拆成两部分后：
+- 第一阶段（GRU）只学趋势
+- 第二阶段（Diffusion）只学残差
+
+这样能显著减少“时序 vs 分布”的冲突。
+
+---
+
+## 1.5 扩散模型为什么适合分布对齐
+
+扩散模型（DDPM）是一类生成式模型，核心思想是：
+1) 逐步加噪，把数据变成纯噪声
+2) 逐步去噪，恢复真实分布
+
+数学形式：
+\[
+ x_t = \sqrt{\bar{\alpha}_t}\,x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon
+\]
+
+去噪的目标是预测 \(\epsilon\) 或 \(x_0\)。
+
+**优点**：
+- 可以拟合复杂、多峰、重尾分布
+- 不依赖显式的概率密度函数
+
+这正是 ICS 数据中“分布复杂”的现实需求。
+
+---
+
+## 1.6 为什么离散变量要用 mask‑diffusion
+
+离散变量的关键问题是“语义合法性”。如果直接用连续扩散，会生成不合法值（例如 0.3、1.7 等），导致模型失效。
+
+mask‑diffusion 的思路：
+1) 随机遮蔽部分 token
+2) 模型预测被遮蔽 token
+
+损失是交叉熵：
+\[
+\mathcal{L}_{disc} = \frac{1}{|\mathcal{M}|}\sum_{(i,t)\in\mathcal{M}} CE(\hat{p}_{i,t},y_{i,t})
+\]
+
+这样保证输出必然来自 vocab，语义上合法。
+
+---
+
+## 1.7 为什么需要 Type‑aware 分治
+
+即使有两阶段 + 混合扩散，仍然会发现 KS 被少数变量主导：
+
+- Type1（setpoint/demand）几乎是“程序信号”
+- Type3（执行器）具有“停留+饱和”特性
+- Type5（派生变量）是确定性函数
+
+这些变量从统计机制上就不适合扩散模型，因此需要额外设计。
+
+Type‑aware 的意义是：
+- 把“扩散学不好的变量”剥离出去
+- 让扩散专注于真正适合它的变量（Type4）
+
+---
+
+## 1.8 指标设计的意义（直觉解释）
+
+我们采用三类核心指标：
+
+**(a) KS：分布对齐**  
+衡量生成分布与真实分布最大差距，越小越好。
+
+**(b) JSD：离散一致性**  
+衡量离散状态频率是否接近真实。
+
+**(c) Lag‑1 Diff：时序一致性**  
+衡量相邻时刻相关结构是否接近真实。
+
+这三个指标对应三个核心目标：分布、时序、合法性。
+
+---
+
+## 1.9 理论层面的 trade‑off
+
+可以理解为：
+
+- KS 低 → 表示分布好
+- Lag‑1 diff 低 → 表示时序好
+
+但在实践中：
+- 如果模型过度优化 KS，序列可能变成“随机抽样” → 时序崩坏
+- 如果模型过度优化时序，分布会过于平滑 → KS 上升
+
+因此，一个好的模型必须在二者之间找到平衡点。这正是本项目提出两阶段框架的核心动机。
+
+---
+
+## 1.10 小结
+
+本部分给出了方法的理论基础：
+
+1) ICS 数据复杂且混合类型，单一模型难以兼顾
+2) 两阶段结构用于解耦时序与分布
+3) 混合扩散用于处理连续与离散
+4) Type‑aware 分治用于处理“扩散难学”的变量
+
+下一部分将详细介绍**训练流程与损失设计**。
+
+
+---
+
+# Part 2 / 4 — 训练流程与损失设计（约 5k）
+
+> 本部分重点：让普通读者清楚“模型训练到底在做什么、每一步优化的目标是什么、为什么需要这些 loss”。
+
+## 2.1 训练流程总览（高层视角）
+
+整个训练分为两个阶段：
+
+1) **Temporal Stage**：训练一个时序模型（GRU），学习序列趋势。
+2) **Diffusion Stage**：训练扩散模型，学习趋势残差的分布。
+
+换句话说，训练不是一次性完成，而是先学“怎么变化”，再学“变化后的数值细节”。
+
+---
+
+## 2.2 Temporal Stage：学趋势的意义
+
+工业过程的动态通常平滑且具有惯性。假设我们直接用扩散模型学习整个序列，那么它必须同时完成“序列走向”和“数值分布”的建模任务。这样会产生冲突：
+
+- 过度优化分布 → 序列变随机
+- 过度优化时序 → 分布变窄
+
+Temporal Stage 的目的就是先把“走向”固定住。
+
+### 训练方式（直觉）
+- 输入：连续变量序列
+- 目标：预测下一时刻的连续序列
+- 损失：MSE
+
+这样模型学到的是“趋势结构”，而不是复杂分布。
+
+---
+
+## 2.3 Diffusion Stage：学残差分布
+
+在 Temporal Stage 得到趋势序列 \(T\) 后，我们构造残差：
+\[
+R = X - T
+\]
+
+Diffusion Stage 的目标是拟合残差分布。这样最终生成结果为：
+\[
+\hat{X} = \hat{R} + T
+\]
+
+**直观理解**：
+- 趋势提供骨架
+- 残差提供“真实的噪声与细节”
+
+---
+
+## 2.4 连续分支损失设计
+
+### 基本损失
+在扩散中，我们可以选择预测噪声 \(\epsilon\)，或直接预测原始 \(x_0\)：
+
+\[
+\mathcal{L}_{cont} =
+\begin{cases}
+\|\hat{\epsilon} - \epsilon\|^2 & (eps 预测)\\
+\|\hat{x}_0 - x_0\|^2 & (x0 预测)
+\end{cases}
+\]
+
+### 为什么要预测 eps 或 x0？
+- eps 预测更稳定（扩散原始形式）
+- x0 预测更容易做分布校正
+
+因此在不同实验中会切换。
+
+---
+
+## 2.5 离散分支损失设计
+
+离散变量使用 mask‑diffusion。训练时随机遮蔽一部分 token：
+
+\[
+\mathcal{L}_{disc} = \frac{1}{|\mathcal{M}|}\sum_{(i,t)\in\mathcal{M}} CE(\hat{p}_{i,t},y_{i,t})
+\]
+
+直觉解释：
+- 只对被 mask 的 token 计算交叉熵
+- 学到的是“在上下文中恢复正确状态”
+
+这样避免了离散变量非法输出问题。
+
+---
+
+## 2.6 总损失与权衡
+
+最终损失采用线性加权：
+\[
+\mathcal{L} = \lambda\mathcal{L}_{cont} + (1-\lambda)\mathcal{L}_{disc}
+\]
+
+**\(\lambda\) 的作用**：
+- \(\lambda\) 大 → 连续分布更好，离散可能退化
+- \(\lambda\) 小 → 离散更好，连续分布可能偏差
+
+这是训练中的关键权衡参数。
+
+---
+
+## 2.7 SNR 加权（为什么需要）
+
+扩散过程早期噪声大、后期噪声小。直接用统一损失会导致训练不均衡。
+因此引入 SNR 权重：
+
+\[
+\mathcal{L}_{snr} = \frac{\text{SNR}_t}{\text{SNR}_t+\gamma}\mathcal{L}_{cont}
+\]
+
+直觉解释：
+- 在高噪声阶段，降低损失权重
+- 在低噪声阶段，更强调精细结构
+
+---
+
+## 2.8 分位数损失（为什么对 KS 有效）
+
+KS 本质是比较分布的 CDF。
+因此加入分位数损失：
+
+\[
+\mathcal{L}_Q = \frac{1}{K}\sum_k \|Q_k(x_{real}) - Q_k(x_{gen})\|_1
+\]
+
+直觉解释：
+- 让模型对齐分布的关键分位点
+- 尤其改善重尾和尖峰变量
+
+---
+
+## 2.9 残差统计约束（防止塌缩）
+
+为了避免残差分布塌缩，加入 mean/std 对齐：
+
+\[
+\mathcal{L}_{stat} = \|\mu_{gen}-\mu_{real}\|^2 + \|\sigma_{gen}-\sigma_{real}\|^2
+\]
+
+解释：
+- 约束残差均值与方差
+- 防止模型学到“过于平滑”残差
+
+---
+
+## 2.10 为什么训练会慢（深度解释）
+
+1) **两阶段训练**
+   - Temporal 训练 + Diffusion 训练
+   - 训练成本几乎翻倍
+
+2) **扩散步数大**
+   - timesteps=600
+   - 每次训练/采样都要反复迭代
+
+3) **Transformer backbone**
+   - 自注意力复杂度随序列长度平方增长
+   - seq_len=128 时显存和时间增加明显
+
+---
+
+## 2.11 训练中的常见异常与解释
+
+**(a) loss 降得慢**
+- 原因：扩散训练本身慢
+- 解决：降低 timesteps 或增大 batch
+
+**(b) loss 大幅波动**
+- 原因：噪声采样随机性 + 分布复杂性
+- 解决：增加 EMA 或降低学习率
+
+**(c) lag‑1 差**
+- 原因：trend 学得不足或 seq_len mismatch
+- 解决：提高 temporal_epochs，并统一 seq_len 与 sample_seq_len
+
+---
+
+## 2.12 小结
+
+本部分强调了训练的核心逻辑：
+- Temporal Stage 负责时序结构
+- Diffusion Stage 负责分布残差
+- Loss 设计通过多项损失平衡分布与时序
+
+下一部分将深入说明评估与诊断体系。
+
+
+---
+
+# Part 3 / 4 — 评估与诊断体系（约 5k）
+
+> 本部分重点：让读者理解“如何判断模型好坏”，以及“为什么需要诊断工具”。
+
+## 3.1 为什么只看 loss 不够
+
+训练 loss 只能说明“模型在拟合训练目标”，但不一定等价于真实质量。
+在本项目中，loss 可能降低，但 KS 或 lag‑1 仍可能很差，因为：
+- loss 主要优化局部误差，而 KS 衡量整体分布差异
+- 模型可能学到“平均值附近”，但尾部严重缺失
+- 离散变量可能预测错误，但 loss 依旧较小
+
+因此，必须使用更贴近目标的评估指标。
+
+---
+
+## 3.2 分布一致性指标（KS）
+
+KS 统计量定义为：
+\[
+KS_i = \sup_x |F^{(i)}_{gen}(x)-F^{(i)}_{real}(x)|
+\]
+
+**解释**：
+- \(F_{gen}\)：生成分布的累积分布函数
+- \(F_{real}\)：真实分布的累积分布函数
+- KS 表示两条 CDF 曲线的最大距离
+
+**直观理解**：
+- KS 小 → 生成分布几乎重合
+- KS 大 → 分布偏移明显
+
+**注意**：在尖峰/离散化变量上，必须使用 tie‑aware KS，否则会被高估。
+
+---
+
+## 3.3 离散一致性指标（JSD）
+
+Jensen‑Shannon Divergence：
+\[
+JSD(P,Q)=\tfrac12 KL(P\|M)+\tfrac12 KL(Q\|M)
+\]
+其中 \(M = (P+Q)/2\)。
+
+**解释**：
+- 衡量离散分布差异
+- 适合评估状态类变量
+- 越小越好
+
+---
+
+## 3.4 时序一致性指标（Lag‑1 Diff）
+
+Lag‑1 相关差异：
+\[
+\Delta_{lag1} = \frac{1}{d}\sum_i |\rho_1(\hat{x}_i)-\rho_1(x_i)|
+\]
+
+**解释**：
+- 计算每个变量的相邻相关性
+- 衡量“时间上的惯性是否正确”
+- 越小越好
+
+---
+
+## 3.5 为什么要做诊断
+
+即使平均 KS 下降，也可能是“部分变量非常好，部分变量非常差”。
+这会误导优化方向。因此必须诊断：
+
+- 哪些变量最差？
+- 它们的分布差在什么区间？
+- 它们属于哪种类型？
+
+---
+
+## 3.6 诊断工具 1：Per‑feature KS 排序
+
+我们用 `ranked_ks.py` 输出：
+- 变量 KS 从高到低排序
+- 观察“top‑N 主导问题变量”
+
+**意义**：
+- 精准定位瓶颈
+- 避免盲目改模型
+
+---
+
+## 3.7 诊断工具 2：CDF 图
+
+用 `diagnose_ks.py` 绘制 CDF 对比：
+- 真实 vs 生成
+- 可直观看到“尾部缺失 / 中位数偏移 / 饱和堆积”
+
+这是理解 KS 高的最直接手段。
+
+---
+
+## 3.8 诊断工具 3：Filtered KS
+
+Filtered KS 会剔除：
+- 标准差为 0 的变量
+- KS 极端异常的变量
+
+**意义**：
+- 判断“整体模型是否已经合理”
+- 证明 KS 被少数变量拖垮
+
+---
+
+## 3.9 Type‑wise 评估
+
+为了对应 Type‑aware 分治策略，我们也统计：
+- Type1 KS 均值
+- Type2 KS 均值
+- Type3 KS 均值
+- …
+
+这样可以看到不同类型变量的贡献。
+
+---
+
+## 3.10 评估指标的 trade‑off
+
+现实中常见情况：
+- KS 下降 → lag‑1 上升
+- lag‑1 下降 → KS 上升
+
+因此需要在实验中明确目标：
+- 若目标是分布一致性，接受时序退化
+- 若目标是时序一致性，接受分布偏差
+
+这也是论文中必须讨论的权衡点。
+
+---
+
+## 3.11 KS‑only baseline 的作用
+
+KS‑only baseline 是一种诊断工具：
+- 把难学变量替换为真实分布采样
+- 观察整体 KS 降到的“上限”
+
+意义：
+- 如果 KS 仍高 → 说明模型整体分布学得差
+- 如果 KS 明显降 → 说明瓶颈集中在少数变量
+
+---
+
+## 3.12 评估流程的完整逻辑
+
+评估流程包含：
+1) 生成数据（generated.csv）
+2) 与真实数据对比，计算 KS/JSD/lag‑1
+3) 输出 eval.json
+4) 额外诊断：CDF/Rank/Filtered
+5) 若启用后处理，再评估 eval_post.json
+
+---
+
+## 3.13 小结
+
+本部分说明：
+- 为什么必须用 KS/JSD/lag‑1 评估
+- 为什么要做 per‑feature 诊断
+- 为什么引入 Type‑wise 分治评估
+
+下一部分将讨论实验观察、局限与未来方向。
+
+
+---
+
+# Part 4 / 4 — 实验观察、局限与未来方向（约 5k）
+
+> 本部分重点：总结实验现象，解释为什么会出现这些现象，并给出未来工作的方向。
+
+## 4.1 典型实验现象回顾
+
+在大量实验中，我们观察到几个稳定现象：
+
+1) **KS 可以明显下降，但 lag‑1 可能变差**
+   - 当加强分布对齐（quantile loss、校准）时，分布更好，但时序一致性下降
+
+2) **离散 JSD 相对稳定**
+   - 离散 mask diffusion 保证 token 合法性
+   - JSD 通常不会像 KS 那样剧烈波动
+
+3) **少数变量主导 KS**
+   - Type1/2/3 是 KS 的主要来源
+   - 说明整体分布其实已经合理，问题集中在少数难学变量
+
+---
+
+## 4.2 为什么 Type1/2/3 变量难学
+
+**Type1（setpoint/demand）**
+- 本质是外部程序驱动，而非系统响应
+- 呈现长时间常数 + 突然阶跃
+- 扩散模型更擅长连续变化，难以准确复现这种跳变
+
+**Type2（controller output）**
+- 受控制算法约束（PID）
+- 值域常常饱和于 0 或 100
+- 分布极端尖峰
+
+**Type3（actuator positions）**
+- 常有停留、饱和和离散档位
+- 实际变化规律与简单连续噪声不同
+
+这些特性说明：必须用专门的模型去建模，而不是依靠通用 DDPM。
+
+---
+
+## 4.3 KS‑only baseline 的意义与局限
+
+KS‑only baseline（经验重采样）告诉我们：
+- **理论上 KS 能到多低**
+- 确认瓶颈是否在少数变量
+
+但它的局限也非常明显：
+- 破坏变量之间的依赖关系
+- 无法保证联合分布一致
+- 只能作为诊断工具，而非最终生成器
+
+---
+
+## 4.4 时序 vs 分布的理论矛盾
+
+在本项目中，时序与分布的矛盾非常典型：
+
+- 如果只关注分布：序列会失去惯性，变成“随机抽样”
+- 如果只关注时序：序列会变平滑，失去重尾或多模态特征
+
+两阶段架构正是为了缓解这种矛盾，但仍无法彻底消除。
+
+---
+
+## 4.5 为什么需要条件建模（Conditioning）
+
+真实工业数据往往依赖于“工况/模式”。
+例如：
+- 不同 setpoint 会导致不同 PV 分布
+- 不同状态会导致执行器行为差异
+
+因此，未来需要在模型中加入条件：
+
+\[
+P(X|C)\quad \text{而不是}\quad P(X)
+\]
+
+这种条件建模可以显著减少分布误差。
+
+---
+
+## 4.6 未来方向（具体可操作的路线）
+
+### 方向 1：Type1 程序生成器
+- 用 HMM 或 change‑point 模型生成 setpoint/demand
+- 再作为条件输入扩散模型
+
+### 方向 2：Type2 控制器模拟器
+- 训练一个小型控制器网络
+- 根据 setpoint + PV 输出控制量
+
+### 方向 3：Type3 执行器动力学模型
+- 用“状态 + 停留 + 速率”建模
+- 比纯重采样更真实
+
+### 方向 4：Type4 PV 的条件扩散
+- 加入工况 embedding
+- 在不同 regime 下分布更一致
+
+---
+
+## 4.7 实验写作建议（论文表达角度）
+
+在写论文时，可以这样组织：
+
+1) 展示 KS/JSD/lag‑1 的全局指标
+2) 展示 per‑feature KS 排序
+3) 强调“少数变量主导错误”
+4) 提出 Type‑aware 分治策略
+5) 用 KS‑only baseline 作为诊断上界
+
+这样能显得方法合理且有深度。
+
+---
+
+## 4.8 结论总结
+
+本项目提供了一条清晰路径：
+
+- 先解耦时序与分布（两阶段）
+- 再解决连续与离散（混合扩散）
+- 最后用 Type‑aware 分治处理难变量
+
+因此，它不仅是一个生成模型，更是一套“如何诊断、如何改进”的系统性方法。
+
+---
+
+# 全文总结
+
+mask‑ddpm 的核心贡献在于 **结构性拆解问题**：
+- 时序结构与分布对齐分开建模
+- 连续与离散分开扩散
+- 难变量类型单独处理
+
+通过这些结构化手段，本项目为工业时序生成提供了一个可解释、可扩展的方案。它不是单纯追求指标最优，而是提供了一套完整的研究路径。
+
--- a/links.txt
+++ b/links.txt
@@ -0,0 +1 @@
+https://my.feishu.cn/wiki/Za4dwCsG6iPD9qklRLWcoJOZnnb?from=from_copylink
--- a/notes4coding/checker_design.md
+++ b/notes4coding/checker_design.md
@@ -0,0 +1,357 @@
+## Conversation summary (so far)
+
+* You’re building a **Modbus/TCP traffic generation system for security use** with this pipeline:
+  **feature extraction → hybrid diffusion generates features → generator turns features into raw packets → checker validates traffic**.
+* I proposed a checker that validates traffic at **multiple layers**:
+
+  1. frame/IP/TCP parseability,
+  2. optional TCP conversation sanity,
+  3. Modbus/TCP structural + semantic validity (MBAP + PDU),
+  4. optional “feature-fidelity” checks against generator sidecar metadata.
+* I recommended a **generator → checker contract** that is easy to debug and automate:
+  **PCAP/PCAPNG + JSONL sidecar** (1 metadata line per packet/event).
+* You then shared your existing **descriptor-driven Modbus parser** (Rust) and a short doc.
+  The public API/type model is: `Config → FunctionDescriptor → FieldDescriptor(FieldType, length/length_from/scale/enum_map)` 
+  and `parse_sawp_message(...)` returns JSON containing `unit`, `function`, optional `exception`, and `fields` .
+* I said: the **descriptor engine is very reusable** for the checker, but the checker’s entry point should ideally work on **raw Modbus/TCP bytes** (MBAP+PDU), not require `sawp_modbus::Message`.
+  Also: for checker ergonomics, prefer structured error types over `Result<Value, String>` (but you can keep JSON output for the MVP).
+
+---
+
+# Implementation document: Modbus/TCP Checker (Rust)
+
+This document is a practical, “not too complex” plan to implement the checker while still following good practices where they don’t add much difficulty.
+
+## 1) What the checker does (goals / non-goals)
+
+### Goals
+
+The checker verifies that generator output is:
+
+1. **Parsable** as TCP/IP traffic,
+2. **Modbus/TCP-valid** at the application level (MBAP + PDU rules),
+3. **Consistent** in request/response pairing (Transaction ID matching),
+4. Optionally **matches the expected features** (function code, unit id, quantities, payload size, timing tolerances, etc.).
+
+### Non-goals (to keep it simple)
+
+To avoid turning this into a full Wireshark, we deliberately **do not** implement:
+
+* full TCP stream reassembly (segments split/merged),
+* full TCP state machine with retransmits/out-of-order handling,
+* IP/TCP checksum verification by default.
+
+Instead, we enforce a **generator constraint**: **one Modbus ADU per TCP payload** (no segmentation, no coalescing). This single constraint dramatically reduces checker complexity and is realistic for generated traces.
+
+> Trade-off: best practice would handle segmentation/coalescing and reassembly; difficulty rises a lot. The “one ADU per TCP payload” rule is the best complexity/benefit lever for this project.
+
+---
+
+## 2) Generator output contract (what the checker consumes)
+
+### Recommended output (MVP-friendly and debuggable)
+
+**(A) PCAP or PCAPNG file**
+
+* `trace.pcapng` (or `.pcap`) containing the raw generated packets
+
+**(B) Sidecar JSONL metadata file**
+
+* `trace.meta.jsonl` where each line describes the corresponding packet/event (same order)
+
+This is the easiest way to:
+
+* reproduce failures,
+* correlate packet index with expected semantic fields,
+* produce actionable reports.
+
+### JSONL schema (minimal + optional)
+
+**Minimal fields (recommended):**
+
+* `trace_id` (string/uuid)
+* `event_id` (monotonic integer)
+* `pcap_index` (or implicit by line number)
+* `ts_ns` timestamp
+* `direction` (`"c2s"` or `"s2c"`)
+* `flow` (src/dst ip/port)
+
+**Optional `expected` block (for feature-fidelity checks):**
+
+* `expected.modbus.transaction_id`, `unit_id`, `function_code`, and `expected.fields` (names matching your descriptor JSON).
+
+Example line:
+
+```json
+{
+  "trace_id": "c7f1...",
+  "event_id": 42,
+  "pcap_index": 42,
+  "ts_ns": 1736451234567890123,
+  "direction": "c2s",
+  "flow": {"src_ip":"10.0.0.10","src_port":51012,"dst_ip":"10.0.0.20","dst_port":502},
+  "expected": {
+    "modbus": {"transaction_id": 513, "unit_id": 1, "function_code": 3},
+    "fields": {"starting_address": 0, "quantity": 10}
+  }
+}
+```
+
+> Trade-off: best practice is “self-describing PCAP” (pcapng custom blocks, or embedding metadata); difficulty higher. JSONL sidecar is dead simple and works well.
+
+---
+
+## 3) Workflow (starting from generator output)
+
+### Step 0 — Load inputs
+
+1. Read `trace.meta.jsonl` into a lightweight iterator (don’t load all if trace is huge).
+2. Open `trace.pcapng` and stream packets in order.
+
+### Step 1 — Align packets and metadata
+
+For each packet index `i`:
+
+* read packet `i` from PCAP
+* read metadata line `i` from JSONL
+  If mismatch (missing line/packet), record a **Fatal** alignment error and stop (or continue with “best effort”, your call).
+
+### Step 2 — Decode packet and extract TCP payload
+
+Decode:
+
+* link layer (Ethernet/SLL/RAW depending on PCAP linktype),
+* IPv4/IPv6,
+* TCP,
+* extract TCP payload bytes.
+
+Minimal checks:
+
+* packet parses,
+* TCP payload length > 0 when direction indicates Modbus message,
+* port 502 is present on either side (configurable if you generate non-502).
+
+### Step 3 — Parse Modbus/TCP ADU
+
+Assuming payload contains exactly one ADU:
+
+* parse MBAP (7 bytes) + PDU
+* validate basic MBAP invariants
+* parse function code and PDU data
+* decide request vs response based on `direction`
+* parse PDU data using descriptor map (your reusable part)
+
+### Step 4 — Stateful consistency checks
+
+Maintain per-flow state:
+
+* request/response pairing by `(transaction_id, unit_id)`
+* outstanding request table with timeout/window limits
+
+### Step 5 — Feature-fidelity checks (optional)
+
+If `expected` exists in JSONL:
+
+* compare decoded modbus header + parsed fields with expected values
+* compare sizes and (optionally) timing with tolerances
+
+### Step 6 — Emit report
+
+Output:
+
+* `report.json` with summary + per-finding samples (packet indices, flow key, reason, extracted fields)
+* optional `report.txt` for quick reading
+
+---
+
+## 4) Reusing your existing parser (what to keep, what to adjust)
+
+You already have:
+
+* A descriptor model (`Config/FunctionDescriptor/FieldDescriptor/FieldType`) 
+* A function that returns a JSON representation with the shape the checker wants (`unit`, `function`, optional `exception`, `fields`) 
+
+### 4.1 What is immediately reusable
+
+**Highly reusable for the checker:**
+
+* Descriptor loading (serde)
+* Field decoding logic (length/length_from, scale, enum_map)
+* The “JSON output” idea for reporting and debugging 
+
+### 4.2 Small design adjustment to make reuse clean (recommended)
+
+Your checker will naturally see **raw TCP payload bytes**. So the lowest-friction integration is:
+
+* Implement a tiny **MBAP parser** in the checker:
+
+  * returns `(transaction_id, protocol_id, length, unit_id, function_code, pdu_data)`
+* Then call your descriptor-based decoder on `pdu_data` (bytes **after** function code)
+
+Your doc shows the parser conceptually returns JSON with `fields` and supports request vs response descriptors , which maps perfectly to `direction`.
+
+**Suggested public entrypoint to expose from your parser module:**
+
+* `parse_with_descriptor(pdu_data: &[u8], unit: u8, function: u8, fields: &Vec<FieldDescriptor>) -> Result<Value, String>`
+
+If it’s currently private, just make it `pub(crate)` or `pub` and reuse it. This avoids binding the checker to `sawp_modbus::Message` and keeps implementation simple.
+
+> Trade-off: best practice would be to return a typed struct + typed errors; easier to maintain long term but more refactor work. For your “don’t make it hard” requirement, keeping JSON output + simple error types is totally fine for the first version.
+
+### 4.3 How the checker chooses which descriptor to use
+
+* If `direction == c2s` → request descriptor
+* If `direction == s2c` → response descriptor
+  This matches the intent of having `request` and `response` descriptor vectors in your model .
+
+---
+
+## 5) Checker internal design (simple but extensible)
+
+### 5.1 Core data structures
+
+* `FlowKey { src_ip, src_port, dst_ip, dst_port, ip_version }`
+* `PacketCtx { trace_id, event_id, pcap_index, ts_ns, direction, flow }`
+* `DecodedModbus { transaction_id, protocol_id, length, unit_id, function_code, is_exception, exception_code?, pdu_data, parsed_fields_json? }`
+
+### 5.2 “Rules” model (optional, but keeps code tidy)
+
+Instead of huge if/else blocks, implement a few rules that return findings:
+
+* `RuleMbapValid`
+* `RuleFunctionPduWellFormed` (basic length sanity)
+* `RuleTxIdPairing`
+* `RuleExpectedMatch` (only if sidecar has expected)
+
+If you don’t want a formal trait system initially, just implement these as functions that append to a `Vec<Finding>`.
+
+### 5.3 Findings + severity
+
+Use a compact severity scale:
+
+* `Fatal`: cannot parse / cannot continue reliably
+* `Error`: protocol invalid
+* `Warn`: unusual but maybe acceptable
+* `Info`: stats
+
+A finding should include:
+
+* `pcap_index`, `event_id`, `flow`, `severity`, `code`, `message`
+* optional `observed` and `expected` snippets
+
+---
+
+## 6) What the checker validates (MVP vs stricter)
+
+### MVP validations (recommended first milestone)
+
+1. PCAP + JSONL aligned
+2. Parse Ethernet/IP/TCP and extract payload
+3. MBAP:
+
+   * payload length ≥ 7
+   * length field consistency (basic)
+4. PDU:
+
+   * function code exists
+   * exception handling if `fc & 0x80 != 0`
+5. Descriptor parse success (request/response based on direction)
+6. Transaction pairing:
+
+   * every response matches an outstanding request by transaction_id/unit_id
+   * no duplicate outstanding txid unless you allow it
+
+### “Strict mode” additions (still reasonable)
+
+* enforce unit_id range (if you want)
+* enforce function-code-specific invariants using parsed fields
+
+  * e.g., `byte_count == 2 * quantity` for register reads/writes (if present in descriptor)
+* timeouts:
+
+  * response must arrive within configured window
+
+### Heavy features (avoid unless needed)
+
+* TCP reassembly and multi-ADU per segment
+* checksum verification
+* handling retransmits/out-of-order robustly
+
+---
+
+## 7) Dependencies (crates) for the checker
+
+### Minimal set (keeps implementation easy)
+
+* **PCAP reading**
+
+  * `pcap` (libpcap-backed; you already use it in your codebase)
+* **Packet decoding**
+
+  * `pnet_packet` (you already use `pnet` patterns)
+* **Config + sidecar + report**
+
+  * `serde`, `serde_json`
+* **Errors + logging**
+
+  * `anyhow` (fast to integrate) and/or `thiserror` (nicer structured errors)
+  * `tracing`, `tracing-subscriber`
+* **Utilities**
+
+  * `hashbrown` (optional; std HashMap is fine)
+  * `hex` (useful for debug/trailing bytes like your parser does)
+
+### If you want to reduce external requirements (optional alternative)
+
+* Replace `pcap` with `pcap-file` (pure Rust; no libpcap dependency)
+* Replace `pnet` with `etherparse` (often simpler APIs)
+
+> Trade-off: “best practice” for portability is pure Rust (`pcap-file` + `etherparse`).
+> “Best practice” for least effort *given your current code* is reusing `pcap` + `pnet`.
+
+---
+
+## 8) Suggested project layout (simple)
+
+```
+checker/
+  src/
+    main.rs               # CLI entry
+    config.rs             # descriptor loading
+    meta.rs               # JSONL reader structs
+    pcap_in.rs            # pcap streaming
+    decode.rs             # ethernet/ip/tcp extract payload
+    mbap.rs               # Modbus/TCP MBAP parsing
+    modbus_desc.rs        # reuse your parse_with_descriptor + types
+    state.rs              # outstanding tx table
+    validate.rs           # main validation pipeline
+    report.rs             # report structs + JSON output
+```
+
+---
+
+## 9) Practical implementation tips (to keep it from getting “hard”)
+
+1. **Enforce generator constraints**:
+
+   * one ADU per TCP payload
+   * no splitting/coalescing
+     This keeps checker complexity low and makes failure reasons obvious.
+
+2. **Keep JSON output for parsed fields** at first:
+
+   * You already have a clean JSON shape (`unit`, `function`, `fields`) 
+   * Great for debugging mismatches with `expected.fields`
+
+3. **Add strictness as “modes”**:
+
+   * `--mode=mvp | strict`
+   * or config file toggles
+
+4. **Fail-fast vs best-effort**:
+
+   * For CI or batch filtering, fail-fast on `Fatal` is fine.
+   * For research/debugging, best-effort (continue and collect findings) is more useful.
+
+---
+
--- a/papers/Practical
+++ b/papers/Practical
@@ -0,0 +1,17 @@
+@inproceedings{10.1145/3544216.3544251,
+author = {Yin, Yucheng and Lin, Zinan and Jin, Minhao and Fanti, Giulia and Sekar, Vyas},
+title = {Practical GAN-based synthetic IP header trace generation using NetShare},
+year = {2022},
+isbn = {9781450394208},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3544216.3544251},
+doi = {10.1145/3544216.3544251},
+abstract = {We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46\% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.},
+booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
+pages = {458–472},
+numpages = {15},
+keywords = {synthetic data generation, privacy, network packets, network flows, generative adversarial networks},
+location = {Amsterdam, Netherlands},
+series = {SIGCOMM '22}
+}
--- a/papers/Practical
+++ b/papers/Practical
@@ -0,0 +1,45 @@
+# Practical GAN-based synthetic IP header trace generation using
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文面向网络管理任务（telemetry、anomaly detection、provisioning）中难以共享的packet/flow header traces，指出现有GAN/表格生成与时序生成方法在fidelity（字段相关性与大支持域字段）、scalability-fidelity tradeoff、privacy-fidelity tradeoff上不足。目标是自动学习可生成synthetic IP header traces的端到端框架。方法NetShare将多epoch traces合并并按five-tuple拆分为flow time series，用DoppelGANger式time-series GAN建模；对大范围数值做log(1+x)，IP用bit encoding，端口/协议用基于public data训练的IP2Vec；通过chunk化+fine-tuning并行训练提升效率，并用public pretraining+DP-SGD改善DP训练。结果显示NetShare在六个数据集上跨分布指标总体比基线高46%准确度，并在下游任务上保持accuracy与rank ordering，结论是GAN在该场景可行但需域特定重构与隐私/效率设计。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46% more accuracy than baselines and (2) it meets users’ requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Yucheng Yin, Zinan Lin, Minhao Jin, Giulia Fanti, Vyas Sekar
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+ACM SIGCOMM 2022 Conference (SIGCOMM ’22)
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是：如何在不共享原始数据的前提下，生成对网络任务“足够真实且可用”的synthetic packet/flow IP header traces，同时兼顾三类现实约束——(1) Fidelity：要捕获跨字段与跨记录的相关性（如flow size、跨epoch的同一flow、多包流），并能学习大支持域字段（IP/port、pkts/bytes等）的长尾与heavy hitters；(2) Scalability：面对百万级记录的trace训练成本要可控，避免复杂时序GAN训练耗费数量级更多CPU/GPU-hours；(3) Privacy：在考虑差分隐私(ε,δ)-DP等机制时，避免“DP一上、保真度崩溃”的privacy-fidelity tradeoff。简洁概述：NetShare要把“能用、能训、能隐私化”的网络头部合成数据生成从不可行变为可落地。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) Reformulation as flow time series generation：把按epoch的表格记录改写为“合并全局trace→按five-tuple切成flows→对每条flow生成时间序列”，以显式学习跨包/跨epoch相关性；(2) Field representation/encoding设计：对pkts/bytes等大范围数值做log(1+x)；IP用bitwise encoding；port/protocol用在public data上训练的IP2Vec embedding以兼顾可学性与DP可行性；(3) Scalability via chunking + fine tuning + parallelism：将flow trace按固定时间切成M个chunk，首chunk训练seed model，其余chunk并行fine-tune，并用flow tags保留跨chunk依赖；(4) Practical DP via public pretraining + DP-SGD fine-tuning：先用相近public traces预训练，再用DP-SGD在private data上微调以减少DP-SGD轮数，从而改善privacy-fidelity tradeoff；(5) Post-processing correctness：生成后回映射字段、补齐派生字段（如checksum）并重组为PCAP/NetFlow以保证trace可用与协议一致性。
+
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+Flow(NetFlow)数据集：(1) UGR16：西班牙ISP的NetFlow v9采集数据（论文使用2016年3月第三周）；(2) CIDDS：模拟小型企业环境并注入恶意流量的流量数据集（带benign/attack与attack type标签）；(3) TON_IoT (TON)：物联网遥测/安全数据集（论文使用Train_Test_datasets子集）。Packet(PCAP)数据集：(4) CAIDA Anonymized Internet Traces：商业骨干链路高速度监测点的匿名化PCAP（论文用2018年3月New York collector子集）；(5) Data Center (DC)：IMC 2010论文中“UNI1”数据中心的抓包数据；(6) CyberAttack (CA)：U.S. National CyberWatch Mid-Atlantic CCDC 2012年3月比赛抓包数据（Netresec公开capture files）。另外，用于隐私/编码预训练的public数据：CAIDA Chicago collector 2015（用于IP2Vec端口/协议embedding与DP预训练示例）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) Jensen–Shannon Divergence (JSD)：衡量合成与真实在categorical字段分布的差异（如SA/DA、SP/DP、PR），越小越好；(2) Earth Mover’s Distance (EMD / Wasserstein-1)：衡量continuous字段分布差异（如TS、TD、PKT、BYT、PS、PAT、FS），越小越好；(3) Normalized EMD：为跨字段可比性将EMD按字段归一化到固定区间用于汇总/展示；(4) Downstream task accuracy：如traffic type prediction各分类器准确率，用于检验“训练在合成、测试在真实”的可用性；(5) Order preservation / Spearman’s rank correlation coefficient：比较算法/模型在真实与合成上的相对排序一致性（分类器、sketch、NetML modes），越接近1越好；(6) Heavy hitter count estimation relative error：|error_syn−error_real|/error_real，检验sketch类遥测任务误差是否被合成数据保持；(7) Anomaly ratio relative error：|ratio_syn−ratio_real|/ratio_real，检验异常检测输出比例一致性；(8) Scalability：训练成本以total CPU hours度量，用于比较scalability-fidelity tradeoff；(9) Differential privacy参数(ε,δ)：隐私强度指标（δ固定10^-5），用于privacy-fidelity tradeoff曲线；(10) Protocol/domain correctness checks：如IP地址合法性、pkt/byt与协议约束、port与protocol一致性、最小包长等，用于验证生成trace语义/语法正确性。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+(1) 总体保真度：在六个header traces上，NetShare“across all distributional metrics and traces”相对基线达到46% more accuracy；文中进一步给出总体汇总：NetShare在NetFlow分布指标上平均48%更好、在PCAP分布指标上平均41%更好（以JSD/normalized EMD衡量）。(2) 下游任务：TON数据集的traffic type prediction中，NetShare在五种分类器上均优于CTGAN/STAN/E-WGAN-GP；例如MLP上比次优基线（E-WGAN-GP）准确率高12%，并达到真实数据准确率的84%。排序保持方面，Spearman相关系数：CIDDS上NetShare为0.90（基线0.60/0.60/0.70），TON上NetShare为0.70（基线0.10/0.60/-0.60）。(3) Sketch遥测：重流计数任务的相对误差上，NetShare相对基线平均48%更小，并实现sketch算法排序的perfect match（文中描述）。(4) 异常检测：NetML modes排序相关系数在CAIDA上NetShare=1.00，DC上0.94，CA上0.88，整体优于基线。(5) 隐私：在(ε,δ)-DP下，public pretraining能显著优于naive DP-SGD；但即使在CAIDA上ε=24.24，DP-pretrain-SAME的mean EMD仍比非DP退化约2.3×（naive约3.5×），结论是高维DP合成仍是开放难题。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：(1) 真实packet/flow header traces对监控算法评估、异常检测、容量规划等至关重要，但受业务与隐私限制难以共享；(2) 传统simulation/model-driven方法依赖大量领域知识且难泛化，既有ML/GAN方法又在相关性、规模与隐私上不可用。
+贡献点与主要创新：(1) 提出并实现端到端系统NetShare，把header trace合成从“按epoch表格生成”重构为“全局合并+按flow切分的time-series GAN生成”，系统性解决flow size/跨epoch相关性缺失；(2) 给出面向fidelity–scalability–privacy的字段表示设计（log变换、IP比特编码、端口/协议基于public数据的IP2Vec），并解释为何某些embedding会破坏DP（训练数据依赖的dictionary问题）；(3) 提出chunk化并行训练+seed模型fine-tuning与flow tags以改善scalability-fidelity tradeoff；(4) 将“public pretraining + DP-SGD fine-tuning”引入header trace生成，实证改善privacy-fidelity tradeoff，并量化其局限；(5) 用六个公开数据集、分布指标+多类下游任务（分类、sketch遥测、异常检测）进行系统评估，展示在准确率与排序保持上的实用性，并补充协议一致性检查与可用的PCAP/NetFlow输出流程（含派生字段生成）。
--- a/papers/Practical_GAN-based_synthetic_IP_header_trace_generation_using_NetShare.pdf
+++ b/papers/Practical_GAN-based_synthetic_IP_header_trace_generation_using_NetShare.pdf
--- a/Cellular/Spatio-Temporal
+++ b/Cellular/Spatio-Temporal
@@ -0,0 +1,10 @@
+@ARTICLE{11087622,
+  author={Liu, Xiaosi and Xu, Xiaowen and Liu, Zhidan and Li, Zhenjiang and Wu, Kaishun},
+  journal={IEEE Transactions on Mobile Computing}, 
+  title={Spatio-Temporal Diffusion Model for Cellular Traffic Generation}, 
+  year={2026},
+  volume={25},
+  number={1},
+  pages={257-271},
+  keywords={Base stations;Diffusion models;Data models;Uncertainty;Predictive models;Generative adversarial networks;Knowledge graphs;Mobile computing;Telecommunication traffic;Semantics;Cellular traffic;data generation;diffusion model;spatio-temporal graph},
+  doi={10.1109/TMC.2025.3591183}}
--- a/Cellular/Spatio-Temporal
+++ b/Cellular/Spatio-Temporal
@@ -0,0 +1,54 @@
+# Spatio-Temporal Diffusion Model for Cellular
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+移动网络规划与资源分配需要可用的流量需求建模，但现实中历史数据依赖重、运营商数据受隐私限制，且现有GAN/自回归方法在城域大规模基站流量生成上易不稳定、mode collapse且难刻画同一时空上下文下的uncertainty。论文提出STOUTER（Spatio-Temporal diffusiOn model for cellUlarTraffic genERation），以DDPM式forward diffusion向真实流量注噪、再reverse denoising从高斯先验重建生成流量分布。为捕获long/short-term periodic patterns与spatial patterns，作者预训练temporal graph与base station graph，并设计Spatio-Temporal Feature Fusion Module（STFFM），在Spatio-Temporal UNet（STUnet）中融合时间嵌入、基站嵌入与扩散步信息以预测噪声。实验在Milan与Trentino两区域共5类（及聚合Call/SMS）CDR数据上验证，STOUTER在Internet数据上相对基线平均提升MAE 19.23%、RMSE 18.35%、JSD 52.77%。结论是STOUTER能生成更接近真实分布且包含不确定性波动的城域基站流量，为后续预测与网络优化提供高质量合成数据支撑。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+In the digital era, the increasing demand for network traffic necessitates strategic network infrastructure planning. Accurate modeling of traffic demand through cellular traffic generation is crucial for optimizing base station deployment, enhancing network efficiency, and fostering technological innovation. In this paper, we introduce STOUTER, a spatio-temporal diffusion model for cellular traffic generation. STOUTER incorporates noise into traffic data through a forward diffusion process, followed by a reverse reconstruction process to generate realistic cellular traffic. To effectively capture the spatio-temporal patterns inherent in cellular traffic, we pre-train a temporal graph and a base station graph, and design the Spatio-Temporal Feature Fusion Module (STFFM). Leveraging STFFM, we develop STUnet, which estimates noise levels during the reverse denoising process, successfully simulating the spatio-temporal patterns and uncertainty variations in cellular traffic. Extensive experiments conducted on five cellular traffic datasets across two regions demonstrate that STOUTER improves cellular traffic generation by 52.77% in terms of the Jensen-Shannon Divergence (JSD) metric compared to existing models. These results indicate that STOUTER can generate cellular traffic distributions that closely resemble real-world data, providing valuable support for downstream applications.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Xiaosi Liu, Xiaowen Xu, Zhidan Liu, Zhenjiang Li, Kaishun Wu
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+IEEE Transactions on Mobile Computing
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+论文要解决的核心问题是：在缺乏充分真实运营商数据（隐私限制、数据获取困难）且仅依赖历史数据会带来计算/时延与长期误差累积的情况下，如何生成“城域大规模基站级”蜂窝网络流量，使其同时逼近真实分布并体现三类关键规律：长短期周期性（hourly/daily/weekly）、空间异质性（不同基站/区域差异）以及同一时空条件下的流量不确定性波动（uncertainty pattern）。作者指出现有GAN类生成在大规模场景下易出现训练不稳定与mode collapse，且多依赖知识图谱/多源数据或仅拟合固定模式，忽略不确定性；因此需要一种更稳定、能从随机先验出发逐步“恢复分布”的生成范式，并能把时空先验注入生成过程。简洁概述：用扩散生成替代GAN，在去噪过程中显式融合时间周期图与基站空间图的表示，生成既有规律又有随机波动的基站流量序列分布。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+1) STOUTER：面向蜂窝流量生成的时空扩散框架，forward diffusion逐步加高斯噪声，reverse denoising从N(0,I)采样并迭代去噪重建流量。
+2) Temporal Graph（G_t）+表示学习：用小时节点与两类有向边（hourly relation、daily relation）编码短期与周周期关系，采用GraphMAE式GIN编码器/解码器得到时间段嵌入F_t。
+3) Base Station Graph（G_s）+表示学习：以基站为节点，1km阈值内建无向二值边；节点特征由覆盖POI八类计数构成，用GCN编码器/解码器得到基站嵌入F_s。
+4) 预训练目标L_pre（SCE）：对图自编码重建用Scaled Cosine Error并引入γ>1强调难样本，提升表示学习质量。
+5) STFFM（Spatio-Temporal Feature Fusion Module）：把扩散步t的SinPosEmb+MLP嵌入与Concat(F_s,F_t)经FC融合为F_st，并在每个残差块中作为条件注入卷积特征，引导去噪保留时空模式。
+6) STUnet：带Down/Mid/Up结构的Spatio-Temporal UNet作为噪声预测网络ε_θ(x_t,t)，在Mid-Block用多头注意力、在Down/Up用多头线性注意力融合历史流量、扩散步、时空语义等特征。
+7) 扩散训练目标：以噪声预测误差训练（文中给出基于ε_t与ε_θ的平方误差形式）。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+使用公开CDR数据集：Milan与Trentino（“Call Detailed Records (CDRs) from Italy”，由Semantics and Knowledge Innovation Lab提供；论文引用数据来源为Barlacchi et al., Sci. Data 2015[52]）。流量类型包含5类：Internet、Received-SMS、Sent-SMS、Incoming-Call、Outgoing-Call；并构造聚合数据集：Call（Incoming+Outgoing）与SMS（Received+Sent）。POI数据来源：OpenStreetMap（用于构建基站节点的POI八类统计特征，论文引用[34]）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+1) MAE（Mean Absolute Error）：衡量生成流量与真实流量在基站维度上的平均绝对偏差，反映数值误差水平。
+2) RMSE（Root Mean Square Error）：对大误差更敏感的均方根误差，用于衡量整体重建偏差并强调异常偏离。
+3) KL divergence：用于定义分布差异的基础量（文中作为JSD的组成项）。
+4) JSD（Jensen–Shannon Divergence）：对称且更稳定的分布相似度度量，用于比较生成流量分布与真实分布的接近程度，是论文主指标之一（报告提升52.77%）。此外，案例研究的下游预测同样用MAE与RMSE评估LSTM预测误差。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+总体表现（Section VI-B）：在Milan与Trentino两区域、五类原始流量+聚合Call/SMS上，STOUTER在MAE、RMSE、JSD三指标整体优于TCN-GAN、VAE与以DiffWave/WaveNet为去噪网络的扩散基线。论文给出在Internet数据集上相对基线“平均提升”数值：MAE提升19.23%、RMSE提升18.35%、JSD提升52.77%。可视化（Fig.9）：Trentino Internet一月尺度上，STOUTER更好复现周期趋势且偏差更小；示例指标为STOUTER MAE 0.18、RMSE 0.2558，优于VAE（0.2923/0.4528）、DiffWave（0.2359/0.3384）与TCN-GAN（1.5976/2.5625）。消融（Fig.7）：去掉Temporal Graph（w/o TG）、去掉Base Station Graph（w/o BSG）或两者都去掉（w/o ST）都会导致指标退化，说明时空表示与融合对生成分布至关重要；但在Received-SMS等随机性更强、时空相关弱的数据上，JSD优势相对减弱。参数研究（Fig.8）：扩散步数t从10增至1000整体提升质量，Milan Internet约t=600趋稳、Trentino Internet约t=800趋稳，超过阈值收益递减。案例研究（Fig.10）：用生成数据训练LSTM做流量预测时，STOUTER生成数据使预测误差与“用真实数据训练”的模型差距在MAE/RMSE上可控制在0.1以内，表明生成数据对下游任务可用。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：1) 基站部署与资源优化需要可信的流量需求数据，但真实数据受隐私/获取限制；2) 预测方法依赖大量历史数据且长期预测误差累积；3) GAN类城域流量生成存在训练不稳、mode collapse、对多源知识图谱依赖强且往往忽略不确定性波动。
+贡献点：1) 提出STOUTER：面向大规模基站蜂窝流量生成的spatio-temporal diffusion model，通过从高斯先验逐步去噪生成以刻画uncertainty pattern。2) 构建两类先验图并预训练表示：Temporal Graph建模hourly/daily（含周周期）关系以捕获long/short-term periodic patterns；Base Station Graph融合基站距离与POI八类统计以建模spatial patterns。3) 提出STFFM并构建STUnet：在扩散去噪网络各层中融合扩散步嵌入与时空图嵌入，结合多头注意力/线性注意力，提升对时空模式与不确定性同时建模的能力。4) 大规模真实数据实证：在Milan/Trentino五类流量上较SOTA基线在JSD上提升52.77%（Internet任务），并通过消融、变体、参数敏感性与下游预测案例验证有效性与可用性。
+主要创新之处：将“时空图先验表示（temporal graph + base station graph）”以模块化方式注入DDPM式去噪网络（STUnet），在无需城市知识图谱的前提下，同时复现周期性、空间异质性与不确定性波动，并用分布指标（JSD）系统证明生成分布更贴近真实。
--- a/papers/Spatio-Temporal_Diffusion_Model_for_Cellular_Traffic_Generation.pdf
+++ b/papers/Spatio-Temporal_Diffusion_Model_for_Cellular_Traffic_Generation.pdf
--- a/generation/Flow-based
+++ b/generation/Flow-based
@@ -0,0 +1,11 @@
+@article{Ring_2019,
+   title={Flow-based network traffic generation using Generative Adversarial Networks},
+   volume={82},
+   ISSN={0167-4048},
+   url={http://dx.doi.org/10.1016/j.cose.2018.12.012},
+   DOI={10.1016/j.cose.2018.12.012},
+   journal={Computers &amp; Security},
+   publisher={Elsevier BV},
+   author={Ring, Markus and Schlör, Daniel and Landes, Dieter and Hotho, Andreas},
+   year={2019},
+   month=may, pages={156–172} }
--- a/generation/Flow-based
+++ b/generation/Flow-based
@@ -0,0 +1,48 @@
+# Flow-based Network Traffic Generation using Generative Adversarial Ne
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文面向NIDS评估的flow-based数据稀缺与隐私/标注难题，提出用Generative Adversarial Networks生成逼真的NetFlow/IPFIX流。核心挑战是GAN仅能处理连续属性，而流数据含大量categorical字段（IP/端口/协议）。作者提出三种预处理与生成方法：N-WGAN-GP（数值归一化，将IP/port视为连续值）、B-WGAN-GP（二进制化，将IP/port/bytes/packets转为bit向量）、E-WGAN-GP（Embedding，将IP/port/bytes/packets/duration用IP2Vec学习到R^m连续嵌入）；生成器采用Improved Wasserstein GAN with Gradient Penalty（WGAN-GP）与TTUR训练，评估采用分布可视化、属性逐维欧氏距离以及基于领域知识的七项质量测试。基于CIDDS-001数据（week2-4训练、每法生成8.5M flows），实验显示E-WGAN-GP与B-WGAN-GP能重现条件分布与结构关系（如子网行为、协议/端口一致性），E-WGAN-GP在domain checks整体最佳；N-WGAN-GP易产生不合理值。结论：GAN结合合适的离散到连续表示（尤其IP2Vec嵌入）可高质量生成flow-based网络流，用于NIDS训练与评估。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data. Keywords: GANs, TTUR WGAN-GP, NetFlow, Generation, IDS
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Markus Ring, Daniel Schlör, Dieter Landes, Andreas Hotho
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+Computer & Security
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是：如何让仅能处理连续变量的GAN生成包含大量类别型字段（IP地址、端口、协议等）的flow-based网络流，并保证生成数据在分布多样性与内部语义关系（如子网结构、协议-端口一致性）上逼近真实流量，以用于NIDS的训练与评估。简述：通过将离散流属性转换为适合GAN的连续表示（数值化、二进制化或IP2Vec嵌入），用WGAN-GP+TTUR学习真实流特征并生成高质量合成流，再用领域知识规则与分布距离进行质量评估。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) N-WGAN-GP：将IP四段与端口按数值归一化到[0,1]，连续化bytes/packets/duration后，用WGAN-GP+TTUR生成。 (2) B-WGAN-GP：将IP映射为32位二进制、端口为16位二进制，bytes/packets为限定长度二进制表示，保留子网结构信息后，用WGAN-GP生成。 (3) E-WGAN-GP：扩展IP2Vec，学习IP/port/protocol及bytes/packets/duration的m维嵌入；GAN生成嵌入后通过最近邻（如余弦相似）反映射回原值。 (4) Baseline：独立抽样各属性的经验分布作为对照。 (5) 评估方法：分布可视化、逐属性欧氏距离、七项domain knowledge checks质量测试。
+
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+CIDDS-001（Flow-based benchmark data sets for intrusion detection），来源于Coburg University与Würzburg University的公开数据集；采用OpenStack仿真企业网络环境抓取的NetFlow，使用week2-4（约2200万flows）为训练、week1（约850万flows）为参考评估。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) 可视化分布（小提琴图/时间分布）：检查条件分布与子网行为是否匹配真实流量。 (2) 属性欧氏距离：比较生成数据与训练集在各属性上的概率分布差异，衡量分布逼近程度与概念漂移合理性。 (3) Domain knowledge checks（7项规则）：基于协议-旗标一致性、内部/外部IP约束、HTTP/HTTPS/DNS协议端口对应、广播/多播位置、NetBIOS流结构、bytes/packets范围关系等，检验内部语义与关系正确性。 (4) 生成规模与覆盖（每法生成8.5M flows）：确保多样性与统计稳定性。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+在CIDDS-001上，每种方法生成约8.5百万flows。时间分布方面，E-WGAN-GP曲线平滑且贴近真实week1的工作时段/午休行为。属性分布可视化显示：baseline无法捕获条件分布；N-WGAN-GP错误地集中到ext子网；B-WGAN-GP与E-WGAN-GP能区分客户端/服务器端口与目的IP范围。欧氏距离（相对week2-4）：E-WGAN-GP在source/dest port、bytes、packets等属性上与参考week1相近（如destination port 0.0327，bytes 0.0278，packets 0.0251），N-WGAN-GP在端口/bytes/packets距离较大（如source port 0.5658，bytes 0.5858，packets 1.0416）。七项domain checks（%通过率）：E-WGAN-GP在Test1–7整体最高（如Test1 99.77、Test2 99.98、Test7 99.49）；B-WGAN-GP在协议端口一致性表现优（Test3 99.97、Test4 99.90），但在广播相关规则较弱（Test5 47.13、Test6 40.19）；N-WGAN-GP在UDP/TCP旗标和DNS/HTTP规则明显较差（Test4 87.14）；baseline整体最低。结论：E-WGAN-GP质量最佳、B-WGAN-GP次之且可生成未见值，N-WGAN-GP不适合。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：NIDS评估缺乏最新、可共享且带标签的flow-based数据；GAN擅长生成，但网络流含大量类别型字段、语义关系复杂，直接应用受限。
+
+贡献点：(1) 提出首个面向完整flow-based网络流的GAN生成框架，采用WGAN-GP+TTUR稳定训练；(2) 提出三种离散到连续转换策略（N/B/Embedding），系统解决GAN处理类别属性的关键障碍；(3) 扩展IP2Vec以学习bytes/packets/duration等数值字段的上下文化嵌入，并给出嵌入反映射方案；(4) 提出结合分布距离与七项领域知识规则的综合评估方法；(5) 在CIDDS-001大规模实验中验证两种方法能生成高质量数据。
+
+主要创新：用IP2Vec嵌入将复杂离散网络流属性及数值属性统一到连续空间，使GAN能内生学习属性间条件依赖；提出领域知识驱动的质量测试，补足仅看分布的评估盲点；二进制化策略保留子网与结构信息，兼顾生成未见值的能力。
--- a/generation/Flow-based
+++ b/generation/Flow-based
--- a/generation/Intro.txt
+++ b/generation/Intro.txt
@@ -0,0 +1,13 @@
+网络流量/Trace 生成与“可用性”讨论（支撑你做语义 trace 生成，而不是原始字节生成）
+
+Yin et al. Practical GAN-based Synthetic IP Header Trace Generation using NetShare. ACM SIGCOMM 2022.
+用途：它强调“生成可用的协议字段 trace”与实用评估（不是只看视觉相似）。你可以借鉴其“字段级一致性/约束”的评估思路。
+
+Lin et al. Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions. ACM IMC 2020.
+用途：专门讨论网络化时间序列共享/合成的挑战（相关性、隐私、评估）；你做 Modbus 合成的“评估指标设计”很适合引用它的观点。
+
+Ring et al. Flow-based Network Traffic Generation using Generative Adversarial Networks. Computers & Security 2019.
+用途：作为 GAN 基线类相关工作，对比扩散模型的训练稳定性与多样性优势。
+
+Vishwanath & Vahdat. Swing: Realistic and Responsive Network Traffic Generation. IEEE/ACM ToN 2009.
+用途：传统 traffic generator 经典工作；用于 related work 中“非深度学习合成”的对比。
--- a/generation/Practical
+++ b/generation/Practical
@@ -0,0 +1,17 @@
+@inproceedings{10.1145/3544216.3544251,
+author = {Yin, Yucheng and Lin, Zinan and Jin, Minhao and Fanti, Giulia and Sekar, Vyas},
+title = {Practical GAN-based synthetic IP header trace generation using NetShare},
+year = {2022},
+isbn = {9781450394208},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3544216.3544251},
+doi = {10.1145/3544216.3544251},
+abstract = {We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46\% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.},
+booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
+pages = {458–472},
+numpages = {15},
+keywords = {generative adversarial networks, network flows, network packets, privacy, synthetic data generation},
+location = {Amsterdam, Netherlands},
+series = {SIGCOMM '22}
+}
--- a/generation/Practical
+++ b/generation/Practical
@@ -0,0 +1,48 @@
+# Practical GAN-based synthetic IP header trace generation using NetShare
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文针对网络管理与安全中的包/流头部数据难以共享的问题，探索用GAN生成合成IP头部踪迹以替代真实PCAP/NetFlow数据。作者识别出fidelity、scalability-fidelity与privacy-fidelity三大挑战：现有表格/时间序列GAN难以捕捉字段相关性与大取值空间、训练成本高、差分隐私训练破坏保真度。为此提出端到端框架NetShare：将跨epoch的头部轨迹重构为merged flow-level time series，并用DoppelGANger类时序GAN生成；结合域知识进行字段编码（log(1+x)、IP位编码、端口/协议IP2Vec嵌入）以提升可学性；通过chunk并行+seed fine-tuning与“flow tags”保留跨块相关性；引入DP-SGD并用相近公共数据预训练改善privacy-fidelity。基于6个异质数据集（UGR16、CIDDS、TON，CAIDA、数据中心、网络攻防PCAP），结果显示：在分布指标上NetShare较基线平均提升46%，在下游任务保持accuracy与rank ordering；相较基线实现更优scalability-fidelity，并在差分隐私设定下优于naive DP。结论：NetShare将域洞察与ML结合，实用地生成高保真、可扩展且具更好隐私折中的合成头部踪迹。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46% more accuracy than baselines and (2) it meets users’ requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Yucheng Yin, Zinan Lin, Minhao Jin, Giulia Fanti, Vyas Sekar
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+ACM SIGCOMM 2022
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是如何用GAN生成高保真、可扩展、具隐私保障的PCAP/NetFlow合成头部踪迹，使其在字段分布、跨包/跨epoch相关性、下游任务准确性与排序上接近真实数据，同时缓解训练成本与差分隐私带来的保真损失。简述：将头部生成重构为flow-level时序问题，配合域感知编码与并行微调，联用公共数据预训练+DP-SGD，实现高保真且更实用的隐私合成。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) 问题重构：merge多epoch为巨型trace，按五元组拆分flow，用时序GAN学习跨包/跨epoch相关性。 (2) 域感知编码：连续大范围字段用log(1+x)；IP用bit编码；端口/协议用IP2Vec嵌入（在公共数据上训练），提升可学性与隐私。 (3) 可扩展训练：将trace按固定时间切分为chunks；首块训练seed模型，其余并行fine-tuning；用“flow tags”保留跨块出现关系。 (4) 隐私增强：DP-SGD微调并结合公共数据预训练（相似域），改善privacy-fidelity折中。 (5) 合规后处理：最近邻反映射嵌入、补派生字段（如校验和）、按原始时间字段合并为PCAP/NetFlow。
+
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+(1) UGR16（NetFlow v9，西班牙ISP，2016年3月第三周）；(2) CIDDS（仿真小型企业环境，有标签的Benign/Attack与类型）；(3) TON_IoT（IoT遥测子数据集，含9类攻击与正常流）；(4) CAIDA（骨干链路PCAP，2018年3月纽约采集器；公共数据）；(5) Data Center（IMC 2010研究的数据中心UNI1 PCAP）；(6) CyberAttack（美国网络攻防竞赛PCAP，2012年3月）。每数据集选取连续100万样本用于评测。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) 分布一致性：对SA/DA、SP/DP、PR用Jensen-Shannon Divergence；对TS、TD、PKT、BYT、PS、PAT、FS用Earth Mover’s Distance（归一化）。衡量合成与真实分布的距离。 (2) 下游任务保真：accuracy preservation与order preservation，用真实/合成训练-测试的准确率与Spearman秩相关评估分类器与测量算法的排序一致性。 (3) Sketch重估误差：重估heavy hitter计数的相对误差，评估合成数据在网络测量中的可用性。 (4) 异常检测保真：NetML多模式（IAT、SIZE、IAT_SIZE、STATS、SAMP-NUMP、SAMP-SIZE）下的异常比例相对误差及秩相关。 (5) 可扩展性-保真折中：训练CPU小时 vs 平均JSD/EMD，衡量效率与质量。 (6) 隐私-保真折中：差分隐私参数(ε, δ) vs 平均JSD/EMD，评估DP训练影响。 (7) 协议/域知识一致性测试：IP有效性、字节-包关系、端口-协议一致性、最小包长等，确保语义/语法正确性。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+在6个数据集上，NetShare在分布指标相较基线平均提升46%（NetFlow约48%，PCAP约41%）；在UGR16与CAIDA示例中，NetShare的平均JSD/归一化EMD显著低于CTGAN、E-WGAN-GP、STAN及PCAP基线。下游任务方面：在TON流分类，NetShare在MLP上比次优基线高12%准确率，达到真实数据性能的84%，并在CIDDS/TON上获得更高秩相关（如CIDDS 0.90）；在Sketch重估，3个数据集上NetShare相对误差最低（平均减少约48%），并保持算法排序；在PCAP异常检测，NetShare多数模式相对误差最低，且在CAIDA上模式排名完全匹配。可扩展性方面，NetShare通过并行微调与chunking达成较优CPU小时-保真折中，PCAP场景相对CTGAN提升近一数量级的JSD，同时较DoppelGANger近一数量级更快。隐私方面，naive DP-SGD显著劣化保真度；用相似公共数据预训练后在ε≈24的CAIDA上将平均EMD相对naive DP从3.5×降至2.3×，但仍存在折中。协议一致性测试中，NetShare在多项测试达到≈95–99%通过率。结论：NetShare在保真、任务可用性与训练效率上优于现有方法，并以预训练提升DP情形的可用性。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：现实PCAP/NetFlow头部踪迹因隐私/政策难共享，现有模拟/模型/ML方法在字段相关性、大取值空间、跨epoch相关与隐私保障上存在不足，且训练成本高、差分隐私显著破坏保真。
+
+贡献点：(1) 提出NetShare，重构为flow-level时序生成并实现端到端系统与Web原型；(2) 域感知编码策略（log变换、IP位编码、端口/协议IP2Vec公共预训练）提升GAN对网络字段的可学性与鲁棒性；(3) 并行可扩展训练（chunk+seed fine-tuning+flow tags）在保持相关性的同时显著降低CPU小时；(4) 差分隐私微调框架（公共数据预训练+DP-SGD）改善privacy-fidelity折中；(5) 系统性评测：在6数据集与多类分布/任务/一致性指标上验证较基线平均46%保真提升与排序保持。
+
+主要创新：将跨epoch合并与按五元组时序化的生成范式用于网络头部踪迹；将端口/协议嵌入与IP位编码的混合表示用于兼顾保真与隐私；提出带flow tags的并行微调保持跨块相关；首次在网络头部生成中实证差分隐私预训练微调提升隐私-保真折中。
--- a/generation/Practical
+++ b/generation/Practical
--- a/generation/Realistic
+++ b/generation/Realistic
@@ -0,0 +1,38 @@
+@article{10.1145/1151659.1159928,
+author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
+title = {Realistic and responsive network traffic generation},
+year = {2006},
+issue_date = {October 2006},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+volume = {36},
+number = {4},
+issn = {0146-4833},
+url = {https://doi.org/10.1145/1151659.1159928},
+doi = {10.1145/1151659.1159928},
+abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
+journal = {SIGCOMM Comput. Commun. Rev.},
+month = aug,
+pages = {111–122},
+numpages = {12},
+keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets}
+}
+
+@inproceedings{10.1145/1159913.1159928,
+author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
+title = {Realistic and responsive network traffic generation},
+year = {2006},
+isbn = {1595933085},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/1159913.1159928},
+doi = {10.1145/1159913.1159928},
+abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
+booktitle = {Proceedings of the 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications},
+pages = {111–122},
+numpages = {12},
+keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets},
+location = {Pisa, Italy},
+series = {SIGCOMM '06}
+}
+
--- a/generation/Realistic
+++ b/generation/Realistic
@@ -0,0 +1,58 @@
+# Realistic and responsive network traffic generation
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文提出Swing，一个闭环、网络响应式的traffic generator，旨在在网络仿真/仿真环境中用简单的structural model重现真实链路上的packet/flow交互与burstiness。背景问题是现有生成器难以同时匹配多时间尺度的到达过程方差（energy plots）、应用与用户层行为，以及广域网条件。Swing从单点tcpdump观察自动提取用户/会话/连接/网络四层参数分布（如numRRE、interRRE、numconn、interConn、numpairs、request/response sizes、reqthink，以及链路capacity/latency/loss），在ModelNet哑铃拓扑上用commodity stacks生成live traffic，并记录target link上的trace。方法包含被动估计RTT（SYN→SYN/ACK、响应→ACK时差）、packet-pair容量、基于序列号洞的丢包率，并用小波MRA的energy plots验证burstiness。结果表明：Swing在Mawi、CAIDA、Auck多数据集上，生成trace的统计特性与原trace相似，首次跨多时间尺度（含sub-RTT）重现字节/包到达能量谱；敏感性分析揭示需同时捕获用户/应用/网络特征，且可在场景投射中响应地调整延迟、容量、应用占比。结论：简单但结构化的模型加网络条件重放即可现实、响应地生成真实网络流量。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing’s ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Kashi Venkatesh Vishwanath, Amin Vahdat
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+ACM SIGCOMM 2006
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题：如何从单点观测的packet header traces，构建一个语义明确且可响应的结构化模型，并在网络仿真环境中生成live traffic，准确重现多时间尺度（含sub-RTT）的burstiness与应用/用户/网络交互，同时支持场景投射。简述：提出Swing，用四层结构模型加被动网络特性估计与仿真回放，实现统计相似、跨尺度能量谱匹配的真实感流量生成。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) 结构化参数化模型：四层（Users/Sessions/Connections/Network）参数与分布（如numRRE、interRRE、numconn、interConn、numpairs、request/response sizes、reqthink、capacity/latency/loss）统一描述应用行为。
+ (2) 会话/RRE提取：基于SYN/FIN/时序与RREtimeout/SESStimeout将连接聚类为RRE与会话，得到分布的经验CDF。
+ (3) 被动网络特性估计：用SYN→SYN/ACK、响应→ACK时差估RTT分解；packet-pair估瓶颈容量；序列号“holes”与重传估loss。
+ (4) 仿真生成：在ModelNet哑铃拓扑配置链路MTU/容量/延迟/丢包；部署多生成器/监听器用commodity stacks按分布驱动live连接。
+ (5) 验证与敏感性分析：用小波MRA energy plots比较字节/包到达的多尺度方差；探究网络/用户/应用参数缺省的影响。
+ (6) 场景投射：调整延迟、容量、响应大小、应用占比等，以测试trace在替代场景下的响应。
+
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+(1) CAIDA OC-48 MFN Backbone 1 链路（San Jose–Seattle）公共trace；(2) MAWI（WIDE项目）Trans-Pacific 18Mbps CAR 链路trace；(3) University of Auckland（NLANR存档，Auckland-VI）OC3c ATM 链路trace。均来自公开trace存储库（CAIDA、MAWI、NLANR）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) 应用/总体带宽与pps：比较生成trace与原trace的Mbps与packets/sec，验证宏观负载一致性。
+ (2) 参数分布一致性：各模型参数的median与IQR（numconn、interConn、numpairs、numRRE、interRRE、reqthink、请求/响应大小），检验结构模型重现性。
+ (3) 小波MRA能量谱（energy plots）：比较不同时间尺度下字节/包到达过程方差，验证burstiness（含RTT与瓶颈容量对应的特征“dip”）。
+ (4) 网络特性CDF：两向延迟、上下游容量、丢包率分布的CDF，对比被动估计与生成trace的网络条件一致性。
+ (5) 敏感性分析：移除或变更网络/应用参数（如无网络重放、仅延迟/容量、去interRRE/interConn）对能量谱的影响，评估必要性与重要性。
+ (6) 响应性实验：加倍延迟/响应大小/改变应用占比对能量谱与带宽的影响，评估场景投射能力。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+在Auck/Mawi/CAIDA三数据集上：总体带宽与pps（如CAIDA：184Mbps、Auck与Mawi多案例）在Trace/Swing对照中接近；参数分布median/IQR高一致（例如HTTP请求/响应大小、numconn、interConn、reqthink等），少数人机“think”参数差异仍可接受。能量谱方面：Swing在HTTP、SQUID、KAZAA与方向双向的字节/包到达能量谱均与原trace紧密重叠，能再现RTT尺度dip（如200ms）与瓶颈容量对应dip（如3.5Mbps→约8ms）。网络特性估计的延迟/容量/丢包CDF与生成trace吻合度高。敏感性显示：不重放网络条件会丢失sub-RTT结构；仅延迟或延迟+容量不足；去interRRE或interConn会改变大尺度或整体burstiness。响应性显示：延迟加倍使能量谱特征右移一档（log2），响应大小加倍使能量谱整体升高且带宽由9Mbps增至19Mbps；增大SQUID占比使整体能量谱更类似SQUID曲线。结论：Swing首次跨多时间尺度重现burstiness，生成trace统计上与原trace相似，并能以语义明确的参数实现响应式场景投射。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：需要一个能生成真实、可响应且最大随机性的网络traffic generator，既匹配多时间尺度burstiness，又能从单点观测自动抽取用户/应用/网络参数，并支持投射到不同网络/应用场景。
+
+贡献点：(1) 提出Swing，一个闭环、网络响应式生成框架，统一结构模型覆盖Users/Sessions/Connections/Network；(2) 从单点packet headers自动提取应用会话/RRE/连接参数的经验CDF与网络特性（RTT分解、packet-pair容量、序列号洞丢包）；(3) 在ModelNet哑铃拓扑用commodity stacks生成live traffic，统计上复现原trace的带宽/pps与结构参数分布；(4) 用小波MRA能量谱首次跨多时间尺度（含sub-RTT）重现字节/包到达burstiness；(5) 系统敏感性与响应性分析，证明用户/应用/网络三类特征均为必要，并展示调整延迟/容量/应用占比的可投射能力。
+
+创新之处：以简洁结构化模型与被动网络条件估计相结合，在仿真环境中重放多层交互，首次在多时间尺度上重现burstiness并保持统计相似，同时支持面向未来场景的参数化、响应式投射。
--- a/generation/Realistic
+++ b/generation/Realistic
--- a/generation/Using
+++ b/generation/Using
@@ -0,0 +1,10 @@
+@inproceedings{Lin_2020, series={IMC ’20},
+   title={Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions},
+   url={http://dx.doi.org/10.1145/3419394.3423643},
+   DOI={10.1145/3419394.3423643},
+   booktitle={Proceedings of the ACM Internet Measurement Conference},
+   publisher={ACM},
+   author={Lin, Zinan and Jain, Alankar and Wang, Chen and Fanti, Giulia and Sekar, Vyas},
+   year={2020},
+   month=oct, pages={464–483},
+   collection={IMC ’20} }
--- a/generation/Using
+++ b/generation/Using
@@ -0,0 +1,71 @@
+# Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文针对网络与系统领域“数据共享难”的痛点，探讨利用Generative Adversarial Networks为多维元数据关联的长时序数据生成高保真合成数据，以最小专家介入支持结构表征、预测建模与算法评估。作者识别GAN在fidelity（长程依赖、复杂多维关系、mode collapse）与privacy（理论保证牺牲保真、记忆泄露）上的关键挑战，提出DoppelGANger工作流：解耦元数据与时序的联合生成P(A)·P(R|A)、为元数据引入auxiliary discriminator、batch generation的RNN以捕捉长时程相关、auto-normalization为每个样本生成min/max“伪元数据”缓解mode collapse。基于WWT、MBA、GCUT真实数据集，DG在分布与自相关等结构微基准和下游任务上较基线最高提升达43%，能保持算法排序；隐私方面表明DP-SGD破坏时序相关，提出用更大训练集缓解membership inference并通过仅重训元数据生成器实现业务属性混淆。结论：DG在保真与通用性上展现初步可行性，但隐私-保真折中仍是开放问题。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Zinan Lin, Alankar Jain, Chen Wang, Giulia Fanti, Vyas Sekar
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+ACM Internet Measurement Conference (IMC 2020)
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题：如何用GAN在不依赖大量领域专家知识的前提下，为带多维元数据的网络化长时序数据生成高保真合成数据，既能保留长程时间相关与测量-元数据的复杂关系、避免mode collapse，又能在隐私上避免记忆泄露，并支持实际下游任务。简述：提出DoppelGANger，通过解耦/条件生成、批量RNN、辅助判别器与样本级归一化（min/max伪元数据）系统性提升保真，验证其在多数据集与任务上的效果，同时分析差分隐私训练的破坏性与可行的隐私折中。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+- 解耦+条件生成：分解P(A,R)=P(A)·P(R|A)，先MLP生成元数据，再以元数据作为条件由RNN生成时序，保留测量-元数据依赖。
+- 辅助判别器：为元数据单独设置discriminator并与主判别器加权联合训练，提升元数据分布保真。
+- Batch generation RNN：每次生成S个连续记录，降低RNN步数，捕捉长时程相关。
+- Auto-normalization：对每个样本单独归一化，并学习/生成min/max为“伪元数据”，生成后再反缩放，缓解mode collapse。
+- 变长长度标记：生成flag控制序列终止，学习长度分布。
+- 隐私策略：用更大训练集降低membership inference成功率，仅重训元数据生成器实现属性分布混淆；评估DP-SGD对保真的影响。
+
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+- Wikipedia Web Traffic (WWT)：Kaggle Web Traffic Time Series Forecasting，日页面浏览量与域/访问方式/agent元数据。
+- FCC Measuring Broadband America (MBA)：美国FCC公开数据，家庭UDP ping丢包率与总流量，元数据含接入技术/ISP/州。
+- Google Cluster Usage Traces (GCUT)：Google集群任务资源使用日志（公开数据集），秒级测量聚合为5分钟统计与任务结束事件类型元数据。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+- 自相关（ACF）与均方误差（MSE）：衡量长短期时间相关的保真，比较真实与合成序列的自相关曲线误差。
+- 分布距离：如Jensen-Shannon/Wasserstein-1，评估测量、元数据及联合分布的接近程度（如(max+min)/2、带宽CDF、Pearson相关系数CDF）。
+- 变长分布拟合：任务时长直方图拟合（捕捉多模态）。
+- 下游预测精度：在真实/合成数据训练测试的分类/回归（MLP、LR、SVM等）准确率或R²，检验泛化。
+- 排序保持（Spearman秩相关）：比较算法在真实与合成数据上的性能排序一致性。
+- 训练资源与稳定性：训练时间、样本规模与MSE敏感性，评估可用性。
+- 隐私评估：membership inference成功率与训练集规模的关系；DP-SGD下自相关退化情况。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+DG在WWT上自相关MSE较最佳基线（RCGAN）降低约91.2%；在预测任务中，GCUT的MLP分类器用DG生成数据训练、真实数据测试的准确率比次优基线（AR）高43%，达到真实数据的约80%；算法排序方面，DG在GCUT与WWT的Spearman秩相关分别达1.00与0.80，保持排序优于多数基线；长度分布上，DG准确复现GCUT的双峰模式，RNN等基线缺失次模态；测量-元数据联合分布（MBA中DSL/Cable总带宽CDF）的Wasserstein-1距离最小。隐私上，membership inference随训练样本增多降至约51%（近随机），但DP-SGD（如ε≈1.45或更小）显著破坏自相关与保真。结论：DG在结构微基准与下游任务上显著优于基线，具备通用性与可用性；隐私-保真仍需新方法突破。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：网络/系统领域数据共享受限，亟需无需大量专家参与的通用合成数据框架，既保留长时序与多维关系，又兼顾隐私与实际任务可用性。
+
+贡献点：
+
+- 提出DoppelGANger，系统性解决GAN在时间序列的fidelity挑战（长程相关、mode collapse、测量-元数据关系）。
+- 设计解耦/条件生成与auxiliary discriminator，提升元数据与联合分布保真。
+- 引入batch generation的RNN与样本级auto-normalization（min/max伪元数据），缓解mode collapse并学习长时程依赖。• 在WWT/MBA/GCUT多数据集与结构/预测/排序评估中实证高保真（最高43%提升）。
+- 系统性分析隐私折中，揭示DP-SGD对时序保真的破坏，提出实用缓解（更大训练集、只重训元数据生成器）。
+
+创新之处：
+
+- 将样本级归一化与伪元数据联合到GAN工作流中，解决时序值域多样性引发的mode collapse。
+- 元数据-时序的解耦条件生成+辅助判别器组合，在网络化长时序场景下有效提升联合分布学习。
+- 批量时序生成策略，使RNN在不牺牲长程相关的前提下可训练、稳定地生成长序列。
+- 面向数据共享的任务化评估框架（结构/预测/排序）与隐私实践建议。
--- a/generation/Using
+++ b/generation/Using
--- a/Fuzzing/AFLNet
+++ b/Fuzzing/AFLNet
@@ -0,0 +1,9 @@
+@misc{meng2025aflnetyearslatercoverageguided,
+      title={AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing}, 
+      author={Ruijie Meng and Van-Thuan Pham and Marcel Böhme and Abhik Roychoudhury},
+      year={2025},
+      eprint={2412.20324},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2412.20324}, 
+}
--- a/Fuzzing/AFLNet
+++ b/Fuzzing/AFLNet
@@ -0,0 +1,47 @@
+# AFLNet Five Years Later On Coverage-Guided Protocol Fuzzing
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+协议实现是stateful且message-driven，同一消息在不同内部state下可能产生不同response，使传统coverage-guided greybox fuzzing（如AFL）难以有效测试网络协议。论文旨在对AFLNet（首个code- and state-coverage-guided protocol fuzzer）给出扩展技术讨论与大规模实证评估，并回顾其五年影响。方法上，AFLNet以message sequence作为seed，基于pcap录制/回放构建初始corpus，在线学习implemented protocol state machine（IPSM），用response code等标识state并统计#fuzz/#selected/#paths；在seed selection中交织queue顺序与state heuristics以导向progressive states；对序列分割为M1/M2/M3并在M2上施加protocol-aware与byte-level mutation；在同一bitmap中同时维护branch coverage与state transition coverage以判定interesting。结果显示：state feedback单独使用在部分对象上显著优于black-box；加入state feedback使state coverage平均提升35.67×，但对code coverage提升总体不显著；交织式seed-selection在综合code/state覆盖上最稳健。结论：state反馈能显著扩大协议状态空间探索，但“state定义”与吞吐等仍是关键挑战。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+Abstract—Protocol implementations are stateful which makes them difficult to test: Sending the same test input message twice might yield a different response every time. Our proposal to consider a sequence of messages as a seed for coverage-directed greybox fuzzing, to associate each message with the corresponding protocol state, and to maximize the coverage of both the state space and the code was first published in 2020 in a short tool demonstration paper. AFLNet was the first code- and state-coverage-guided protocol fuzzer; it used the response code as an indicator of the current protocol state. Over the past five years, the tool paper has gathered hundreds of citations, the code repository was forked almost 200 times and has seen over thirty pull requests from practitioners and researchers, and our initial proposal has been improved upon in many significant ways. In this paper, we first provide an extended discussion and a full empirical evaluation of the technical contributions of AFLNet and then reflect on the impact that our approach and our tool had in the past five years, on both the research and the practice of protocol fuzzing.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Ruijie Meng, Van-Thuan Pham, Marcel Böehme, Abhik Roychoudhury
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+文段未给出会议或期刊信息。
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是：如何将面向“单输入/近似无状态程序”的coverage-guided greybox fuzzing扩展到网络协议这种stateful、需要message sequence驱动且状态空间巨大的目标上，并同时兼顾code coverage与state space coverage。传统做法要么靠手工协议模型的stateful blackbox fuzzing（依赖不完备的状态/数据模型，且不保留“有趣”用例继续进化），要么把消息序列拼成文件交给AFL（无法聚焦关键消息、易生成大量无效序列）。论文围绕AFLNet提出并系统评估的一套解法：以消息序列为seed、在线推断IPSM并把状态反馈纳入引导与“interesting”判定，从而更系统地探索协议实现的状态与代码。简洁概述：让灰盒模糊测试“看见并利用协议状态”，从而可有效fuzz stateful protocols。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) 消息序列作为seed的SCGF：把sequence of messages而非单文件输入作为进化种子，适配stateful server。(2) 录制/回放驱动（pcap→parse→send）：从真实流量提取初始语料并可重复回放以执行fuzzing迭代。(3) 轻量协议学习IPSM（implemented protocol state machine）：从response序列抽取state transitions，在线增量构建/更新状态机并维护#fuzz/#selected/#paths统计。(4) 面向progressive states的引导：按“盲点/新近/高产出”启发式选state，再在到达该state的子语料上做AFL式优先级选序列。(5) 交织式seed-selection：在coverage plateau时切换到state-heuristic重策略，否则按AFL队列顺序，兼顾吞吐与导向。(6) 三段式序列变异M1/M2/M3：固定前缀M1保证到达目标state，只在候选段M2做变异并继续执行后缀M3以观察传播效应。(7) 协议感知变异算子：对消息做replacement/insertion/duplication/deletion并与byte-level mutation堆叠。(8) 统一bitmap记录code+state覆盖：为state transition预留bitmap区域（SHIFT_SIZE），用分支与状态转移共同定义interesting seeds。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+基准为ProFuzzBench（Natella & Pham, ISSTA 2021工具/基准论文：ProFuzzBench: A benchmark for stateful protocol fuzzing），论文在其默认集成的网络协议实现（如Bftpd、DNSmasq、OpenSSH、TinyDTLS、Live555、ProFTPD、Pure-FTPd、Exim、DCMTK、Kamailio、forked-daapd、lightFTP等）上进行评测。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) Code coverage：以branch coverage（分支覆盖数）衡量探索到的代码范围，“未覆盖代码无法触发漏洞”。(2) State space coverage：以IPSM中构建的state transitions数量（以及状态数量/覆盖）衡量探索到的协议状态空间。(3) Vargha-Delaney effect size（Â12）：衡量两组独立实验结果的优势概率/效应量，用于判断差异是否具有“显著优势”（文中以Â12≥0.71或≤0.29作为显著门槛）。(4) 时间维度覆盖趋势：branch covered over time（24小时曲线）用于对比不同变体达到同等覆盖所需时间（如提到“约6×/4×更快达到相同分支数”）。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+RQ1（仅state反馈）：AFLNetDARK（仅state feedback）在12个ProFuzzBench对象中有6个在code coverage上显著优于AFLNetBLACK（无code/state反馈）：Bftpd、DNSmasq、Kamailio、lightFTP、ProFTPD、Pure-FTPd；并在OpenSSH与TinyDTLS上分别约6×与4×更快达到与BLACK相同的分支数；对state数量很少的对象（如DCMTK最终仅3个state）提升不明显。结论：当state数量“足够”时，state反馈可作为无代码插桩场景的有效引导。RQ2（state+code vs 仅code，表1）：AFLNetQUEUE相对AFLNetCODE平均branch coverage提升仅+0.01%，但state coverage平均提升+35.67×；例如OpenSSH的state数从93.5提升到30480.9（+325.00×，Â12=1.00），DNSmasq从282.5到27364.0（+95.85×，Â12=1.00），Bftpd从170.5到334.0（+0.96×，Â12=1.00）。结论：额外state反馈极大扩展状态空间探索，但对代码覆盖提升整体不显著。RQ3（seed-selection策略，表2/3）：交织策略AFLNet在综合表现上最好；其相对AFLNetQUEUE平均branch coverage为-0.52%但相对AFLNetIPSM为+1.65%，同时state coverage相对AFLNetQUEUE为+5.77%、相对AFLNetIPSM为+12.77%。结论：交织式seed-selection总体上最稳健，且state覆盖与code覆盖不呈简单正相关。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：网络协议实现是stateful reactive systems，输入是message sequence而非单文件；AFL类CGF缺乏状态与序列结构认知，SBF又依赖手工模型且不做进化保种，导致难以深入状态空间并发现实现偏差/隐藏转移与漏洞。
+
+贡献点：(1) 系统化阐释AFLNet：把消息序列纳入灰盒进化框架，并将state coverage与code coverage共同作为反馈信号。(2) 在线轻量IPSM学习与引导：通过response code等构造implemented protocol state machine，并用#fuzz/#selected/#paths等统计驱动state选择与导向。(3) 面向目标state的序列分割与变异策略（M1/M2/M3）：既保证可达性又允许在关键段落聚焦变异。(4) 统一bitmap编码分支与状态转移覆盖：在AFL bitmap中为state transition预留空间，使“interesting”判定同时考虑state与code。(5) 五年后大规模评估与配置指导：在ProFuzzBench上做24h×10次的大样本实验，分离评估state反馈、state+code协同、以及seed-selection策略，并用Â12报告效应量。
+
+主要创新之处：以“状态反馈+在线状态机学习”把灰盒fuzzing从无状态程序扩展到协议实现的状态空间探索，并给出可复现的消融式实证结论与最佳实践建议。
--- a/Fuzzing/AFLNet
+++ b/Fuzzing/AFLNet
--- a/fuzzing/Intro.txt
+++ b/fuzzing/Intro.txt
@@ -0,0 +1,17 @@
+协议状态机/模糊测试/学习输入生成（支撑你“生成有效 request-response 交互序列”）
+
+对 Modbus TCP 来说，“有效”不仅是字段合法，还包括：
+request 与 response 配对
+Transaction ID 一致/递增策略合理
+功能码与地址范围一致（如 0x03 对 holding register 区间）
+异常响应的触发条件合理
+这些强约束往往在 fuzzing / protocol testing 文献里讨论得更系统。
+
+Pham et al. AFLNet: A Greybox Fuzzer for Network Protocols. ICST 2019.
+用途：面向网络协议的状态覆盖 fuzzing；你可以借鉴其“状态反馈”思想，把扩散生成器和协议栈反馈（有效率/覆盖率）结合起来做强化。
+
+She et al. NEUZZ: Efficient Fuzzing with Neural Networks. IEEE S&P 2019.
+用途：神经网络引导 fuzzing 的代表作；可作为你未来“生成模型 + 反馈优化/引导采样”的相关工作支撑。
+
+Godefroid, Peleg, Singh. Learn&Fuzz: Machine Learning for Input Fuzzing. ASE 2017.
+用途：学习输入格式再生成；与你“语义级生成 + 确定性组装器”的理念一致（模型学语义，规则负责封包细节）。
--- a/Fuzzing/Learn&Fuzz
+++ b/Fuzzing/Learn&Fuzz
@@ -0,0 +1,9 @@
+@misc{godefroid2017learnfuzzmachinelearninginput,
+      title={Learn&Fuzz: Machine Learning for Input Fuzzing}, 
+      author={Patrice Godefroid and Hila Peleg and Rishabh Singh},
+      year={2017},
+      eprint={1701.07232},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/1701.07232}, 
+}
--- a/Fuzzing/Learn&Fuzz
+++ b/Fuzzing/Learn&Fuzz
@@ -0,0 +1,47 @@
+# Learn&Fuzz Machine Learning for Input Fuzzing
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文聚焦grammar-based fuzzing的关键瓶颈：需要手工编写input grammar，过程费时且易错；而复杂结构化格式（如PDF）又最依赖该类fuzzing。研究目的在于用neural-network-based statistical learning自动生成可用于输入fuzzing的grammar/生成模型，并解决learn&fuzz张力（learning偏向生成well-formed inputs，fuzzing需要破坏结构以覆盖error-handling与意外路径）。方法上对约63,000个non-binary PDF objects进行无监督训练，采用seq2seq RNN（LSTM）学习字符级概率分布，并提出NoSample/Sample/SampleSpace三种采样生成策略及基于概率分布引导“where to fuzz”的SampleFuzz算法。实验以Microsoft Edge PDF parser为目标，用instruction coverage、pass rate与AppVerifier监测bugs评估：SampleSpace在50 epochs达97% pass rate；整体覆盖最佳为Sample-40e；在learn+fuzz组合中SampleFuzz以567,634条指令覆盖、68.24% pass rate取得最高覆盖并优于多种random fuzzing基线，且在更长实验中发现并修复一个stack-overflow bug。结论：统计生成模型可自动学习输入结构，并用概率信息更智能地施加fuzz以提升覆盖。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+Abstract.Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabili-ties in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sam-ple inputs and neural-network-based statistical machine-learning tech-niques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft’s new Edge browser. We discuss (and measure) the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed in-puts, while fuzzing wants to break that structure in order to cover unex-pected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability distribution to intelligently guide where to fuzz inputs.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Patrice Godefroid, Hila Peleg, Rishabh Singh
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+arXiv:1701.07232v1
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+论文要解决的核心问题是：如何在无需人工编写格式规范的前提下，从sample inputs自动学习出“足够像grammar”的生成式输入模型，用于grammar-based fuzzing复杂结构化输入（以PDF为代表），并进一步在“生成尽量well-formed以深入解析流程”和“刻意引入ill-formed片段以触达异常/错误处理代码”之间取得可控平衡。传统黑盒/白盒fuzz对复杂文本结构格式不如grammar-based有效，但后者依赖手工grammar；已有grammar/automata学习方法对PDF对象这种“相对扁平但token/键值组合极多”的格式并不理想。本文用seq2seq RNN学习字符序列的概率分布作为统计grammar，并利用该分布在高置信位置定点“反向扰动”以实现learn&fuzz。简洁概述：用神经网络从样本自动学输入结构，并用学到的概率分布指导更有效的结构化fuzz。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) seq2seq RNN统计输入建模：把PDF object当作字符序列，训练encoder-decoder（LSTM）学习p(x_t|x_<t)并作为生成式语言模型。 (2) 训练数据构造（shift-by-one无监督）：将拼接后的字符流分块成长度d的输入序列，输出为右移1位序列，实现无监督next-char学习。 (3) NoSample生成：每步取argmax字符，最大化well-formed但多样性极低。 (4) Sample生成：每步按分布采样字符，提升多样性但更易产生ill-formed片段。 (5) SampleSpace生成：仅在whitespace边界采样、token内部用argmax，折中多样性与语法一致性。 (6) SampleFuzz算法：生成时先采样得到字符c及其概率p(c)，若模型高置信(p(c)>p_t)且掷币触发(p_fuzz>t_fuzz)，则用分布中最低概率字符替换（argmin），在“最不该出错的位置”注入异常以诱导解析器走入错误处理/意外路径。 (7) PDF对象嵌入整文件的host-append机制：将新对象按PDF增量更新规则附加到well-formed host（更新xref与trailer）以便对Edge PDF parser进行端到端测试。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+(1) PDF训练语料：从534个PDF文件中抽取约63,000个non-binary PDF objects；这534个PDF由Windows fuzzing team提供，且是对更大PDF集合做seed minimization后的结果；更大集合来源包括公开Web与历史fuzz用PDF。（论文未给该数据集专有名称）(2) 目标程序/基准：Microsoft Edge browser内嵌的Edge PDF parser（通过Windows团队提供的单进程test-driver执行）。(3) Host PDF集合：从上述534个PDF中选取最小的3个作为host1/host2/host3（约26Kb/33Kb/16Kb）用于将生成对象附加成完整PDF。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) Coverage（instruction coverage）：统计执行过的唯一指令集合（dll-name, dll-offset标识），集合并集衡量一组测试的覆盖范围，是fuzzing有效性的核心指标。 (2) Pass rate：通过grep解析日志中是否有parsing-error来判定pass/fail，pass表示被解析器视为well-formed；主要用来估计学习质量与“结构保持程度”。 (3) Bugs：在AppVerifier监控下捕获内存破坏类缺陷（如buffer overflow、异常递归导致的stack overflow等），衡量真实漏洞发现能力。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+基线覆盖（host与baseline）：三份host单独覆盖约353,327（host1）到457,464（host2）条唯一指令，三者并集host123为494,652；将1,000个真实对象附加到host后，baseline123覆盖为553,873，且所有host自身pass rate为100%。学习质量（pass rate）：Sample在10 epochs时pass rate已>70%；SampleSpace整体更高，50 epochs最高达97% pass rate。覆盖表现（学习不加fuzz）：不同host对覆盖影响明显；总体覆盖最佳为Sample-40e（host123场景下胜出），且Sample-40e的覆盖集合几乎是其他集合的超集（相对SampleSpace-40e仅缺1,680条指令）。学习+fuzz对比（30,000个PDF/组，图8）：SampleFuzz覆盖567,634、pass rate 68.24%为最高覆盖；次优Sample+Random覆盖566,964、pass rate 41.81%；Sample-10K覆盖565,590、pass rate 78.92%；baseline+Random覆盖564,195、pass rate 44.05%；SampleSpace+Random覆盖563,930、pass rate 36.97%。结论：存在coverage与pass rate张力，随机fuzz提升覆盖但显著降低通过率；SampleFuzz在约65%–70%通过率附近取得更佳折中并带来最高覆盖。漏洞：常规实验未发现bug（目标已被长期fuzz）；但更长实验（Sample+Random，100,000对象/300,000 PDF，约5天）发现并修复一个stack-overflow bug。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：grammar-based fuzzing对复杂结构化输入最有效，但手工编写input grammar“劳累/耗时/易错”，限制了在真实大型解析器（如浏览器PDF解析）上的应用；同时学习生成“规范输入”与fuzzing“破坏结构找漏洞”目标冲突，需要可控融合。
+
+贡献点：(1) 首次将neural-network-based statistical learning（seq2seq RNN/LSTM）用于从样本自动学习可生成的输入模型，以自动化grammar生成用于fuzzing。 (2) 针对PDF这种超复杂格式，明确限定范围为non-binary PDF objects，并给出端到端工程方案（把生成对象按PDF增量更新规则附加到host形成完整PDF）以真实驱动Edge PDF parser评测。 (3) 系统分析并量化learn&fuzz张力：用pass rate刻画学习质量、用instruction coverage刻画fuzz有效性，展示两者此消彼长。 (4) 提出SampleFuzz：利用learned input probability distribution在模型高置信位置用低概率字符替换，实现“智能选择where to fuzz”，在覆盖与通过率之间取得更优折中并获得最高覆盖。
+
+主要创新：把“概率语言模型”的不确定性/置信度直接转化为fuzzing决策信号（高置信处注入反常），从而在保持足够结构可深入解析的同时，更系统地触达错误处理与意外路径。
--- a/Fuzzing/Learn&Fuzz
+++ b/Fuzzing/Learn&Fuzz
--- a/Smoothing/NEUZZ
+++ b/Smoothing/NEUZZ
@@ -0,0 +1,9 @@
+@misc{she2019neuzzefficientfuzzingneural,
+      title={NEUZZ: Efficient Fuzzing with Neural Program Smoothing}, 
+      author={Dongdong She and Kexin Pei and Dave Epstein and Junfeng Yang and Baishakhi Ray and Suman Jana},
+      year={2019},
+      eprint={1807.05620},
+      archivePrefix={arXiv},
+      primaryClass={cs.CR},
+      url={https://arxiv.org/abs/1807.05620}, 
+}
--- a/Smoothing/NEUZZ
+++ b/Smoothing/NEUZZ
@@ -0,0 +1,60 @@
+# NEUZZ Efficient Fuzzing with Neural Program Smoothing
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文指出传统graybox fuzzing多依赖evolutionary guidance，易陷入随机变异的低效序列，难以触发深层分支与hard-to-trigger bugs；而直接用gradient-guided optimization又会被真实程序的discontinuities、plateaus与ridges卡住。研究目的在于通过program smoothing构造可微的surrogate function，使梯度方法可用于覆盖导向fuzzing。方法上，NEUZZ用feed-forward NN学习输入到edge bitmap（分支/边覆盖）的平滑近似，采用binary cross-entropy训练并做label降维（合并总是共现的边）；再基于∇x f_i(θ,x)选取top-k高梯度字节并按梯度符号生成变异，同时通过coverage-based filtration进行incremental learning以避免遗忘。主要结果：在10个真实程序上，NEUZZ在24小时内相对10个SOTA fuzzers达到最高edge coverage（最高达3×，文中亦报告对AFL最高约10×）；发现31个此前未知bug（含2个CVE），并在LAVA-M与DARPA CGC上分别找到更多注入/真实漏洞。结论是：神经网络程序平滑+梯度引导变异能显著提升fuzz效率与覆盖，并具备良好可扩展性。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+Abstract—Fuzzing has become the de facto standard technique for  finding  software  vulnerabilities.  However,  even  state-of-the-art   fuzzers   are   not   very   efficient   at   finding   hard-to-trigger software  bugs.  Most  popular  fuzzers  use  evolutionary  guidance to   generate   inputs   that   can   trigger   different   bugs.   Such evolutionary  algorithms,  while  fast  and  simple  to  implement, often   get   stuck   in   fruitless   sequences   of   random   mutations. Gradient-guided  optimization  presents  a  promising  alternative to evolutionary guidance. Gradient-guided techniques have been shown  to  significantly  outperform  evolutionary  algorithms  at solving  high-dimensional  structured  optimization  problems  in domains  like  machine  learning  by  efficiently  utilizing  gradients or  higher-order  derivatives  of  the  underlying  function. However,gradient-guidedapproachesarenotdirectly applicable  to  fuzzing  as  real-world  program  behaviors  contain many  discontinuities,  plateaus,  and  ridges  where  the  gradient-based  methods  often  get  stuck.  We  observe  that  this  problem can   be   addressed   by   creating   a   smooth   surrogate   function approximating the target program’s discrete branching behavior. In this paper, we propose a novel program smoothing technique using  surrogate  neural  network  models  that  can  incrementally learn smooth approximations of a complex, real-world program’s branching  behaviors.  We  further  demonstrate  that  such  neural network   models   can   be   used   together   with   gradient-guided input  generation  schemes  to  significantly  increase  the  efficiency of  the  fuzzing  process. OurextensiveevaluationsdemonstratethatNEUZZ significantly  outperforms  10  state-of-the-art  graybox  fuzzers  on 10  popular  real-world  programs  both  at  finding  new  bugs  and achieving  higher  edge  coverage.  NEUZZ  found  31  previously unknown  bugs  (including  two  CVEs)  that  other  fuzzers  failed to  find  in  10  real-world  programs  and  achieved  3X  more  edge coverage  than  all  of  the  tested  graybox  fuzzers  over  24  hour runs.  Furthermore,  NEUZZ  also  outperformed  existing  fuzzers on  both  LAVA-M  and  DARPA  CGC  bug  datasets.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+arXiv:1807.05620v4
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是：覆盖导向fuzzing本质是优化问题（最大化new edge coverage/bugs），但真实程序的分支行为对输入是高度离散且不光滑的目标函数，导致两类主流方法各有瓶颈：evolutionary algorithms无法利用梯度结构而易低效停滞；gradient-guided optimization虽高效，但在程序的discontinuities、plateaus、ridges处梯度不可用/不可靠而“卡住”。NEUZZ要解决的就是“如何在不引入符号执行等高开销白盒平滑的前提下，把程序分支行为变成可微、可求梯度的近似函数，并让梯度真正能指导变异去触达未覆盖边与隐藏漏洞”。简洁概述：用可微的神经网络代理模型平滑程序分支，使梯度引导变异在真实程序上可用且更高效。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) Neural program smoothing：训练feed-forward surrogate NN，把输入字节序列映射为edge bitmap的“平滑近似”，从而可微、可求梯度。
+ (2) Edge-label降维预处理：仅保留训练集中出现过的边，并合并“总是共现”的边以缓解multicollinearity，减少输出维度（约从65,536降到~4,000）。
+ (3) Gradient-guided mutation（Algorithm 1）：对选定的输出边神经元计算∇x f_i(θ,x)，选top-k高梯度字节作为critical bytes，按梯度符号对字节做增/减并clip到[0,255]生成定向变异。
+ (4) Exponentially grow mutation target：从少量字节开始，逐轮扩大要变异的字节数，以覆盖更大输入空间同时保持单次搜索有效性。
+ (5) Incremental learning + coverage-based filtration：把新发现（触发新边）的输入加入，并用“只保留能带来新覆盖的旧数据摘要”控制数据规模，迭代重训以提高代理模型精度并避免灾难性遗忘。
+ (6) Magic-check辅助（LAVA/CGC场景）：用定制LLVM pass插桩magic byte checks；用NN梯度先定位关键字节，再对相邻字节做局部穷举（4×256）以高效触发多字节条件。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+(1) 10个真实世界程序集：binutils-2.30（readelf -a, nm -C, objdump -D, size, strip）、harfbuzz-1.7.6、libjpeg-9c、mupdf-1.12.0、libxml2-2.9.7、zlib-1.2.11（论文表IIb列出；来源为对应开源项目/版本）。
+ (2) LAVA-M bug dataset：LAVA项目的子集（base64、md5sum、uniq、who，含注入的magic-number触发漏洞；来源引用[28] LAVA）。
+ (3) DARPA CGC dataset：Cyber Grand Challenge二进制/服务程序数据集（论文从中随机选50个binary评测；来源引用[26] CGC repository）。
+ (4) 训练数据来源：先运行AFL-2.52b 1小时生成初始seed corpus与边覆盖标签，用于训练NN（平均每个程序约2K训练输入）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) Bugs found / crashes：统计发现的真实漏洞与崩溃数量，用于衡量漏洞挖掘能力；内存类问题通过AddressSanitizer的stack trace去重，整数溢出通过人工分析+UBSan验证。
+ (2) Edge coverage（new control-flow edges）：以AFL的edge coverage report统计“新增边数”，作为覆盖导向fuzzing的核心效果指标。
+ (3) 运行时间预算下的覆盖增长曲线：比较24h（真实程序）、5h（LAVA-M）、6h（CGC）内覆盖随时间变化，体现“到达新边的速度”。
+ (4) 训练开销/时间（NEUZZ train(s), training time sec）：衡量学习组件的成本（如与RNN fuzzer对比时报告训练时间差异）。
+ (5) 固定变异预算下的覆盖（如1M mutations）：在控制变异次数时比较不同方法/模型的有效性，排除训练时长差异干扰。
+ (6) NN预测精度（test accuracy约95%平均）：用于说明代理模型对分支行为预测质量（间接影响梯度指导有效性）。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+真实程序（24h）：NEUZZ在10个程序上均取得最高edge coverage（表VI示例：readelf -a 4,942；harfbuzz 6,081；nm -C 2,056；libxml 1,596；mupdf 487；zlib 376等），并在多程序上呈现“1小时内新增>1,000边”的领先速度；文中总结对AFL在9/10程序上分别约6×、1.5×、9×、1.8×、3.7×、1.9×、10×、1.3×、3×的边覆盖优势，并称相对次优fuzzer可达约4.2×、1.3×、7×、1.2×、2.5×等提升。真实漏洞（表III）：在6个fuzzer对比中NEUZZ总计发现60个bug（AFL 29，AFLFast 27，VUzzer 14，KleeFL 26，AFL-laf-intel 60?——表中按项目/类型给出，且NEUZZ覆盖5类bug并额外拿到2个CVE：CVE-2018-19931/19932）。LAVA-M（5h）：NEUZZ在base64找到48、md5sum 60、uniq 29、who 1,582（表IV），整体优于Angora等对比项。CGC（6h，50 binaries）：NEUZZ触发31个有漏洞binary，AFL为21、Driller为25，且NEUZZ覆盖了AFL/Driller找到的全部并额外多6个（表V）。与RNN fuzzer对比（1M mutations，表VII）：NEUZZ在readelf/libjpeg/libxml/mupdf上分别获得约8.4×/4.2×/6.7×/3.7×更多边覆盖，且训练开销约低20×。模型消融（表VIII）：线性模型显著落后；增量学习进一步提升（如readelf -a：1,723→1,800→2,020）。结论：神经平滑+梯度定向变异在覆盖与找bug上均显著优于多种SOTA，且训练/执行开销可控、可扩展到大程序。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：evolutionary fuzzing在深层逻辑与稀疏漏洞上效率低；梯度优化在高维结构化问题上更强，但直接用于程序会被分支离散性导致的不可微/不连续行为阻断；既有program smoothing依赖符号执行/抽象解释，开销大且不可扩展。
+
+贡献点：(1) 提出“program smoothing对gradient-guided fuzzing至关重要”的核心观点，并把fuzzing形式化为优化问题，明确为何需要平滑代理。 (2) 设计首个可扩展的surrogate neural network program smoothing：用feed-forward NN学习输入→edge coverage bitmap的平滑近似，并通过标签降维解决训练可行性。 (3) 提出面向覆盖的gradient-guided mutation策略：利用∇x f_i(θ,x)定位critical bytes与变异方向，系统生成高价值变异而非均匀随机。 (4) 提出coverage-based filtration的incremental learning流程，持续用新覆盖数据纠正代理模型且避免灾难性遗忘。 (5) 实现NEUZZ并在真实程序、LAVA-M、CGC上进行大规模对比，证明在bug数量与edge coverage上显著超越10个SOTA fuzzers。
+
+主要创新之处：用“可微代理模型”把离散分支行为平滑为可求梯度的函数，再把梯度直接转化为变异位置/方向的决策信号，从而以低开销获得比符号/污点等重分析更强的探索能力。
--- a/Smoothing/NEUZZ
+++ b/Smoothing/NEUZZ
--- a/spatiotemporal/Autoregressive
+++ b/spatiotemporal/Autoregressive
@@ -0,0 +1,9 @@
+@misc{rasul2021autoregressivedenoisingdiffusionmodels,
+      title={Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting}, 
+      author={Kashif Rasul and Calvin Seward and Ingmar Schuster and Roland Vollgraf},
+      year={2021},
+      eprint={2101.12072},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2101.12072}, 
+}
--- a/spatiotemporal/Autoregressive
+++ b/spatiotemporal/Autoregressive
@@ -0,0 +1,44 @@
+# Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文面向高维多变量概率时间序列预测，指出现有方法常依赖可处理的分布族（如多元高斯/低秩近似）、copula或normalizing flows，对真实分布的多模态/非连通结构表达受限，且高维协方差建模代价高。研究目的在于构建一个能在每个时间步学习灵活条件分布、并保持自回归预测优势的模型。方法上提出TimeGrad：用RNN（LSTM/GRU）编码历史与covariates得到隐藏状态h_{t−1}，并在每个时间步以条件denoising diffusion probabilistic model/score matching形式学习p_θ(x_t^0|h_{t−1})，训练通过变分下界等价的噪声预测损失||ε−ε_θ(·,h_{t−1},n)||^2，推断用annealed Langevin dynamics式的反向马尔可夫链从白噪声采样。实验在六个真实数据集（最高达2000维）上以CRPS与CRPS_sum评测，TimeGrad在除最小数据集外均达到新的SOTA，结论是“自回归+扩散/EBM”的组合可有效学习高维相关预测分布。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+In this work, we propose TimeGrad, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Kashif Rasul, Calvin Seward, Ingmar Schuster, Roland Vollgraf
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+文段未给出会议或期刊信息。
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+论文要解决的核心问题是：在多变量（维度D可达上千/上万）时间序列预测中，如何为每个未来时间步学习“高维、相关、可能多模态且非高斯”的条件预测分布q_X(x_t^0|x_{1:t-1}^0,c_{1:T})，同时避免传统多元高斯协方差带来的O(D^2)参数和O(D^3)计算、以及normalizing flows/VAEs在非连通模式上可能出现的“连接模式间的虚假密度/映射困难”。TimeGrad通过在自回归框架下把每个时间步的发射分布替换为可用Langevin采样的扩散/score-based EBM，从而以更少的结构限制拟合复杂分布。简洁概述：用“RNN条件 + 扩散去噪得分模型”替代传统输出分布，做高维相关概率预测。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) TimeGrad总体框架：把多变量预测分解为自回归因子Π_{t=t0}^T p_θ(x_t^0|h_{t−1})，其中h_{t−1}由RNN对历史与协变量编码。(2) 条件扩散发射头（conditional denoising diffusion model）：对每个时间步t，把目标向量x_t^0视为训练样本，使用固定前向扩散q(x_n|x_{n−1})加噪，学习反向p_θ(x_{n−1}|x_n,h_{t−1})去噪。(3) 噪声ε预测的训练目标：用Ho et al. (2020)的ε-parameterization，优化E[||ε−ε_θ(√\barα_n x_t^0+√(1−\barα_n)ε, h_{t−1}, n)||^2]（选择Σ_θ=~β_n）以学习score/梯度。(4) 推断与采样：给定h_{t−1}，从x_t^N~N(0,I)出发，按n=N…1进行反向更新（含噪声项z），得到x_t^0；再把采样到的x_t^0回喂RNN滚动生成多步轨迹。(5) 维度缩放（Scaling）：按context window的均值对各维归一化（均值为0则用1），预测后再缩放回原尺度以处理不同维度量纲差异。(6) 协变量建模（Covariates）：对类别特征用embedding，组合时间相关（hour/day等）、时间不变特征与lag features，作为RNN输入的一部分。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+使用六个公开数据集，并声明“preprocessed exactly as in (Salinas et al., 2019a)”：Exchange（Lai et al., 2018）；Solar（Lai et al., 2018）；Electricity（UCI ElectricityLoadDiagrams20112014：https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014）；Traffic（UCI PEMS-SF：https://archive.ics.uci.edu/ml/datasets/PEMS-SF）；Taxi（NYC TLC Trip Record Data：https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page）；Wikipedia（GluonTS仓库提供：链接指向https://github.com/mbohlkeschneider/gluon-ts/tree/mv_release/datasets）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) CRPS（Continuous Ranked Probability Score）：对每个维度的预测CDF F与观测x的兼容性评分，CRPS越小代表预测分布越接近真实分布，是proper scoring rule。(2) CRPS_sum：先对D维求和得到标量序列，再对其预测分布\hat F_sum计算CRPS并在预测区间上平均，用于衡量模型对“总量/聚合量”不确定性的刻画能力，并且同样是proper scoring function。论文强调选择CRPS类指标是因为部分对比方法不提供解析likelihood或likelihood不可比。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+论文在六个数据集上用CRPS_sum对比多类基线（VAR/VAR-Lasso/GARCH/VES、KVAE、Vec-LSTM系列、GP-Copula/GP-scaling、Transformer-MAF），TimeGrad在除最小数据集外均为最优：Exchange上TimeGrad 0.006±0.001（最优为VES 0.005±0.000，TimeGrad略逊）；Solar上0.287±0.02（优于Transformer-MAF 0.301±0.014等）；Electricity上0.0206±0.001（与Transformer-MAF 0.0207±0.000接近且更好）；Traffic上0.044±0.006（优于Transformer-MAF 0.056±0.001）；Taxi上0.114±0.02（优于Transformer-MAF 0.179±0.002等）；Wikipedia上0.0485±0.002（优于Transformer-MAF 0.063±0.003）。消融显示扩散步数N可降到约10而性能无明显损失，最优大约在N≈100，继续增大无收益。结论：TimeGrad作为“自回归扩散去噪模型”在高维相关概率预测上达到SOTA，并具备对多尺度/多模态分布更友好的建模特性。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：(1) 多变量概率预测需要刻画不确定性与跨维相关性，但多元高斯/低秩/二阶相关限制明显且计算昂贵；(2) normalizing flows/VAEs在高维、非连通模式分布上可能产生虚假密度或映射困难；(3) EBMs/score-based方法表达力强但训练与时间序列条件化结合不足。
+贡献点与创新：(1) 提出TimeGrad：将每个时间步的条件发射分布建模为RNN条件化的denoising diffusion probabilistic model，通过ε-预测学习梯度/score，并用Langevin式反向链采样，从而在自回归预测中实现高维灵活分布建模；(2) 给出训练与推断算法（逐时间步训练目标、滚动采样轨迹），并加入适配真实多维量纲差异的scale normalization与covariate embedding；(3) 在六个真实基准（最高2000维）上用CRPS与CRPS_sum系统对比，取得除最小数据集外的SOTA；(4) 分析扩散长度N的影响，指出N≈10已可用、N≈100最优，为实践中的采样-质量权衡提供依据。
--- a/spatiotemporal/Autoregressive
+++ b/spatiotemporal/Autoregressive
--- a/spatiotemporal/CSDI
+++ b/spatiotemporal/CSDI
@@ -0,0 +1,9 @@
+@misc{tashiro2021csdiconditionalscorebaseddiffusion,
+      title={CSDI Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation}, 
+      author={Yusuke Tashiro and Jiaming Song and Yang Song and Stefano Ermon},
+      year={2021},
+      eprint={2107.03502},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={httpsarxiv.orgabs2107.03502}, 
+}
--- a/spatiotemporal/CSDI
+++ b/spatiotemporal/CSDI
@@ -0,0 +1,44 @@
+# CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
+
+
+ 
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文研究多变量时间序列缺失值插补（probabilistic time series imputation），指出传统以autoregressive为主的插补方法虽常用，但score-based diffusion models在图像/音频生成上已显著优于同类方法，且现有“用无条件扩散模型近似后验score”的插补做法会对观测值加噪、并非精确条件分布。研究目的为直接学习q(x0^ta|x0^co)的条件分布并提升插补质量。方法提出CSDI：基于DDPM参数化，将反向过程扩展为pθ(x^ta_{t-1}|x^ta_t,x0^co)，令去噪网络εθ显式条件化于观测（含conditional mask），并用masked language modeling式自监督训练把观测拆为conditional observations与imputation targets（含random/historical/mix/testpattern策略）；实现上采用2D attention（时间Transformer+特征Transformer）捕获时序与跨特征依赖，并加入时间/特征嵌入。结果：在PhysioNet 2012与北京空气质量数据上，CSDI将CRPS相对现有概率插补方法提升40–65%，确定性插补MAE降低5–20%，并在插值与概率预测上具竞争力。结论是“为插补显式训练的条件扩散模型”显著优于后验近似式扩散插补与RNN基线。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Yusuke Tashiro, Jiaming Song, Yang Song, Stefano Ermon
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+35th Conference on Neural Information Processing Systems (NeurIPS 2021)
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+论文核心要解决：在多变量时间序列X∈R^{K×L}存在缺失（mask M）时，如何学习缺失部分的条件分布q(x0^ta|x0^co)并进行概率插补（输出分布/采样），同时充分利用观测值在时间维与特征维的相关性。现有score-based/扩散插补通常用“无条件扩散模型”近似条件反向过程：在反向采样中对观测x0^co也加噪并拼接进入x_t，导致观测信息被噪声破坏且不对应精确的条件分布，从而限制插补质量。CSDI通过“显式条件反向过程 + 为插补专门训练的εθ(x^ta_t,t|x0^co)”来避免该近似，并用自监督策略在训练时构造(conditional, target)对。简洁概述：用显式条件扩散模型直接建模“给定观测的缺失分布”，替代对观测加噪的后验近似式扩散插补。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) CSDI条件扩散插补框架：把DDPM反向链改为pθ(x^ta_{t-1}|x^ta_t,x0^co)，在每步去噪时显式输入观测条件以恢复缺失值分布。(2) 条件DDPM参数化：沿用DDPM的μ_DDPM/σ_DDPM形式，仅将εθ扩展为条件网络εθ(x^ta_t,t|x0^co)（并在实现里加m_co）。(3) 自监督训练（masked language modeling式）：把“观测值”划分为x0^co与x0^ta（目标），对x0^ta加噪得到x^ta_t，优化||ε−εθ(x^ta_t,t|x0^co)||^2，无需真实缺失的ground truth。(4) 目标选择策略（target choice）：Random（随机比例选观测为目标）、Historical（用训练集中其他样本缺失模式决定目标）、Mix（随机+历史混合）、Testpattern（已知测试缺失模式时直接用）。(5) 时间序列实现的输入对齐：将x^ta_t与x0^co零填充到固定K×L形状，配合conditional mask m_co指示条件位置与输出mask。(6) 2D attention去噪网络：在残差层中同时做temporal Transformer（沿L捕获时间依赖）与feature Transformer（沿K捕获特征依赖）。(7) Side information：时间戳s的128维time embedding + 特征ID的embedding，用于增强条件建模。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+插补/插值数据集：(1) PhysioNet Challenge 2012 healthcare数据集：4000个ICU临床时间序列，35变量，48小时（文中引用Silva et al., 2012）。(2) Air quality（Beijing PM2.5）数据集：北京36个站点PM2.5小时数据（文中引用Yi et al., 2016），并说明数据覆盖2014/05/01–2015/04/30且含结构化缺失/人工ground truth。预测（forecasting）数据集（来自GluonTS并按Salinas et al. 2019预处理）：solar（137站点光伏发电记录）、electricity（UCI ElectricityLoadDiagrams20112014）、traffic（UCI PEMS-SF）、taxi（NYC TLC trip records聚合的1214位置半小时序列）、wiki（2000维Wikipedia page views）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) CRPS（continuous ranked probability score）：评估概率插补/插值的预测分布与观测值的匹配程度（越小越好），用100个样本近似分布并按分位数损失积分近似计算；论文用于概率插补与预测（也报告逐任务CRPS）。(2) 归一化平均CRPS：对所有目标(k,l)的CRPS求和后除以∑|x_{k,l}|，用于跨特征尺度的可比性（论文Eq.18）。(3) MAE（mean absolute error）：确定性插补误差（越小越好），CSDI取100样本中位数作为确定性插补。(4) RMSE：确定性插补/插值误差补充指标（越小越好），在附录表9/10报告。(5) CRPS-sum：概率预测任务中“对K维求和后的分布”的CRPS（越小越好），衡量联合效应/聚合量不确定性（论文Eq.19）。(6) MSE：概率预测任务的点误差补充指标（越小越好，表12）。(7) NLL（negative log likelihood）：论文做了对比分析（表8）说明其对噪声schedule敏感且与样本质量/CRPS不一定相关，因此不作为主指标。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+概率插补（CRPS，越低越好，表2）：在healthcare数据上，CSDI分别为10%/50%/90% missing：0.238(0.001)/0.330(0.002)/0.522(0.002)，优于Multitask GP（0.489/0.581/0.942）、GP-VAE（0.574/0.774/0.998）、V-RIN（0.808/0.831/0.922）；并优于“unconditional diffusion”插补（0.360/0.458/0.671）。在air quality上CSDI为0.108(0.001)，优于Multitask GP 0.301、GP-VAE 0.397、V-RIN 0.526、unconditional 0.135。确定性插补（MAE，表3）：healthcare为0.217/0.301/0.481（10/50/90% missing），优于BRITS 0.284/0.368/0.517、GLIMA（引用）10.54在air quality上，CSDI在air quality为9.60(0.04)（优于BRITS引用11.56、unconditional 12.13等），总体称提升5–20%。插值（不规则采样，CRPS，表4）：CSDI 0.380/0.418/0.556（10/50/90% missing）优于Latent ODE与mTANs。预测（CRPS-sum，表5）：CSDI在electricity与traffic上最好（0.017与0.020），在solar/taxi/wiki上与SOTA基线接近或略优（例如wiki 0.047 vs TimeGrad 0.049）。结论：显式条件扩散模型+自监督训练显著提升概率/确定性插补，并可迁移到插值与预测。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：(1) 时间序列缺失普遍存在且影响医疗/金融等应用；(2) 既有深度插补多为autoregressive，难以达到扩散模型在生成任务中的表现；(3) 现有扩散插补多用“无条件扩散近似条件后验score”，需要对观测加噪，信息受损且不对应精确条件分布。
+贡献点与创新：(1) 提出CSDI：为插补任务显式建模pθ(x0^ta|x0^co)的conditional score-based diffusion model，并在反向去噪中直接条件化于观测值以利用相关性。(2) 自监督训练机制：借鉴masked language modeling，把观测拆成条件与目标，设计random/historical/mix/testpattern等target choice策略，使得即使训练集缺少真实缺失ground truth也能训练。(3) 面向多变量时序的结构设计：提出2D attention（时间Transformer+特征Transformer）与side information（time embedding、feature embedding），强化跨时间/跨特征依赖建模。(4) 系统实证：在healthcare与air quality上CRPS提升40–65%，确定性MAE降低5–20%，并展示对插值与预测的可扩展性与竞争力，同时通过消融证明2D attention等组件关键。
--- a/spatiotemporal/CSDI
+++ b/spatiotemporal/CSDI
--- a/spatiotemporal/Denoising
+++ b/spatiotemporal/Denoising
@@ -0,0 +1,11 @@
+@inproceedings{NEURIPS2020_4c5bcfec,
+ author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
+ booktitle = {Advances in Neural Information Processing Systems},
+ editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
+ pages = {6840--6851},
+ publisher = {Curran Associates, Inc.},
+ title = {Denoising Diffusion Probabilistic Models},
+ url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
+ volume = {33},
+ year = {2020}
+}
--- a/spatiotemporal/Denoising
+++ b/spatiotemporal/Denoising
@@ -0,0 +1,44 @@
+# **Denoising Diffusion Probabilistic Models**
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文研究扩散概率模型（diffusion probabilistic models, diffusion models）在高质量图像生成上的可行性。背景问题是：此前扩散模型虽易定义与训练，但缺少生成高质量样本的证明。研究目的在于改进扩散模型的训练与参数化，并展示其样本质量。方法上，构建固定前向扩散过程q(x_t|x_{t-1})逐步加高斯噪声，并学习反向过程p_θ(x_{t-1}|x_t)的高斯转移；提出与denoising score matching和Langevin dynamics的显式联系，采用预测噪声ε的参数化，并使用简化的加权变分目标L_simple训练；模型用U-Net+自注意力与时间嵌入。结果：在无条件CIFAR10上达到Inception Score 9.46、FID 3.17；在256×256 LSUN上样本质量接近ProgressiveGAN。结论指出扩散模型能产生高质量样本，且其采样可解释为progressive decoding/渐进式有损解码的泛化。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Jonathan Ho, Ajay Jain, Pieter Abbeel
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+34th Conference on Neural Information Processing Systems (NeurIPS 2020)
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是：扩散概率模型作为“通过变分推断训练的马尔可夫链”能否在图像上达到与GAN/自回归模型相当的高样本质量，以及如何设计反向过程参数化与训练目标来实现这一点。具体难点包括：反向过程p_θ(x_{t-1}|x_t)如何学习“去噪逆扩散”，训练时如何用可计算的变分下界分解（L_T、L_{1:T-1}、L_0）稳定优化，采样链如何与score-based/annealed Langevin dynamics统一理解，并解释其为何样本质量高但log likelihood（bits/dim）不占优。简洁概述：把扩散模型从“理论上合理、效果未证实”变成“能稳定训练并生成SOTA质量图像”的生成框架，并给出与score matching/Langevin及渐进式解码的统一解释。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) 扩散模型的前向/反向马尔可夫链建模：固定q(x_t|x_{t-1})逐步加高斯噪声（β_t schedule），学习高斯反向转移p_θ(x_{t-1}|x_t)。(2) 反向均值的ε-预测参数化：用网络ε_θ(x_t,t)预测噪声ε，并由此构造μ_θ(x_t,t)=1/√α_t·(x_t−β_t/√(1−\barα_t)·ε_θ(x_t,t))，使采样形态类似Langevin dynamics。(3) 与denoising score matching的等价联系：推导L_{t-1}在ε-参数化下变为多噪声等级的去噪平方误差（与score matching形式一致）。(4) 简化训练目标L_simple：用未加权的E[||ε−ε_θ(√\barα_t x_0+√(1−\barα_t)ε,t)||^2]训练（t均匀采样），作为“加权变分下界”以提升样本质量并简化实现。(5) 固定方差Σ_θ(x_t,t)=σ_t^2 I：不学习反向方差，实验发现更稳定且FID更好。(6) Progressive lossy decompression / progressive decoding解释：把采样过程解释为逐步恢复信息的渐进式解码，并与自回归解码作类比。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+(1) CIFAR10：无条件生成评测（论文实验主基准；通过TensorFlow Datasets加载）。(2) LSUN 256×256：Bedroom、Church、Cat类别（数据准备使用StyleGAN代码；FID计算亦使用StyleGAN2代码）。(3) CelebA-HQ 256×256：用于展示高分辨率人脸样本与插值（通过TensorFlow Datasets加载）。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) Inception Score (IS)：衡量生成样本的可辨识性与多样性，越高越好（主要用于CIFAR10）。(2) Fréchet Inception Distance (FID)：衡量生成分布与真实分布在Inception特征空间的距离，越低越好（CIFAR10/LSUN等）。(3) Negative Log Likelihood (NLL) / bits per dimension：以离散数据的lossless codelength度量似然质量，越低越好（用于对比likelihood-based模型）。(4) Rate–Distortion分析：将L_{1:T}视为rate、L_0视为distortion，画rate(bits/dim)与distortion(RMSE)随时间/码率变化，用于解释“高感知质量但似然一般”的原因。(5) Distortion：用RMSE（在[0,255]像素尺度）度量重建误差/失真。 (6) Progressive sampling quality over time：在反向步数(T−t)过程中跟踪IS与FID，观察由粗到细生成。 (7) Train–test NLL gap：训练/测试bits/dim差异用于检查是否过拟合（论文报告gap≤0.03 bits/dim）。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+在无条件CIFAR10上，论文最佳模型（L_simple）达到IS=9.46±0.11、FID=3.17，并给出NLL Test≤3.75 bits/dim（Train 3.72）。与参数化/目标消融相比，ε-预测在L_simple下显著优于μ̃-预测：μ̃预测在真变分界L+固定各向同性Σ时FID=13.22，而ε-预测+L_simple将FID降至3.17。高分辨率上，LSUN 256×256生成质量与ProgressiveGAN相当：Bedroom FID可达4.90（large模型），Church FID=7.89，Cat FID=19.75。论文同时指出：尽管样本质量强，扩散模型的log likelihood不如其他likelihood-based模型；通过rate-distortion显示大量bits用于“人眼不可感知细节”，支持其“优秀有损压缩/渐进式解码”解释。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：扩散模型训练高效、定义简单，但此前缺少能生成高质量样本的实证；同时score matching/能量模型/Langevin采样等方向虽能生成图像，但缺少统一、可计算似然与“直接训练采样器”的框架。
+贡献点：(1) 证明扩散模型可达高样本质量：在CIFAR10取得SOTA级FID=3.17，并在LSUN 256×256上达到接近ProgressiveGAN的质量。(2) 提出关键理论联系：建立扩散模型与多噪声级别的denoising score matching、以及与annealed Langevin dynamics采样的显式等价关系，并将其视为主要贡献之一。(3) ε-预测反向过程参数化：用预测噪声ε_θ替代直接预测μ̃_t，使训练目标化简并使采样形式更接近Langevin动态。(4) 简化的加权变分目标L_simple：作为对标准变分下界的重加权，降低小t去噪项权重，显著提升样本质量且实现更简洁。(5) 渐进式有损解码解释与自回归泛化：把反向扩散看作progressive decoding，并给出与自回归“比特顺序/掩码顺序”的广义对应，同时用rate-distortion与逐步生成可视化支撑这一解释。
--- a/spatiotemporal/Denoising
+++ b/spatiotemporal/Denoising
--- a/spatiotemporal/DiffSTG
+++ b/spatiotemporal/DiffSTG
@@ -0,0 +1,9 @@
+@misc{wen2024diffstgprobabilisticspatiotemporalgraph,
+      title={DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models}, 
+      author={Haomin Wen and Youfang Lin and Yutong Xia and Huaiyu Wan and Qingsong Wen and Roger Zimmermann and Yuxuan Liang},
+      year={2024},
+      eprint={2301.13629},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2301.13629}, 
+}
--- a/spatiotemporal/DiffSTG
+++ b/spatiotemporal/DiffSTG
@@ -0,0 +1,45 @@
+# DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models
+
+
+
+**第一个问题**：请对论文的内容进行摘要总结，包含研究背景与问题、研究目的、方法、主要结果和结论，字数要求在150-300字之间，使用论文中的术语和概念。
+
+论文针对spatio-temporal graph(STG) forecasting中STGNN难以刻画intrinsic uncertainties的问题，研究probabilistic STG forecasting在不确定性建模与复杂ST dependencies上的挑战。目标是将denoising diffusion probabilistic models(DDPM)推广到STGs，提出非自回归框架DiffSTG，并设计首个面向STG的去噪网络UGnet。方法上构建conditional diffusion：以历史图信号与图结构为条件，在masked formulation中把[x^h,x^p]统一为x_all并用x_all^msk条件化反向过程；训练采用噪声预测损失||ε−ε_θ(x_all^n,n|x_all^msk,G)||^2；UGnet以Unet式时间尺度聚合+TCN建模时间依赖+GCN建模空间相关；并用子步采样与末k步复用加速采样。结果在PEMS08、AIR-BJ、AIR-GZ上CRPS降低4%–14%、RMSE降低2%–7%，且推理较TimeGrad显著加速。结论表明DiffSTG兼顾ST建模能力与不确定性量化，适用于决策场景。
+
+**第二个问题**：请提取论文的摘要原文，摘要一般在Abstract之后，Introduction之前。
+
+Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal graph (STG) forecast-ing. Despite their success, they fail to model intrinsic uncertainties within STG data, which cripples their practicality in downstream tasks for decision-making. To this end, this paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex ST dependencies. In this study, we present the first attempt to generalize the popular de-noising diffusion probabilistic models to STGs, leading to a novel non-autoregressive framework called DiffSTG, along with the first denoising network UGnet for STG in the framework. Our approach combines the spatio-temporal learning capabilities of STGNNs with the uncertainty measurements of diffusion models. Extensive ex-periments validate that DiffSTG reduces the Continuous Ranked Probability Score (CRPS) by 4%-14%, and Root Mean Squared Er-ror (RMSE) by 2%-7% over existing methods on three real-world datasets. The code is in https://github.com/wenhaomin/DiffSTG.
+
+**第三个问题**：请列出论文的全部作者，按照此格式：`作者1, 作者2, 作者3`。
+
+Haomin Wen, Youfang Lin, Yutong Xia, Huaiyu Wan, Qingsong Wen, Roger Zimmermann, Yuxuan Liang
+
+**第四个问题**：请直接告诉我这篇论文发表在哪个会议或期刊，请不要推理或提供额外信息。
+
+未在提供的论文内容中给出具体会议或期刊名称。
+
+**第五个问题**：请详细描述这篇论文主要解决的核心问题，并用简洁的语言概述。
+
+核心问题是：在STG forecasting中同时做到(1)捕获跨节点的spatial correlations与跨时间的temporal dependencies（复杂ST dependencies），以及(2)对未来多步预测给出可用的概率分布与uncertainty quantification（而非仅点预测），并且(3)避免现有扩散式时间序列方法（如TimeGrad）在多步预测时的自回归/逐步生成带来的推理低效。论文指出：时间序列扩散模型往往只在单变量或单节点上建模时间依赖，缺乏显式图结构条件；同时TimeGrad需要对每个未来步重复运行扩散链，导致S×T_p×N级别的高开销。DiffSTG通过“图条件+非自回归多步一次生成+ST专用去噪网络”来解决。简洁概述：把扩散概率模型做成“能利用图结构、一次生成多步、还能输出不确定性”的STG概率预测器。
+
+**第六个问题**：请告诉我这篇论文提出了哪些方法，请用最简洁的方式概括每个方法的核心思路。
+
+(1) DiffSTG非自回归概率预测框架：用一次反向扩散过程直接生成多预测步x^p（而不是逐步/逐未来步运行）。(2) Conditional Diffusion for STG：在反向过程p_θ(x_{n-1}|x_n, x^h, G)中显式条件化历史与图结构以学习p(x^p|x^h,G)。(3) Generalized Masked Conditional Diffusion：将[x^h,x^p]拼成x_all，并用mask得到x_all^msk作为条件，统一“重建历史+预测未来”的训练目标。(4) UGnet去噪网络：时间维Unet结构捕获multi-scale temporal dependencies，块内用TCN建模时间依赖、用GCN建模空间相关，并注入noise level embedding。 (5) Sampling Acceleration：只采样子序列{τ_1…τ_M}减少步数M，并在末k步把近似样本复用以减少需要的反向扩散链次数（S→S/k）。
+
+**第七个问题**：请告诉我这篇论文所使用的数据集，包括数据集的名称和来源。
+
+(1) PEMS08：交通流量数据集，来自Caltrans Performance Measurement System (PeMS)；论文使用STSGCN提取/整理的版本，并依据实际路网构建邻接关系。 (2) AIR-BJ：北京34个监测站PM2.5小时数据（2019/01/01–2019/12/31），来源引用Yi et al.相关工作[46]；用站点距离构建空间相关矩阵A。 (3) AIR-GZ：广州41个监测站PM2.5小时数据（2017/01/01–2017/12/31），同样来源[46]并用距离建图。
+
+**第八个问题**：请列举这篇论文评估方法的所有指标，并简要说明这些指标的作用。
+
+(1) CRPS（Continuous Ranked Probability Score）：评估预测分布F与观测x的兼容性，衡量概率预测质量，越小越好（论文的主要指标）。(2) MAE：衡量点预测误差的平均绝对偏差；论文对概率模型用多次采样的均值（或聚合）给出确定性结果后计算，越小越好。(3) RMSE：衡量点预测误差的均方根，对大误差更敏感，越小越好。(4) Inference Time / Time cost：比较扩散式方法的推理耗时与加速效果（表3给出不同样本数S下秒级耗时）。
+
+**第九个问题**：请总结这篇论文实验的表现，包含具体的数值表现和实验结论。
+
+在三数据集上的概率方法对比（表2，越小越好）：AIR-BJ上DiffSTG=MAE 17.88、RMSE 29.60、CRPS 0.34（优于TimeGrad 0.36、DeepAR 0.37等）；AIR-GZ上DiffSTG=MAE 10.95、RMSE 16.66、CRPS 0.22（优于DeepAR 0.23、TimeGrad/MC Dropout 0.25）；PEMS08上DiffSTG=MAE 17.68、RMSE 27.13、CRPS 0.06（优于DeepAR/MC Dropout 0.07、TimeGrad 0.09）。相对“各数据集最强基线”，CRPS下降5.6%/4.3%/14.3%，RMSE下降约7.1%/2.4%/7.6%（表2的Error reduction行）。推理效率（表3，AIR-GZ，N=100）：S=8时TimeGrad 9.58s，DiffSTG(M=100,k=1) 0.24s（约40×加速）；进一步加速DiffSTG(M=40,k=2)为0.07s，且随S增大仍显著快于TimeGrad与CSDI。结论：DiffSTG在概率质量（CRPS）与确定性误差（MAE/RMSE）上均领先，并通过非自回归与采样子步显著提升推理速度。
+
+**第十个问题**：请清晰地描述论文所作的工作，分别列举出动机和贡献点以及主要创新之处。
+
+动机：(1) 现有STGNN多为deterministic，难以提供uncertainty用于决策；(2) 扩散式概率时间序列模型（TimeGrad/CSDI等）缺少图空间依赖建模，且自回归式多步生成效率低。
+贡献点：(1) 首次将DDPM系统推广到STG概率预测，提出非自回归DiffSTG以建模p(x^p|x^h,G)并量化不确定性。(2) 提出首个面向STG的扩散去噪网络UGnet：时间维Unet捕获多尺度时间模式，结合TCN+GCN显式学习ST dependencies。(3) 提出masked generalized conditional diffusion（x_all与x_all^msk同空间）以统一“重建历史+预测未来”，提升条件利用与任务泛化（预测/生成/插值）。(4) 提出采样加速策略（子步采样+末k步复用）并实证带来显著推理加速且性能保持。
+主要创新之处：ST专用去噪结构UGnet、非自回归多步扩散预测、以及面向STG任务的masked条件化扩散建模方式。
--- a/spatiotemporal/DiffSTG
+++ b/spatiotemporal/DiffSTG
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
DaZuo0122	ce2cabf505	Update gitignore to exclude pdfs	2026-04-21 11:10:05 +08:00
DaZuo0122	d85fc5df08	Refactor: delete border of fig design v4	2026-04-21 10:38:21 +08:00
Markyan04	5f4458f77b	New pdf	2026-04-21 00:20:06 +08:00
Markyan04	012f34ac45	newest pdf	2026-04-21 00:13:04 +08:00
Markyan04	6f908a493b	新版pdf	2026-04-21 00:08:30 +08:00
Markyan04	9a4c42772a	Rename subsection and rebuild PDF Shorten the subsection title in the document TOC from "Joint optimization and end-to-end sampling" to "Joint optimization and sampling". Regenerated build artifacts (main.aux, main.log and main.pdf) to reflect the change; log timestamp and output size updated as part of the rebuild.	2026-04-20 23:46:53 +08:00
DaZuo0122	be54120218	Refactor: deleted misleading end-to-end description	2026-04-20 23:41:43 +08:00
DaZuo0122	6b5dafe778	Refactor: deleted misleading end-to-end description	2026-04-20 23:39:03 +08:00
Markyan04	6e406bf6f8	Fix arXiv ref and the complie problem	2026-04-20 18:06:41 +08:00
MZ YANG	98a67a25a7	Tighten benchmark section for page limit	2026-04-20 16:44:33 +08:00
MZ YANG	8af3a60542	Add LNCS template sync and updated paper assets	2026-04-20 16:07:59 +08:00
DaZuo0122	6adcbfb0c0	Update authors info	2026-04-19 10:53:18 +08:00
MZ YANG	d2e1ba5f36	Clean manuscript references and terminology	2026-04-18 22:29:51 +08:00
MZ YANG	b67e7ffb0a	Align benchmark evidence text and figure	2026-04-18 22:13:51 +08:00
Markyan04	4a6dcb77a5	Refine paper text, add figure captions/labels Clarify and reflow several paragraphs in arxiv-style/main.tex: reference Figure~\ref{fig:design} when describing the staged generator; rename residual token from RRR to \bm{R}; replace unnumbered \caption* with numbered \caption and add labels for the type taxonomy, benchmark story, and ablation figures (fig:type_taxonomy, fig:benchmark_story, fig:ablation_impact); add a reference to Table~\ref{tab:core_metrics} and a brief commented note before the benchmark paragraph. These edits improve cross-referencing, readability, and figure numbering. Also update texput.log timestamp (texput.log).	2026-04-17 17:40:59 +08:00
Markyan04	025d0c2632	Fix: quotation mark style	2026-04-17 17:11:25 +08:00
DaZuo0122	750351f1b4	Fix: quotation mark style at line 94	2026-04-18 00:51:06 +08:00
DaZuo0122	1e167188f6	Add: caption for fig:design	2026-04-18 00:45:07 +08:00
DaZuo0122	6790f2ad40	Add: caption for fig:design	2026-04-18 00:42:27 +08:00
MZ YANG	269187438b	Update methodology figures for submission build	2026-04-16 20:13:43 +08:00
DaZuo0122	3f5be77a7b	Merge branch 'esorics' of https://gitea.markyan04.cn/ModuFlow/internal-docs into esorics	2026-04-16 16:56:07 +08:00
DaZuo0122	b1b0c83944	Refactor: fig-design-v4 removed title	2026-04-16 16:54:56 +08:00
Markyan04	d8a805c162	Consolidate Future Work into Conclusion Remove the separate "Future Work" section and replace the conclusion header with a combined "Conclusion and Future Work" section. Condense and rephrase the prior verbose future-work discussion into a shorter concluding paragraph that preserves the paper's contributions and reported metrics, and adds explicit future directions: strengthening control-theoretic foundations (structured state-space/causal priors, formal supervisory transitions), expanding to additional ICS/automotive protocols (Modbus/TCP, DNP3, IEC 104, OPC UA, CAN/CAN FD, automotive Ethernet), and enabling controllable attack/violation injection for reproducible adversarial benchmarks. No functional changes to figures or metrics; bibliography remains unchanged.	2026-04-14 15:20:58 +08:00
Markyan04	22be1aba74	Polish paper text, add refs and remove template Edit arxiv-style/main.tex for clarity and wording (abstract and methodology), remove stray template comments and minor copy edits; append several bibliography entries to arxiv-style/references.bib; delete the unused arxiv-style/template.tex file; add texput.log (LaTeX compilation output). Primarily editorial and bibliography updates plus cleanup of the template artifact.	2026-04-14 14:41:51 +08:00
MZ YANG	d51e31e53f	Use draw.io type taxonomy figure in methodology	2026-04-10 21:06:16 +08:00
DaZuo0122	043d264125	Changed methodology pic to v4	2026-04-10 17:54:46 +08:00
MZ YANG	fb6c0ee368	Refine type taxonomy figure and benchmark layout	2026-04-10 17:40:52 +08:00
MZ YANG	6f6a4b6a20	Update benchmark and type-aware figures	2026-03-27 23:37:11 +08:00
DaZuo0122	0ba59c131c	Add: source code of drawio, and remove the corresponding gitignore	2026-03-24 15:46:08 +08:00
DaZuo0122	566e251743	Add: python scripts for figure generation	2026-02-09 00:24:40 +08:00
DaZuo0122	0a74d67f19	Add: ignore rules for drawio	2026-02-08 17:26:38 +08:00
DaZuo0122	096af1bcb6	Fix: Author Block	2026-02-06 15:51:07 +08:00
DaZuo0122	c146138443	Fix: reference issue, from 50 to 46	2026-02-06 15:24:43 +08:00
DaZuo0122	0b3e79841b	Fix: pictures misplacement, remove the lower is better column from the table	2026-02-06 15:11:17 +08:00
DaZuo0122	67466f8e3d	Add: IEEE style double column - exceed 6 page limit and graph lispalced	2026-02-06 14:55:53 +08:00
DaZuo0122	250428b176	Add: README.md for latex compiling	2026-02-05 17:06:27 +08:00
Hongyu Yan	6f1e7a9994	Update main.tex	2026-02-04 23:02:58 +08:00
Hongyu Yan	b3280dcc19	Update main.tex	2026-02-04 22:48:59 +08:00
Hongyu Yan	de445963b5	Update main.tex	2026-02-04 22:45:41 +08:00
Hongyu Yan	5ede1a11f1	The first draft is completed.	2026-02-04 22:37:35 +08:00
Hongyu Yan	21053b4f13	Future Work Part Complete	2026-02-04 22:21:42 +08:00
Hongyu Yan	a7c8250d1a	Reapply "Update references.bib" This reverts commit `03640302db`.	2026-02-04 22:18:35 +08:00
Hongyu Yan	03640302db	Revert "Update references.bib" This reverts commit `3a9836d15d`.	2026-02-04 22:16:52 +08:00
Hongyu Yan	3a9836d15d	Update references.bib Solve the ref problem of HAI dataset	2026-02-04 22:16:18 +08:00
DaZuo0122	1e3eb39dea	Add: benchmark, bibtex shows error in line 377, should be reference for other section than benchmark/method	2026-02-04 22:13:46 +08:00
DaZuo0122	9f8af2c67c	Add: reference for benchmark	2026-02-04 21:46:53 +08:00
Hongyu Yan	272e159df1	Intro and Related Work Completed - The reference of HAI dataset still have problems.	2026-02-04 19:39:36 +08:00
Hongyu Yan	81625b5c4e	Update .gitignore	2026-02-04 18:19:30 +08:00
DaZuo0122	4815d05127	Change: citations are now presented in numbers	2026-02-04 17:14:03 +08:00
DaZuo0122	5fbfd1068f	Add: figure for benchmark section	2026-02-04 17:08:45 +08:00
DaZuo0122	dc5db4f3ec	Fix: equation misplaced error	2026-02-04 17:06:57 +08:00
DaZuo0122	87ba7a5516	Finish: Methodology section - have error at \label{eq:masking_process} misplaced.	2026-02-04 16:48:46 +08:00
DaZuo0122	6ae0f24af3	Add: first 3 paragraphs to methodology section	2026-02-04 13:07:59 +08:00
DaZuo0122	c95fca657b	Add: Dependencies for equations	2026-02-04 12:47:34 +08:00
DaZuo0122	704277f011	Remove: all custom \DeclareMathOperator and \newcommand	2026-02-04 12:38:55 +08:00
manbo	25bc355717	Update .gitignore	2026-02-04 11:23:01 +08:00
DaZuo0122	89997f1125	Add: equations of methodology in a separate file	2026-02-04 10:48:24 +08:00
Hongyu Yan	4ec1b0b7a8	Related Work Completed	2026-02-03 23:51:44 +08:00
Hongyu Yan	327fc36bf7	main tex file update	2026-02-03 23:38:31 +08:00
Hongyu Yan	72b653f35d	Ignore file	2026-02-03 23:01:29 +08:00
manbo	0e0f951e12	Upload files to "arxiv-style"	2026-02-02 21:54:05 +08:00
manbo	ace1130179	Add: Methodology section 12 references	2026-02-01 11:46:16 +08:00
manbo	0613edcf61	Add arxiv style template	2026-02-01 11:44:24 +08:00
manbo	1ee85b97bc	Add more paragraphs/citations to smooth the logic flow	2026-01-31 21:53:52 +08:00
manbo	2c1e211504	Version 2, slightly differ from online docs v2(use that) online docs v2: https://my.feishu.cn/wiki/Za4dwCsG6iPD9qklRLWcoJOZnnb?edition_id=dXsOZT	2026-01-31 21:50:02 +08:00
manbo	735ca8ab51	Update knowledges/draft-incomplete-methodology.md	2026-01-30 21:05:56 +08:00
manbo	b88a9d39da	Add knowledges/draft-incomplete-methodology.md	2026-01-30 17:43:12 +08:00
Your Name	b3bbef795d	Add detailed mask-ddpm documentation	2026-01-30 00:51:08 +08:00
Hongyu Yan	16a3085d14	Network Traffic Generation	2026-01-30 00:05:35 +08:00
Hongyu Yan	d5c8ace183	Generation of process time series on ICS to produce regular network data packets Papers about topic 7 Generation of process time series on ICS to produce regular network data packets	2026-01-29 00:48:42 +08:00
Hongyu Yan	e94d4aa91f	modbus generator related paper	2026-01-28 20:45:46 +08:00
Hongyu Yan	1cbfc6d53d	新增提取所有md文件的脚本所有md文件都会被提取到/papers/md文件夹下	2026-01-26 18:22:48 +08:00
Hongyu Yan	1b86bc7927	论文精读完工	2026-01-26 15:55:39 +08:00
Hongyu Yan	dac3a2e545	Topic1 Completed	2026-01-26 15:43:12 +08:00
Hongyu Yan	7cfdeb7503	Topic2 Completed	2026-01-26 15:28:08 +08:00
Hongyu Yan	284b2dfe8f	Topic3 Completed	2026-01-26 15:18:20 +08:00
Hongyu Yan	3219222204	新增论文概要	2026-01-26 00:18:00 +08:00
Hongyu Yan	4af2d73cc8	Feishu Docs link	2026-01-24 14:19:54 +08:00
Hongyu Yan	4ef0266a71	Topic 4 & 5: 论文精读补全所有U-开头的论文都还没有完成精读	2026-01-23 14:12:24 +08:00
Hongyu Yan	2208d85d80	Reference Paper	2026-01-22 16:08:52 +08:00
manbo	86852cbbbe	Update README.md	2026-01-09 15:59:03 +08:00
manbo	079a138c2a	Add notes4coding/checker_design.md	2026-01-09 15:49:44 +08:00
manbo	0730864c95	Add API/whatsneeded.md	2026-01-09 15:47:38 +08:00
MingzheYang	112e5ae85e	更新 knowledges/Intro-Modbus-Diffusion-Tech-Route.md	2026-01-07 19:24:15 +08:00
MingzheYang	9511e6271f	更新 knowledges/Intro-Modbus-Diffusion-Tech-Route.md	2026-01-07 14:53:05 +08:00
MingzheYang	baf511a81d	Update TeX fomula Change the TeX formula to the correctly rendered state	2026-01-07 14:52:09 +08:00
Markyan04	1728a88f4e	Intro-Modbus-Diffusion-Tech-Route.md An Intro for how to use diffusion on modbus protocol, and how to use/train it.	2026-01-07 00:14:45 +08:00
				`@@ -0,0 +1 @@`
				`https://my.feishu.cn/wiki/Za4dwCsG6iPD9qklRLWcoJOZnnb?from=from_copylink`