final

Improve layout and spacing; add enumitem
Update main.tex to tighten internal spacing and list formatting: add \usepackage{enumitem}, set compact float spacing (\textfloatsep, \floatsep, \intextsep, \abovecaptionskip, \belowcaptionskip), and configure \setlist for denser lists; enable \raggedbottom. Adjust figure include to 0.995\textwidth with trim+clip to avoid overfull boxes. Make small editorial tweaks ("time-step" hyphenation and minor rephrasing in the Conclusion). Recompiled artifacts (main.aux, main.log, main.pdf) were also updated.
2026-04-21 17:02:33 +08:00 · 2026-04-21 14:40:48 +08:00 · 2026-04-21 14:15:26 +08:00 · 2026-04-21 14:11:51 +08:00 · 2026-04-21 11:10:05 +08:00 · 2026-04-21 10:38:21 +08:00
46 changed files with 10084 additions and 285 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,5 +5,5 @@ arxiv-style/*.log
 arxiv-style/*.blg
 arxiv-style/*.bbl
 arxiv-style/*.out
 fig/
 .DS_Store
 paper/
--- a/LaTeX2e+Proceedings+Templates+download/fig-benchmark-ablations-v1.png
+++ b/LaTeX2e+Proceedings+Templates+download/fig-benchmark-ablations-v1.png
--- a/LaTeX2e+Proceedings+Templates+download/fig-benchmark-story-v2.png
+++ b/LaTeX2e+Proceedings+Templates+download/fig-benchmark-story-v2.png
--- a/LaTeX2e+Proceedings+Templates+download/fig-design-v4-from-user-svg-cropped.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/fig-design-v4-from-user-svg-cropped.pdf
--- a/LaTeX2e+Proceedings+Templates+download/fig1.eps
+++ b/LaTeX2e+Proceedings+Templates+download/fig1.eps
--- a/LaTeX2e+Proceedings+Templates+download/history.txt
+++ b/LaTeX2e+Proceedings+Templates+download/history.txt
@@ -0,0 +1,157 @@
 Version history for the LLNCS LaTeX2e class
 date     filename      version   action/reason/acknowledgements
 ----------------------------------------------------------------------------
 29.5.96  letter.txt      beta    naming problems (subject index file)
                                  thanks to Dr. Martin Held, Salzburg, AT
          subjindx.ind            renamed to subjidx.ind as required
                                  by llncs.dem
          history.txt             introducing this file
 30.5.96  llncs.cls               incompatibility with new article.cls of
                                  1995/12/20 v1.3q Standard LaTeX document class,
                                  \if@openbib is no longer defined,
                                  reported by Ralf Heckmann and Graham Gough
                                  solution by David Carlisle
 10.6.96  llncs.cls               problems with fragile commands in \author field
                                  reported by Michael Gschwind, TU Wien
 25.7.96  llncs.cls               revision a corrects:
                                  wrong size of text area, floats not \small,
                                  some LaTeX generated texts
                                  reported by Michael Sperber, Uni Tuebingen
 16.4.97  all files        2.1    leaving beta state,
                                  raising version counter to 2.1
  8.6.97  llncs.cls        2.1a   revision a corrects:
                                  unbreakable citation lists, reported by
                                  Sergio Antoy of Portland State University
 11.12.97  llncs.cls        2.2    "general" headings centered; two new elements
                                  for the article header: \email and \homedir;
                                  complete revision of special environments:
                                  \newtheorem replaced with \spnewtheorem,
                                  introduced the theopargself environment;
                                  two column parts made with multicol package;
                                  add ons to work with the hyperref package
 07.01.98  llncs.cls        2.2    changed \email to simply switch to \tt
 25.03.98  llncs.cls        2.3    new class option "oribibl" to suppress
                                  changes to the thebibliograpy environment
                                  and retain pure LaTeX codes - useful
                                  for most BibTeX applications
 16.04.98  llncs.cls        2.3    if option "oribibl" is given, extend the
                                  thebibliograpy hook with "\small", suggested
                                  by Clemens Ballarin, University of Cambridge
 20.11.98  llncs.cls        2.4    pagestyle "titlepage" - useful for
                                  compilation of whole LNCS volumes
 12.01.99  llncs.cls        2.5    counters of orthogonal numbered special
                                  environments are reset each new contribution
 27.04.99  llncs.cls        2.6    new command \thisbottomragged for the
                                  actual page; indention of the footnote
                                  made variable with \fnindent (default 1em);
                                  new command \url that copys its argument
 2.03.00  llncs.cls        2.7    \figurename and \tablename made compatible
                                  to babel, suggested by Jo Hereth, TU Darmstadt;
                                  definition of \url moved \AtBeginDocument
                                  (allows for url package of Donald Arseneau),
                                  suggested by Manfred Hauswirth, TU of Vienna;
                                  \large for part entries in the TOC
 16.04.00  llncs.cls        2.8    new option "orivec" to preserve the original
                                  vector definition, read "arrow" accent
 17.01.01  llncs.cls        2.9    hardwired texts made polyglot,
                                  available languages: english (default),
                                  french, german - all are "babel-proof"
 20.06.01  splncs.bst              public release of a BibTeX style for LNCS,
                                  nobly provided by Jason Noble
 14.08.01  llncs.cls        2.10   TOC: authors flushleft,
                                  entries without hyphenation; suggested
                                  by Wiro Niessen, Imaging Center - Utrecht
 23.01.02  llncs.cls        2.11   fixed footnote number confusion with
                                  \thanks, numbered institutes, and normal
                                  footnote entries; error reported by
                                  Saverio Cittadini, Istituto Tecnico
                                  Industriale "Tito Sarrocchi" - Siena
 28.01.02  llncs.cls        2.12   fixed footnote fix; error reported by
                                  Chris Mesterharm, CS Dept. Rutgers - NJ
 28.01.02  llncs.cls        2.13   fixed the fix (programmer needs vacation)
 17.08.04  llncs.cls        2.14   TOC: authors indented, smart \and handling
                                  for the TOC suggested by Thomas Gabel
                                  University of Osnabrueck
 07.03.06  splncs.bst              fix for BibTeX entries without year; patch
                                  provided by Jerry James, Utah State University
 14.06.06  splncs_srt.bst          a sorting BibTeX style for LNCS, feature
                                  provided by Tobias Heindel, FMI Uni-Stuttgart
 16.10.06  llncs.dem        2.3    removed affiliations from \tocauthor demo
 11.12.07  llncs.doc               note on online visibility of given e-mail address
 15.06.09  splncs03.bst            new BibTeX style compliant with the current
                                  requirements, provided by Maurizio "Titto"
                                  Patrignani of Universita' Roma Tre
 30.03.10  llncs.cls        2.15   fixed broken hyperref interoperability;
                                  patch provided by Sven Koehler,
                                  Hamburg University of Technology
 15.04.10  llncs.cls        2.16   fixed hyperref warning for informatory TOC entries;
                                  introduced \keywords command - finally;
                                  blank removed from \keywordname, flaw reported
                                  by Armin B. Wagner, IGW TU Vienna
 15.04.10  llncs.cls        2.17   fixed missing switch "openright" used by \backmatter;
                                  flaw reported by Tobias Pape, University of Potsdam
 27.09.13  llncs.cls        2.18   fixed "ngerman" incompatibility; solution provided
                                  by Bastian Pfleging, University of Stuttgart
 04.09.17  llncs.cls        2.19   introduced \orcidID command
 10.03.18  llncs.cls        2.20   adjusted \doi according to CrossRef requirements;
                                  TOC: removed affiliation numbers
          splncs04.bst            added doi field;
                                  bold journal numbers
          samplepaper.tex         new sample paper
          llncsdoc.pdf            new LaTeX class documentation
 12.01.22  llncs.cls        2.21   fixed German and French \maketitle, bug reported by
                                  Alexander Malkis, Technical University of Munich;
                                  use detokenized argument in the definition of \doi
                                  to allow underscores in DOIs
 05.09.22  llncs.cls        2.22   robust redefinition of \vec (bold italics), bug
                                  reported by Alexander Malkis, TUM
 02.11.23  llncs.cls        2.23   \ackname changed from "Acknowledgements" (BE) to
                                  "Acknowledgments" (AE).
                                  \discintname introduced for the new, mandatory
                                  section "Disclosure of Interests".
                                  New "credits" environment introduced to provide
                                  small run-in headings for "Acknowledgments" and
                                  the "Disclosure of Interests".
 29.01.22  llncs.cls        2.24   bugfixes for options envcountsame and envcountsect
--- a/LaTeX2e+Proceedings+Templates+download/llncs.cls
+++ b/LaTeX2e+Proceedings+Templates+download/llncs.cls
--- a/LaTeX2e+Proceedings+Templates+download/llncsdoc.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/llncsdoc.pdf
--- a/LaTeX2e+Proceedings+Templates+download/main.aux
+++ b/LaTeX2e+Proceedings+Templates+download/main.aux
@@ -0,0 +1,161 @@
 \relax 
 \citation{10.1007/s10844-022-00753-1,Nankya2023-gp}
 \@writefile{toc}{\contentsline {title}{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}{1}{}\protected@file@percent }
 \@writefile{toc}{\authcount {4}}
 \@writefile{toc}{\contentsline {author}{Zhenglan Chen \and Mingzhe Yang \and Hongyu Yan \and Huan Yang}{1}{}\protected@file@percent }
 \@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}{}\protected@file@percent }
 \newlabel{sec:intro}{{1}{1}{}{section.1}{}}
 \citation{shin}
 \citation{info16100910}
 \citation{pmlr-v202-kotelnikov23a,rasul2021autoregressivedenoisingdiffusionmodels}
 \citation{jiang2023netdiffusionnetworkdataaugmentation}
 \citation{pmlr-v202-kotelnikov23a}
 \citation{10.1145/1151659.1159928}
 \citation{Ring_2019}
 \citation{10.1145/3544216.3544251}
 \citation{Lin_2020}
 \citation{7469060,10.1145/3055366.3055375}
 \citation{ho2020denoising}
 \citation{song2021score}
 \citation{rasul2021autoregressivedenoisingdiffusionmodels}
 \citation{tashiro2021csdiconditionalscorebaseddiffusion}
 \citation{wen2024diffstgprobabilisticspatiotemporalgraph}
 \citation{liu2023pristiconditionaldiffusionframework}
 \citation{kong2021diffwaveversatilediffusionmodel}
 \citation{11087622}
 \@writefile{toc}{\contentsline {section}{\numberline {2}Related Work}{3}{}\protected@file@percent }
 \newlabel{sec:related}{{2}{3}{}{section.2}{}}
 \citation{austin2021structured}
 \citation{Lin_2020}
 \citation{hoogeboom2021argmaxflowsmultinomialdiffusion}
 \citation{li2022diffusionlmimprovescontrollabletext}
 \citation{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}
 \citation{dai2019transformerxlattentivelanguagemodels}
 \citation{zhou2021informerefficienttransformerlong}
 \citation{wu2022autoformerdecompositiontransformersautocorrelation}
 \citation{zhou2022fedformerfrequencyenhanceddecomposed}
 \citation{nie2023patchtst}
 \citation{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}
 \citation{nist2023sp80082}
 \citation{ho2020denoising,song2021score}
 \citation{kollovieh2023tsdiff,sikder2023transfusion}
 \@writefile{toc}{\contentsline {section}{\numberline {3}Methodology}{5}{}\protected@file@percent }
 \newlabel{sec:method}{{3}{5}{}{section.3}{}}
 \citation{vaswani2017attention}
 \citation{ho2020denoising,kollovieh2023tsdiff}
 \citation{austin2021structured,shi2024simplified}
 \citation{yuan2025ctu,sha2026ddpm}
 \citation{vaswani2017attention}
 \citation{vaswani2017attention,nist2023sp80082}
 \@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Masked-DDPM: Unified Synthesis for ICS traffic}}{6}{}\protected@file@percent }
 \newlabel{fig:design}{{1}{6}{}{figure.1}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {3.1}Transformer trend module for continuous dynamics}{6}{}\protected@file@percent }
 \newlabel{sec:method-trans}{{3.1}{6}{}{subsection.3.1}{}}
 \citation{kollovieh2023tsdiff,sikder2023transfusion}
 \citation{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}
 \citation{ho2020denoising}
 \citation{ho2020denoising,song2021score}
 \citation{kollovieh2023tsdiff,sikder2023transfusion}
 \newlabel{eq:additive_decomp}{{1}{7}{}{equation.1}{}}
 \newlabel{eq:trend_prediction}{{2}{7}{}{equation.2}{}}
 \newlabel{eq:trend_loss}{{3}{7}{}{equation.3}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {3.2}DDPM for continuous residual generation}{7}{}\protected@file@percent }
 \newlabel{sec:method-ddpm}{{3.2}{7}{}{subsection.3.2}{}}
 \citation{ho2020denoising,sikder2023transfusion}
 \citation{hang2023efficient}
 \citation{yuan2025ctu,sha2026ddpm}
 \citation{austin2021structured,shi2024simplified}
 \citation{nist2023sp80082}
 \citation{shi2024simplified}
 \newlabel{eq:forward_corruption}{{4}{8}{}{equation.4}{}}
 \newlabel{eq:forward_corruption_eq}{{5}{8}{}{equation.5}{}}
 \newlabel{eq:reverse_process}{{6}{8}{}{equation.6}{}}
 \newlabel{eq:ddpm_loss}{{7}{8}{}{equation.7}{}}
 \newlabel{eq:snr_loss}{{8}{8}{}{equation.8}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {3.3}Masked diffusion for discrete ICS variables}{8}{}\protected@file@percent }
 \newlabel{sec:method-discrete}{{3.3}{8}{}{subsection.3.3}{}}
 \citation{nist2023sp80082}
 \citation{shi2024simplified,yuan2025ctu}
 \citation{nist2023sp80082}
 \newlabel{eq:masking_process}{{9}{9}{}{equation.9}{}}
 \newlabel{eq:discrete_denoising}{{10}{9}{}{equation.10}{}}
 \newlabel{eq:discrete_loss}{{11}{9}{}{equation.11}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {3.4}Type-aware decomposition as factorization and routing layer}{9}{}\protected@file@percent }
 \newlabel{sec:method-types}{{3.4}{9}{}{subsection.3.4}{}}
 \citation{shi2025tabdiff,yuan2025ctu,nist2023sp80082}
 \citation{kollovieh2023tsdiff,sikder2023transfusion}
 \@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Type assignment and six-type taxonomy.}}{11}{}\protected@file@percent }
 \newlabel{fig:type_taxonomy}{{2}{11}{}{figure.2}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {3.5}Joint optimization and sampling}{11}{}\protected@file@percent }
 \newlabel{sec:method-joint}{{3.5}{11}{}{subsection.3.5}{}}
 \citation{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}
 \citation{coletta2023constrained,yang2001interlock,stenger2024survey}
 \citation{lin1991divergence,yoon2019timegan}
 \@writefile{toc}{\contentsline {section}{\numberline {4}Benchmark}{12}{}\protected@file@percent }
 \newlabel{sec:benchmark}{{4}{12}{}{section.4}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Core fidelity, legality, and reproducibility}{12}{}\protected@file@percent }
 \newlabel{sec:benchmark-quant}{{4.1}{12}{}{subsection.4.1}{}}
 \@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces Benchmark evidence chain.}}{13}{}\protected@file@percent }
 \newlabel{fig:benchmark_story}{{3}{13}{}{figure.3}{}}
 \@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Core benchmark summary. Lower is better except for validity rate.}}{13}{}\protected@file@percent }
 \newlabel{tab:core_metrics}{{1}{13}{}{table.1}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Type-aware diagnostics}{13}{}\protected@file@percent }
 \newlabel{sec:benchmark-typed}{{4.2}{13}{}{subsection.4.2}{}}
 \@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces Type-aware diagnostic summary. Lower values indicate better alignment.}}{14}{}\protected@file@percent }
 \newlabel{tab:typed_diagnostics}{{2}{14}{}{table.2}{}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {4.3}Ablation study}{14}{}\protected@file@percent }
 \newlabel{sec:benchmark-ablation}{{4.3}{14}{}{subsection.4.3}{}}
 \@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Ablation impact.}}{14}{}\protected@file@percent }
 \newlabel{fig:ablation_impact}{{4}{14}{}{figure.4}{}}
 \@writefile{lot}{\contentsline {table}{\numberline {3}{\ignorespaces Ablation study. Lower is better except for anomaly AUPRC.}}{15}{}\protected@file@percent }
 \newlabel{tab:ablation}{{3}{15}{}{table.3}{}}
 \@writefile{toc}{\contentsline {section}{\numberline {5}Conclusion and Future Work}{15}{}\protected@file@percent }
 \newlabel{sec:conclusion}{{5}{15}{}{section.5}{}}
 \bibstyle{splncs04}
 \bibdata{references}
 \bibcite{10.1145/3055366.3055375}{1}
 \bibcite{info16100910}{2}
 \bibcite{austin2021structured}{3}
 \bibcite{coletta2023constrained}{4}
 \bibcite{dai2019transformerxlattentivelanguagemodels}{5}
 \bibcite{godefroid2017learnfuzzmachinelearninginput}{6}
 \bibcite{hang2023efficient}{7}
 \bibcite{ho2020denoising}{8}
 \bibcite{hoogeboom2021argmaxflowsmultinomialdiffusion}{9}
 \bibcite{jiang2023netdiffusionnetworkdataaugmentation}{10}
 \bibcite{10.1007/s10844-022-00753-1}{11}
 \bibcite{kollovieh2023tsdiff}{12}
 \bibcite{kong2021diffwaveversatilediffusionmodel}{13}
 \bibcite{pmlr-v202-kotelnikov23a}{14}
 \bibcite{li2022diffusionlmimprovescontrollabletext}{15}
 \bibcite{lin1991divergence}{16}
 \bibcite{Lin_2020}{17}
 \bibcite{liu2023pristiconditionaldiffusionframework}{18}
 \bibcite{11087622}{19}
 \bibcite{7469060}{20}
 \bibcite{meng2025aflnetyearslatercoverageguided}{21}
 \bibcite{Nankya2023-gp}{22}
 \bibcite{nist2023sp80082}{23}
 \bibcite{nie2023patchtst}{24}
 \bibcite{rasul2021autoregressivedenoisingdiffusionmodels}{25}
 \bibcite{Ring_2019}{26}
 \bibcite{sha2026ddpm}{27}
 \bibcite{she2019neuzzefficientfuzzingneural}{28}
 \bibcite{shi2024simplified}{29}
 \bibcite{shi2025tabdiff}{30}
 \bibcite{shin}{31}
 \bibcite{sikder2023transfusion}{32}
 \bibcite{song2021score}{33}
 \bibcite{stenger2024survey}{34}
 \bibcite{tashiro2021csdiconditionalscorebaseddiffusion}{35}
 \bibcite{vaswani2017attention}{36}
 \bibcite{10.1145/1151659.1159928}{37}
 \bibcite{wen2024diffstgprobabilisticspatiotemporalgraph}{38}
 \bibcite{wu2022autoformerdecompositiontransformersautocorrelation}{39}
 \bibcite{yang2001interlock}{40}
 \bibcite{10.1145/3544216.3544251}{41}
 \bibcite{yoon2019timegan}{42}
 \bibcite{yuan2025ctu}{43}
 \bibcite{zhou2021informerefficienttransformerlong}{44}
 \bibcite{zhou2022fedformerfrequencyenhanceddecomposed}{45}
 \gdef \@abspage@last{19}
--- a/LaTeX2e+Proceedings+Templates+download/main.bbl
+++ b/LaTeX2e+Proceedings+Templates+download/main.bbl
@@ -0,0 +1,267 @@
 \begin{thebibliography}{10}
 \providecommand{\url}[1]{\texttt{#1}}
 \providecommand{\urlprefix}{URL }
 \providecommand{\doi}[1]{https://doi.org/#1}
 \bibitem{10.1145/3055366.3055375}
 Ahmed, C.M., Palleti, V.R., Mathur, A.P.: Wadi: a water distribution testbed
  for research in the design of secure cyber physical systems. In: Proceedings
  of the 3rd International Workshop on Cyber-Physical Systems for Smart Water
  Networks. p. 25–28. CySWATER '17, Association for Computing Machinery, New
  York, NY, USA (2017)
 \bibitem{info16100910}
 Ali, J., Ali, S., Al~Balushi, T., Nadir, Z.: Intrusion detection in industrial
  control systems using transfer learning guided by reinforcement learning.
  Information  \textbf{16}(10) (2025)
 \bibitem{austin2021structured}
 Austin, J., Johnson, D.D., Ho, J., Tarlow, D., van~den Berg, R.: Structured
  denoising diffusion models in discrete state-spaces. In: Ranzato, M.,
  Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
  Neural Information Processing Systems. vol.~34, pp. 17981--17993. Curran
  Associates, Inc. (2021)
 \bibitem{coletta2023constrained}
 Coletta, A., Gopalakrishnan, S., Borrajo, D., Vyetrenko, S.: On the constrained
  time-series generation problem. In: Oh, A., Naumann, T., Globerson, A.,
  Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information
  Processing Systems. vol.~36, pp. 61048--61059. Curran Associates, Inc. (2023)
 \bibitem{dai2019transformerxlattentivelanguagemodels}
 Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.:
  Transformer-{XL}: Attentive language models beyond a fixed-length context.
  In: Korhonen, A., Traum, D., M{\`a}rquez, L. (eds.) Proceedings of the 57th
  Annual Meeting of the Association for Computational Linguistics. pp.
  2978--2988. Association for Computational Linguistics, Florence, Italy (Jul
  2019)
 \bibitem{godefroid2017learnfuzzmachinelearninginput}
 Godefroid, P., Peleg, H., Singh, R.: Learn\&fuzz: Machine learning for input
  fuzzing. In: 2017 32nd IEEE/ACM International Conference on Automated
  Software Engineering (ASE). pp. 50--59 (2017)
 \bibitem{hang2023efficient}
 Hang, T., Gu, S., Li, C., Bao, J., Chen, D., Hu, H., Geng, X., Guo, B.:
  Efficient diffusion training via min-snr weighting strategy. In: Proceedings
  of the IEEE/CVF International Conference on Computer Vision (ICCV). pp.
  7441--7451 (October 2023)
 \bibitem{ho2020denoising}
 Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In:
  Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances
  in Neural Information Processing Systems. vol.~33, pp. 6840--6851. Curran
  Associates, Inc. (2020)
 \bibitem{hoogeboom2021argmaxflowsmultinomialdiffusion}
 Hoogeboom, E., Nielsen, D., Jaini, P., Forr\'{e}, P., Welling, M.: Argmax flows
  and multinomial diffusion: Learning categorical distributions. In: Ranzato,
  M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
  Neural Information Processing Systems. vol.~34, pp. 12454--12465. Curran
  Associates, Inc. (2021)
 \bibitem{jiang2023netdiffusionnetworkdataaugmentation}
 Jiang, X., Liu, S., Gember-Jacobson, A., Bhagoji, A.N., Schmitt, P., Bronzino,
  F., Feamster, N.: Netdiffusion: Network data augmentation through
  protocol-constrained traffic generation. Proc. ACM Meas. Anal. Comput. Syst.
  \textbf{8}(1) (Feb 2024)
 \bibitem{10.1007/s10844-022-00753-1}
 Koay, A.M., Ko, R.K.L., Hettema, H., Radke, K.: Machine learning in industrial
  control system (ics) security: current landscape, opportunities and
  challenges. Journal of Intelligent Information Systems  \textbf{60}(2),
  377--405 (2023)
 \bibitem{kollovieh2023tsdiff}
 Kollovieh, M., Ansari, A.F., Bohlke-Schneider, M., Zschiegner, J., Wang, H.,
  Wang, Y.B.: Predict, refine, synthesize: Self-guiding diffusion models for
  probabilistic time series forecasting. In: Oh, A., Naumann, T., Globerson,
  A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information
  Processing Systems. vol.~36, pp. 28341--28364. Curran Associates, Inc. (2023)
 \bibitem{kong2021diffwaveversatilediffusionmodel}
 Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: Diffwave: A versatile
  diffusion model for audio synthesis (2021)
 \bibitem{pmlr-v202-kotelnikov23a}
 Kotelnikov, A., Baranchuk, D., Rubachev, I., Babenko, A.: {T}ab{DDPM}:
  Modelling tabular data with diffusion models. In: Krause, A., Brunskill, E.,
  Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the
  40th International Conference on Machine Learning. Proceedings of Machine
  Learning Research, vol.~202, pp. 17564--17579. PMLR (23--29 Jul 2023)
 \bibitem{li2022diffusionlmimprovescontrollabletext}
 Li, X., Thickstun, J., Gulrajani, I., Liang, P.S., Hashimoto, T.B.:
  Diffusion-lm improves controllable text generation. In: Koyejo, S., Mohamed,
  S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural
  Information Processing Systems. vol.~35, pp. 4328--4343. Curran Associates,
  Inc. (2022)
 \bibitem{lin1991divergence}
 Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on
  Information Theory  \textbf{37}(1),  145--151 (1991)
 \bibitem{Lin_2020}
 Lin, Z., Jain, A., Wang, C., Fanti, G., Sekar, V.: Using gans for sharing
  networked time series data: Challenges, initial promise, and open questions.
  In: Proceedings of the ACM Internet Measurement Conference. p. 464–483. IMC
  '20, Association for Computing Machinery, New York, NY, USA (2020)
 \bibitem{liu2023pristiconditionaldiffusionframework}
 Liu, M., Huang, H., Feng, H., Sun, L., Du, B., Fu, Y.: Pristi: A conditional
  diffusion framework for spatiotemporal imputation. In: 2023 IEEE 39th
  International Conference on Data Engineering (ICDE). pp. 1927--1939 (2023)
 \bibitem{11087622}
 Liu, X., Xu, X., Liu, Z., Li, Z., Wu, K.: Spatio-temporal diffusion model for
  cellular traffic generation. IEEE Transactions on Mobile Computing
  \textbf{25}(1),  257--271 (2026)
 \bibitem{7469060}
 Mathur, A.P., Tippenhauer, N.O.: Swat: a water treatment testbed for research
  and training on ics security. In: 2016 International Workshop on
  Cyber-physical Systems for Smart Water Networks (CySWater). pp. 31--36 (2016)
 \bibitem{meng2025aflnetyearslatercoverageguided}
 Meng, R., Pham, V.T., Böhme, M., Roychoudhury, A.: Aflnet five years later: On
  coverage-guided protocol fuzzing. IEEE Transactions on Software Engineering
  \textbf{51}(4),  960--974 (2025)
 \bibitem{Nankya2023-gp}
 Nankya, M., Chataut, R., Akl, R.: Securing industrial control systems:
  Components, cyber threats, and machine learning-driven defense strategies.
  Sensors  \textbf{23}(21) (2023)
 \bibitem{nist2023sp80082}
 {National Institute of Standards and Technology}: Guide to operational
  technology (ot) security. Special Publication 800-82 Rev. 3, NIST (sep 2023)
 \bibitem{nie2023patchtst}
 Nie, Y., Nguyen, N.H., Sinthong, P., Kalagnanam, J.: A time series is worth 64
  words: Long-term forecasting with transformers. In: International Conference
  on Learning Representations (ICLR) (2023)
 \bibitem{rasul2021autoregressivedenoisingdiffusionmodels}
 Rasul, K., Seward, C., Schuster, I., Vollgraf, R.: Autoregressive denoising
  diffusion models for multivariate probabilistic time series forecasting. In:
  Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference
  on Machine Learning. Proceedings of Machine Learning Research, vol.~139, pp.
  8857--8868. PMLR (18--24 Jul 2021)
 \bibitem{Ring_2019}
 Ring, M., Schlör, D., Landes, D., Hotho, A.: Flow-based network traffic
  generation using generative adversarial networks. Computers \& Security
  \textbf{82},  156--172 (2019)
 \bibitem{sha2026ddpm}
 Sha, Y., Yuan, Y., Wu, Y., Zhao, H.: Ddpm fusing mamba and adaptive attention:
  An augmentation method for industrial control systems anomaly data (jan
  2026), sSRN Electronic Journal
 \bibitem{she2019neuzzefficientfuzzingneural}
 She, D., Pei, K., Epstein, D., Yang, J., Ray, B., Jana, S.: Neuzz: Efficient
  fuzzing with neural program smoothing. In: 2019 IEEE Symposium on Security
  and Privacy (SP). pp. 803--817 (2019)
 \bibitem{shi2024simplified}
 Shi, J., Han, K., Wang, Z., Doucet, A., Titsias, M.: Simplified and generalized
  masked diffusion for discrete data. In: Globerson, A., Mackey, L., Belgrave,
  D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural
  Information Processing Systems. vol.~37, pp. 103131--103167. Curran
  Associates, Inc. (2024)
 \bibitem{shi2025tabdiff}
 Shi, J., Xu, M., Hua, H., Zhang, H., Ermon, S., Leskovec, J.: Tabdiff: a
  mixed-type diffusion model for tabular data generation (2025)
 \bibitem{shin}
 Shin, H.K., Lee, W., Choi, S., Yun, J.H., Min, B.G., Kim, H.: Hai security
  dataset (2023)
 \bibitem{sikder2023transfusion}
 Sikder, M.F., Ramachandranpillai, R., Heintz, F.: Transfusion: Generating long,
  high fidelity time series using diffusion models with transformers. Machine
  Learning with Applications  \textbf{20},  100652 (2025)
 \bibitem{song2021score}
 Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.:
  Score-based generative modeling through stochastic differential equations
  (2021)
 \bibitem{stenger2024survey}
 Stenger, M., Leppich, R., Foster, I.T., Kounev, S., Bauer, A.: Evaluation is
  key: a survey on evaluation measures for synthetic time series. Journal of
  Big Data  \textbf{11}(1), ~66 (2024)
 \bibitem{tashiro2021csdiconditionalscorebaseddiffusion}
 Tashiro, Y., Song, J., Song, Y., Ermon, S.: Csdi: Conditional score-based
  diffusion models for probabilistic time series imputation. In: Ranzato, M.,
  Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
  Neural Information Processing Systems. vol.~34, pp. 24804--24816. Curran
  Associates, Inc. (2021)
 \bibitem{vaswani2017attention}
 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
  Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I.,
  Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S.,
  Garnett, R. (eds.) Advances in Neural Information Processing Systems.
  vol.~30. Curran Associates, Inc. (2017)
 \bibitem{10.1145/1151659.1159928}
 Vishwanath, K.V., Vahdat, A.: Realistic and responsive network traffic
  generation. SIGCOMM Comput. Commun. Rev.  \textbf{36}(4),  111–122 (Aug
  2006)
 \bibitem{wen2024diffstgprobabilisticspatiotemporalgraph}
 Wen, H., Lin, Y., Xia, Y., Wan, H., Wen, Q., Zimmermann, R., Liang, Y.:
  Diffstg: Probabilistic spatio-temporal graph forecasting with denoising
  diffusion models. In: Proceedings of the 31st ACM International Conference on
  Advances in Geographic Information Systems. SIGSPATIAL '23, Association for
  Computing Machinery, New York, NY, USA (2023)
 \bibitem{wu2022autoformerdecompositiontransformersautocorrelation}
 Wu, H., Xu, J., Wang, J., Long, M.: Autoformer: Decomposition transformers with
  auto-correlation for long-term series forecasting. In: Ranzato, M.,
  Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in
  Neural Information Processing Systems. vol.~34, pp. 22419--22430. Curran
  Associates, Inc. (2021)
 \bibitem{yang2001interlock}
 Yang, S., Tan, L., He, C.: Automatic verification of safety interlock systems
  for industrial processes. Journal of Loss Prevention in the Process
  Industries  \textbf{14}(5),  379--386 (2001)
 \bibitem{10.1145/3544216.3544251}
 Yin, Y., Lin, Z., Jin, M., Fanti, G., Sekar, V.: Practical gan-based synthetic
  ip header trace generation using netshare. In: Proceedings of the ACM SIGCOMM
  2022 Conference. p. 458–472. SIGCOMM '22, Association for Computing
  Machinery, New York, NY, USA (2022)
 \bibitem{yoon2019timegan}
 Yoon, J., Jarrett, D., van~der Schaar, M.: Time-series generative adversarial
  networks. In: Wallach, H., Larochelle, H., Beygelzimer, A., d\textquotesingle
  Alch\'{e}-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information
  Processing Systems. vol.~32. Curran Associates, Inc. (2019)
 \bibitem{yuan2025ctu}
 Yuan, Y., Sha, Y., Zhao, H.: Ctu-ddpm: Generating industrial control system
  time-series data with a cnn-transformer hybrid diffusion model. In:
  Proceedings of the 2025 International Symposium on Artificial Intelligence
  and Computational Social Sciences. p. 547–552. AICSS '25, Association for
  Computing Machinery, New York, NY, USA (2025)
 \bibitem{zhou2021informerefficienttransformerlong}
 Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.:
  Informer: Beyond efficient transformer for long sequence time-series
  forecasting. Proceedings of the AAAI Conference on Artificial Intelligence
  \textbf{35}(12),  11106--11115 (May 2021)
 \bibitem{zhou2022fedformerfrequencyenhanceddecomposed}
 Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., Jin, R.: {FED}former: Frequency
  enhanced decomposed transformer for long-term series forecasting. In:
  Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S.
  (eds.) Proceedings of the 39th International Conference on Machine Learning.
  Proceedings of Machine Learning Research, vol.~162, pp. 27268--27286. PMLR
  (17--23 Jul 2022)
 \end{thebibliography}
--- a/LaTeX2e+Proceedings+Templates+download/main.blg
+++ b/LaTeX2e+Proceedings+Templates+download/main.blg
@@ -0,0 +1,48 @@
 This is BibTeX, Version 0.99e
 Capacity: max_strings=200000, hash_size=200000, hash_prime=170003
 The top-level auxiliary file: main.aux
 Reallocating 'name_of_file' (item size: 1) to 9 items.
 The style file: splncs04.bst
 Reallocating 'name_of_file' (item size: 1) to 11 items.
 Database file #1: references.bib
 You've used 45 entries,
            2850 wiz_defined-function locations,
            862 strings with 15679 characters,
 and the built_in function-call counts, 33915 in all, are:
 = -- 2700
 > -- 1365
 < -- 60
 + -- 534
 - -- 488
 * -- 2256
 := -- 4201
 add.period$ -- 84
 call.type$ -- 45
 change.case$ -- 392
 chr.to.int$ -- 0
 cite$ -- 45
 duplicate$ -- 2949
 empty$ -- 2813
 format.name$ -- 546
 if$ -- 7426
 int.to.chr$ -- 0
 int.to.str$ -- 45
 missing$ -- 645
 newline$ -- 141
 num.names$ -- 120
 pop$ -- 1213
 preamble$ -- 1
 purify$ -- 294
 quote$ -- 0
 skip$ -- 902
 stack$ -- 0
 substring$ -- 1833
 swap$ -- 1830
 text.length$ -- 60
 text.prefix$ -- 0
 top$ -- 0
 type$ -- 180
 warning$ -- 0
 while$ -- 224
 width$ -- 47
 write$ -- 476
--- a/LaTeX2e+Proceedings+Templates+download/main.log
+++ b/LaTeX2e+Proceedings+Templates+download/main.log
@@ -0,0 +1,586 @@
 This is pdfTeX, Version 3.141592653-2.6-1.40.28 (MiKTeX 25.12) (preloaded format=pdflatex 2026.4.14)  21 APR 2026 17:02
 entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.
 **./main.tex
 (main.tex
 LaTeX2e <2025-11-01>
 L3 programming layer <2025-12-24>
 (llncs.cls
 Document Class: llncs 2024/01/29 v2.24 
 LaTeX document class for Lecture Notes in Computer Science
 (D:\MikTex\tex/latex/base\article.cls
 Document Class: article 2025/01/22 v1.4n Standard LaTeX document class
 (D:\MikTex\tex/latex/base\size10.clo
 File: size10.clo 2025/01/22 v1.4n Standard LaTeX file (size option)
 )
 \c@part=\count275
 \c@section=\count276
 \c@subsection=\count277
 \c@subsubsection=\count278
 \c@paragraph=\count279
 \c@subparagraph=\count280
 \c@figure=\count281
 \c@table=\count282
 \abovecaptionskip=\skip49
 \belowcaptionskip=\skip50
 \bibindent=\dimen148
 ) (D:\MikTex\tex/latex/tools\multicol.sty
 Package: multicol 2025/10/21 v2.0b multicolumn formatting (FMi)
 \c@tracingmulticols=\count283
 \mult@box=\box53
 \multicol@leftmargin=\dimen149
 \c@unbalance=\count284
 \c@collectmore=\count285
 \doublecol@number=\count286
 \multicoltolerance=\count287
 \multicolpretolerance=\count288
 \full@width=\dimen150
 \page@free=\dimen151
 \premulticols=\dimen152
 \postmulticols=\dimen153
 \multicolsep=\skip51
 \multicolbaselineskip=\skip52
 \partial@page=\box54
 \last@line=\box55
 \mc@boxedresult=\box56
 \maxbalancingoverflow=\dimen154
 \mult@rightbox=\box57
 \mult@grightbox=\box58
 \mult@firstbox=\box59
 \mult@gfirstbox=\box60
 \@tempa=\box61
 \@tempa=\box62
 \@tempa=\box63
 \@tempa=\box64
 \@tempa=\box65
 \@tempa=\box66
 \@tempa=\box67
 \@tempa=\box68
 \@tempa=\box69
 \@tempa=\box70
 \@tempa=\box71
 \@tempa=\box72
 \@tempa=\box73
 \@tempa=\box74
 \@tempa=\box75
 \@tempa=\box76
 \@tempa=\box77
 \@tempa=\box78
 \@tempa=\box79
 \@tempa=\box80
 \@tempa=\box81
 \@tempa=\box82
 \@tempa=\box83
 \@tempa=\box84
 \@tempa=\box85
 \@tempa=\box86
 \@tempa=\box87
 \@tempa=\box88
 \@tempa=\box89
 \@tempa=\box90
 \@tempa=\box91
 \@tempa=\box92
 \@tempa=\box93
 \@tempa=\box94
 \@tempa=\box95
 \@tempa=\box96
 \c@minrows=\count289
 \c@columnbadness=\count290
 \c@finalcolumnbadness=\count291
 \last@try=\dimen155
 \multicolovershoot=\dimen156
 \multicolundershoot=\dimen157
 \mult@nat@firstbox=\box97
 \colbreak@box=\box98
 \mc@col@check@num=\count292
 \g__mc_curr_col_int=\count293
 ) (D:\MikTex\tex/latex/oberdiek\aliascnt.sty
 Package: aliascnt 2018/09/07 v1.5 Alias counters (HO)
 )
 \c@chapter=\count294
 LaTeX Font Info:    Redeclaring math symbol \Gamma on input line 379.
 LaTeX Font Info:    Redeclaring math symbol \Delta on input line 380.
 LaTeX Font Info:    Redeclaring math symbol \Theta on input line 381.
 LaTeX Font Info:    Redeclaring math symbol \Lambda on input line 382.
 LaTeX Font Info:    Redeclaring math symbol \Xi on input line 383.
 LaTeX Font Info:    Redeclaring math symbol \Pi on input line 384.
 LaTeX Font Info:    Redeclaring math symbol \Sigma on input line 385.
 LaTeX Font Info:    Redeclaring math symbol \Upsilon on input line 386.
 LaTeX Font Info:    Redeclaring math symbol \Phi on input line 387.
 LaTeX Font Info:    Redeclaring math symbol \Psi on input line 388.
 LaTeX Font Info:    Redeclaring math symbol \Omega on input line 389.
 LaTeX Info: Redefining \vec on input line 394.
 \tocchpnum=\dimen158
 \tocsecnum=\dimen159
 \tocsectotal=\dimen160
 \tocsubsecnum=\dimen161
 \tocsubsectotal=\dimen162
 \tocsubsubsecnum=\dimen163
 \tocsubsubsectotal=\dimen164
 \tocparanum=\dimen165
 \tocparatotal=\dimen166
 \tocsubparanum=\dimen167
 \@tempcntc=\count295
 \fnindent=\dimen168
 \c@@inst=\count296
 \c@@auth=\count297
 \c@auco=\count298
 \instindent=\dimen169
 \authrun=\box99
 \authorrunning=\toks17
 \tocauthor=\toks18
 \titrun=\box100
 \titlerunning=\toks19
 \toctitle=\toks20
 \c@theorem=\count299
 \c@case=\count300
 \c@conjecture=\count301
 \c@corollary=\count302
 \c@definition=\count303
 \c@example=\count304
 \c@exercise=\count305
 \c@lemma=\count306
 \c@note=\count307
 \c@problem=\count308
 \c@property=\count309
 \c@proposition=\count310
 \c@question=\count311
 \c@solution=\count312
 \c@remark=\count313
 \headlineindent=\dimen170
 )
 (D:\MikTex\tex/latex/base\fontenc.sty
 Package: fontenc 2025/07/18 v2.1d Standard LaTeX package
 ) (D:\MikTex\tex/latex/lm\lmodern.sty
 Package: lmodern 2015/05/01 v1.6.1 Latin Modern Fonts
 LaTeX Font Info:    Overwriting symbol font `operators' in version `normal'
 (Font)                  OT1/cmr/m/n --> OT1/lmr/m/n on input line 22.
 LaTeX Font Info:    Overwriting symbol font `letters' in version `normal'
 (Font)                  OML/cmm/m/it --> OML/lmm/m/it on input line 23.
 LaTeX Font Info:    Overwriting symbol font `symbols' in version `normal'
 (Font)                  OMS/cmsy/m/n --> OMS/lmsy/m/n on input line 24.
 LaTeX Font Info:    Overwriting symbol font `largesymbols' in version `normal'
 (Font)                  OMX/cmex/m/n --> OMX/lmex/m/n on input line 25.
 LaTeX Font Info:    Overwriting symbol font `operators' in version `bold'
 (Font)                  OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 26.
 LaTeX Font Info:    Overwriting symbol font `letters' in version `bold'
 (Font)                  OML/cmm/b/it --> OML/lmm/b/it on input line 27.
 LaTeX Font Info:    Overwriting symbol font `symbols' in version `bold'
 (Font)                  OMS/cmsy/b/n --> OMS/lmsy/b/n on input line 28.
 LaTeX Font Info:    Overwriting symbol font `largesymbols' in version `bold'
 (Font)                  OMX/cmex/m/n --> OMX/lmex/m/n on input line 29.
 LaTeX Font Info:    Overwriting math alphabet `\mathbf' in version `normal'
 (Font)                  OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 31.
 LaTeX Font Info:    Overwriting math alphabet `\mathsf' in version `normal'
 (Font)                  OT1/cmss/m/n --> OT1/lmss/m/n on input line 32.
 LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `normal'
 (Font)                  OT1/cmr/m/it --> OT1/lmr/m/it on input line 33.
 LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `normal'
 (Font)                  OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 34.
 LaTeX Font Info:    Overwriting math alphabet `\mathbf' in version `bold'
 (Font)                  OT1/cmr/bx/n --> OT1/lmr/bx/n on input line 35.
 LaTeX Font Info:    Overwriting math alphabet `\mathsf' in version `bold'
 (Font)                  OT1/cmss/bx/n --> OT1/lmss/bx/n on input line 36.
 LaTeX Font Info:    Overwriting math alphabet `\mathit' in version `bold'
 (Font)                  OT1/cmr/bx/it --> OT1/lmr/bx/it on input line 37.
 LaTeX Font Info:    Overwriting math alphabet `\mathtt' in version `bold'
 (Font)                  OT1/cmtt/m/n --> OT1/lmtt/m/n on input line 38.
 )
 (D:\MikTex\tex/latex/graphics\graphicx.sty
 Package: graphicx 2024/12/31 v1.2e Enhanced LaTeX Graphics (DPC,SPQR)
 (D:\MikTex\tex/latex/graphics\keyval.sty
 Package: keyval 2022/05/29 v1.15 key=value parser (DPC)
 \KV@toks@=\toks21
 )
 (D:\MikTex\tex/latex/graphics\graphics.sty
 Package: graphics 2024/08/06 v1.4g Standard LaTeX Graphics (DPC,SPQR)
 (D:\MikTex\tex/latex/graphics\trig.sty
 Package: trig 2023/12/02 v1.11 sin cos tan (DPC)
 )
 (D:\MikTex\tex/latex/graphics-cfg\graphics.cfg
 File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
 )
 Package graphics Info: Driver file: pdftex.def on input line 106.
 (D:\MikTex\tex/latex/graphics-def\pdftex.def
 File: pdftex.def 2025/09/29 v1.2d Graphics/color driver for pdftex
 ))
 \Gin@req@height=\dimen171
 \Gin@req@width=\dimen172
 )
 (D:\MikTex\tex/latex/amsmath\amsmath.sty
 Package: amsmath 2025/07/09 v2.17z AMS math features
 \@mathmargin=\skip53
 For additional information on amsmath, use the `?' option.
 (D:\MikTex\tex/latex/amsmath\amstext.sty
 Package: amstext 2024/11/17 v2.01 AMS text
 (D:\MikTex\tex/latex/amsmath\amsgen.sty
 File: amsgen.sty 1999/11/30 v2.0 generic functions
 \@emptytoks=\toks22
 \ex@=\dimen173
 ))
 (D:\MikTex\tex/latex/amsmath\amsbsy.sty
 Package: amsbsy 1999/11/29 v1.2d Bold Symbols
 \pmbraise@=\dimen174
 )
 (D:\MikTex\tex/latex/amsmath\amsopn.sty
 Package: amsopn 2022/04/08 v2.04 operator names
 )
 \inf@bad=\count314
 LaTeX Info: Redefining \frac on input line 233.
 \uproot@=\count315
 \leftroot@=\count316
 LaTeX Info: Redefining \overline on input line 398.
 LaTeX Info: Redefining \colon on input line 409.
 \classnum@=\count317
 \DOTSCASE@=\count318
 LaTeX Info: Redefining \ldots on input line 495.
 LaTeX Info: Redefining \dots on input line 498.
 LaTeX Info: Redefining \cdots on input line 619.
 \Mathstrutbox@=\box101
 \strutbox@=\box102
 LaTeX Info: Redefining \big on input line 721.
 LaTeX Info: Redefining \Big on input line 722.
 LaTeX Info: Redefining \bigg on input line 723.
 LaTeX Info: Redefining \Bigg on input line 724.
 \big@size=\dimen175
 LaTeX Font Info:    Redeclaring font encoding OML on input line 742.
 LaTeX Font Info:    Redeclaring font encoding OMS on input line 743.
 Package amsmath Warning: Unable to redefine math accent \vec.
 \macc@depth=\count319
 LaTeX Info: Redefining \bmod on input line 904.
 LaTeX Info: Redefining \pmod on input line 909.
 LaTeX Info: Redefining \smash on input line 939.
 LaTeX Info: Redefining \relbar on input line 969.
 LaTeX Info: Redefining \Relbar on input line 970.
 \c@MaxMatrixCols=\count320
 \dotsspace@=\muskip17
 \c@parentequation=\count321
 \dspbrk@lvl=\count322
 \tag@help=\toks23
 \row@=\count323
 \column@=\count324
 \maxfields@=\count325
 \andhelp@=\toks24
 \eqnshift@=\dimen176
 \alignsep@=\dimen177
 \tagshift@=\dimen178
 \tagwidth@=\dimen179
 \totwidth@=\dimen180
 \lineht@=\dimen181
 \@envbody=\toks25
 \multlinegap=\skip54
 \multlinetaggap=\skip55
 \mathdisplay@stack=\toks26
 LaTeX Info: Redefining \[ on input line 2950.
 LaTeX Info: Redefining \] on input line 2951.
 ) (D:\MikTex\tex/latex/amsfonts\amssymb.sty
 Package: amssymb 2013/01/14 v3.01 AMS font symbols
 (D:\MikTex\tex/latex/amsfonts\amsfonts.sty
 Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
 \symAMSa=\mathgroup4
 \symAMSb=\mathgroup5
 LaTeX Font Info:    Redeclaring math symbol \hbar on input line 98.
 LaTeX Font Info:    Overwriting math alphabet `\mathfrak' in version `bold'
 (Font)                  U/euf/m/n --> U/euf/b/n on input line 106.
 )) (D:\MikTex\tex/latex/tools\bm.sty
 Package: bm 2025/10/21 v1.2g Bold Symbol Support (DPC/FMi)
 \symboldoperators=\mathgroup6
 \symboldletters=\mathgroup7
 \symboldsymbols=\mathgroup8
 Package bm Info: No bold for \OMX/lmex/m/n, using \pmb.
 Package bm Info: No bold for \U/msa/m/n, using \pmb.
 Package bm Info: No bold for \U/msb/m/n, using \pmb.
 LaTeX Font Info:    Redeclaring math alphabet \mathbf on input line 149.
 ) (D:\MikTex\tex/latex/tools\array.sty
 Package: array 2025/09/25 v2.6n Tabular extension package (FMi)
 \col@sep=\dimen182
 \ar@mcellbox=\box103
 \extrarowheight=\dimen183
 \NC@list=\toks27
 \extratabsurround=\skip56
 \backup@length=\skip57
 \ar@cellbox=\box104
 )
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/booktabs\booktabs.sty
 Package: booktabs 2020/01/12 v1.61803398 Publication quality tables
 \heavyrulewidth=\dimen184
 \lightrulewidth=\dimen185
 \cmidrulewidth=\dimen186
 \belowrulesep=\dimen187
 \belowbottomsep=\dimen188
 \aboverulesep=\dimen189
 \abovetopsep=\dimen190
 \cmidrulesep=\dimen191
 \cmidrulekern=\dimen192
 \defaultaddspace=\dimen193
 \@cmidla=\count326
 \@cmidlb=\count327
 \@aboverulesep=\dimen194
 \@belowrulesep=\dimen195
 \@thisruleclass=\count328
 \@lastruleclass=\count329
 \@thisrulewidth=\dimen196
 )
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\microtype.sty
 Package: microtype 2026/03/01 v3.2d Micro-typographical refinements (RS)
 (D:\MikTex\tex/latex/etoolbox\etoolbox.sty
 Package: etoolbox 2025/10/02 v2.5m e-TeX tools for LaTeX (JAW)
 \etb@tempcnta=\count330
 )
 \MT@toks=\toks28
 \MT@tempbox=\box105
 \MT@count=\count331
 LaTeX Info: Redefining \noprotrusionifhmode on input line 1084.
 LaTeX Info: Redefining \leftprotrusion on input line 1085.
 \MT@prot@toks=\toks29
 LaTeX Info: Redefining \rightprotrusion on input line 1104.
 LaTeX Info: Redefining \textls on input line 1449.
 \MT@outer@kern=\dimen197
 LaTeX Info: Redefining \microtypecontext on input line 2053.
 LaTeX Info: Redefining \textmicrotypecontext on input line 2070.
 \MT@listname@count=\count332
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\microtype-pdftex
 .def
 File: microtype-pdftex.def 2026/03/01 v3.2d Definitions specific to pdftex (RS)
 LaTeX Info: Redefining \lsstyle on input line 944.
 LaTeX Info: Redefining \lslig on input line 944.
 \MT@outer@space=\skip58
 )
 Package microtype Info: Loading configuration file microtype.cfg.
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\microtype.cfg
 File: microtype.cfg 2026/03/01 v3.2d microtype main configuration file (RS)
 )
 LaTeX Info: Redefining \microtypesetup on input line 3065.
 )
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/float\float.sty
 Package: float 2001/11/08 v1.3d Float enhancements (AL)
 \c@float@type=\count333
 \float@exts=\toks30
 \float@box=\box106
 \@float@everytoks=\toks31
 \@floatcapt=\box107
 )
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/enumitem\enumitem.sty
 Package: enumitem 2025/02/06 v3.11 Customized lists
 \labelindent=\skip59
 \enit@outerparindent=\dimen198
 \enit@toks=\toks32
 \enit@inbox=\box108
 \enit@count@id=\count334
 \enitdp@description=\count335
 )
 (D:\MikTex\tex/latex/url\url.sty
 \Urlmuskip=\muskip18
 Package: url 2013/09/16  ver 3.4  Verb mode for urls, etc.
 )
 LaTeX Font Info:    Trying to load font information for T1+lmr on input line 39
 .
 (D:\MikTex\tex/latex/lm\t1lmr.fd
 File: t1lmr.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 )
 (D:\MikTex\tex/latex/l3backend\l3backend-pdftex.def
 File: l3backend-pdftex.def 2025-10-09 L3 backend support: PDF output (pdfTeX)
 \l__color_backend_stack_int=\count336
 ) (main.aux)
 \openout1 = `main.aux'.
 LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 LaTeX Font Info:    Checking defaults for TS1/cmr/m/n on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 39.
 LaTeX Font Info:    ... okay on input line 39.
 (D:\MikTex\tex/context/base/mkii\supp-pdf.mkii
 [Loading MPS to PDF converter (version 2006.09.02).]
 \scratchcounter=\count337
 \scratchdimen=\dimen199
 \scratchbox=\box109
 \nofMPsegments=\count338
 \nofMParguments=\count339
 \everyMPshowfont=\toks33
 \MPscratchCnt=\count340
 \MPscratchDim=\dimen256
 \MPnumerator=\count341
 \makeMPintoPDFobject=\count342
 \everyMPtoPDFconversion=\toks34
 ) (D:\MikTex\tex/latex/epstopdf-pkg\epstopdf-base.sty
 Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf
 Package epstopdf-base Info: Redefining graphics rule for `.eps' on input line 4
 85.
 (D:\MikTex\tex/latex/00miktex\epstopdf-sys.cfg
 File: epstopdf-sys.cfg 2021/03/18 v2.0 Configuration of epstopdf for MiKTeX
 ))
 LaTeX Info: Redefining \microtypecontext on input line 39.
 Package microtype Info: Applying patch `item' on input line 39.
 Package microtype Info: Applying patch `toc' on input line 39.
 Package microtype Info: Applying patch `eqnum' on input line 39.
 Package microtype Info: Applying patch `footnote' on input line 39.
 Package microtype Info: Applying patch `verbatim' on input line 39.
 LaTeX Info: Redefining \microtypesetup on input line 39.
 Package microtype Info: Generating PDF output.
 Package microtype Info: Character protrusion enabled (level 2).
 Package microtype Info: Using default protrusion set `alltext'.
 Package microtype Info: No font expansion.
 Package microtype Info: No adjustment of tracking.
 Package microtype Info: No adjustment of interword spacing.
 Package microtype Info: No adjustment of character kerning.
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\mt-cmr.cfg
 File: mt-cmr.cfg 2013/05/19 v2.2 microtype config. file: Computer Modern Roman 
 (RS)
 )
 LaTeX Font Info:    Trying to load font information for OT1+lmr on input line 4
 1.
 (D:\MikTex\tex/latex/lm\ot1lmr.fd
 File: ot1lmr.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 )
 LaTeX Font Info:    Trying to load font information for OML+lmm on input line 4
 1.
 (D:\MikTex\tex/latex/lm\omllmm.fd
 File: omllmm.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 )
 LaTeX Font Info:    Trying to load font information for OMS+lmsy on input line 
 41.
 (D:\MikTex\tex/latex/lm\omslmsy.fd
 File: omslmsy.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 )
 LaTeX Font Info:    Trying to load font information for OMX+lmex on input line 
 41.
 (D:\MikTex\tex/latex/lm\omxlmex.fd
 File: omxlmex.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 )
 LaTeX Font Info:    External font `lmex10' loaded for size
 (Font)              <10> on input line 41.
 LaTeX Font Info:    External font `lmex10' loaded for size
 (Font)              <7> on input line 41.
 LaTeX Font Info:    External font `lmex10' loaded for size
 (Font)              <5> on input line 41.
 LaTeX Font Info:    Trying to load font information for U+msa on input line 41.
 (D:\MikTex\tex/latex/amsfonts\umsa.fd
 File: umsa.fd 2013/01/14 v3.01 AMS symbols A
 )
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\mt-msa.cfg
 File: mt-msa.cfg 2006/02/04 v1.1 microtype config. file: AMS symbols (a) (RS)
 )
 LaTeX Font Info:    Trying to load font information for U+msb on input line 41.
 (D:\MikTex\tex/latex/amsfonts\umsb.fd
 File: umsb.fd 2013/01/14 v3.01 AMS symbols B
 )
 (C:\Users\Markyan04\AppData\Roaming\MiKTeX\tex/latex/microtype\mt-msb.cfg
 File: mt-msb.cfg 2005/06/01 v1.0 microtype config. file: AMS symbols (b) (RS)
 )
 LaTeX Font Info:    Trying to load font information for T1+lmtt on input line 4
 1.
 (D:\MikTex\tex/latex/lm\t1lmtt.fd
 File: t1lmtt.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 )
 Package microtype Info: Loading generic protrusion settings for font family
 (microtype)             `lmtt' (encoding: T1).
 (microtype)             For optimal results, create family-specific settings.
 (microtype)             See the microtype manual for details.
 LaTeX Font Info:    External font `lmex10' loaded for size
 (Font)              <9> on input line 41.
 LaTeX Font Info:    External font `lmex10' loaded for size
 (Font)              <6> on input line 41.
 LaTeX Font Info:    Trying to load font information for TS1+lmr on input line 4
 6.
 (D:\MikTex\tex/latex/lm\ts1lmr.fd
 File: ts1lmr.fd 2015/05/01 v1.6.1 Font defs for Latin Modern
 ) [1
 {C:/Users/Markyan04/AppData/Local/MiKTeX/fonts/map/pdftex/pdftex.map}{D:/MikTex
 /fonts/enc/dvips/lm/lm-ec.enc}{D:/MikTex/fonts/enc/dvips/lm/lm-rm.enc}{D:/MikTe
 x/fonts/enc/dvips/lm/lm-mathit.enc}{D:/MikTex/fonts/enc/dvips/lm/lm-mathsy.enc}
 {D:/MikTex/fonts/enc/dvips/lm/lm-ts1.enc}] [2] [3] [4]
 <fig-design-v4-from-user-svg-cropped.pdf, id=26, 616.3025pt x 172.645pt>
 File: fig-design-v4-from-user-svg-cropped.pdf Graphic file (type pdf)
 <use fig-design-v4-from-user-svg-cropped.pdf>
 Package pdftex.def Info: fig-design-v4-from-user-svg-cropped.pdf  used on input
 line 81.
 (pdftex.def)             Requested size: 345.38622pt x 93.48433pt.
 [5] [6 <./fig-design-v4-from-user-svg-cropped.pdf>] [7{D:/MikTex/fonts/enc/dvi
 ps/lm/lm-mathex.enc}] [8]
 [9]
 <typeclass-cropped.pdf, id=143, 616.3025pt x 221.82875pt>
 File: typeclass-cropped.pdf Graphic file (type pdf)
 <use typeclass-cropped.pdf>
 Package pdftex.def Info: typeclass-cropped.pdf  used on input line 205.
 (pdftex.def)             Requested size: 340.17958pt x 122.4462pt.
 [10] [11 <./typeclass-cropped.pdf>] [12]
 <fig-benchmark-story-v2.png, id=174, 1089.6345pt x 360.036pt>
 File: fig-benchmark-story-v2.png Graphic file (type png)
 <use fig-benchmark-story-v2.png>
 Package pdftex.def Info: fig-benchmark-story-v2.png  used on input line 235.
 (pdftex.def)             Requested size: 347.12354pt x 114.69197pt.
 [13 <./fig-benchmark-story-v2.png>]
 <fig-benchmark-ablations-v1.png, id=181, 727.299pt x 328.5pt>
 File: fig-benchmark-ablations-v1.png Graphic file (type png)
 <use fig-benchmark-ablations-v1.png>
 Package pdftex.def Info: fig-benchmark-ablations-v1.png  used on input line 288
 .
 (pdftex.def)             Requested size: 347.12354pt x 156.78598pt.
 [14 <./fig-benchmark-ablations-v1.png>] [15] (main.bbl [16] [17] [18])
 [19] (main.aux)
 ***********
 LaTeX2e <2025-11-01>
 L3 programming layer <2025-12-24>
 ***********
 ) 
 Here is how much of TeX's memory you used:
 6672 strings out of 467871
 103050 string characters out of 5435199
 561035 words of memory out of 5000000
 35432 multiletter control sequences out of 15000+600000
 706871 words of font info for 99 fonts, out of 8000000 for 9000
 1141 hyphenation exceptions out of 8191
 57i,9n,65p,2477b,346s stack positions out of 10000i,1000n,20000p,200000b,200000s
 <D:/MikTex/fonts/type1/public/lm/lmbx10.pfb><D:/MikTex/fonts/type1/public/lm/
 lmbx12.pfb><D:/MikTex/fonts/type1/public/lm/lmbx9.pfb><D:/MikTex/fonts/type1/pu
 blic/lm/lmex10.pfb><D:/MikTex/fonts/type1/public/lm/lmmi10.pfb><D:/MikTex/fonts
 /type1/public/lm/lmmi5.pfb><D:/MikTex/fonts/type1/public/lm/lmmi7.pfb><D:/MikTe
 x/fonts/type1/public/lm/lmmi9.pfb><D:/MikTex/fonts/type1/public/lm/lmmib10.pfb>
 <D:/MikTex/fonts/type1/public/lm/lmmib7.pfb><D:/MikTex/fonts/type1/public/lm/lm
 r10.pfb><D:/MikTex/fonts/type1/public/lm/lmr5.pfb><D:/MikTex/fonts/type1/public
 /lm/lmr6.pfb><D:/MikTex/fonts/type1/public/lm/lmr7.pfb><D:/MikTex/fonts/type1/p
 ublic/lm/lmr9.pfb><D:/MikTex/fonts/type1/public/lm/lmri10.pfb><D:/MikTex/fonts/
 type1/public/lm/lmri9.pfb><D:/MikTex/fonts/type1/public/lm/lmsy10.pfb><D:/MikTe
 x/fonts/type1/public/lm/lmsy7.pfb><D:/MikTex/fonts/type1/public/lm/lmsy9.pfb><D
 :/MikTex/fonts/type1/public/lm/lmtt10.pfb><D:/MikTex/fonts/type1/public/lm/lmtt
 9.pfb><D:/MikTex/fonts/type1/public/amsfonts/symbols/msbm10.pfb>
 Output written on main.pdf (19 pages, 1109662 bytes).
 PDF statistics:
 313 PDF objects out of 1000 (max. 8388607)
 0 named destinations out of 1000 (max. 500000)
 13851 words of extra memory for PDF output out of 14400 (max. 10000000)
--- a/LaTeX2e+Proceedings+Templates+download/main.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/main.pdf
--- a/LaTeX2e+Proceedings+Templates+download/main.tex
+++ b/LaTeX2e+Proceedings+Templates+download/main.tex
@@ -0,0 +1,336 @@
 \documentclass[runningheads]{llncs}
 \usepackage[T1]{fontenc}
 \usepackage{lmodern}
 \usepackage{graphicx}
 \usepackage{amsmath}
 \usepackage{amssymb}
 \usepackage{amsfonts}
 \usepackage{bm}
 \usepackage{array}
 \usepackage{booktabs}
 \usepackage[expansion=false]{microtype}
 \usepackage{float}
 \usepackage{enumitem}
 \usepackage{url}
 % Compact internal spacing only; page layout/margins remain unchanged.
 \setlength{\textfloatsep}{10pt plus 2pt minus 2pt}
 \setlength{\floatsep}{8pt plus 2pt minus 2pt}
 \setlength{\intextsep}{10pt plus 2pt minus 2pt}
 \setlength{\abovecaptionskip}{4pt}
 \setlength{\belowcaptionskip}{0pt}
 \setlist{topsep=3pt,itemsep=2pt,parsep=0pt,partopsep=0pt}
 % Compatibility shim: the source manuscript uses natbib-style citep commands.
 \newcommand{\citep}[1]{\cite{#1}}
 \title{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}
 \titlerunning{Mask-DDPM for ICS Telemetry Synthesis}
 \author{Zhenglan Chen\inst{1} \and Mingzhe Yang\inst{1} \and Hongyu Yan\inst{1} \and Huan Yang\inst{2}}
 \authorrunning{Z. Chen et al.}
 \institute{Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University, Guangzhou, Guangdong 510631, China \\
 \email{\{20223803054,20223803063,20223803065\}@m.scnu.edu.cn}
 \and
 School of Artificial Intelligence, South China Normal University, Guangzhou, Guangdong 510631, China \\
 \email{huan.yang@m.scnu.edu.cn}}
 \begin{document}
 \raggedbottom
 \maketitle
 \begin{abstract}
 Industrial control systems (ICS) security research is increasingly constrained by the scarcity and limited shareability of realistic communication traces and process measurements, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol-feature and process-signal level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-range temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees valid symbol generation by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows ($L=96$) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 $\pm$ 0.0079 (continuous), mean JSD = 0.0284 $\pm$ 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows.
 \keywords{Machine Learning \and Attack Synthesis \and ICS}
 \end{abstract}
 \section{Introduction}
 \label{sec:intro}
 Industrial control systems (ICS) form the backbone of modern critical infrastructure, which includes power grids, water treatment, manufacturing, and transportation, among others. These systems monitor, regulate, and automate physical processes through sensors, actuators, programmable logic controllers (PLCs), and monitoring software. Unlike conventional IT systems, ICS operate in real time, closely coupled with physical processes and safety-critical constraints, using heterogeneous and legacy communication protocols such as Modbus/TCP and DNP3 that were not originally designed with robust security in mind. This architectural complexity and operational criticality make ICS high-impact targets for cyber attacks, where disruptions can result in physical damage, environmental harm, and even loss of life. Recent reviews of ICS security highlight the expanding attack surface due to increased connectivity, vulnerabilities in legacy systems, and the inadequacy of traditional security controls in capturing the nuances of ICS networks and protocols \citep{10.1007/s10844-022-00753-1,Nankya2023-gp}
 While machine learning (ML) techniques have shown promise for anomaly detection and automated cybersecurity within ICS, they rely heavily on labeled datasets that capture both benign operations and diverse attack patterns. In practice, real ICS traffic data, especially attack-triggered captures, are scarce due to confidentiality, safety, and legal restrictions, and available public ICS datasets are few, limited in scope, or fail to reflect current threat modalities. For instance, the HAI Security Dataset provides operational telemetry and anomaly flags from a realistic control system setup for research purposes, but must be carefully preprocessed to derive protocol-relevant features for ML tasks \citep{shin}. Data scarcity directly undermines model generalization, evaluation reproducibility, and the robustness of intrusion detection research, especially when training or testing ML models on realistic ICS behavior remains confined to small or outdated collections of examples \citep{info16100910}.
 Synthetic data generation offers a practical pathway to mitigate these challenges. By programmatically generating feature-level sequences that mimic the statistical and temporal structure of real ICS telemetry, researchers can augment scarce training sets, standardize benchmarking, and preserve operational confidentiality. Relative to raw packet captures, feature-level synthesis abstracts critical protocol semantics and statistical patterns without exposing sensitive fields, making it more compatible with safety constraints and compliance requirements in ICS environments. Modern generative modeling, including diffusion models, has advanced significantly in producing high-fidelity synthetic data across domains. Diffusion approaches, such as denoising diffusion probabilistic models, learn to transform noise into coherent structured samples and have been successfully applied to tabular or time series data synthesis with better stability and data coverage compared to adversarial methods \citep{pmlr-v202-kotelnikov23a,rasul2021autoregressivedenoisingdiffusionmodels}
 Despite these advances, most existing work either focuses on packet-level generation \citep{jiang2023netdiffusionnetworkdataaugmentation} or is limited to generic tabular data \citep{pmlr-v202-kotelnikov23a}, rather than domain-specific control sequence synthesis tailored for ICS protocols where temporal coherence, multi-channel dependencies, and discrete protocol legality are jointly required. This gap motivates our focus on protocol feature-level generation for ICS, which involves synthesizing sequences of protocol-relevant fields conditioned on their temporal and cross-channel structure. In this work, we formulate a hybrid modeling pipeline that decouples long-horizon trends and local statistical detail while preserving discrete semantics of protocol tokens. By combining causal Transformers with diffusion-based refiners, and enforcing deterministic validity constraints during sampling, our framework generates semantically coherent, temporally consistent, and distributionally faithful ICS feature sequences. We evaluate features derived from the HAI Security Dataset and demonstrate that our approach produces high-quality synthetic sequences suitable for downstream augmentation, benchmarking, and integration into packet-reconstruction workflows that respect realistic ICS constraints.
 % 2. Related Work
 \section{Related Work}
 \label{sec:related}
 Early generation of network data oriented towards ``realism'' mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \citep{10.1145/1151659.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \citep{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \citep{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \citep{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \citep{7469060,10.1145/3055366.3055375}. This makes ``synthesis at the feature/telemetry level, aware of protocol and semantics'' more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from ``packet syntax reproduction'' to ``modeling of high-level spatio-temporal distributions and uncertainties'', requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
 Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \citep{ho2020denoising}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \citep{song2021score}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \citep{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \citep{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \citep{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \citep{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \citep{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \citep{11087622}. These results overall point to a conclusion: when the research focus is on ``telemetry/high-level features'' rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a ``monolithic diffusion'' is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
 Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be ``legal'' under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \citep{austin2021structured}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \citep{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \citep{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \citep{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of ``sequence legality and state coverage'' \citep{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of ``legality by construction'' in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
 From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-facated fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \citep{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \citep{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \citep{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \citep{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \citep{nie2023patchtst}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \citep{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
 % 3. Methodology
 \section{Methodology}
 \label{sec:method}
 Industrial control system (ICS) telemetry is intrinsically mixed-type and mechanistically heterogeneous: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic: it directly affects what realistic synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled \citep{nist2023sp80082}.
 We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
 A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}. Figure~\ref{fig:design} summarizes how our framework maps these requirements into a staged generator for mixed-type ICS telemetry.
 \begin{figure}[htbp]
  \centering
  \includegraphics[width=0.995\textwidth,trim=4pt 4pt 4pt 4pt,clip]{fig-design-v4-from-user-svg-cropped.pdf}
  \caption{Masked-DDPM: Unified Synthesis for ICS traffic}
  \label{fig:design}
 \end{figure}
 Motivated by these considerations, we propose Mask-DDPM, organized in the following order:
 \begin{enumerate}
  \item Transformer trend module: learns the dominant temporal backbone of continuous dynamics via attention-based sequence modeling \citep{vaswani2017attention}.
  \item Residual DDPM for continuous variables: models distributional detail as stochastic residual structure conditioned on the learned trend \citep{ho2020denoising,kollovieh2023tsdiff}.
  \item Masked diffusion for discrete variables: generates discrete ICS states with an absorbing/masking corruption process and categorical reconstruction \citep{austin2021structured,shi2024simplified}.
  \item Type-aware decomposition: a type-aware factorization and routing layer that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
 \end{enumerate}
 This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. As shown in Figure~\ref{fig:design}, these components are arranged sequentially so that temporal scaffolding, residual refinement, and discrete legality are enforced in complementary rather than competing stages. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
 \subsection{Transformer trend module for continuous dynamics}
 \label{sec:method-trans}
 We instantiate the temporal backbone as a causal Transformer trend extractor, leveraging self-attention's ability to represent long-range dependencies and cross-channel interactions without recurrence \citep{vaswani2017attention}. Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation \citep{vaswani2017attention,nist2023sp80082}. Crucially, in our design the Transformer is not asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
 For continuous channels $\bm{X}$, we posit an additive decomposition:
 \begin{equation}
 \bm{X} = \bm{S} + \bm{R},
 \label{eq:additive_decomp}
 \end{equation}
 where $\bm{S} \in \mathbb{R}^{L \times d_c}$ is a smooth trend capturing predictable temporal evolution, and $\bm{R} \in \mathbb{R}^{L \times d_c}$ is a residual capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-based temporal objective. This separation reflects an explicit division of labor: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level, a strategy aligned with predict-then-refine perspectives in time-series diffusion modeling \citep{kollovieh2023tsdiff,sikder2023transfusion}.
 We parameterize the trend $\bm{S}$ using a causal Transformer $f_\phi$. With teacher forcing, we train $F_\phi$ to predict the next-step trend from past observations:
 \begin{equation}
 \hat{\bm{S}}_{t+1} = f_{\phi}(\bm{X}_{1:t}), \quad t = 1, \dots, L-1.
 \label{eq:trend_prediction}
 \end{equation}
 using the mean-squared error objective:
 \begin{equation}
 \mathcal{L}_{\text{trend}}(\phi) = \frac{1}{(L-1)d_c} \sum_{t=1}^{L-1} \bigl\| \hat{\bm{S}}_{t+1} - \bm{X}_{t+1} \bigr\|_2^2.
 \label{eq:trend_loss}
 \end{equation}
 At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S}}$, and then define the residual target for diffusion as $\bm{R} = \bm{X} - \hat{\bm{S}}$. This setup intentionally locks in a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a conditioning-first temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an monolithic generator \citep{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}.
 \subsection{DDPM for continuous residual generation}
 \label{sec:method-ddpm}
 We model the residual $\bm{R}$ with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff,sikder2023transfusion}.
 Let $\bm{K}$ denote the number of diffusion steps, with a noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
 \begin{equation}
 q(\bm{r}_k \mid \bm{r}_0) = \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr)
 \label{eq:forward_corruption}
 \end{equation}
 equivalently,
 \begin{equation}
 \bm{r}_k = \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
 \label{eq:forward_corruption_eq}
 \end{equation}
 The learned reverse process is parameterized as:
 \begin{equation}
 p_{\theta}(\bm{r}_{k-1} \mid \bm{r}_k, \hat{\bm{S}}) = \mathcal{N}\bigl( \boldsymbol{\mu}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}),\; \boldsymbol{\Sigma}(k) \bigr).
 \label{eq:reverse_process}
 \end{equation}
 where $\mu_\theta$ is implemented by a Transformer denoiser that consumes (i) the noised residual $r_k$, (ii) a timestep embedding for $k$, and (iii) conditioning features derived from $\hat{\bm{S}}$.  This denoiser architecture is consistent with the growing use of attention-based denoisers for long-context time-series diffusion, while our key methodological emphasis is the trend-conditioned residual factorization as the object of diffusion learning \citep{ho2020denoising,sikder2023transfusion}.
 We train the denoiser using the standard DDPM $\epsilon$-prediction objective:
 \begin{equation}
 \mathcal{L}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right].
 \label{eq:ddpm_loss}
 \end{equation}
 Because diffusion optimization can exhibit time-step imbalance (i.e., some steps dominate gradients), we optionally apply an SNR-based reweighting consistent with Min-SNR training:
 \begin{equation}
 \mathcal{L}^{\text{snr}}_{\text{cont}}(\theta) = \mathbb{E}_{k,\bm{r}_0,\boldsymbol{\epsilon}} \left[ w_k \bigl\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\bm{r}_k, k, \hat{\bm{S}}) \bigr\|_2^2 \right],
 \label{eq:snr_loss}
 \end{equation}
 where $\mathrm{SNR}_k=\bar{\alpha}_k/(1-\bar{\alpha}_k)$ and $\gamma>0$ is a cap parameter \citep{hang2023efficient}.
 After sampling $\hat{\bm{R}}$ by reverse diffusion, we reconstruct the continuous output as $\hat{\bm{X}} = \hat{\bm{S}} + \hat{\bm{R}}$. Overall, the DDPM component serves as a distributional corrector on top of a temporally coherent backbone, which is particularly suited to ICS where low-frequency dynamics are strong and persistent but fine-scale variability (including bursts and regime-conditioned noise) remains important for realism. Relative to prior ICS diffusion efforts that primarily focus on continuous augmentation, our formulation elevates trend-conditioned residual diffusion as a modular mechanism for disentangling temporal structure from distributional refinement \citep{yuan2025ctu,sha2026ddpm}.
 \subsection{Masked diffusion for discrete ICS variables}
 \label{sec:method-discrete}
 Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., in-between modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables \citep{austin2021structured,shi2024simplified}. In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness \citep{nist2023sp80082}.
 We therefore adopt masked (absorbing) diffusion for discrete channels, where corruption replaces tokens with a special $\texttt{[MASK]}$ symbol according to a schedule \citep{shi2024simplified}. For each variable $j$, define a masking schedule $\{m_k\}_{k=1}^K$ (with $m_k\in[0,1]$) increasing in $k$. The forward corruption process is:
 \begin{equation}
 q(y^{(j)}_k \mid y^{(j)}_0) =
 \begin{cases}
 y^{(j)}_0,            & \text{with probability } 1 - m_k, \\
 \texttt{[MASK]},      & \text{with probability } m_k,
 \end{cases}
 \label{eq:masking_process}
 \end{equation}
 applied independently across $j$ and $t$. Let $\mathcal{M}$ denote the set of masked positions at step $k$. The denoiser $h_{\psi}$ predicts a categorical distribution over $\mathcal{V}_j$ for each masked token, conditioned on (i) the corrupted discrete sequence, (ii) the diffusion step $k$, and (iii) continuous context. Concretely, we condition on $\hat{\bm{S}}$ and $\hat{\bm{X}}$ to couple supervisory reconstruction to the underlying continuous dynamics:
 \begin{equation}
 p_{\psi}\bigl( y^{(j)}_0 \mid y_k, k, \hat{\bm{S}}, \hat{\bm{X}} \bigr) = h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}}).
 \label{eq:discrete_denoising}
 \end{equation}
 This conditioning choice is motivated by the fact that many discrete ICS states are not standalone, they are functions of regimes, thresholds, and procedural phases that manifest in continuous channels \citep{nist2023sp80082}. Training uses a categorical denoising objective:
 \begin{equation}
 \mathcal{L}_{\text{disc}}(\psi) = \mathbb{E}_{k} \left[ \frac{1}{|\mathcal{M}|} \sum_{(j,t) \in \mathcal{M}} \mathrm{CE}\bigl( h_{\psi}(y_k, k, \hat{\bm{S}}, \hat{\bm{X}})_{j,t},\; y^{(j)}_{0,t} \bigr) \right],
 \label{eq:discrete_loss}
 \end{equation}
 where $\mathrm{CE}(\cdot,\cdot)$ is cross-entropy. At sampling time, we initialize all discrete tokens as $\texttt{[MASK]}$ and iteratively unmask them using the learned conditionals, ensuring that every output token lies in its legal vocabulary by construction. This discrete branch is a key differentiator of our pipeline: unlike typical continuous-only diffusion augmentation in ICS, we integrate masked diffusion as a first-class mechanism for supervisory-variable legality within the same synthesis workflow \citep{shi2024simplified,yuan2025ctu}.
 \subsection{Type-aware decomposition as factorization and routing layer}
 \label{sec:method-types}
 Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some derived tags are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables \citep{nist2023sp80082}.
 We therefore introduce a type-aware decomposition that formalizes this heterogeneity as a routing and constraint layer. Let $\tau(i)\in{1,\dots,6}$ assign each variable $i$ to a type class. For expository convenience, the assignment can be viewed as a mapping $\tau(i)=\mathrm{TypeAssign}(m_i, s_i, d_i)$, where $m_i$, $s_i$, and $d_i$ denote metadata/engineering role, temporal signature, and dependency pattern, respectively. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section. Importantly, this refinement does not change the core diffusion backbone; it changes which mechanism is responsible for which variable, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
 We use the following taxonomy:
 \begin{enumerate}
 	\item Type 1 (program-driven / setpoint-like): externally commanded, step-and-dwell variables. These variables can be treated as exogenous drivers (conditioning signals) or routed to specialized change-point / dwell-time models, rather than being forced into a smooth denoiser that may over-regularize step structure.
 	\item Type 2 (controller outputs): continuous variables tightly coupled to feedback loops; these benefit from conditional modeling where the conditioning includes relevant process variables and commanded setpoints.
 	\item Type 3 (actuator states/positions): often exhibit saturation, dwell, and rate limits; these may require stateful dynamics beyond generic residual diffusion, motivating either specialized conditional modules or additional inductive constraints.
 	\item Type 4 (process variables): inertia-dominated continuous dynamics; these are the primary beneficiaries of the Transformer trend + residual DDPM pipeline.
 	\item Type 5 (derived/deterministic variables): algebraic or rule-based functions of other variables; we enforce deterministic reconstruction $\hat{x}^{(i)} = g_i(\hat{X},\hat{Y})$ rather than learning a stochastic generator, improving logical consistency and sample efficiency.
 	\item Type 6 (auxiliary/low-impact variables): weakly coupled or sparse signals; we allow simplified modeling (e.g., calibrated marginals or lightweight temporal models) to avoid allocating diffusion capacity where it is not warranted.
 \end{enumerate}
 \begin{figure}[H]
  \centering
  \includegraphics[width=0.98\textwidth]{typeclass-cropped.pdf}
  \caption{Type assignment and six-type taxonomy.}
  \label{fig:type_taxonomy}
 \end{figure}
 Figure~\ref{fig:type_taxonomy} visualizes the six-type taxonomy and the routing logic behind it. Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
 From a novelty standpoint, this layer is not merely an engineering patch; it is an explicit methodological statement that ICS synthesis benefits from typed factorization, a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \citep{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
 \subsection{Joint optimization and sampling}
 \label{sec:method-joint}
 We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer $f_{\phi}$ to obtain $\hat{\bm{S}}$; (ii) we compute residual targets $\hat{\bm{R}} = \bm{X} - \hat{\bm{S}}$ for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM $p_{\theta}(\bm{R}\mid \hat{\bm{S}})$ and masked diffusion model $p_{\psi}(\bm{Y}\mid \text{masked}(\bm{Y}), \hat{\bm{S}}, \hat{\bm{X}})$; and (iv) we apply type-aware routing and deterministic reconstruction during sampling. This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate structure-vs.-realism conflicts \citep{kollovieh2023tsdiff,sikder2023transfusion}.
 A simple combined objective is $\mathcal{L} = \lambda\mathcal{L}_{\text{cont}} + (1-\lambda)\mathcal{L}_{\text{disc}}$ with $\lambda\in[0,1]$ controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or as exogenous / specialized state channels (e.g., driver-like or actuator-state variables) are prevented from forcing the diffusion models into statistically incoherent compromises.
 At inference time, generation follows the same structured order: (i) trend $\hat{\bm{S}}$ via the Transformer, (ii) residual $\hat{\bm{R}}$ via DDPM, (iii) discrete $\hat{\bm{Y}}$ via masked diffusion, and (iv) type-aware assembly with deterministic reconstruction for routed variables. This pipeline produces $(\hat{\bm{X}},\hat{\bm{Y}})$ that are temporally coherent by construction (through $\hat{\bm{S}}$), distributionally expressive (through $\hat{\bm{R}}$ denoising), and discretely valid (through masked diffusion), while explicitly accounting for heterogeneous variable-generating mechanisms through type-aware routing. In combination, these choices constitute our central methodological contribution: a unified Transformer + mixed diffusion generator for ICS telemetry, augmented by typed factorization to align model capacity with domain mechanism \citep{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}.
 % 4. Benchmark
 \section{Benchmark}
 \label{sec:benchmark}
 A credible ICS generator must clear three hurdles. It must first be \emph{semantically legal}: any out-of-vocabulary supervisory token renders a sample unusable, regardless of marginal fidelity. It must then match the heterogeneous statistics of mixed-type telemetry, including continuous process channels and discrete supervisory states. Finally, it must preserve \emph{mechanism-level realism}: switch-and-dwell behavior, bounded control motion, cross-tag coordination, and short-horizon persistence. We therefore organize the benchmark as a funnel from legality and reproducibility to structural diagnosis and ablation \citep{coletta2023constrained,yang2001interlock,stenger2024survey}.
 For continuous channels, we use the Kolmogorov--Smirnov (KS) statistic because ICS process signals are often bounded, saturated, heavy-tailed, and plateau-dominated, so moment matching alone is too weak. KS directly compares empirical cumulative distributions, makes no parametric assumption, and is sensitive to support or shape mismatches that are operationally meaningful in telemetry. For discrete channels, realism is primarily about how probability mass is distributed over a finite vocabulary, so we use Jensen--Shannon divergence (JSD) between per-feature categorical marginals and average across discrete variables \citep{lin1991divergence,yoon2019timegan}. To assess short-horizon dynamics, we compare lag-1 autocorrelation feature-wise and report the mean absolute difference between real and synthetic lag-1 coefficients. We also track semantic legality by counting out-of-vocabulary discrete outputs and report a filtered KS that excludes near-constant channels so that trivially flat tags do not dominate the aggregate.
 \subsection{Core fidelity, legality, and reproducibility}
 \label{sec:benchmark-quant}
 Across three independent runs, Mask-DDPM achieves mean KS $=0.3311 \pm 0.0079$, mean JSD $=0.0284 \pm 0.0073$, and mean absolute lag-1 difference $=0.2684 \pm 0.0027$, while maintaining a validity rate of \textbf{100\%} across the modeled discrete channels. The small dispersion across runs shows that mixed-type fidelity is reproducible rather than dependent on a single favorable seed. On a representative diagnostic slice, the model attains mean KS $=0.4025$, filtered mean KS $=0.3191$, mean JSD $=0.0166$, and mean absolute lag-1 difference $=0.2859$, again with zero invalid discrete tokens. The main pattern is that discrete legality is already solved, while continuous mismatch is concentrated in a limited subset of difficult channels rather than spread uniformly across the telemetry space.
 \begin{figure}[htbp]
  \centering
  \includegraphics[width=\textwidth]{fig-benchmark-story-v2.png}
  \caption{Benchmark evidence chain.}
  \label{fig:benchmark_story}
 \end{figure}
 \begin{table}[htbp]
 \centering
 \caption{Core benchmark summary. Lower is better except for validity rate.}
 \label{tab:core_metrics}
 \begin{tabular}{@{}lcc@{}}
 \toprule
 \textbf{Metric} & \textbf{3-run mean $\pm$ std} & \textbf{Diagnostic slice} \\
 \midrule
 Mean KS (continuous) & $0.3311 \pm 0.0079$ & $0.4025$ \\
 Filtered mean KS & -- & $0.3191$ \\
 Mean JSD (discrete) & $0.0284 \pm 0.0073$ & $0.0166$ \\
 Mean abs. $\Delta$ lag-1 autocorr & $0.2684 \pm 0.0027$ & $0.2859$ \\
 Validity rate (26 discrete tags) $\uparrow$ & $100.0 \pm 0.0\%$ & $100.0\%$ \\
 \bottomrule
 \end{tabular}
 \end{table}
 Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis. The left panel shows seed-level stability across the three benchmark runs. The middle panel shows that the dominant continuous mismatch is concentrated in a relatively small subset of control-sensitive variables rather than indicating a global collapse of the generator. The right panel shows that the remaining realism gap is mechanism-specific, with program-like long-dwell behavior and actuator-state occupancy contributing more strongly than PV-like channels on this slice.
 \subsection{Type-aware diagnostics}
 \label{sec:benchmark-typed}
 Type-aware diagnostics make that mechanism gap explicit. Table~\ref{tab:typed_diagnostics} reports one representative statistic per variable family on the same diagnostic slice. Because each family is evaluated with a different proxy, the absolute-error column should be interpreted within type, while the relative-error column is the more comparable cross-type indicator.
 \begin{table}[htbp]
 \centering
 \caption{Type-aware diagnostic summary. Lower values indicate better alignment.}
 \label{tab:typed_diagnostics}
 \begin{tabular}{@{}llcc@{}}
 \toprule
 \textbf{Type} & \textbf{Proxy statistic} & \textbf{Mean abs. error} & \textbf{Mean rel. error} \\
 \midrule
 Program & mean dwell & $318.70$ & $2.19$ \\
 Controller & change rate & $0.104$ & $0.25$ \\
 Actuator & top-3 mass & $0.0615$ & $0.69$ \\
 PV & tail ratio & $1.614$ & $0.20$ \\
 Auxiliary & lag-1 autocorr & $0.125$ & $0.37$ \\
 \bottomrule
 \end{tabular}
 \end{table}
 Program-like channels remain the hardest family by a clear margin: mean-dwell mismatch is still large, indicating that the generator does not yet sustain the long plateaus characteristic of schedule-driven behavior. Actuator channels form the next clear difficulty, while PV channels are the most stable family under this diagnostic. In short, legality is solved, but the remaining realism gap is not uniform across types; it is dominated primarily by long-dwell program behavior and actuator-state occupancy.
 \subsection{Ablation study}
 \label{sec:benchmark-ablation}
 We evaluate ten controlled variants under a shared pipeline and summarize six representative metrics: continuous fidelity (KS), discrete fidelity (JSD), short-horizon dynamics (lag-1), cross-variable coupling, predictive transfer, and downstream anomaly utility. Figure~\ref{fig:ablation_impact} visualizes signed changes relative to the full model, and Table~\ref{tab:ablation} gives the underlying values.
 \begin{figure}[htbp]
  \centering
  \includegraphics[width=\textwidth]{fig-benchmark-ablations-v1.png}
  \caption{Ablation impact.}
  \label{fig:ablation_impact}
 \end{figure}
 \begin{table}[htbp]
 \centering
 \small
 \caption{Ablation study. Lower is better except for anomaly AUPRC.}
 \label{tab:ablation}
 \begin{tabular}{@{}lcccccc@{}}
 \toprule
 \textbf{Variant} & \textbf{KS$\downarrow$} & \textbf{JSD$\downarrow$} & \textbf{Lag-1$\downarrow$} & \textbf{Coupling$\downarrow$} & \textbf{Pred. RMSE$\downarrow$} & \textbf{AUPRC$\uparrow$} \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Full model}} \\
 Full model & $0.402$ & $0.028$ & $0.291$ & $0.215$ & $0.972$ & $0.644$ \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Structure and conditioning}} \\
 No temporal scaffold & $0.408$ & $0.031$ & $0.664$ & $0.306$ & $0.977$ & $0.645$ \\
 No file-level context & $0.405$ & $0.033$ & $0.237$ & $0.262$ & $0.986$ & $0.640$ \\
 No type routing & $0.356$ & $0.022$ & $0.138$ & $0.324$ & $1.017$ & $0.647$ \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Distribution shaping}} \\
 No quantile transform & $0.599$ & $0.010$ & $0.156$ & $0.300$ & $1.653$ & $0.417$ \\
 No post-calibration & $0.543$ & $0.024$ & $0.253$ & $0.249$ & $1.086$ & $0.647$ \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Loss and target design}} \\
 No SNR weighting & $0.400$ & $0.022$ & $0.299$ & $0.214$ & $0.961$ & $0.637$ \\
 No quantile loss & $0.413$ & $0.018$ & $0.311$ & $0.213$ & $0.965$ & $0.645$ \\
 No residual-stat loss & $0.404$ & $0.029$ & $0.285$ & $0.210$ & $0.970$ & $0.647$ \\
 Epsilon target & $0.482$ & $0.102$ & $0.728$ & $0.195$ & $0.968$ & $0.647$ \\
 \bottomrule
 \end{tabular}
 \end{table}
 The ablation results reveal three distinct roles. First, temporal staging is what makes the sequence look dynamical rather than merely plausible frame by frame: removing the temporal scaffold leaves KS nearly unchanged but more than doubles lag-1 error ($0.291 \rightarrow 0.664$) and substantially worsens coupling ($0.215 \rightarrow 0.306$). Second, quantile-based distribution shaping is one of the main contributors to usable continuous realism: without the quantile transform, KS degrades sharply ($0.402 \rightarrow 0.599$), synthetic-only predictive RMSE deteriorates ($0.972 \rightarrow 1.653$), and anomaly utility collapses ($0.644 \rightarrow 0.417$). Third, routing is the key counterexample to one-dimensional evaluation: disabling type routing can improve KS or lag-1 in isolation, yet it worsens coupling ($0.215 \rightarrow 0.324$) and predictive transfer ($0.972 \rightarrow 1.017$), showing that typed decomposition helps preserve coordinated mechanism-level behavior.
 Taken together, the benchmark supports a focused claim. Mask-DDPM already provides stable mixed-type fidelity and perfect discrete legality, while the remaining error is concentrated in ICS-specific channels whose realism depends on rare switching, long dwell intervals, constrained occupancy, and persistent local dynamics.
 % 5. Conclusion and Future Work
 \section{Conclusion and Future Work}
 \label{sec:conclusion}
 This paper addresses data scarcity and sharing barriers in industrial control systems (ICS) security by proposing Mask-DDPM, a hybrid synthetic telemetry generator at the protocol-feature level. By combining a causal Transformer trend module, a trend-conditioned residual DDPM, a masked diffusion branch for discrete variables, and a type-aware routing layer, the framework preserves long-horizon temporal structure, improves local distributional fidelity, and guarantees discrete semantic legality. On windows derived from the HAI Security Dataset, the model achieves stable mixed-type fidelity across seeds, with mean KS = 0.3311 $\pm$ 0.0079 on continuous features, mean JSD = 0.0284 $\pm$ 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027.
 Overall, Mask-DDPM provides a reproducible foundation for shareable, semantically valid ICS feature sequences for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Future work will proceed in two complementary directions. Vertically, we will strengthen the theoretical foundation of the framework by introducing more explicit control-theoretic constraints, structured state-space or causal priors, and formal transition models for supervisory logic, so that legality, stability, and cross-channel coupling can be characterized more rigorously. Horizontally, we will extend the framework beyond the current setting to additional industrial control protocols such as Modbus/TCP, DNP3, IEC 104, and OPC UA, and investigate analogous adaptations to automotive communication protocols such as CAN/CAN FD and automotive Ethernet. A related extension is controllable attack or violation injection on top of legal base traces, enabling reproducible adversarial benchmarks for anomaly detection and intrusion-detection studies.
 \bibliographystyle{splncs04}
 \bibliography{references}
 \end{document}
--- a/LaTeX2e+Proceedings+Templates+download/readme.txt
+++ b/LaTeX2e+Proceedings+Templates+download/readme.txt
@@ -0,0 +1,20 @@
 Dear llncs user,
 The files in this directory belong to the LaTeX2e package
 for Springer's Lecture Notes in Computer Science (LNCS) and
 other proceedings book series.
 It consists of the following files:
  readme.txt         this file
  history.txt        the version history of the package
  llncs.cls          the LaTeX2e document class
  samplepaper.tex    a sample paper
  fig1.eps           a figure used in the sample paper
  llncsdoc.pdf       the documentation of the class (PDF version)
  splncs04.bst       current LNCS BibTeX style with alphabetic sorting
--- a/LaTeX2e+Proceedings+Templates+download/references.bib
+++ b/LaTeX2e+Proceedings+Templates+download/references.bib
@@ -0,0 +1,572 @@
@inproceedings{vaswani2017attention,
 author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {I. Guyon and U. Von Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
 pages = {},
 publisher = {Curran Associates, Inc.},
 title = {Attention is All you Need},
 volume = {30},
 year = {2017}
 }
@inproceedings{ho2020denoising,
 author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {6840--6851},
 publisher = {Curran Associates, Inc.},
 title = {Denoising Diffusion Probabilistic Models},
 volume = {33},
 year = {2020}
 }
@inproceedings{austin2021structured,
 author = {Austin, Jacob and Johnson, Daniel D. and Ho, Jonathan and Tarlow, Daniel and van den Berg, Rianne},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {17981--17993},
 publisher = {Curran Associates, Inc.},
 title = {Structured Denoising Diffusion Models in Discrete State-Spaces},
 volume = {34},
 year = {2021}
 }
@inproceedings{shi2024simplified,
 author = {Shi, Jiaxin and Han, Kehang and Wang, Zhe and Doucet, Arnaud and Titsias, Michalis},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
 pages = {103131--103167},
 publisher = {Curran Associates, Inc.},
 title = {Simplified and Generalized Masked Diffusion for Discrete Data},
 volume = {37},
 year = {2024}
 }
@InProceedings{hang2023efficient,
    author    = {Hang, Tiankai and Gu, Shuyang and Li, Chen and Bao, Jianmin and Chen, Dong and Hu, Han and Geng, Xin and Guo, Baining},
    title     = {Efficient Diffusion Training via Min-SNR Weighting Strategy},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {7441-7451}
 }
@inproceedings{kollovieh2023tsdiff,
 author = {Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang (Bernie)},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {28341--28364},
 publisher = {Curran Associates, Inc.},
 title = {Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting},
 volume = {36},
 year = {2023}
 }
@article{sikder2023transfusion,
 title = {TransFusion: Generating long, high fidelity time series using diffusion models with transformers},
 journal = {Machine Learning with Applications},
 volume = {20},
 pages = {100652},
 year = {2025},
 issn = {2666-8270},
 author = {Md Fahim Sikder and Resmi Ramachandranpillai and Fredrik Heintz},
 keywords = {Time series generation, Generative models, Diffusion models, Synthetic data, Long-sequenced data},
 abstract = {The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. TransFusion is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.}
 }
@misc{song2021score,
      title={Score-Based Generative Modeling through Stochastic Differential Equations},
      author={Yang Song and Jascha Sohl-Dickstein and Diederik P. Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},
      year={2021},
      eprint={2011.13456},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
 }
@misc{shi2025tabdiff,
      title={TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation},
      author={Juntong Shi and Minkai Xu and Harper Hua and Hengrui Zhang and Stefano Ermon and Jure Leskovec},
      year={2025},
      eprint={2410.20626},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
 }
@inproceedings{yuan2025ctu,
 author = {Yuan, Yusong and Sha, Yun and Zhao, Haidong},
 title = {CTU-DDPM: Generating Industrial Control System Time-Series Data with a CNN-Transformer Hybrid Diffusion Model},
 year = {2025},
 isbn = {9798400721007},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 abstract = {The security of Industrial Control Systems (ICS) is of paramount importance to national security. Anomaly detection, as a critical security measure, can effectively identify attack behaviors targeting ICS. However, the performance of anomaly detection methods is highly dependent on high-quality datasets, and real anomalous data, in particular, is often difficult to obtain due to its sensitive security implications. To address this challenge, this paper proposes CTU-DDPM, a method for generating multivariate time series data based on Diffusion Models. Our aim is to Generate high-quality industrial control time series data to enhance the performance of anomaly detection methods. This research constructs a diffusion model that fuses a Convolutional Neural Network (CNN) and a Transformer architecture. This hybrid approach is designed to achieve more precise and realistic data generation in complex industrial time series, thereby effectively compensating for the scarcity of authentic anomalous data and providing crucial data support for ICS security.},
 booktitle = {Proceedings of the 2025 International Symposium on Artificial Intelligence and Computational Social Sciences},
 pages = {547–552},
 numpages = {6},
 keywords = {Convolutional Neural Network, Diffusion Model, Generation, Industrial Control Systems, Time Series Data, Transformer},
 location = {
 },
 series = {AICSS '25}
 }
@misc{sha2026ddpm,
  title={DDPM Fusing Mamba and Adaptive Attention: An Augmentation Method for Industrial Control Systems Anomaly Data},
  author={Sha, Yun and Yuan, Yusong and Wu, Yonghao and Zhao, Haidong},
  year={2026},
  month={jan},
  note={SSRN Electronic Journal},
  eprint={6055903},
  archivePrefix={SSRN},
 }
@techreport{nist2023sp80082,
  title={Guide to Operational Technology (OT) Security},
  author={{National Institute of Standards and Technology}},
  institution={NIST},
  type={Special Publication},
  number={800-82 Rev. 3},
  year={2023},
  month={sep}
 }
@article{10.1007/s10844-022-00753-1,
  title={Machine learning in industrial control system (ICS) security: current landscape, opportunities and challenges},
  author={Koay, Abigail MY and Ko, Ryan K L and Hettema, Hinne and Radke, Kenneth},
  journal={Journal of Intelligent Information Systems},
  volume={60},
  number={2},
  pages={377--405},
  year={2023},
  publisher={Springer}
 }
@Article{Nankya2023-gp,
 AUTHOR = {Nankya, Mary and Chataut, Robin and Akl, Robert},
 TITLE = {Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense Strategies},
 JOURNAL = {Sensors},
 VOLUME = {23},
 YEAR = {2023},
 NUMBER = {21},
 ARTICLE-NUMBER = {8840},
 PubMedID = {37960539},
 ISSN = {1424-8220},
 ABSTRACT = {Industrial Control Systems (ICS), which include Supervisory Control and Data Acquisition (SCADA) systems, Distributed Control Systems (DCS), and Programmable Logic Controllers (PLC), play a crucial role in managing and regulating industrial processes. However, ensuring the security of these systems is of utmost importance due to the potentially severe consequences of cyber attacks. This article presents an overview of ICS security, covering its components, protocols, industrial applications, and performance aspects. It also highlights the typical threats and vulnerabilities faced by these systems. Moreover, the article identifies key factors that influence the design decisions concerning control, communication, reliability, and redundancy properties of ICS, as these are critical in determining the security needs of the system. The article outlines existing security countermeasures, including network segmentation, access control, patch management, and security monitoring. Furthermore, the article explores the integration of machine learning techniques to enhance the cybersecurity of ICS. Machine learning offers several advantages, such as anomaly detection, threat intelligence analysis, and predictive maintenance. However, combining machine learning with other security measures is essential to establish a comprehensive defense strategy for ICS. The article also addresses the challenges associated with existing measures and provides recommendations for improving ICS security. This paper becomes a valuable reference for researchers aiming to make meaningful contributions within the constantly evolving ICS domain by providing an in-depth examination of the present state, challenges, and potential future advancements.},
 }
@misc{shin,
  title = {HAI Security Dataset},
  publisher = {Kaggle},
  author = {Shin, Hyeok-Ki and Lee, Woomyo and Choi, Seungoh and Yun, Jeong-Han and Min, Byung Gil and Kim, HyoungChun},
  year = {2023}
 }
@Article{info16100910,
 AUTHOR = {Ali, Jokha and Ali, Saqib and Al Balushi, Taiseera and Nadir, Zia},
 TITLE = {Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning},
 JOURNAL = {Information},
 VOLUME = {16},
 YEAR = {2025},
 NUMBER = {10},
 ARTICLE-NUMBER = {910},
 ISSN = {2078-2489},
 ABSTRACT = {Securing Industrial Control Systems (ICSs) is critical, but it is made challenging by the constant evolution of cyber threats and the scarcity of labeled attack data in these specialized environments. Standard intrusion detection systems (IDSs) often fail to adapt when transferred to new networks with limited data. To address this, this paper introduces an adaptive intrusion detection framework that combines a hybrid Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) model with a novel transfer learning strategy. We employ a Reinforcement Learning (RL) agent to intelligently guide the fine-tuning process, which allows the IDS to dynamically adjust its parameters such as layer freezing and learning rates in real-time based on performance feedback. We evaluated our system in a realistic data-scarce scenario using only 50 labeled training samples. Our RL-Guided model achieved a final F1-score of 0.9825, significantly outperforming a standard neural fine-tuning model (0.861) and a target baseline model (0.759). Analysis of the RL agent’s behavior confirmed that it learned a balanced and effective policy for adapting the model to the target domain. We conclude that the proposed RL-guided approach creates a highly accurate and adaptive IDS that overcomes the limitations of static transfer learning methods. This dynamic fine-tuning strategy is a powerful and promising direction for building resilient cybersecurity defenses for critical infrastructure.},
 }
@InProceedings{pmlr-v202-kotelnikov23a,
  title =          {{T}ab{DDPM}: Modelling Tabular Data with Diffusion Models},
  author =       {Kotelnikov, Akim and Baranchuk, Dmitry and Rubachev, Ivan and Babenko, Artem},
  booktitle =          {Proceedings of the 40th International Conference on Machine Learning},
  pages =          {17564--17579},
  year =          {2023},
  editor =          {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume =          {202},
  series =          {Proceedings of Machine Learning Research},
  month =          {23--29 Jul},
  publisher =    {PMLR},
  abstract =          {Denoising diffusion probabilistic models are becoming the leading generative modeling paradigm for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling since the individual features can be of a completely different nature, i.e., some of them can be continuous and some can be discrete. To address such data types, we introduce TabDDPM — a diffusion model that can be universally applied to any tabular dataset and handles any feature types. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields.}
 }
@InProceedings{rasul2021autoregressivedenoisingdiffusionmodels,
  title = 	 {Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting},
  author =       {Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8857--8868},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  abstract = 	 {In this work, we propose TimeGrad, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.}
 }
@article{jiang2023netdiffusionnetworkdataaugmentation,
 author = {Jiang, Xi and Liu, Shinan and Gember-Jacobson, Aaron and Bhagoji, Arjun Nitin and Schmitt, Paul and Bronzino, Francesco and Feamster, Nick},
 title = {NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation},
 year = {2024},
 issue_date = {March 2024},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 volume = {8},
 number = {1},
 abstract = {Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flow statistics or a few selected packet attributes, do not always suffice, especially when model training relies on having features that are only available from packet traces. This shortfall manifests in both insufficient statistical resemblance to real traces and suboptimal performance on ML tasks when employed for data augmentation. In this paper, we apply diffusion models to generate high-resolution synthetic network traffic traces. We present NetDiffusion1, a tool that uses a finely-tuned, controlled variant of a Stable Diffusion model to generate synthetic network traffic that is high fidelity and conforms to protocol specifications. Our evaluation demonstrates that packet captures generated from NetDiffusion can achieve higher statistical similarity to real data and improved ML model performance than current state-of-the-art approaches (e.g., GAN-based approaches). Furthermore, our synthetic traces are compatible with common network analysis tools and support a myriad of network tasks, suggesting that NetDiffusion can serve a broader spectrum of network analysis and testing tasks, extending beyond ML-centric applications.},
 journal = {Proc. ACM Meas. Anal. Comput. Syst.},
 month = feb,
 articleno = {11},
 numpages = {32},
 keywords = {diffusion model, network traffic, synthesis}
 }
@article{Ring_2019,
 title = {Flow-based network traffic generation using Generative Adversarial Networks},
 journal = {Computers \& Security},
 volume = {82},
 pages = {156-172},
 year = {2019},
 issn = {0167-4048},
 author = {Markus Ring and Daniel Schlör and Dieter Landes and Andreas Hotho},
 keywords = {GANs, TTUR WGAN-GP, NetFlow, Generation, IDS},
 abstract = {Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.}
 }
@inproceedings{10.1145/3544216.3544251,
 author = {Yin, Yucheng and Lin, Zinan and Jin, Minhao and Fanti, Giulia and Sekar, Vyas},
 title = {Practical GAN-based synthetic IP header trace generation using NetShare},
 year = {2022},
 isbn = {9781450394208},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 abstract = {We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46\% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.},
 booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
 pages = {458–472},
 numpages = {15},
 keywords = {synthetic data generation, privacy, network packets, network flows, generative adversarial networks},
 location = {Amsterdam, Netherlands},
 series = {SIGCOMM '22}
 }
@inproceedings{Lin_2020,
 author = {Lin, Zinan and Jain, Alankar and Wang, Chen and Fanti, Giulia and Sekar, Vyas},
 title = {Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions},
 year = {2020},
 isbn = {9781450381383},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 abstract = {Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43\% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.},
 booktitle = {Proceedings of the ACM Internet Measurement Conference},
 pages = {464–483},
 numpages = {20},
 keywords = {generative adversarial networks, privacy, synthetic data generation, time series},
 location = {Virtual Event, USA},
 series = {IMC '20}
 }
@INPROCEEDINGS{7469060,
  author={Mathur, Aditya P. and Tippenhauer, Nils Ole},
  booktitle={2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater)},
  title={SWaT: a water treatment testbed for research and training on ICS security},
  year={2016},
  volume={},
  number={},
  pages={31-36},
  keywords={Sensors;Actuators;Feeds;Process control;Chemicals;Chemical sensors;Security;Cyber Physical Systems;Industrial Control Systems;Cyber Attacks;Cyber Defense;Water Testbed},
 }
@inproceedings{10.1145/3055366.3055375,
 author = {Ahmed, Chuadhry Mujeeb and Palleti, Venkata Reddy and Mathur, Aditya P.},
 title = {WADI: a water distribution testbed for research in the design of secure cyber physical systems},
 year = {2017},
 isbn = {9781450349758},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 abstract = {The architecture of a water distribution testbed (WADI), and on-going research in the design of secure water distribution system is presented. WADI consists of three stages controlled by Programmable Logic Controllers (PLCs) and two stages controlled via Remote Terminal Units (RTUs). Each PLC and RTU uses sensors to estimate the system state and the actuators to effect control. WADI is currently used to (a) conduct security analysis for water distribution networks, (b) experimentally assess detection mechanisms for potential cyber and physical attacks, and (c) understand how the impact of an attack on one CPS could cascade to other connected CPSs. The cascading effects of attacks can be studied in WADI through its connection to two other testbeds, namely for water treatment and power generation and distribution.},
 booktitle = {Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks},
 pages = {25–28},
 numpages = {4},
 keywords = {attack detection, cyber physical systems, cyber security, industrial control systems, water distribution testbed},
 location = {Pittsburgh, Pennsylvania},
 series = {CySWATER '17}
 }
@inproceedings{tashiro2021csdiconditionalscorebaseddiffusion,
 author = {Tashiro, Yusuke and Song, Jiaming and Song, Yang and Ermon, Stefano},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {24804--24816},
 publisher = {Curran Associates, Inc.},
 title = {CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation},
 volume = {34},
 year = {2021}
 }
@inproceedings{wen2024diffstgprobabilisticspatiotemporalgraph,
 author = {Wen, Haomin and Lin, Youfang and Xia, Yutong and Wan, Huaiyu and Wen, Qingsong and Zimmermann, Roger and Liang, Yuxuan},
 title = {DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting  with Denoising Diffusion Models},
 year = {2023},
 isbn = {9798400701689},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 abstract = {Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal graph (STG) forecasting. Despite their success, they fail to model intrinsic uncertainties within STG data, which cripples their practicality in downstream tasks for decision-making. To this end, this paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex ST dependencies. In this study, we present the first attempt to generalize the popular de-noising diffusion probabilistic models to STGs, leading to a novel non-autoregressive framework called DiffSTG, along with the first denoising network UGnet for STG in the framework. Our approach combines the spatio-temporal learning capabilities of STGNNs with the uncertainty measurements of diffusion models. Extensive experiments validate that DiffSTG reduces the Continuous Ranked Probability Score (CRPS) by 4\%-14\%, and Root Mean Squared Error (RMSE) by 2\%-7\% over existing methods on three real-world datasets.},
 booktitle = {Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems},
 articleno = {60},
 numpages = {12},
 keywords = {spatio-temporal graph forecasting, probabilistic forecasting, diffusion model},
 location = {Hamburg, Germany},
 series = {SIGSPATIAL '23}
 }
@INPROCEEDINGS{liu2023pristiconditionaldiffusionframework,
  author={Liu, Mingzhe and Huang, Han and Feng, Hao and Sun, Leilei and Du, Bowen and Fu, Yanjie},
  booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},
  title={PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation},
  year={2023},
  volume={},
  number={},
  pages={1927-1939},
  keywords={Correlation;Scalability;Transforms;Predictive models;Feature extraction;Propagation losses;Probabilistic logic;Spatiotemporal Imputation;Diffusion Model;Spatiotemporal Dependency Learning},}
@misc{kong2021diffwaveversatilediffusionmodel,
      title={DiffWave: A Versatile Diffusion Model for Audio Synthesis},
      author={Zhifeng Kong and Wei Ping and Jiaji Huang and Kexin Zhao and Bryan Catanzaro},
      year={2021},
      eprint={2009.09761},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
 }
@ARTICLE{11087622,
  author={Liu, Xiaosi and Xu, Xiaowen and Liu, Zhidan and Li, Zhenjiang and Wu, Kaishun},
  journal={IEEE Transactions on Mobile Computing},
  title={Spatio-Temporal Diffusion Model for Cellular Traffic Generation},
  year={2026},
  volume={25},
  number={1},
  pages={257-271},
  keywords={Base stations;Diffusion models;Data models;Uncertainty;Predictive models;Generative adversarial networks;Knowledge graphs;Mobile computing;Telecommunication traffic;Semantics;Cellular traffic;data generation;diffusion model;spatio-temporal graph},
 }
@inproceedings{hoogeboom2021argmaxflowsmultinomialdiffusion,
 author = {Hoogeboom, Emiel and Nielsen, Didrik and Jaini, Priyank and Forr\'{e}, Patrick and Welling, Max},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {12454--12465},
 publisher = {Curran Associates, Inc.},
 title = {Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions},
 volume = {34},
 year = {2021}
 }
@inproceedings{li2022diffusionlmimprovescontrollabletext,
 author = {Li, Xiang and Thickstun, John and Gulrajani, Ishaan and Liang, Percy S and Hashimoto, Tatsunori B},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
 pages = {4328--4343},
 publisher = {Curran Associates, Inc.},
 title = {Diffusion-LM Improves Controllable Text Generation},
 volume = {35},
 year = {2022}
 }
@ARTICLE{meng2025aflnetyearslatercoverageguided,
  author={Meng, Ruijie and Pham, Van-Thuan and Böhme, Marcel and Roychoudhury, Abhik},
  journal={IEEE Transactions on Software Engineering},
  title={AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing},
  year={2025},
  volume={51},
  number={4},
  pages={960-974},
  keywords={Protocols;Servers;Fuzzing;Codes;Security;Data models;Source coding;Computer bugs;Software systems;Reliability;Greybox fuzzing;network protocol testing;stateful fuzzing},
 }
@INPROCEEDINGS{godefroid2017learnfuzzmachinelearninginput,
  author={Godefroid, Patrice and Peleg, Hila and Singh, Rishabh},
  booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)},
  title={Learn\&Fuzz: Machine learning for input fuzzing},
  year={2017},
  volume={},
  number={},
  pages={50-59},
  keywords={Portable document format;Grammar;Training;Probability distribution;Recurrent neural networks;Fuzzing;Deep Learning;Grammar-based Fuzzing;Grammar Learning},
 }
@INPROCEEDINGS{she2019neuzzefficientfuzzingneural,
  author={She, Dongdong and Pei, Kexin and Epstein, Dave and Yang, Junfeng and Ray, Baishakhi and Jana, Suman},
  booktitle={2019 IEEE Symposium on Security and Privacy (SP)},
  title={NEUZZ: Efficient Fuzzing with Neural Program Smoothing},
  year={2019},
  volume={},
  number={},
  pages={803-817},
  keywords={Optimization;Fuzzing;Computer bugs;Artificial neural networks;Smoothing methods;Evolutionary computation;fuzzing;-neural-program-smoothing;-gradient-guided-mutation},
 }
@inproceedings{dai2019transformerxlattentivelanguagemodels,
    title = "Transformer-{XL}: Attentive Language Models beyond a Fixed-Length Context",
    author = "Dai, Zihang  and
      Yang, Zhilin  and
      Yang, Yiming  and
      Carbonell, Jaime  and
      Le, Quoc  and
      Salakhutdinov, Ruslan",
    editor = "Korhonen, Anna  and
      Traum, David  and
      M{\`a}rquez, Llu{\'i}s",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    pages = "2978--2988",
    abstract = "Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80{\%} longer than RNNs and 450{\%} longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch."
 }
@article{zhou2021informerefficienttransformerlong,
    title={Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
    volume={35},
    abstractNote={Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences’ dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.},
    number={12},
    journal={Proceedings of the AAAI Conference on Artificial Intelligence},
    author={Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai},
    year={2021},
    month={May},
    pages={11106-11115}
 }
@inproceedings{wu2022autoformerdecompositiontransformersautocorrelation,
 author = {Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {22419--22430},
 publisher = {Curran Associates, Inc.},
 title = {Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting},
 volume = {34},
 year = {2021}
 }
@InProceedings{zhou2022fedformerfrequencyenhanceddecomposed,
  title = 	 {{FED}former: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting},
  author =       {Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {27268--27286},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/zhou22g/zhou22g.pdf},
  abstract = 	 {Long-term time series forecasting is challenging since prediction accuracy tends to decrease dramatically with the increasing horizon. Although Transformer-based methods have significantly improved state-of-the-art results for long-term forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in a well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer (FEDformer), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, Fedformer can reduce prediction error by 14.8% and 22.6% for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.}
 }
@article{2023,
   title={A Note on Extremal Sombor Indices of Trees with a Given Degree Sequence},
   volume={90},
   ISSN={0340-6253},
   number={1},
   journal={Match Communications in Mathematical and in Computer Chemistry},
   publisher={University Library in Kragujevac},
   author={Damjanović, Ivan and Milošević, Marko and Stevanović, Dragan},
   year={2023},
   pages={197–202}
 }
@article{stenger2024survey,
  title={Evaluation is key: a survey on evaluation measures for synthetic time series},
  author={Stenger, Michael and Leppich, Robert and Foster, Ian T and Kounev, Samuel and Bauer, Andre},
  journal={Journal of Big Data},
  volume={11},
  number={1},
  pages={66},
  year={2024},
  publisher={Springer}
 }
@ARTICLE{lin1991divergence,
  author={Lin, J.},
  journal={IEEE Transactions on Information Theory},
  title={Divergence measures based on the Shannon entropy},
  year={1991},
  volume={37},
  number={1},
  pages={145-151},
  keywords={Entropy;Probability distribution;Upper bound;Pattern analysis;Signal analysis;Signal processing;Pattern recognition;Taxonomy;Genetics;Computer science},
 }
@inproceedings{yoon2019timegan,
 author = {Yoon, Jinsung and Jarrett, Daniel and van der Schaar, Mihaela},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
 pages = {},
 publisher = {Curran Associates, Inc.},
 title = {Time-series Generative Adversarial Networks},
 volume = {32},
 year = {2019}
 }
@inproceedings{10.1145/3490354.3494393,
 author = {Ni, Hao and Szpruch, Lukasz and Sabate-Vidales, Marc and Xiao, Baoren and Wiese, Magnus and Liao, Shujian},
 title = {Sig-wasserstein GANs for time series generation},
 year = {2022},
 isbn = {9781450391481},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 abstract = {Synthetic data is an emerging technology that can significantly accelerate the development and deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed signature W1 metric. The former are the Logsig-RNN models based on the stochastic differential equations, whereas the latter originates from the universal and principled mathematical features to characterize the measure induced by time series. SigWGAN allows turning computationally challenging GAN min-max problem into supervised learning while generating high fidelity samples. We validate the proposed model on both synthetic data generated by popular quantitative risk models and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-Wasserstein-GANs.git},
 booktitle = {Proceedings of the Second ACM International Conference on AI in Finance},
 articleno = {28},
 numpages = {8},
 keywords = {signatures, neural networks, generative modelling},
 location = {Virtual Event},
 series = {ICAIF '21}
 }
@inproceedings{coletta2023constrained,
 author = {Coletta, Andrea and Gopalakrishnan, Sriram and Borrajo, Daniel and Vyetrenko, Svitlana},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {61048--61059},
 publisher = {Curran Associates, Inc.},
 title = {On the Constrained Time-Series Generation Problem},
 volume = {36},
 year = {2023}
 }
@article{yang2001interlock,
 title = {Automatic verification of safety interlock systems for industrial processes},
 journal = {Journal of Loss Prevention in the Process Industries},
 volume = {14},
 number = {5},
 pages = {379-386},
 year = {2001},
 issn = {0950-4230},
 author = {S.H. Yang and L.S. Tan and C.H. He},
 keywords = {Safety interlock system, Symbolic model checking, Safety verification, Industrial processes},
 abstract = {The safety interlock system (SIS) is one of the most important protective measurements in industrial processes that provide automatic actions to correct an abnormal plant event. This paper considers the use of formal techniques based on symbolic model checking and computation tree logic (CTL) in the specification to automatically verify the SIS for industrial processes. It addresses the problem of modelling industrial processes and presenting the SIS in CTL. It shows how symbolic model checking can be used efficiently in the verification of a SIS. A transferring system for a penicillin process is used as a case study.}
 }
@article{10.1145/1151659.1159928,
 author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
 title = {Realistic and responsive network traffic generation},
 year = {2006},
 issue_date = {October 2006},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 volume = {36},
 number = {4},
 issn = {0146-4833},
 abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
 journal = {SIGCOMM Comput. Commun. Rev.},
 month = aug,
 pages = {111–122},
 numpages = {12},
 keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets}
 }
@inproceedings{nie2023patchtst,
  title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
  author={Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2023},
 }
--- a/LaTeX2e+Proceedings+Templates+download/references_backup_url.bib
+++ b/LaTeX2e+Proceedings+Templates+download/references_backup_url.bib
@@ -0,0 +1,633 @@
@inproceedings{vaswani2017attention,
 author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, \L ukasz and Polosukhin, Illia},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {I. Guyon and U. Von Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
 pages = {},
 publisher = {Curran Associates, Inc.},
 title = {Attention is All you Need},
 url = {https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf},
 volume = {30},
 year = {2017}
 }
@inproceedings{ho2020denoising,
 author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {6840--6851},
 publisher = {Curran Associates, Inc.},
 title = {Denoising Diffusion Probabilistic Models},
 url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
 volume = {33},
 year = {2020}
 }
@inproceedings{austin2021structured,
 author = {Austin, Jacob and Johnson, Daniel D. and Ho, Jonathan and Tarlow, Daniel and van den Berg, Rianne},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {17981--17993},
 publisher = {Curran Associates, Inc.},
 title = {Structured Denoising Diffusion Models in Discrete State-Spaces},
 url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/958c530554f78bcd8e97125b70e6973d-Paper.pdf},
 volume = {34},
 year = {2021}
 }
@inproceedings{shi2024simplified,
 author = {Shi, Jiaxin and Han, Kehang and Wang, Zhe and Doucet, Arnaud and Titsias, Michalis},
 booktitle = {Advances in Neural Information Processing Systems},
 doi = {10.52202/079017-3277},
 editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
 pages = {103131--103167},
 publisher = {Curran Associates, Inc.},
 title = {Simplified and Generalized Masked Diffusion for Discrete Data},
 url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/bad233b9849f019aead5e5cc60cef70f-Paper-Conference.pdf},
 volume = {37},
 year = {2024}
 }
@InProceedings{hang2023efficient,
    author    = {Hang, Tiankai and Gu, Shuyang and Li, Chen and Bao, Jianmin and Chen, Dong and Hu, Han and Geng, Xin and Guo, Baining},
    title     = {Efficient Diffusion Training via Min-SNR Weighting Strategy},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {7441-7451}
 }
@inproceedings{kollovieh2023tsdiff,
 author = {Kollovieh, Marcel and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang (Bernie)},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {28341--28364},
 publisher = {Curran Associates, Inc.},
 title = {Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/5a1a10c2c2c9b9af1514687bc24b8f3d-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
 }
@article{sikder2023transfusion,
 title = {TransFusion: Generating long, high fidelity time series using diffusion models with transformers},
 journal = {Machine Learning with Applications},
 volume = {20},
 pages = {100652},
 year = {2025},
 issn = {2666-8270},
 doi = {https://doi.org/10.1016/j.mlwa.2025.100652},
 url = {https://www.sciencedirect.com/science/article/pii/S2666827025000350},
 author = {Md Fahim Sikder and Resmi Ramachandranpillai and Fredrik Heintz},
 keywords = {Time series generation, Generative models, Diffusion models, Synthetic data, Long-sequenced data},
 abstract = {The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. TransFusion is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.}
 }
@misc{song2021score,
      title={Score-Based Generative Modeling through Stochastic Differential Equations},
      author={Yang Song and Jascha Sohl-Dickstein and Diederik P. Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},
      year={2021},
      eprint={2011.13456},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2011.13456},
 }
@misc{shi2025tabdiff,
      title={TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation},
      author={Juntong Shi and Minkai Xu and Harper Hua and Hengrui Zhang and Stefano Ermon and Jure Leskovec},
      year={2025},
      eprint={2410.20626},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.20626},
 }
@inproceedings{yuan2025ctu,
 author = {Yuan, Yusong and Sha, Yun and Zhao, Haidong},
 title = {CTU-DDPM: Generating Industrial Control System Time-Series Data with a CNN-Transformer Hybrid Diffusion Model},
 year = {2025},
 isbn = {9798400721007},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 url = {https://doi.org/10.1145/3776759.3776845},
 doi = {10.1145/3776759.3776845},
 abstract = {The security of Industrial Control Systems (ICS) is of paramount importance to national security. Anomaly detection, as a critical security measure, can effectively identify attack behaviors targeting ICS. However, the performance of anomaly detection methods is highly dependent on high-quality datasets, and real anomalous data, in particular, is often difficult to obtain due to its sensitive security implications. To address this challenge, this paper proposes CTU-DDPM, a method for generating multivariate time series data based on Diffusion Models. Our aim is to Generate high-quality industrial control time series data to enhance the performance of anomaly detection methods. This research constructs a diffusion model that fuses a Convolutional Neural Network (CNN) and a Transformer architecture. This hybrid approach is designed to achieve more precise and realistic data generation in complex industrial time series, thereby effectively compensating for the scarcity of authentic anomalous data and providing crucial data support for ICS security.},
 booktitle = {Proceedings of the 2025 International Symposium on Artificial Intelligence and Computational Social Sciences},
 pages = {547–552},
 numpages = {6},
 keywords = {Convolutional Neural Network, Diffusion Model, Generation, Industrial Control Systems, Time Series Data, Transformer},
 location = {
 },
 series = {AICSS '25}
 }
@misc{sha2026ddpm,
  title={DDPM Fusing Mamba and Adaptive Attention: An Augmentation Method for Industrial Control Systems Anomaly Data},
  author={Sha, Yun and Yuan, Yusong and Wu, Yonghao and Zhao, Haidong},
  year={2026},
  month={jan},
  note={SSRN Electronic Journal},
  eprint={6055903},
  archivePrefix={SSRN},
  doi={10.2139/ssrn.6055903},
  url={https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055903}
 }
@techreport{nist2023sp80082,
  title={Guide to Operational Technology (OT) Security},
  author={{National Institute of Standards and Technology}},
  institution={NIST},
  type={Special Publication},
  number={800-82 Rev. 3},
  year={2023},
  month={sep},
  doi={10.6028/NIST.SP.800-82r3},
  url={https://csrc.nist.gov/pubs/sp/800/82/r3/final}
 }
@article{10.1007/s10844-022-00753-1,
  title={Machine learning in industrial control system (ICS) security: current landscape, opportunities and challenges},
  author={Koay, Abigail MY and Ko, Ryan K L and Hettema, Hinne and Radke, Kenneth},
  journal={Journal of Intelligent Information Systems},
  volume={60},
  number={2},
  pages={377--405},
  year={2023},
  publisher={Springer}
 }
@Article{Nankya2023-gp,
 AUTHOR = {Nankya, Mary and Chataut, Robin and Akl, Robert},
 TITLE = {Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense Strategies},
 JOURNAL = {Sensors},
 VOLUME = {23},
 YEAR = {2023},
 NUMBER = {21},
 ARTICLE-NUMBER = {8840},
 URL = {https://www.mdpi.com/1424-8220/23/21/8840},
 PubMedID = {37960539},
 ISSN = {1424-8220},
 ABSTRACT = {Industrial Control Systems (ICS), which include Supervisory Control and Data Acquisition (SCADA) systems, Distributed Control Systems (DCS), and Programmable Logic Controllers (PLC), play a crucial role in managing and regulating industrial processes. However, ensuring the security of these systems is of utmost importance due to the potentially severe consequences of cyber attacks. This article presents an overview of ICS security, covering its components, protocols, industrial applications, and performance aspects. It also highlights the typical threats and vulnerabilities faced by these systems. Moreover, the article identifies key factors that influence the design decisions concerning control, communication, reliability, and redundancy properties of ICS, as these are critical in determining the security needs of the system. The article outlines existing security countermeasures, including network segmentation, access control, patch management, and security monitoring. Furthermore, the article explores the integration of machine learning techniques to enhance the cybersecurity of ICS. Machine learning offers several advantages, such as anomaly detection, threat intelligence analysis, and predictive maintenance. However, combining machine learning with other security measures is essential to establish a comprehensive defense strategy for ICS. The article also addresses the challenges associated with existing measures and provides recommendations for improving ICS security. This paper becomes a valuable reference for researchers aiming to make meaningful contributions within the constantly evolving ICS domain by providing an in-depth examination of the present state, challenges, and potential future advancements.},
 DOI = {10.3390/s23218840}
 }
@misc{shin,
  title = {HAI Security Dataset},
  url = {https://www.kaggle.com/dsv/5821622},
  doi = {10.34740/kaggle/dsv/5821622},
  publisher = {Kaggle},
  author = {Shin, Hyeok-Ki and Lee, Woomyo and Choi, Seungoh and Yun, Jeong-Han and Min, Byung Gil and Kim, HyoungChun},
  year = {2023}
 }
@Article{info16100910,
 AUTHOR = {Ali, Jokha and Ali, Saqib and Al Balushi, Taiseera and Nadir, Zia},
 TITLE = {Intrusion Detection in Industrial Control Systems Using Transfer Learning Guided by Reinforcement Learning},
 JOURNAL = {Information},
 VOLUME = {16},
 YEAR = {2025},
 NUMBER = {10},
 ARTICLE-NUMBER = {910},
 URL = {https://www.mdpi.com/2078-2489/16/10/910},
 ISSN = {2078-2489},
 ABSTRACT = {Securing Industrial Control Systems (ICSs) is critical, but it is made challenging by the constant evolution of cyber threats and the scarcity of labeled attack data in these specialized environments. Standard intrusion detection systems (IDSs) often fail to adapt when transferred to new networks with limited data. To address this, this paper introduces an adaptive intrusion detection framework that combines a hybrid Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) model with a novel transfer learning strategy. We employ a Reinforcement Learning (RL) agent to intelligently guide the fine-tuning process, which allows the IDS to dynamically adjust its parameters such as layer freezing and learning rates in real-time based on performance feedback. We evaluated our system in a realistic data-scarce scenario using only 50 labeled training samples. Our RL-Guided model achieved a final F1-score of 0.9825, significantly outperforming a standard neural fine-tuning model (0.861) and a target baseline model (0.759). Analysis of the RL agent’s behavior confirmed that it learned a balanced and effective policy for adapting the model to the target domain. We conclude that the proposed RL-guided approach creates a highly accurate and adaptive IDS that overcomes the limitations of static transfer learning methods. This dynamic fine-tuning strategy is a powerful and promising direction for building resilient cybersecurity defenses for critical infrastructure.},
 DOI = {10.3390/info16100910}
 }
@InProceedings{pmlr-v202-kotelnikov23a,
  title =          {{T}ab{DDPM}: Modelling Tabular Data with Diffusion Models},
  author =       {Kotelnikov, Akim and Baranchuk, Dmitry and Rubachev, Ivan and Babenko, Artem},
  booktitle =          {Proceedings of the 40th International Conference on Machine Learning},
  pages =          {17564--17579},
  year =          {2023},
  editor =          {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume =          {202},
  series =          {Proceedings of Machine Learning Research},
  month =          {23--29 Jul},
  publisher =    {PMLR},
  pdf =          {https://proceedings.mlr.press/v202/kotelnikov23a/kotelnikov23a.pdf},
  url =          {https://proceedings.mlr.press/v202/kotelnikov23a.html},
  abstract =          {Denoising diffusion probabilistic models are becoming the leading generative modeling paradigm for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling since the individual features can be of a completely different nature, i.e., some of them can be continuous and some can be discrete. To address such data types, we introduce TabDDPM — a diffusion model that can be universally applied to any tabular dataset and handles any feature types. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields.}
 }
@InProceedings{rasul2021autoregressivedenoisingdiffusionmodels,
  title = 	 {Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting},
  author =       {Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8857--8868},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/rasul21a/rasul21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/rasul21a.html},
  abstract = 	 {In this work, we propose TimeGrad, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.}
 }
@article{jiang2023netdiffusionnetworkdataaugmentation,
 author = {Jiang, Xi and Liu, Shinan and Gember-Jacobson, Aaron and Bhagoji, Arjun Nitin and Schmitt, Paul and Bronzino, Francesco and Feamster, Nick},
 title = {NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation},
 year = {2024},
 issue_date = {March 2024},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 volume = {8},
 number = {1},
 url = {https://doi.org/10.1145/3639037},
 doi = {10.1145/3639037},
 abstract = {Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flow statistics or a few selected packet attributes, do not always suffice, especially when model training relies on having features that are only available from packet traces. This shortfall manifests in both insufficient statistical resemblance to real traces and suboptimal performance on ML tasks when employed for data augmentation. In this paper, we apply diffusion models to generate high-resolution synthetic network traffic traces. We present NetDiffusion1, a tool that uses a finely-tuned, controlled variant of a Stable Diffusion model to generate synthetic network traffic that is high fidelity and conforms to protocol specifications. Our evaluation demonstrates that packet captures generated from NetDiffusion can achieve higher statistical similarity to real data and improved ML model performance than current state-of-the-art approaches (e.g., GAN-based approaches). Furthermore, our synthetic traces are compatible with common network analysis tools and support a myriad of network tasks, suggesting that NetDiffusion can serve a broader spectrum of network analysis and testing tasks, extending beyond ML-centric applications.},
 journal = {Proc. ACM Meas. Anal. Comput. Syst.},
 month = feb,
 articleno = {11},
 numpages = {32},
 keywords = {diffusion model, network traffic, synthesis}
 }
@article{Ring_2019,
 title = {Flow-based network traffic generation using Generative Adversarial Networks},
 journal = {Computers \& Security},
 volume = {82},
 pages = {156-172},
 year = {2019},
 issn = {0167-4048},
 doi = {https://doi.org/10.1016/j.cose.2018.12.012},
 url = {https://www.sciencedirect.com/science/article/pii/S0167404818308393},
 author = {Markus Ring and Daniel Schlör and Dieter Landes and Andreas Hotho},
 keywords = {GANs, TTUR WGAN-GP, NetFlow, Generation, IDS},
 abstract = {Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation. A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.}
 }
@inproceedings{10.1145/3544216.3544251,
 author = {Yin, Yucheng and Lin, Zinan and Jin, Minhao and Fanti, Giulia and Sekar, Vyas},
 title = {Practical GAN-based synthetic IP header trace generation using NetShare},
 year = {2022},
 isbn = {9781450394208},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 url = {https://doi.org/10.1145/3544216.3544251},
 doi = {10.1145/3544216.3544251},
 abstract = {We explore the feasibility of using Generative Adversarial Networks (GANs) to automatically learn generative models to generate synthetic packet- and flow header traces for networking tasks (e.g., telemetry, anomaly detection, provisioning). We identify key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges. Building on these insights, we develop an end-to-end framework, NetShare. We evaluate NetShare on six diverse packet header traces and find that: (1) across all distributional metrics and traces, it achieves 46\% more accuracy than baselines and (2) it meets users' requirements of downstream tasks in evaluating accuracy and rank ordering of candidate approaches.},
 booktitle = {Proceedings of the ACM SIGCOMM 2022 Conference},
 pages = {458–472},
 numpages = {15},
 keywords = {synthetic data generation, privacy, network packets, network flows, generative adversarial networks},
 location = {Amsterdam, Netherlands},
 series = {SIGCOMM '22}
 }
@inproceedings{Lin_2020,
 author = {Lin, Zinan and Jain, Alankar and Wang, Chen and Fanti, Giulia and Sekar, Vyas},
 title = {Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions},
 year = {2020},
 isbn = {9781450381383},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 url = {https://doi.org/10.1145/3419394.3423643},
 doi = {10.1145/3419394.3423643},
 abstract = {Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43\% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.},
 booktitle = {Proceedings of the ACM Internet Measurement Conference},
 pages = {464–483},
 numpages = {20},
 keywords = {generative adversarial networks, privacy, synthetic data generation, time series},
 location = {Virtual Event, USA},
 series = {IMC '20}
 }
@INPROCEEDINGS{7469060,
  author={Mathur, Aditya P. and Tippenhauer, Nils Ole},
  booktitle={2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater)},
  title={SWaT: a water treatment testbed for research and training on ICS security},
  year={2016},
  volume={},
  number={},
  pages={31-36},
  keywords={Sensors;Actuators;Feeds;Process control;Chemicals;Chemical sensors;Security;Cyber Physical Systems;Industrial Control Systems;Cyber Attacks;Cyber Defense;Water Testbed},
  doi={10.1109/CySWater.2016.7469060}
 }
@inproceedings{10.1145/3055366.3055375,
 author = {Ahmed, Chuadhry Mujeeb and Palleti, Venkata Reddy and Mathur, Aditya P.},
 title = {WADI: a water distribution testbed for research in the design of secure cyber physical systems},
 year = {2017},
 isbn = {9781450349758},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 url = {https://doi.org/10.1145/3055366.3055375},
 doi = {10.1145/3055366.3055375},
 abstract = {The architecture of a water distribution testbed (WADI), and on-going research in the design of secure water distribution system is presented. WADI consists of three stages controlled by Programmable Logic Controllers (PLCs) and two stages controlled via Remote Terminal Units (RTUs). Each PLC and RTU uses sensors to estimate the system state and the actuators to effect control. WADI is currently used to (a) conduct security analysis for water distribution networks, (b) experimentally assess detection mechanisms for potential cyber and physical attacks, and (c) understand how the impact of an attack on one CPS could cascade to other connected CPSs. The cascading effects of attacks can be studied in WADI through its connection to two other testbeds, namely for water treatment and power generation and distribution.},
 booktitle = {Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks},
 pages = {25–28},
 numpages = {4},
 keywords = {attack detection, cyber physical systems, cyber security, industrial control systems, water distribution testbed},
 location = {Pittsburgh, Pennsylvania},
 series = {CySWATER '17}
 }
@inproceedings{tashiro2021csdiconditionalscorebaseddiffusion,
 author = {Tashiro, Yusuke and Song, Jiaming and Song, Yang and Ermon, Stefano},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {24804--24816},
 publisher = {Curran Associates, Inc.},
 title = {CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation},
 url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/cfe8504bda37b575c70ee1a8276f3486-Paper.pdf},
 volume = {34},
 year = {2021}
 }
@inproceedings{wen2024diffstgprobabilisticspatiotemporalgraph,
 author = {Wen, Haomin and Lin, Youfang and Xia, Yutong and Wan, Huaiyu and Wen, Qingsong and Zimmermann, Roger and Liang, Yuxuan},
 title = {DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting  with Denoising Diffusion Models},
 year = {2023},
 isbn = {9798400701689},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 url = {https://doi.org/10.1145/3589132.3625614},
 doi = {10.1145/3589132.3625614},
 abstract = {Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal graph (STG) forecasting. Despite their success, they fail to model intrinsic uncertainties within STG data, which cripples their practicality in downstream tasks for decision-making. To this end, this paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex ST dependencies. In this study, we present the first attempt to generalize the popular de-noising diffusion probabilistic models to STGs, leading to a novel non-autoregressive framework called DiffSTG, along with the first denoising network UGnet for STG in the framework. Our approach combines the spatio-temporal learning capabilities of STGNNs with the uncertainty measurements of diffusion models. Extensive experiments validate that DiffSTG reduces the Continuous Ranked Probability Score (CRPS) by 4\%-14\%, and Root Mean Squared Error (RMSE) by 2\%-7\% over existing methods on three real-world datasets.},
 booktitle = {Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems},
 articleno = {60},
 numpages = {12},
 keywords = {spatio-temporal graph forecasting, probabilistic forecasting, diffusion model},
 location = {Hamburg, Germany},
 series = {SIGSPATIAL '23}
 }
@INPROCEEDINGS{liu2023pristiconditionaldiffusionframework,
  author={Liu, Mingzhe and Huang, Han and Feng, Hao and Sun, Leilei and Du, Bowen and Fu, Yanjie},
  booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},
  title={PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation},
  year={2023},
  volume={},
  number={},
  pages={1927-1939},
  keywords={Correlation;Scalability;Transforms;Predictive models;Feature extraction;Propagation losses;Probabilistic logic;Spatiotemporal Imputation;Diffusion Model;Spatiotemporal Dependency Learning},
  doi={10.1109/ICDE55515.2023.00150}}
@misc{kong2021diffwaveversatilediffusionmodel,
      title={DiffWave: A Versatile Diffusion Model for Audio Synthesis},
      author={Zhifeng Kong and Wei Ping and Jiaji Huang and Kexin Zhao and Bryan Catanzaro},
      year={2021},
      eprint={2009.09761},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2009.09761},
 }
@ARTICLE{11087622,
  author={Liu, Xiaosi and Xu, Xiaowen and Liu, Zhidan and Li, Zhenjiang and Wu, Kaishun},
  journal={IEEE Transactions on Mobile Computing},
  title={Spatio-Temporal Diffusion Model for Cellular Traffic Generation},
  year={2026},
  volume={25},
  number={1},
  pages={257-271},
  keywords={Base stations;Diffusion models;Data models;Uncertainty;Predictive models;Generative adversarial networks;Knowledge graphs;Mobile computing;Telecommunication traffic;Semantics;Cellular traffic;data generation;diffusion model;spatio-temporal graph},
  doi={10.1109/TMC.2025.3591183}
 }
@inproceedings{hoogeboom2021argmaxflowsmultinomialdiffusion,
 author = {Hoogeboom, Emiel and Nielsen, Didrik and Jaini, Priyank and Forr\'{e}, Patrick and Welling, Max},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {12454--12465},
 publisher = {Curran Associates, Inc.},
 title = {Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions},
 url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/67d96d458abdef21792e6d8e590244e7-Paper.pdf},
 volume = {34},
 year = {2021}
 }
@inproceedings{li2022diffusionlmimprovescontrollabletext,
 author = {Li, Xiang and Thickstun, John and Gulrajani, Ishaan and Liang, Percy S and Hashimoto, Tatsunori B},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
 pages = {4328--4343},
 publisher = {Curran Associates, Inc.},
 title = {Diffusion-LM Improves Controllable Text Generation},
 url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf},
 volume = {35},
 year = {2022}
 }
@ARTICLE{meng2025aflnetyearslatercoverageguided,
  author={Meng, Ruijie and Pham, Van-Thuan and Böhme, Marcel and Roychoudhury, Abhik},
  journal={IEEE Transactions on Software Engineering},
  title={AFLNet Five Years Later: On Coverage-Guided Protocol Fuzzing},
  year={2025},
  volume={51},
  number={4},
  pages={960-974},
  keywords={Protocols;Servers;Fuzzing;Codes;Security;Data models;Source coding;Computer bugs;Software systems;Reliability;Greybox fuzzing;network protocol testing;stateful fuzzing},
  doi={10.1109/TSE.2025.3535925}}
@INPROCEEDINGS{godefroid2017learnfuzzmachinelearninginput,
  author={Godefroid, Patrice and Peleg, Hila and Singh, Rishabh},
  booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)},
  title={Learn\&Fuzz: Machine learning for input fuzzing},
  year={2017},
  volume={},
  number={},
  pages={50-59},
  keywords={Portable document format;Grammar;Training;Probability distribution;Recurrent neural networks;Fuzzing;Deep Learning;Grammar-based Fuzzing;Grammar Learning},
  doi={10.1109/ASE.2017.8115618}}
@INPROCEEDINGS{she2019neuzzefficientfuzzingneural,
  author={She, Dongdong and Pei, Kexin and Epstein, Dave and Yang, Junfeng and Ray, Baishakhi and Jana, Suman},
  booktitle={2019 IEEE Symposium on Security and Privacy (SP)},
  title={NEUZZ: Efficient Fuzzing with Neural Program Smoothing},
  year={2019},
  volume={},
  number={},
  pages={803-817},
  keywords={Optimization;Fuzzing;Computer bugs;Artificial neural networks;Smoothing methods;Evolutionary computation;fuzzing;-neural-program-smoothing;-gradient-guided-mutation},
  doi={10.1109/SP.2019.00052}}
@inproceedings{dai2019transformerxlattentivelanguagemodels,
    title = "Transformer-{XL}: Attentive Language Models beyond a Fixed-Length Context",
    author = "Dai, Zihang  and
      Yang, Zhilin  and
      Yang, Yiming  and
      Carbonell, Jaime  and
      Le, Quoc  and
      Salakhutdinov, Ruslan",
    editor = "Korhonen, Anna  and
      Traum, David  and
      M{\`a}rquez, Llu{\'i}s",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1285/",
    doi = "10.18653/v1/P19-1285",
    pages = "2978--2988",
    abstract = "Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80{\%} longer than RNNs and 450{\%} longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch."
 }
@article{zhou2021informerefficienttransformerlong,
    title={Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
    volume={35}, url={https://ojs.aaai.org/index.php/AAAI/article/view/17325},
    DOI={10.1609/aaai.v35i12.17325},
    abstractNote={Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences’ dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.},
    number={12},
    journal={Proceedings of the AAAI Conference on Artificial Intelligence},
    author={Zhou, Haoyi and Zhang, Shanghang and Peng, Jieqi and Zhang, Shuai and Li, Jianxin and Xiong, Hui and Zhang, Wancai},
    year={2021},
    month={May},
    pages={11106-11115}
 }
@inproceedings{wu2022autoformerdecompositiontransformersautocorrelation,
 author = {Wu, Haixu and Xu, Jiehui and Wang, Jianmin and Long, Mingsheng},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
 pages = {22419--22430},
 publisher = {Curran Associates, Inc.},
 title = {Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting},
 url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/bcc0d400288793e8bdcd7c19a8ac0c2b-Paper.pdf},
 volume = {34},
 year = {2021}
 }
@InProceedings{zhou2022fedformerfrequencyenhanceddecomposed,
  title = 	 {{FED}former: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting},
  author =       {Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {27268--27286},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/zhou22g/zhou22g.pdf},
  url = 	 {https://proceedings.mlr.press/v162/zhou22g.html},
  abstract = 	 {Long-term time series forecasting is challenging since prediction accuracy tends to decrease dramatically with the increasing horizon. Although Transformer-based methods have significantly improved state-of-the-art results for long-term forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in a well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer (FEDformer), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, Fedformer can reduce prediction error by 14.8% and 22.6% for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.}
 }
@article{2023,
   title={A Note on Extremal Sombor Indices of Trees with a Given Degree Sequence},
   volume={90},
   ISSN={0340-6253},
   url={http://dx.doi.org/10.46793/match.90-1.197D},
   DOI={10.46793/match.90-1.197d},
   number={1},
   journal={Match Communications in Mathematical and in Computer Chemistry},
   publisher={University Library in Kragujevac},
   author={Damjanović, Ivan and Milošević, Marko and Stevanović, Dragan},
   year={2023},
   pages={197–202}
 }
@article{stenger2024survey,
  title={Evaluation is key: a survey on evaluation measures for synthetic time series},
  author={Stenger, Michael and Leppich, Robert and Foster, Ian T and Kounev, Samuel and Bauer, Andre},
  journal={Journal of Big Data},
  volume={11},
  number={1},
  pages={66},
  year={2024},
  publisher={Springer}
 }
@ARTICLE{lin1991divergence,
  author={Lin, J.},
  journal={IEEE Transactions on Information Theory},
  title={Divergence measures based on the Shannon entropy},
  year={1991},
  volume={37},
  number={1},
  pages={145-151},
  keywords={Entropy;Probability distribution;Upper bound;Pattern analysis;Signal analysis;Signal processing;Pattern recognition;Taxonomy;Genetics;Computer science},
  doi={10.1109/18.61115}}
@inproceedings{yoon2019timegan,
 author = {Yoon, Jinsung and Jarrett, Daniel and van der Schaar, Mihaela},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
 pages = {},
 publisher = {Curran Associates, Inc.},
 title = {Time-series Generative Adversarial Networks},
 url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf},
 volume = {32},
 year = {2019}
 }
@inproceedings{10.1145/3490354.3494393,
 author = {Ni, Hao and Szpruch, Lukasz and Sabate-Vidales, Marc and Xiao, Baoren and Wiese, Magnus and Liao, Shujian},
 title = {Sig-wasserstein GANs for time series generation},
 year = {2022},
 isbn = {9781450391481},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 url = {https://doi.org/10.1145/3490354.3494393},
 doi = {10.1145/3490354.3494393},
 abstract = {Synthetic data is an emerging technology that can significantly accelerate the development and deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed signature W1 metric. The former are the Logsig-RNN models based on the stochastic differential equations, whereas the latter originates from the universal and principled mathematical features to characterize the measure induced by time series. SigWGAN allows turning computationally challenging GAN min-max problem into supervised learning while generating high fidelity samples. We validate the proposed model on both synthetic data generated by popular quantitative risk models and empirical financial data. Codes are available at https://github.com/SigCGANs/Sig-Wasserstein-GANs.git},
 booktitle = {Proceedings of the Second ACM International Conference on AI in Finance},
 articleno = {28},
 numpages = {8},
 keywords = {signatures, neural networks, generative modelling},
 location = {Virtual Event},
 series = {ICAIF '21}
 }
@inproceedings{coletta2023constrained,
 author = {Coletta, Andrea and Gopalakrishnan, Sriram and Borrajo, Daniel and Vyetrenko, Svitlana},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {61048--61059},
 publisher = {Curran Associates, Inc.},
 title = {On the Constrained Time-Series Generation Problem},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/bfb6a69c0d9e2bc596e1cd31f16fcdde-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
 }
@article{yang2001interlock,
 title = {Automatic verification of safety interlock systems for industrial processes},
 journal = {Journal of Loss Prevention in the Process Industries},
 volume = {14},
 number = {5},
 pages = {379-386},
 year = {2001},
 issn = {0950-4230},
 doi = {https://doi.org/10.1016/S0950-4230(01)00014-6},
 url = {https://www.sciencedirect.com/science/article/pii/S0950423001000146},
 author = {S.H. Yang and L.S. Tan and C.H. He},
 keywords = {Safety interlock system, Symbolic model checking, Safety verification, Industrial processes},
 abstract = {The safety interlock system (SIS) is one of the most important protective measurements in industrial processes that provide automatic actions to correct an abnormal plant event. This paper considers the use of formal techniques based on symbolic model checking and computation tree logic (CTL) in the specification to automatically verify the SIS for industrial processes. It addresses the problem of modelling industrial processes and presenting the SIS in CTL. It shows how symbolic model checking can be used efficiently in the verification of a SIS. A transferring system for a penicillin process is used as a case study.}
 }
@article{10.1145/1151659.1159928,
 author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
 title = {Realistic and responsive network traffic generation},
 year = {2006},
 issue_date = {October 2006},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 volume = {36},
 number = {4},
 issn = {0146-4833},
 url = {https://doi.org/10.1145/1151659.1159928},
 doi = {10.1145/1151659.1159928},
 abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
 journal = {SIGCOMM Comput. Commun. Rev.},
 month = aug,
 pages = {111–122},
 numpages = {12},
 keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets}
 }
@inproceedings{nie2023patchtst,
  title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
  author={Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2023},
  url={https://arxiv.org/abs/2211.14730}
 }
--- a/LaTeX2e+Proceedings+Templates+download/samplepaper.tex
+++ b/LaTeX2e+Proceedings+Templates+download/samplepaper.tex
@@ -0,0 +1,173 @@
 % This is samplepaper.tex, a sample chapter demonstrating the
 % LLNCS macro package for Springer Computer Science proceedings;
 % Version 2.21 of 2022/01/12
 %
 \documentclass[runningheads]{llncs}
 %
 \usepackage[T1]{fontenc}
 % T1 fonts will be used to generate the final print and online PDFs,
 % so please use T1 fonts in your manuscript whenever possible.
 % Other font encondings may result in incorrect characters.
 %
 \usepackage{graphicx}
 % Used for displaying a sample figure. If possible, figure files should
 % be included in EPS format.
 %
 % If you use the hyperref package, please uncomment the following two lines
 % to display URLs in blue roman font according to Springer's eBook style:
 %\usepackage{color}
 %\renewcommand\UrlFont{\color{blue}\rmfamily}
 %\urlstyle{rm}
 %
 \begin{document}
 %
 \title{Contribution Title}
 %
 %\titlerunning{Abbreviated paper title}
 % If the paper title is too long for the running head, you can set
 % an abbreviated paper title here
 %
 \author{First Author\inst{1}\orcidID{0000-1111-2222-3333} \and
 Second Author\inst{2,3}\orcidID{1111-2222-3333-4444} \and
 Third Author\inst{3}\orcidID{2222--3333-4444-5555}}
 %
 \authorrunning{F. Author et al.}
 % First names are abbreviated in the running head.
 % If there are more than two authors, 'et al.' is used.
 %
 \institute{Princeton University, Princeton NJ 08544, USA \and
 Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany
 \email{lncs@springer.com}\\
 \url{http://www.springer.com/gp/computer-science/lncs} \and
 ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany\\
 \email{\{abc,lncs\}@uni-heidelberg.de}}
 %
 \maketitle              % typeset the header of the contribution
 %
 \begin{abstract}
 The abstract should briefly summarize the contents of the paper in
 150--250 words.
 \keywords{First keyword  \and Second keyword \and Another keyword.}
 \end{abstract}
 %
 %
 %
 \section{First Section}
 \subsection{A Subsection Sample}
 Please note that the first paragraph of a section or subsection is
 not indented. The first paragraph that follows a table, figure,
 equation etc. does not need an indent, either.
 Subsequent paragraphs, however, are indented.
 \subsubsection{Sample Heading (Third Level)} Only two levels of
 headings should be numbered. Lower level headings remain unnumbered;
 they are formatted as run-in headings.
 \paragraph{Sample Heading (Fourth Level)}
 The contribution should contain no more than four levels of
 headings. Table~\ref{tab1} gives a summary of all heading levels.
 \begin{table}
 \caption{Table captions should be placed above the
 tables.}\label{tab1}
 \begin{tabular}{|l|l|l|}
 \hline
 Heading level &  Example & Font size and style\\
 \hline
 Title (centered) &  {\Large\bfseries Lecture Notes} & 14 point, bold\\
 1st-level heading &  {\large\bfseries 1 Introduction} & 12 point, bold\\
 2nd-level heading & {\bfseries 2.1 Printing Area} & 10 point, bold\\
 3rd-level heading & {\bfseries Run-in Heading in Bold.} Text follows & 10 point, bold\\
 4th-level heading & {\itshape Lowest Level Heading.} Text follows & 10 point, italic\\
 \hline
 \end{tabular}
 \end{table}
 \noindent Displayed equations are centered and set on a separate
 line.
 \begin{equation}
 x + y = z
 \end{equation}
 Please try to avoid rasterized images for line-art diagrams and
 schemas. Whenever possible, use vector graphics instead (see
 Fig.~\ref{fig1}).
 \begin{figure}
 \includegraphics[width=\textwidth]{fig1.eps}
 \caption{A figure caption is always placed below the illustration.
 Please note that short captions are centered, while long ones are
 justified by the macro package automatically.} \label{fig1}
 \end{figure}
 \begin{theorem}
 This is a sample theorem. The run-in heading is set in bold, while
 the following text appears in italics. Definitions, lemmas,
 propositions, and corollaries are styled the same way.
 \end{theorem}
 %
 % the environments 'definition', 'lemma', 'proposition', 'corollary',
 % 'remark', and 'example' are defined in the LLNCS documentclass as well.
 %
 \begin{proof}
 Proofs, examples, and remarks have the initial word in italics,
 while the following text appears in normal font.
 \end{proof}
 For citations of references, we prefer the use of square brackets
 and consecutive numbers. Citations using labels or the author/year
 convention are also acceptable. The following bibliography provides
 a sample reference list with entries for journal
 articles~\cite{ref_article1}, an LNCS chapter~\cite{ref_lncs1}, a
 book~\cite{ref_book1}, proceedings without editors~\cite{ref_proc1},
 and a homepage~\cite{ref_url1}. Multiple citations are grouped
 \cite{ref_article1,ref_lncs1,ref_book1},
 \cite{ref_article1,ref_book1,ref_proc1,ref_url1}.
 \begin{credits}
 \subsubsection{\ackname} A bold run-in heading in small font size at the end of the paper is
 used for general acknowledgments, for example: This study was funded
 by X (grant number Y).
 \subsubsection{\discintname}
 It is now necessary to declare any competing interests or to specifically
 state that the authors have no competing interests. Please place the
 statement with a bold run-in heading in small font size beneath the
 (optional) acknowledgments\footnote{If EquinOCS, our proceedings submission
 system, is used, then the disclaimer can be provided directly in the system.},
 for example: The authors have no competing interests to declare that are
 relevant to the content of this article. Or: Author A has received research
 grants from Company W. Author B has received a speaker honorarium from
 Company X and owns stock in Company Y. Author C is a member of committee Z.
 \end{credits}
 %
 % ---- Bibliography ----
 %
 % BibTeX users should specify bibliography style 'splncs04'.
 % References will then be sorted and formatted in the correct style.
 %
 % \bibliographystyle{splncs04}
 % \bibliography{mybibliography}
 %
 \begin{thebibliography}{8}
 \bibitem{ref_article1}
 Author, F.: Article title. Journal \textbf{2}(5), 99--110 (2016)
 \bibitem{ref_lncs1}
 Author, F., Author, S.: Title of a proceedings paper. In: Editor,
 F., Editor, S. (eds.) CONFERENCE 2016, LNCS, vol. 9999, pp. 1--13.
 Springer, Heidelberg (2016). \doi{10.10007/1234567890}
 \bibitem{ref_book1}
 Author, F., Author, S., Author, T.: Book title. 2nd edn. Publisher,
 Location (1999)
 \bibitem{ref_proc1}
 Author, A.-B.: Contribution title. In: 9th International Proceedings
 on Proceedings, pp. 1--2. Publisher, Location (2010)
 \bibitem{ref_url1}
 LNCS Homepage, \url{http://www.springer.com/lncs}, last accessed 2023/10/25
 \end{thebibliography}
 \end{document}
--- a/LaTeX2e+Proceedings+Templates+download/splncs04.bst
+++ b/LaTeX2e+Proceedings+Templates+download/splncs04.bst
--- a/LaTeX2e+Proceedings+Templates+download/typeclass-cropped.pdf
+++ b/LaTeX2e+Proceedings+Templates+download/typeclass-cropped.pdf
--- a/arxiv-style/fig-benchmark-ablations-v1.png
+++ b/arxiv-style/fig-benchmark-ablations-v1.png
--- a/arxiv-style/fig-benchmark-story-v2.png
+++ b/arxiv-style/fig-benchmark-story-v2.png
--- a/arxiv-style/fig-design-v4-from-user-svg-cropped.pdf
+++ b/arxiv-style/fig-design-v4-from-user-svg-cropped.pdf
--- a/arxiv-style/fig-design-v4.png
+++ b/arxiv-style/fig-design-v4.png
--- a/arxiv-style/fig-scripts/.python-version
+++ b/arxiv-style/fig-scripts/.python-version
@@ -0,0 +1 @@
 3.12
--- a/arxiv-style/fig-scripts/draw_channels.py
+++ b/arxiv-style/fig-scripts/draw_channels.py
@@ -0,0 +1,237 @@
 #!/usr/bin/env python3
 """
 Draw *separate* SVG figures for:
  1) Continuous channels  (multiple smooth curves per figure)
  2) Discrete channels    (multiple step-like/token curves per figure)
 Outputs (default):
  out/continuous_channels.svg
  out/discrete_channels.svg
 Notes:
 - Transparent background (good for draw.io / LaTeX / diagrams).
 - No axes/frames by default (diagram-friendly).
 - Curves are synthetic placeholders; replace `make_*_channels()` with your real data.
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
 # ----------------------------
 # Data generators (placeholders)
 # ----------------------------
@dataclass
 class GenParams:
    seconds: float = 10.0
    fs: int = 200
    seed: int = 7
    n_cont: int = 6          # number of continuous channels (curves)
    n_disc: int = 5          # number of discrete channels (curves)
    disc_vocab: int = 8      # token/vocab size for discrete channels
    disc_change_rate_hz: float = 1.2  # how often discrete tokens change
 def make_continuous_channels(p: GenParams) -> tuple[np.ndarray, np.ndarray]:
    """
    Returns:
      t: shape (T,)
      Y: shape (n_cont, T)
    """
    rng = np.random.default_rng(p.seed)
    T = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, T, endpoint=False)
    Y = []
    for i in range(p.n_cont):
        # Multi-scale smooth-ish signals
        f1 = 0.15 + 0.06 * i
        f2 = 0.8 + 0.15 * (i % 3)
        phase = rng.uniform(0, 2 * np.pi)
        y = (
            0.9 * np.sin(2 * np.pi * f1 * t + phase)
            + 0.35 * np.sin(2 * np.pi * f2 * t + 1.3 * phase)
        )
        # Add mild colored-ish noise by smoothing white noise
        w = rng.normal(0, 1, size=T)
        w = np.convolve(w, np.ones(9) / 9.0, mode="same")
        y = y + 0.15 * w
        # Normalize each channel for consistent visual scale
        y = (y - np.mean(y)) / (np.std(y) + 1e-9)
        y = 0.8 * y + 0.15 * i  # vertical offset to separate curves a bit
        Y.append(y)
    return t, np.vstack(Y)
 def make_discrete_channels(p: GenParams) -> tuple[np.ndarray, np.ndarray]:
    """
    Discrete channels as piecewise-constant token IDs (integers).
    Returns:
      t: shape (T,)
      X: shape (n_disc, T)  (integers in [0, disc_vocab-1])
    """
    rng = np.random.default_rng(p.seed + 100)
    T = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, T, endpoint=False)
    # expected number of changes per channel
    expected_changes = int(max(1, p.seconds * p.disc_change_rate_hz))
    X = np.zeros((p.n_disc, T), dtype=int)
    for c in range(p.n_disc):
        # pick change points
        k = rng.poisson(expected_changes) + 1
        change_pts = np.unique(rng.integers(0, T, size=k))
        change_pts = np.sort(np.concatenate([[0], change_pts, [T]]))
        cur = rng.integers(0, p.disc_vocab)
        for a, b in zip(change_pts[:-1], change_pts[1:]):
            # occasional token jump
            if a != 0:
                if rng.random() < 0.85:
                    cur = rng.integers(0, p.disc_vocab)
            X[c, a:b] = cur
    return t, X
 # ----------------------------
 # Plotting helpers
 # ----------------------------
 def _make_transparent_figure(width_in: float, height_in: float) -> tuple[plt.Figure, plt.Axes]:
    fig = plt.figure(figsize=(width_in, height_in), dpi=200)
    fig.patch.set_alpha(0.0)
    ax = fig.add_axes([0.03, 0.03, 0.94, 0.94])
    ax.patch.set_alpha(0.0)
    return fig, ax
 def save_continuous_channels_svg(
    t: np.ndarray,
    Y: np.ndarray,
    out_path: Path,
    *,
    lw: float = 2.0,
    clean: bool = True,
 ) -> None:
    """
    Plot multiple continuous curves in one figure and save SVG.
    Y shape: (n_cont, T)
    """
    fig, ax = _make_transparent_figure(width_in=6.0, height_in=2.2)
    # Let matplotlib choose different colors automatically (good defaults).
    for i in range(Y.shape[0]):
        ax.plot(t, Y[i], linewidth=lw)
    if clean:
        ax.set_axis_off()
    else:
        ax.set_xlabel("t")
        ax.set_ylabel("value")
    # Set limits with padding
    y_all = Y.reshape(-1)
    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
    ypad = 0.08 * (ymax - ymin + 1e-9)
    ax.set_xlim(t[0], t[-1])
    ax.set_ylim(ymin - ypad, ymax + ypad)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
    plt.close(fig)
 def save_discrete_channels_svg(
    t: np.ndarray,
    X: np.ndarray,
    out_path: Path,
    *,
    lw: float = 2.0,
    clean: bool = True,
    vertical_spacing: float = 1.25,
 ) -> None:
    """
    Plot multiple discrete (piecewise-constant) curves in one figure and save SVG.
    X shape: (n_disc, T) integers.
    We draw each channel as a step plot, offset vertically so curves don't overlap.
    """
    fig, ax = _make_transparent_figure(width_in=6.0, height_in=2.2)
    for i in range(X.shape[0]):
        y = X[i].astype(float) + i * vertical_spacing
        ax.step(t, y, where="post", linewidth=lw)
    if clean:
        ax.set_axis_off()
    else:
        ax.set_xlabel("t")
        ax.set_ylabel("token id (offset)")
    y_all = (X.astype(float) + np.arange(X.shape[0])[:, None] * vertical_spacing).reshape(-1)
    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
    ypad = 0.10 * (ymax - ymin + 1e-9)
    ax.set_xlim(t[0], t[-1])
    ax.set_ylim(ymin - ypad, ymax + ypad)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
    plt.close(fig)
 # ----------------------------
 # CLI
 # ----------------------------
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--outdir", type=Path, default=Path("out"))
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=10.0)
    ap.add_argument("--fs", type=int, default=200)
    ap.add_argument("--n-cont", type=int, default=6)
    ap.add_argument("--n-disc", type=int, default=5)
    ap.add_argument("--disc-vocab", type=int, default=8)
    ap.add_argument("--disc-change-rate", type=float, default=1.2)
    ap.add_argument("--keep-axes", action="store_true", help="Show axes/labels (default: off)")
    args = ap.parse_args()
    p = GenParams(
        seconds=args.seconds,
        fs=args.fs,
        seed=args.seed,
        n_cont=args.n_cont,
        n_disc=args.n_disc,
        disc_vocab=args.disc_vocab,
        disc_change_rate_hz=args.disc_change_rate,
    )
    t_c, Y = make_continuous_channels(p)
    t_d, X = make_discrete_channels(p)
    cont_path = args.outdir / "continuous_channels.svg"
    disc_path = args.outdir / "discrete_channels.svg"
    save_continuous_channels_svg(t_c, Y, cont_path, clean=not args.keep_axes)
    save_discrete_channels_svg(t_d, X, disc_path, clean=not args.keep_axes)
    print("Wrote:")
    print(f"  {cont_path}")
    print(f"  {disc_path}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/draw_synthetic_ics_optionA.py
+++ b/arxiv-style/fig-scripts/draw_synthetic_ics_optionA.py
@@ -0,0 +1,272 @@
 #!/usr/bin/env python3
 """
 Option A: "Synthetic ICS Data" mini-panel (high-level features, not packets)
 What it draws (one SVG, transparent background):
 - Top: 2–3 continuous feature curves (smooth, time-aligned)
 - Bottom: discrete/categorical feature strip (colored blocks)
 - One vertical dashed alignment line crossing both
 - Optional shaded regime window
 - Optional "real vs synthetic" ghost overlay (faint gray behind one curve)
 Usage:
  uv run python draw_synthetic_ics_optionA.py --out ./assets/synth_ics_optionA.svg
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
 from matplotlib.patches import Rectangle
@dataclass
 class Params:
    seed: int = 7
    seconds: float = 10.0
    fs: int = 300
    n_curves: int = 3              # continuous channels shown
    n_bins: int = 40               # discrete blocks across x
    disc_vocab: int = 8            # number of discrete categories
    # Layout / style
    width_in: float = 6.0
    height_in: float = 2.2
    curve_lw: float = 2.3
    ghost_lw: float = 2.0          # "real" overlay line width
    strip_height: float = 0.65     # bar height in [0,1] strip axis
    strip_gap_frac: float = 0.10   # gap between blocks (fraction of block width)
    # Visual cues
    show_alignment_line: bool = True
    align_x_frac: float = 0.58     # where to place dashed line, fraction of timeline
    show_regime_window: bool = True
    regime_start_frac: float = 0.30
    regime_end_frac: float = 0.45
    show_real_ghost: bool = True   # faint gray "real" behind first synthetic curve
 def _smooth(x: np.ndarray, win: int) -> np.ndarray:
    win = max(3, int(win) | 1)  # odd
    k = np.ones(win, dtype=float)
    k /= k.sum()
    return np.convolve(x, k, mode="same")
 def make_continuous_curves(p: Params) -> tuple[np.ndarray, np.ndarray, np.ndarray | None]:
    """
    Returns:
      t: (T,)
      Y_syn: (n_curves, T)  synthetic curves
      y_real: (T,) or None  optional "real" ghost curve (for one channel)
    """
    rng = np.random.default_rng(p.seed)
    T = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, T, endpoint=False)
    Y = []
    for i in range(p.n_curves):
        # multi-scale smooth temporal patterns
        f_slow = 0.09 + 0.03 * (i % 3)
        f_mid = 0.65 + 0.18 * (i % 4)
        ph = rng.uniform(0, 2 * np.pi)
        y = (
            0.95 * np.sin(2 * np.pi * f_slow * t + ph)
            + 0.30 * np.sin(2 * np.pi * f_mid * t + 0.7 * ph)
        )
        # regime-like bumps
        bumps = np.zeros_like(t)
        for _ in range(2):
            mu = rng.uniform(0.8, p.seconds - 0.8)
            sig = rng.uniform(0.35, 0.85)
            bumps += np.exp(-0.5 * ((t - mu) / (sig + 1e-9)) ** 2)
        y += 0.55 * bumps
        # mild smooth noise
        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.04))
        y += 0.10 * noise
        # normalize for clean presentation
        y = (y - y.mean()) / (y.std() + 1e-9)
        y *= 0.42
        Y.append(y)
    Y_syn = np.vstack(Y)
    # Optional "real" ghost: similar to first curve, but slightly different
    y_real = None
    if p.show_real_ghost:
        base = Y_syn[0].copy()
        drift = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.18))
        drift = drift / (np.std(drift) + 1e-9)
        y_real = base * 0.95 + 0.07 * drift
    return t, Y_syn, y_real
 def make_discrete_strip(p: Params) -> np.ndarray:
    """
    Piecewise-constant categorical IDs across n_bins.
    Returns:
      ids: (n_bins,) in [0, disc_vocab-1]
    """
    rng = np.random.default_rng(p.seed + 123)
    n = p.n_bins
    ids = np.zeros(n, dtype=int)
    cur = rng.integers(0, p.disc_vocab)
    for i in range(n):
        # occasional change
        if i == 0 or rng.random() < 0.28:
            cur = rng.integers(0, p.disc_vocab)
        ids[i] = cur
    return ids
 def _axes_clean(ax: plt.Axes) -> None:
    """Keep axes lines optional but remove all text/numbers (diagram-friendly)."""
    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_title("")
    ax.set_xticks([])
    ax.set_yticks([])
    ax.tick_params(
        axis="both",
        which="both",
        bottom=False,
        left=False,
        top=False,
        right=False,
        labelbottom=False,
        labelleft=False,
    )
 def draw_optionA(out_path: Path, p: Params) -> None:
    # Figure
    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
    fig.patch.set_alpha(0.0)
    # Two stacked axes (shared x)
    ax_top = fig.add_axes([0.08, 0.32, 0.90, 0.62])
    ax_bot = fig.add_axes([0.08, 0.12, 0.90, 0.16], sharex=ax_top)
    ax_top.patch.set_alpha(0.0)
    ax_bot.patch.set_alpha(0.0)
    # Generate data
    t, Y_syn, y_real = make_continuous_curves(p)
    ids = make_discrete_strip(p)
    x0, x1 = float(t[0]), float(t[-1])
    span = x1 - x0
    # Optional shaded regime window
    if p.show_regime_window:
        rs = x0 + p.regime_start_frac * span
        re = x0 + p.regime_end_frac * span
        ax_top.axvspan(rs, re, alpha=0.12)  # default color, semi-transparent
        ax_bot.axvspan(rs, re, alpha=0.12)
    # Optional vertical dashed alignment line
    if p.show_alignment_line:
        vx = x0 + p.align_x_frac * span
        ax_top.axvline(vx, linestyle="--", linewidth=1.2, alpha=0.7)
        ax_bot.axvline(vx, linestyle="--", linewidth=1.2, alpha=0.7)
    # Continuous curves (use fixed colors for consistency)
    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#9467bd"]  # blue, orange, green, purple
    # Ghost "real" behind the first curve (faint gray)
    if y_real is not None:
        ax_top.plot(t, y_real, linewidth=p.ghost_lw, color="0.65", alpha=0.55, zorder=1)
    for i in range(Y_syn.shape[0]):
        ax_top.plot(
            t, Y_syn[i],
            linewidth=p.curve_lw,
            color=curve_colors[i % len(curve_colors)],
            zorder=2
        )
    # Set top y-limits with padding
    ymin, ymax = float(Y_syn.min()), float(Y_syn.max())
    ypad = 0.10 * (ymax - ymin + 1e-9)
    ax_top.set_xlim(x0, x1)
    ax_top.set_ylim(ymin - ypad, ymax + ypad)
    # Discrete strip as colored blocks
    palette = [
        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
    ]
    n = len(ids)
    bin_w = span / n
    gap = p.strip_gap_frac * bin_w
    ax_bot.set_ylim(0, 1)
    y = (1 - p.strip_height) / 2
    for i, cat in enumerate(ids):
        left = x0 + i * bin_w + gap / 2
        width = bin_w - gap
        ax_bot.add_patch(
            Rectangle(
                (left, y), width, p.strip_height,
                facecolor=palette[int(cat) % len(palette)],
                edgecolor="none",
            )
        )
    # Clean axes: no ticks/labels; keep spines (axes lines) visible
    _axes_clean(ax_top)
    _axes_clean(ax_bot)
    for ax in (ax_top, ax_bot):
        for side in ("left", "bottom", "top", "right"):
            ax.spines[side].set_visible(True)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
    plt.close(fig)
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--out", type=Path, default=Path("synth_ics_optionA.svg"))
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=10.0)
    ap.add_argument("--fs", type=int, default=300)
    ap.add_argument("--curves", type=int, default=3)
    ap.add_argument("--bins", type=int, default=40)
    ap.add_argument("--vocab", type=int, default=8)
    ap.add_argument("--no-align", action="store_true")
    ap.add_argument("--no-regime", action="store_true")
    ap.add_argument("--no-ghost", action="store_true")
    args = ap.parse_args()
    p = Params(
        seed=args.seed,
        seconds=args.seconds,
        fs=args.fs,
        n_curves=args.curves,
        n_bins=args.bins,
        disc_vocab=args.vocab,
        show_alignment_line=not args.no_align,
        show_regime_window=not args.no_regime,
        show_real_ghost=not args.no_ghost,
    )
    draw_optionA(args.out, p)
    print(f"Wrote: {args.out}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/draw_synthetic_ics_optionB.py
+++ b/arxiv-style/fig-scripts/draw_synthetic_ics_optionB.py
@@ -0,0 +1,318 @@
 #!/usr/bin/env python3
 """
 Option B: "Synthetic ICS Data" as a mini process-story strip (high-level features)
 - ONE SVG, transparent background
 - Two frames by default: "steady/normal" -> "disturbance/recovery"
 - Each frame contains:
    - Top: multiple continuous feature curves
    - Bottom: discrete/categorical strip (colored blocks)
    - A vertical dashed alignment line crossing both
    - Optional shaded regime window
 - A right-pointing arrow between frames
 No text, no numbers (axes lines only). Good for draw.io embedding.
 Run:
  uv run python draw_synthetic_ics_optionB.py --out ./assets/synth_ics_optionB.svg
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
 from matplotlib.patches import Rectangle, FancyArrowPatch
@dataclass
 class Params:
    seed: int = 7
    seconds: float = 8.0
    fs: int = 250
    # Two-frame story
    n_frames: int = 2
    # Per-frame visuals
    n_curves: int = 3
    n_bins: int = 32
    disc_vocab: int = 8
    # Layout
    width_in: float = 8.2
    height_in: float = 2.4
    # Relative layout inside the figure
    margin_left: float = 0.05
    margin_right: float = 0.05
    margin_bottom: float = 0.12
    margin_top: float = 0.10
    frame_gap: float = 0.08   # gap (figure fraction) between frames (space for arrow)
    # Styling
    curve_lw: float = 2.1
    ghost_lw: float = 1.8
    strip_height: float = 0.65
    strip_gap_frac: float = 0.12
    # Cues
    show_alignment_line: bool = True
    align_x_frac: float = 0.60
    show_regime_window: bool = True
    regime_start_frac: float = 0.30
    regime_end_frac: float = 0.46
    show_real_ghost: bool = False  # keep default off for cleaner story
    show_axes_spines: bool = True  # axes lines only (no ticks/labels)
 # ---------- helpers ----------
 def _smooth(x: np.ndarray, win: int) -> np.ndarray:
    win = max(3, int(win) | 1)
    k = np.ones(win, dtype=float)
    k /= k.sum()
    return np.convolve(x, k, mode="same")
 def _axes_only(ax: plt.Axes, *, keep_spines: bool) -> None:
    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_title("")
    ax.set_xticks([])
    ax.set_yticks([])
    ax.tick_params(
        axis="both",
        which="both",
        bottom=False,
        left=False,
        top=False,
        right=False,
        labelbottom=False,
        labelleft=False,
    )
    ax.grid(False)
    if keep_spines:
        for s in ("left", "right", "top", "bottom"):
            ax.spines[s].set_visible(True)
    else:
        for s in ("left", "right", "top", "bottom"):
            ax.spines[s].set_visible(False)
 def make_frame_continuous(seed: int, seconds: float, fs: int, n_curves: int, style: str) -> tuple[np.ndarray, np.ndarray]:
    """
    style:
      - "steady": smoother, smaller bumps
      - "disturb": larger bumps and more variance
    """
    rng = np.random.default_rng(seed)
    T = int(seconds * fs)
    t = np.linspace(0, seconds, T, endpoint=False)
    amp_bump = 0.40 if style == "steady" else 0.85
    amp_noise = 0.09 if style == "steady" else 0.14
    amp_scale = 0.38 if style == "steady" else 0.46
    base_freqs = [0.10, 0.08, 0.12, 0.09]
    mid_freqs = [0.65, 0.78, 0.90, 0.72]
    Y = []
    for i in range(n_curves):
        f_slow = base_freqs[i % len(base_freqs)]
        f_mid = mid_freqs[i % len(mid_freqs)]
        ph = rng.uniform(0, 2 * np.pi)
        y = (
            0.95 * np.sin(2 * np.pi * f_slow * t + ph)
            + 0.28 * np.sin(2 * np.pi * f_mid * t + 0.65 * ph)
        )
        bumps = np.zeros_like(t)
        n_bumps = 2 if style == "steady" else 3
        for _ in range(n_bumps):
            mu = rng.uniform(0.9, seconds - 0.9)
            sig = rng.uniform(0.35, 0.75) if style == "steady" else rng.uniform(0.20, 0.55)
            bumps += np.exp(-0.5 * ((t - mu) / (sig + 1e-9)) ** 2)
        y += amp_bump * bumps
        noise = _smooth(rng.normal(0, 1, size=T), win=int(fs * 0.04))
        y += amp_noise * noise
        y = (y - y.mean()) / (y.std() + 1e-9)
        y *= amp_scale
        Y.append(y)
    return t, np.vstack(Y)
 def make_frame_discrete(seed: int, n_bins: int, vocab: int, style: str) -> np.ndarray:
    """
    style:
      - "steady": fewer transitions
      - "disturb": more transitions
    """
    rng = np.random.default_rng(seed + 111)
    ids = np.zeros(n_bins, dtype=int)
    p_change = 0.20 if style == "steady" else 0.38
    cur = rng.integers(0, vocab)
    for i in range(n_bins):
        if i == 0 or rng.random() < p_change:
            cur = rng.integers(0, vocab)
        ids[i] = cur
    return ids
 def draw_frame(ax_top: plt.Axes, ax_bot: plt.Axes, t: np.ndarray, Y: np.ndarray, ids: np.ndarray, p: Params) -> None:
    # Optional cues
    x0, x1 = float(t[0]), float(t[-1])
    span = x1 - x0
    if p.show_regime_window:
        rs = x0 + p.regime_start_frac * span
        re = x0 + p.regime_end_frac * span
        ax_top.axvspan(rs, re, alpha=0.12)  # default color
        ax_bot.axvspan(rs, re, alpha=0.12)
    if p.show_alignment_line:
        vx = x0 + p.align_x_frac * span
        ax_top.axvline(vx, linestyle="--", linewidth=1.15, alpha=0.7)
        ax_bot.axvline(vx, linestyle="--", linewidth=1.15, alpha=0.7)
    # Curves
    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#9467bd"]
    for i in range(Y.shape[0]):
        ax_top.plot(t, Y[i], linewidth=p.curve_lw, color=curve_colors[i % len(curve_colors)])
    ymin, ymax = float(Y.min()), float(Y.max())
    ypad = 0.10 * (ymax - ymin + 1e-9)
    ax_top.set_xlim(x0, x1)
    ax_top.set_ylim(ymin - ypad, ymax + ypad)
    # Discrete strip
    palette = [
        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
    ]
    ax_bot.set_xlim(x0, x1)
    ax_bot.set_ylim(0, 1)
    n = len(ids)
    bin_w = span / n
    gap = p.strip_gap_frac * bin_w
    y = (1 - p.strip_height) / 2
    for i, cat in enumerate(ids):
        left = x0 + i * bin_w + gap / 2
        width = bin_w - gap
        ax_bot.add_patch(
            Rectangle((left, y), width, p.strip_height, facecolor=palette[int(cat) % len(palette)], edgecolor="none")
        )
    # Axes-only style
    _axes_only(ax_top, keep_spines=p.show_axes_spines)
    _axes_only(ax_bot, keep_spines=p.show_axes_spines)
 # ---------- main drawing ----------
 def draw_optionB(out_path: Path, p: Params) -> None:
    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
    fig.patch.set_alpha(0.0)
    # Compute frame layout in figure coordinates
    # Each frame has two stacked axes: top curves and bottom strip.
    usable_w = 1.0 - p.margin_left - p.margin_right
    usable_h = 1.0 - p.margin_bottom - p.margin_top
    # Leave gap between frames for arrow
    total_gap = p.frame_gap * (p.n_frames - 1)
    frame_w = (usable_w - total_gap) / p.n_frames
    # Within each frame: vertical split
    top_h = usable_h * 0.70
    bot_h = usable_h * 0.18
    v_gap = usable_h * 0.06
    # bottoms
    bot_y = p.margin_bottom
    top_y = bot_y + bot_h + v_gap
    axes_pairs = []
    for f in range(p.n_frames):
        left = p.margin_left + f * (frame_w + p.frame_gap)
        ax_top = fig.add_axes([left, top_y, frame_w, top_h])
        ax_bot = fig.add_axes([left, bot_y, frame_w, bot_h], sharex=ax_top)
        ax_top.patch.set_alpha(0.0)
        ax_bot.patch.set_alpha(0.0)
        axes_pairs.append((ax_top, ax_bot))
    # Data per frame
    styles = ["steady", "disturb"] if p.n_frames == 2 else ["steady"] * (p.n_frames - 1) + ["disturb"]
    for idx, ((ax_top, ax_bot), style) in enumerate(zip(axes_pairs, styles)):
        t, Y = make_frame_continuous(p.seed + 10 * idx, p.seconds, p.fs, p.n_curves, style=style)
        ids = make_frame_discrete(p.seed + 10 * idx, p.n_bins, p.disc_vocab, style=style)
        draw_frame(ax_top, ax_bot, t, Y, ids, p)
    # Add a visual arrow between frames (in figure coordinates)
    if p.n_frames >= 2:
        for f in range(p.n_frames - 1):
            # center between frame f and f+1
            x_left = p.margin_left + f * (frame_w + p.frame_gap) + frame_w
            x_right = p.margin_left + (f + 1) * (frame_w + p.frame_gap)
            x_mid = (x_left + x_right) / 2
            # arrow y in the middle of the frame stack
            y_mid = bot_y + (bot_h + v_gap + top_h) / 2
            arr = FancyArrowPatch(
                (x_mid - 0.015, y_mid),
                (x_mid + 0.015, y_mid),
                transform=fig.transFigure,
                arrowstyle="-|>",
                mutation_scale=18,
                linewidth=1.6,
                color="black",
            )
            fig.patches.append(arr)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
    plt.close(fig)
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--out", type=Path, default=Path("synth_ics_optionB.svg"))
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=8.0)
    ap.add_argument("--fs", type=int, default=250)
    ap.add_argument("--frames", type=int, default=2, choices=[2, 3], help="2 or 3 frames (story strip)")
    ap.add_argument("--curves", type=int, default=3)
    ap.add_argument("--bins", type=int, default=32)
    ap.add_argument("--vocab", type=int, default=8)
    ap.add_argument("--no-align", action="store_true")
    ap.add_argument("--no-regime", action="store_true")
    ap.add_argument("--no-spines", action="store_true")
    args = ap.parse_args()
    p = Params(
        seed=args.seed,
        seconds=args.seconds,
        fs=args.fs,
        n_frames=args.frames,
        n_curves=args.curves,
        n_bins=args.bins,
        disc_vocab=args.vocab,
        show_alignment_line=not args.no_align,
        show_regime_window=not args.no_regime,
        show_axes_spines=not args.no_spines,
    )
    draw_optionB(args.out, p)
    print(f"Wrote: {args.out}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/draw_transformer_lower_half.py
+++ b/arxiv-style/fig-scripts/draw_transformer_lower_half.py
@@ -0,0 +1,201 @@
 #!/usr/bin/env python3
 """
 Draw the *Transformer section* lower-half visuals:
 - Continuous channels: multiple smooth curves (like the colored trend lines)
 - Discrete channels: small colored bars/ticks along the bottom
 Output: ONE SVG with transparent background, axes hidden.
 Run:
  uv run python draw_transformer_lower_half.py --out ./assets/transformer_lower_half.svg
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
 from matplotlib.patches import Rectangle
@dataclass
 class Params:
    seed: int = 7
    seconds: float = 10.0
    fs: int = 300
    # Continuous channels
    n_curves: int = 3
    curve_lw: float = 2.4
    # Discrete bars
    n_bins: int = 40          # number of discrete bars/ticks across time
    bar_height: float = 0.11  # relative height inside bar strip axis
    bar_gap: float = 0.08     # gap between bars (fraction of bar width)
    # Canvas sizing
    width_in: float = 5.8
    height_in: float = 1.9
 def _smooth(x: np.ndarray, win: int) -> np.ndarray:
    win = max(3, int(win) | 1)  # odd
    k = np.ones(win, dtype=float)
    k /= k.sum()
    return np.convolve(x, k, mode="same")
 def make_continuous_curves(p: Params) -> tuple[np.ndarray, np.ndarray]:
    """
    Produce 3 smooth curves with gentle long-term temporal patterning.
    Returns:
      t: (T,)
      Y: (n_curves, T)
    """
    rng = np.random.default_rng(p.seed)
    T = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, T, endpoint=False)
    Y = []
    base_freqs = [0.12, 0.09, 0.15]
    mid_freqs = [0.65, 0.85, 0.75]
    for i in range(p.n_curves):
        f1 = base_freqs[i % len(base_freqs)]
        f2 = mid_freqs[i % len(mid_freqs)]
        ph = rng.uniform(0, 2 * np.pi)
        # Smooth trend + mid wiggle
        y = (
            1.00 * np.sin(2 * np.pi * f1 * t + ph)
            + 0.35 * np.sin(2 * np.pi * f2 * t + 0.7 * ph)
        )
        # Add a couple of smooth bumps (like slow pattern changes)
        bumps = np.zeros_like(t)
        for _ in range(2):
            mu = rng.uniform(0.8, p.seconds - 0.8)
            sig = rng.uniform(0.35, 0.75)
            bumps += np.exp(-0.5 * ((t - mu) / sig) ** 2)
        y += 0.55 * bumps
        # Mild smooth noise
        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.04))
        y += 0.12 * noise
        # Normalize and compress amplitude to fit nicely
        y = (y - y.mean()) / (y.std() + 1e-9)
        y *= 0.42
        Y.append(y)
    return t, np.vstack(Y)
 def make_discrete_bars(p: Params) -> np.ndarray:
    """
    Generate discrete "token-like" bars across time bins.
    Returns:
      ids: (n_bins,) integer category ids
    """
    rng = np.random.default_rng(p.seed + 123)
    n = p.n_bins
    # A piecewise-constant sequence with occasional changes (looks like discrete channel)
    ids = np.zeros(n, dtype=int)
    cur = rng.integers(0, 8)
    for i in range(n):
        if i == 0 or rng.random() < 0.25:
            cur = rng.integers(0, 8)
        ids[i] = cur
    return ids
 def draw_transformer_lower_half_svg(out_path: Path, p: Params) -> None:
    # --- Figure + transparent background ---
    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
    fig.patch.set_alpha(0.0)
    # Two stacked axes: curves (top), bars (bottom)
    # Tight, diagram-style layout
    ax_curves = fig.add_axes([0.06, 0.28, 0.90, 0.68])  # [left, bottom, width, height]
    ax_bars = fig.add_axes([0.06, 0.10, 0.90, 0.14])
    ax_curves.patch.set_alpha(0.0)
    ax_bars.patch.set_alpha(0.0)
    for ax in (ax_curves, ax_bars):
        ax.set_axis_off()
    # --- Data ---
    t, Y = make_continuous_curves(p)
    ids = make_discrete_bars(p)
    # --- Continuous curves (explicit colors to match the “multi-colored” look) ---
    # Feel free to swap these hex colors to match your figure theme.
    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c"]  # blue / orange / green
    for i in range(Y.shape[0]):
        ax_curves.plot(t, Y[i], linewidth=p.curve_lw, color=curve_colors[i % len(curve_colors)])
    # Set curve bounds with padding (keeps it clean)
    ymin, ymax = float(Y.min()), float(Y.max())
    pad = 0.10 * (ymax - ymin + 1e-9)
    ax_curves.set_xlim(t[0], t[-1])
    ax_curves.set_ylim(ymin - pad, ymax + pad)
    # --- Discrete bars: small colored rectangles along the timeline ---
    # A small palette for categories (repeats if more categories appear)
    bar_palette = [
        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
    ]
    # Convert bins into time spans
    n = len(ids)
    x0, x1 = t[0], t[-1]
    total = x1 - x0
    bin_w = total / n
    gap = p.bar_gap * bin_w
    # Draw bars in [0,1] y-space inside ax_bars
    ax_bars.set_xlim(x0, x1)
    ax_bars.set_ylim(0, 1)
    for i, cat in enumerate(ids):
        left = x0 + i * bin_w + gap / 2
        width = bin_w - gap
        color = bar_palette[int(cat) % len(bar_palette)]
        rect = Rectangle(
            (left, (1 - p.bar_height) / 2),
            width,
            p.bar_height,
            facecolor=color,
            edgecolor="none",
        )
        ax_bars.add_patch(rect)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
    plt.close(fig)
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--out", type=Path, default=Path("transformer_lower_half.svg"))
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=10.0)
    ap.add_argument("--fs", type=int, default=300)
    ap.add_argument("--bins", type=int, default=40)
    args = ap.parse_args()
    p = Params(seed=args.seed, seconds=args.seconds, fs=args.fs, n_bins=args.bins)
    draw_transformer_lower_half_svg(args.out, p)
    print(f"Wrote: {args.out}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/draw_transformer_lower_half_axes.py
+++ b/arxiv-style/fig-scripts/draw_transformer_lower_half_axes.py
@@ -0,0 +1,202 @@
 #!/usr/bin/env python3
 """
 Transformer section lower-half visuals WITH AXES ONLY:
 - Axes spines visible
 - NO numbers (tick labels hidden)
 - NO words (axis labels removed)
 - Transparent background
 - One SVG output
 Run:
  uv run python draw_transformer_lower_half_axes_only.py --out ./assets/transformer_lower_half_axes_only.svg
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
 from matplotlib.patches import Rectangle
@dataclass
 class Params:
    seed: int = 7
    seconds: float = 10.0
    fs: int = 300
    # Continuous channels
    n_curves: int = 3
    curve_lw: float = 2.4
    # Discrete bars
    n_bins: int = 40
    bar_height: float = 0.55   # fraction of the discrete-axis y-range
    bar_gap: float = 0.08      # fraction of bar width
    # Figure size
    width_in: float = 6.6
    height_in: float = 2.6
 def _smooth(x: np.ndarray, win: int) -> np.ndarray:
    win = max(3, int(win) | 1)  # odd
    k = np.ones(win, dtype=float)
    k /= k.sum()
    return np.convolve(x, k, mode="same")
 def make_continuous_curves(p: Params) -> tuple[np.ndarray, np.ndarray]:
    rng = np.random.default_rng(p.seed)
    T = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, T, endpoint=False)
    Y = []
    base_freqs = [0.12, 0.09, 0.15]
    mid_freqs = [0.65, 0.85, 0.75]
    for i in range(p.n_curves):
        f1 = base_freqs[i % len(base_freqs)]
        f2 = mid_freqs[i % len(mid_freqs)]
        ph = rng.uniform(0, 2 * np.pi)
        y = (
            1.00 * np.sin(2 * np.pi * f1 * t + ph)
            + 0.35 * np.sin(2 * np.pi * f2 * t + 0.7 * ph)
        )
        bumps = np.zeros_like(t)
        for _ in range(2):
            mu = rng.uniform(0.8, p.seconds - 0.8)
            sig = rng.uniform(0.35, 0.75)
            bumps += np.exp(-0.5 * ((t - mu) / sig) ** 2)
        y += 0.55 * bumps
        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.04))
        y += 0.12 * noise
        y = (y - y.mean()) / (y.std() + 1e-9)
        y *= 0.42
        Y.append(y)
    return t, np.vstack(Y)
 def make_discrete_bars(p: Params) -> np.ndarray:
    rng = np.random.default_rng(p.seed + 123)
    n = p.n_bins
    ids = np.zeros(n, dtype=int)
    cur = rng.integers(0, 8)
    for i in range(n):
        if i == 0 or rng.random() < 0.25:
            cur = rng.integers(0, 8)
        ids[i] = cur
    return ids
 def _axes_only(ax: plt.Axes) -> None:
    """Keep spines (axes lines), remove all ticks/labels/words."""
    # No labels
    ax.set_xlabel("")
    ax.set_ylabel("")
    ax.set_title("")
    # Keep spines as the only axes element
    for side in ("top", "right", "bottom", "left"):
        ax.spines[side].set_visible(True)
    # Remove tick marks and tick labels entirely
    ax.set_xticks([])
    ax.set_yticks([])
    ax.tick_params(
        axis="both",
        which="both",
        bottom=False,
        left=False,
        top=False,
        right=False,
        labelbottom=False,
        labelleft=False,
    )
    # No grid
    ax.grid(False)
 def draw_transformer_lower_half_svg(out_path: Path, p: Params) -> None:
    fig = plt.figure(figsize=(p.width_in, p.height_in), dpi=200)
    fig.patch.set_alpha(0.0)
    # Two axes sharing x (top curves, bottom bars)
    ax_curves = fig.add_axes([0.10, 0.38, 0.86, 0.56])
    ax_bars = fig.add_axes([0.10, 0.14, 0.86, 0.18], sharex=ax_curves)
    ax_curves.patch.set_alpha(0.0)
    ax_bars.patch.set_alpha(0.0)
    # Data
    t, Y = make_continuous_curves(p)
    ids = make_discrete_bars(p)
    # Top: continuous curves
    curve_colors = ["#1f77b4", "#ff7f0e", "#2ca02c"]  # blue / orange / green
    for i in range(Y.shape[0]):
        ax_curves.plot(t, Y[i], linewidth=p.curve_lw, color=curve_colors[i % len(curve_colors)])
    ymin, ymax = float(Y.min()), float(Y.max())
    ypad = 0.10 * (ymax - ymin + 1e-9)
    ax_curves.set_xlim(t[0], t[-1])
    ax_curves.set_ylim(ymin - ypad, ymax + ypad)
    # Bottom: discrete bars (colored strip)
    bar_palette = [
        "#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
        "#ff7f00", "#ffff33", "#a65628", "#f781bf",
    ]
    x0, x1 = t[0], t[-1]
    total = x1 - x0
    n = len(ids)
    bin_w = total / n
    gap = p.bar_gap * bin_w
    ax_bars.set_xlim(x0, x1)
    ax_bars.set_ylim(0, 1)
    bar_y = (1 - p.bar_height) / 2
    for i, cat in enumerate(ids):
        left = x0 + i * bin_w + gap / 2
        width = bin_w - gap
        color = bar_palette[int(cat) % len(bar_palette)]
        ax_bars.add_patch(Rectangle((left, bar_y), width, p.bar_height, facecolor=color, edgecolor="none"))
    # Apply "axes only" styling (no numbers/words)
    _axes_only(ax_curves)
    _axes_only(ax_bars)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", transparent=True, bbox_inches="tight", pad_inches=0.0)
    plt.close(fig)
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--out", type=Path, default=Path("transformer_lower_half_axes_only.svg"))
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=10.0)
    ap.add_argument("--fs", type=int, default=300)
    ap.add_argument("--bins", type=int, default=40)
    ap.add_argument("--curves", type=int, default=3)
    args = ap.parse_args()
    p = Params(seed=args.seed, seconds=args.seconds, fs=args.fs, n_bins=args.bins, n_curves=args.curves)
    draw_transformer_lower_half_svg(args.out, p)
    print(f"Wrote: {args.out}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/gen_noise_ddmp.py
+++ b/arxiv-style/fig-scripts/gen_noise_ddmp.py
@@ -0,0 +1,161 @@
 #!/usr/bin/env python3
 """
 Generate "Noisy Residual" and "Denoised Residual" curves as SVGs.
 - Produces TWO separate SVG files:
    noisy_residual.svg
    denoised_residual.svg
 - Curves are synthetic but shaped like residual noise + denoised residual.
 - Uses only matplotlib + numpy.
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
@dataclass
 class CurveParams:
    seconds: float = 12.0          # length of the signal
    fs: int = 250                  # samples per second
    seed: int = 7                  # RNG seed for reproducibility
    base_amp: float = 0.12         # smooth baseline amplitude
    noise_amp: float = 0.55        # high-frequency noise amplitude
    burst_amp: float = 1.2         # occasional spike amplitude
    burst_rate_hz: float = 0.35    # average spike frequency
    denoise_smooth_ms: float = 120 # smoothing window for "denoised" (ms)
 def gaussian_smooth(x: np.ndarray, sigma_samples: float) -> np.ndarray:
    """Gaussian smoothing using explicit kernel convolution (no SciPy dependency)."""
    if sigma_samples <= 0:
        return x.copy()
    radius = int(np.ceil(4 * sigma_samples))
    k = np.arange(-radius, radius + 1, dtype=float)
    kernel = np.exp(-(k**2) / (2 * sigma_samples**2))
    kernel /= kernel.sum()
    return np.convolve(x, kernel, mode="same")
 def make_residual(params: CurveParams) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Create synthetic residual:
    - baseline: smooth wavy trend + slight drift
    - noise: band-limited-ish high-frequency noise
    - bursts: sparse spikes / impulse-like events
    Returns: (t, noisy, denoised)
    """
    rng = np.random.default_rng(params.seed)
    n = int(params.seconds * params.fs)
    t = np.linspace(0, params.seconds, n, endpoint=False)
    # Smooth baseline (small): combination of sinusoids + small random drift
    baseline = (
        0.7 * np.sin(2 * np.pi * 0.35 * t + 0.2)
        + 0.35 * np.sin(2 * np.pi * 0.9 * t + 1.2)
        + 0.25 * np.sin(2 * np.pi * 0.15 * t + 2.0)
    )
    baseline *= params.base_amp
    drift = np.cumsum(rng.normal(0, 1, size=n))
    drift = drift / (np.max(np.abs(drift)) + 1e-9) * (params.base_amp * 0.25)
    baseline = baseline + drift
    # High-frequency noise: whitened then lightly smoothed to look "oscillatory"
    raw = rng.normal(0, 1, size=n)
    hf = raw - gaussian_smooth(raw, sigma_samples=params.fs * 0.03)  # remove slow part
    hf = hf / (np.std(hf) + 1e-9)
    hf *= params.noise_amp
    # Bursts/spikes: Poisson process impulses convolved with short kernel
    expected_bursts = params.burst_rate_hz * params.seconds
    k_bursts = rng.poisson(expected_bursts)
    impulses = np.zeros(n)
    if k_bursts > 0:
        idx = rng.integers(0, n, size=k_bursts)
        impulses[idx] = rng.normal(loc=1.0, scale=0.4, size=k_bursts)
    # Shape impulses into spikes (asymmetric bump)
    spike_kernel_len = int(params.fs * 0.06)  # ~60ms
    spike_kernel_len = max(spike_kernel_len, 7)
    spike_t = np.arange(spike_kernel_len)
    spike_kernel = np.exp(-spike_t / (params.fs * 0.012))  # fast decay
    spike_kernel *= np.hanning(spike_kernel_len)  # taper
    spike_kernel /= (spike_kernel.max() + 1e-9)
    bursts = np.convolve(impulses, spike_kernel, mode="same")
    bursts *= params.burst_amp
    noisy = baseline + hf + bursts
    # "Denoised": remove high-frequency using Gaussian smoothing,
    # but keep spike structures partially.
    smooth_sigma = (params.denoise_smooth_ms / 1000.0) * params.fs / 3.0
    denoised = gaussian_smooth(noisy, sigma_samples=smooth_sigma)
    return t, noisy, denoised
 def save_curve_svg(
    t: np.ndarray,
    y: np.ndarray,
    out_path: Path,
    *,
    width_in: float = 5.4,
    height_in: float = 1.6,
    lw: float = 2.2,
    pad: float = 0.03,
 ) -> None:
    """
    Save a clean, figure-only SVG suitable for embedding in diagrams.
    - No axes, ticks, labels.
    - Tight bounding box.
    """
    fig = plt.figure(figsize=(width_in, height_in), dpi=200)
    ax = fig.add_axes([pad, pad, 1 - 2 * pad, 1 - 2 * pad])
    ax.plot(t, y, linewidth=lw)
    # Make it "icon-like" for diagrams: no axes or frames
    ax.set_axis_off()
    # Ensure bounds include a little padding
    ymin, ymax = np.min(y), np.max(y)
    ypad = 0.08 * (ymax - ymin + 1e-9)
    ax.set_xlim(t[0], t[-1])
    ax.set_ylim(ymin - ypad, ymax + ypad)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, format="svg", bbox_inches="tight", pad_inches=0.0)
    plt.close(fig)
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--outdir", type=Path, default=Path("."), help="Output directory")
    ap.add_argument("--seed", type=int, default=7, help="RNG seed")
    ap.add_argument("--seconds", type=float, default=12.0, help="Signal length (s)")
    ap.add_argument("--fs", type=int, default=250, help="Sampling rate (Hz)")
    ap.add_argument("--prefix", type=str, default="", help="Filename prefix (optional)")
    args = ap.parse_args()
    params = CurveParams(seconds=args.seconds, fs=args.fs, seed=args.seed)
    t, noisy, denoised = make_residual(params)
    noisy_path = args.outdir / f"{args.prefix}noisy_residual.svg"
    den_path = args.outdir / f"{args.prefix}denoised_residual.svg"
    save_curve_svg(t, noisy, noisy_path)
    save_curve_svg(t, denoised, den_path)
    print(f"Wrote:\n  {noisy_path}\n  {den_path}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/make_ddpm_like_svg.py
+++ b/arxiv-style/fig-scripts/make_ddpm_like_svg.py
@@ -0,0 +1,188 @@
 #!/usr/bin/env python3
 """
 DDPM-like residual curve SVGs (separate files, fixed colors):
 - noisy_residual.svg    (blue)
 - denoised_residual.svg (purple)
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
@dataclass
 class DDPMStyleParams:
    seconds: float = 12.0
    fs: int = 250
    seed: int = 7
    baseline_amp: float = 0.10
    mid_wiggle_amp: float = 0.18
    colored_noise_amp: float = 0.65
    colored_alpha: float = 1.0
    burst_rate_hz: float = 0.30
    burst_amp: float = 0.9
    burst_width_ms: float = 55
    denoise_sigmas_ms: tuple[float, ...] = (25, 60, 140)
    denoise_weights: tuple[float, ...] = (0.25, 0.35, 0.40)
    denoise_texture_keep: float = 0.10
 def gaussian_smooth(x: np.ndarray, sigma_samples: float) -> np.ndarray:
    if sigma_samples <= 0:
        return x.copy()
    radius = int(np.ceil(4 * sigma_samples))
    k = np.arange(-radius, radius + 1, dtype=float)
    kernel = np.exp(-(k**2) / (2 * sigma_samples**2))
    kernel /= kernel.sum()
    return np.convolve(x, kernel, mode="same")
 def colored_noise_1_f(n: int, rng: np.random.Generator, alpha: float) -> np.ndarray:
    white = rng.normal(0, 1, size=n)
    spec = np.fft.rfft(white)
    freqs = np.fft.rfftfreq(n, d=1.0)
    scale = np.ones_like(freqs)
    nonzero = freqs > 0
    scale[nonzero] = 1.0 / (freqs[nonzero] ** (alpha / 2.0))
    spec *= scale
    x = np.fft.irfft(spec, n=n)
    x = x - np.mean(x)
    x = x / (np.std(x) + 1e-9)
    return x
 def make_ddpm_like_residual(p: DDPMStyleParams) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    rng = np.random.default_rng(p.seed)
    n = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, n, endpoint=False)
    baseline = (
        0.8 * np.sin(2 * np.pi * 0.18 * t + 0.4)
        + 0.35 * np.sin(2 * np.pi * 0.06 * t + 2.2)
    ) * p.baseline_amp
    mid = (
        0.9 * np.sin(2 * np.pi * 0.9 * t + 1.1)
        + 0.5 * np.sin(2 * np.pi * 1.6 * t + 0.2)
        + 0.3 * np.sin(2 * np.pi * 2.4 * t + 2.6)
    ) * p.mid_wiggle_amp
    col = colored_noise_1_f(n, rng, alpha=p.colored_alpha) * p.colored_noise_amp
    expected = p.burst_rate_hz * p.seconds
    k = rng.poisson(expected)
    impulses = np.zeros(n)
    if k > 0:
        idx = rng.integers(0, n, size=k)
        impulses[idx] = rng.normal(loc=1.0, scale=0.35, size=k)
    width = max(int(p.fs * (p.burst_width_ms / 1000.0)), 7)
    u = np.arange(width)
    kernel = np.exp(-u / (p.fs * 0.012)) * np.hanning(width)
    kernel /= (kernel.max() + 1e-9)
    bursts = np.convolve(impulses, kernel, mode="same") * p.burst_amp
    noisy = baseline + mid + col + bursts
    sigmas_samples = [(ms / 1000.0) * p.fs / 3.0 for ms in p.denoise_sigmas_ms]
    smooths = [gaussian_smooth(noisy, s) for s in sigmas_samples]
    den_base = np.zeros_like(noisy)
    for w, sm in zip(p.denoise_weights, smooths):
        den_base += w * sm
    hf = noisy - gaussian_smooth(noisy, sigma_samples=p.fs * 0.03)
    denoised = den_base + p.denoise_texture_keep * (hf / (np.std(hf) + 1e-9)) * (0.10 * np.std(den_base))
    return t, noisy, denoised
 def save_single_curve_svg(
    t: np.ndarray,
    y: np.ndarray,
    out_path: Path,
    *,
    color: str,
    lw: float = 2.2,
 ) -> None:
    fig = plt.figure(figsize=(5.4, 1.6), dpi=200)
    # Make figure background transparent
    fig.patch.set_alpha(0.0)
    ax = fig.add_axes([0.03, 0.03, 0.94, 0.94])
    # Make axes background transparent
    ax.patch.set_alpha(0.0)
    ax.plot(t, y, linewidth=lw, color=color)
    # clean, diagram-friendly
    ax.set_axis_off()
    ymin, ymax = np.min(y), np.max(y)
    ypad = 0.08 * (ymax - ymin + 1e-9)
    ax.set_xlim(t[0], t[-1])
    ax.set_ylim(ymin - ypad, ymax + ypad)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(
        out_path,
        format="svg",
        bbox_inches="tight",
        pad_inches=0.0,
        transparent=True,   # <-- key for transparent output
    )
    plt.close(fig)
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--outdir", type=Path, default=Path("."))
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=12.0)
    ap.add_argument("--fs", type=int, default=250)
    ap.add_argument("--alpha", type=float, default=1.0)
    ap.add_argument("--noise-amp", type=float, default=0.65)
    ap.add_argument("--texture-keep", type=float, default=0.10)
    ap.add_argument("--prefix", type=str, default="")
    args = ap.parse_args()
    p = DDPMStyleParams(
        seconds=args.seconds,
        fs=args.fs,
        seed=args.seed,
        colored_alpha=args.alpha,
        colored_noise_amp=args.noise_amp,
        denoise_texture_keep=args.texture_keep,
    )
    t, noisy, den = make_ddpm_like_residual(p)
    outdir = args.outdir
    noisy_path = outdir / f"{args.prefix}noisy_residual.svg"
    den_path = outdir / f"{args.prefix}denoised_residual.svg"
    # Fixed colors as you requested
    save_single_curve_svg(t, noisy, noisy_path, color="blue")
    save_single_curve_svg(t, den, den_path, color="purple")
    print("Wrote:")
    print(f"  {noisy_path}")
    print(f"  {den_path}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/pyproject.toml
+++ b/arxiv-style/fig-scripts/pyproject.toml
@@ -0,0 +1,10 @@
 [project]
 name = "fig-gen-ddpm"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"
 requires-python = ">=3.12"
 dependencies = [
  "numpy>=1.26",
  "matplotlib>=3.8",
 ]
--- a/arxiv-style/fig-scripts/synth_ics_3d_waterfall.py
+++ b/arxiv-style/fig-scripts/synth_ics_3d_waterfall.py
@@ -0,0 +1,240 @@
 #!/usr/bin/env python3
 """
 3D "final combined outcome" (time × channel × value) with:
 - NO numbers on axes (tick labels removed)
 - Axis *titles* kept (texts are okay)
 - Reduced whitespace: tight bbox + minimal margins
 - White background (non-transparent) suitable for embedding into another SVG
 Output:
  default PNG, optional SVG (2D projected vectors)
 Run:
  uv run python synth_ics_3d_waterfall_tight.py --out ./assets/synth_ics_3d.png
  uv run python synth_ics_3d_waterfall_tight.py --out ./assets/synth_ics_3d.svg --format svg
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
@dataclass
 class Params:
    seed: int = 7
    seconds: float = 10.0
    fs: int = 220
    n_cont: int = 5
    n_disc: int = 2
    disc_vocab: int = 8
    disc_change_rate_hz: float = 1.1
    # view
    elev: float = 25.0
    azim: float = -58.0
    # figure size (smaller, more "cube-like")
    fig_w: float = 5.4
    fig_h: float = 5.0
    # discrete rendering
    disc_z_scale: float = 0.45
    disc_z_offset: float = -1.4
    # margins (figure fraction)
    left: float = 0.03
    right: float = 0.99
    bottom: float = 0.03
    top: float = 0.99
 def _smooth(x: np.ndarray, win: int) -> np.ndarray:
    win = max(3, int(win) | 1)
    k = np.ones(win, dtype=float)
    k /= k.sum()
    return np.convolve(x, k, mode="same")
 def make_continuous(p: Params) -> tuple[np.ndarray, np.ndarray]:
    rng = np.random.default_rng(p.seed)
    T = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, T, endpoint=False)
    Y = []
    base_freqs = [0.08, 0.10, 0.12, 0.09, 0.11]
    mid_freqs = [0.55, 0.70, 0.85, 0.62, 0.78]
    for i in range(p.n_cont):
        f1 = base_freqs[i % len(base_freqs)]
        f2 = mid_freqs[i % len(mid_freqs)]
        ph = rng.uniform(0, 2 * np.pi)
        y = (
            0.95 * np.sin(2 * np.pi * f1 * t + ph)
            + 0.28 * np.sin(2 * np.pi * f2 * t + 0.65 * ph)
        )
        bumps = np.zeros_like(t)
        for _ in range(rng.integers(2, 4)):
            mu = rng.uniform(0.8, p.seconds - 0.8)
            sig = rng.uniform(0.25, 0.80)
            bumps += np.exp(-0.5 * ((t - mu) / (sig + 1e-9)) ** 2)
        y += 0.55 * bumps
        noise = _smooth(rng.normal(0, 1, size=T), win=int(p.fs * 0.05))
        y += 0.10 * noise
        y = (y - y.mean()) / (y.std() + 1e-9)
        Y.append(y)
    return t, np.vstack(Y)  # (n_cont, T)
 def make_discrete(p: Params, t: np.ndarray) -> np.ndarray:
    rng = np.random.default_rng(p.seed + 123)
    T = len(t)
    expected_changes = max(1, int(p.seconds * p.disc_change_rate_hz))
    X = np.zeros((p.n_disc, T), dtype=int)
    for c in range(p.n_disc):
        k = rng.poisson(expected_changes) + 1
        pts = np.unique(rng.integers(0, T, size=k))
        pts = np.sort(np.concatenate([[0], pts, [T]]))
        cur = rng.integers(0, p.disc_vocab)
        for a, b in zip(pts[:-1], pts[1:]):
            if a != 0 and rng.random() < 0.85:
                cur = rng.integers(0, p.disc_vocab)
            X[c, a:b] = cur
    return X
 def style_3d_axes(ax):
    # Make panes white but less visually heavy
    try:
        # Keep pane fill ON (white background) but reduce edge prominence
        ax.xaxis.pane.set_edgecolor("0.7")
        ax.yaxis.pane.set_edgecolor("0.7")
        ax.zaxis.pane.set_edgecolor("0.7")
    except Exception:
        pass
    ax.grid(True, linewidth=0.4, alpha=0.30)
 def remove_tick_numbers_keep_axis_titles(ax):
    # Remove tick labels (numbers) and tick marks, keep axis titles
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    ax.set_zticklabels([])
    ax.tick_params(
        axis="both",
        which="both",
        length=0,   # no tick marks
        pad=0,
    )
    # 3D has separate tick_params for z on some versions; this still works broadly:
    try:
        ax.zaxis.set_tick_params(length=0, pad=0)
    except Exception:
        pass
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--out", type=Path, default=Path("synth_ics_3d.png"))
    ap.add_argument("--format", choices=["png", "svg"], default="png")
    ap.add_argument("--seed", type=int, default=7)
    ap.add_argument("--seconds", type=float, default=10.0)
    ap.add_argument("--fs", type=int, default=220)
    ap.add_argument("--n-cont", type=int, default=5)
    ap.add_argument("--n-disc", type=int, default=2)
    ap.add_argument("--disc-vocab", type=int, default=8)
    ap.add_argument("--disc-rate", type=float, default=1.1)
    ap.add_argument("--elev", type=float, default=25.0)
    ap.add_argument("--azim", type=float, default=-58.0)
    ap.add_argument("--fig-w", type=float, default=5.4)
    ap.add_argument("--fig-h", type=float, default=5.0)
    ap.add_argument("--disc-z-scale", type=float, default=0.45)
    ap.add_argument("--disc-z-offset", type=float, default=-1.4)
    args = ap.parse_args()
    p = Params(
        seed=args.seed,
        seconds=args.seconds,
        fs=args.fs,
        n_cont=args.n_cont,
        n_disc=args.n_disc,
        disc_vocab=args.disc_vocab,
        disc_change_rate_hz=args.disc_rate,
        elev=args.elev,
        azim=args.azim,
        fig_w=args.fig_w,
        fig_h=args.fig_h,
        disc_z_scale=args.disc_z_scale,
        disc_z_offset=args.disc_z_offset,
    )
    t, Yc = make_continuous(p)
    Xd = make_discrete(p, t)
    fig = plt.figure(figsize=(p.fig_w, p.fig_h), dpi=220, facecolor="white")
    ax = fig.add_subplot(111, projection="3d")
    style_3d_axes(ax)
    # Reduce whitespace around axes (tight placement)
    fig.subplots_adjust(left=p.left, right=p.right, bottom=p.bottom, top=p.top)
    # Draw continuous channels
    for i in range(p.n_cont):
        y = np.full_like(t, fill_value=i, dtype=float)
        z = Yc[i]
        ax.plot(t, y, z, linewidth=2.0)
    # Draw discrete channels as steps
    for j in range(p.n_disc):
        ch = p.n_cont + j
        y = np.full_like(t, fill_value=ch, dtype=float)
        z = p.disc_z_offset + p.disc_z_scale * Xd[j].astype(float)
        ax.step(t, y, z, where="post", linewidth=2.2)
    # Axis titles kept
    ax.set_xlabel("time")
    ax.set_ylabel("channel")
    ax.set_zlabel("value")
    # Remove numeric tick labels + tick marks
    remove_tick_numbers_keep_axis_titles(ax)
    # Camera
    ax.view_init(elev=p.elev, azim=p.azim)
    # Save tightly (minimize white border)
    args.out.parent.mkdir(parents=True, exist_ok=True)
    save_kwargs = dict(bbox_inches="tight", pad_inches=0.03, facecolor="white")
    if args.format == "svg" or args.out.suffix.lower() == ".svg":
        fig.savefig(args.out, format="svg", **save_kwargs)
    else:
        fig.savefig(args.out, format="png", **save_kwargs)
    plt.close(fig)
    print(f"Wrote: {args.out}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-scripts/transformer_math_figure.py
+++ b/arxiv-style/fig-scripts/transformer_math_figure.py
@@ -0,0 +1,262 @@
 #!/usr/bin/env python3
 """
 Transformer-ish "trend" visuals with NO equations:
 - attention_weights.svg      : heatmap-like attention map (looks like "Transformer attends to positions")
 - token_activation_trends.svg: multiple token-channel curves (continuous trends)
 - discrete_tokens.svg        : step-like discrete channel trends (optional)
 All SVGs have transparent background and no axes (diagram-friendly).
 """
 from __future__ import annotations
 import argparse
 from dataclasses import dataclass
 from pathlib import Path
 import numpy as np
 import matplotlib.pyplot as plt
 # ----------------------------
 # Synthetic data generators
 # ----------------------------
@dataclass
 class Params:
    seed: int = 7
    T: int = 24                 # sequence length (positions)
    n_heads: int = 4            # attention heads to blend/choose
    n_curves: int = 7           # curves in token_activation_trends
    seconds: float = 10.0
    fs: int = 200
 def _gaussian(x: np.ndarray, mu: float, sig: float) -> np.ndarray:
    return np.exp(-0.5 * ((x - mu) / (sig + 1e-9)) ** 2)
 def make_attention_map(T: int, rng: np.random.Generator, mode: str) -> np.ndarray:
    """
    Create a transformer-like attention weight matrix A (T x T) with different visual styles:
      - "local": mostly near-diagonal attention
      - "global": some global tokens attend broadly
      - "causal": lower-triangular (decoder-like) with local preference
    """
    i = np.arange(T)[:, None]  # query positions
    j = np.arange(T)[None, :]  # key positions
    if mode == "local":
        logits = -((i - j) ** 2) / (2 * (2.2 ** 2))
        logits += 0.15 * rng.normal(size=(T, T))
    elif mode == "global":
        logits = -((i - j) ** 2) / (2 * (3.0 ** 2))
        # Add a few "global" key positions that many queries attend to
        globals_ = rng.choice(T, size=max(2, T // 10), replace=False)
        for g in globals_:
            logits += 1.2 * _gaussian(j, mu=g, sig=1.0)
        logits += 0.12 * rng.normal(size=(T, T))
    elif mode == "causal":
        logits = -((i - j) ** 2) / (2 * (2.0 ** 2))
        logits += 0.12 * rng.normal(size=(T, T))
        logits = np.where(j <= i, logits, -1e9)  # mask future
    else:
        raise ValueError(f"Unknown attention mode: {mode}")
    # softmax rows
    logits = logits - np.max(logits, axis=1, keepdims=True)
    A = np.exp(logits)
    A /= (np.sum(A, axis=1, keepdims=True) + 1e-9)
    return A
 def make_token_activation_trends(p: Params) -> tuple[np.ndarray, np.ndarray]:
    """
    Multiple smooth curves that feel like "representations evolving across layers/time".
    Returns:
      t: (N,)
      Y: (n_curves, N)
    """
    rng = np.random.default_rng(p.seed)
    N = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, N, endpoint=False)
    Y = []
    for k in range(p.n_curves):
        # Multi-scale smooth components + some bursty response
        f1 = 0.10 + 0.04 * k
        f2 = 0.60 + 0.18 * (k % 3)
        phase = rng.uniform(0, 2 * np.pi)
        base = 0.9 * np.sin(2 * np.pi * f1 * t + phase) + 0.35 * np.sin(2 * np.pi * f2 * t + 0.7 * phase)
        # "attention-like gating": a few bumps where the curve spikes smoothly
        bumps = np.zeros_like(t)
        for _ in range(rng.integers(2, 5)):
            mu = rng.uniform(0.5, p.seconds - 0.5)
            sig = rng.uniform(0.15, 0.55)
            bumps += 0.9 * _gaussian(t, mu=mu, sig=sig)
        noise = rng.normal(0, 1, size=N)
        noise = np.convolve(noise, np.ones(11) / 11.0, mode="same")  # smooth noise
        y = base + 0.85 * bumps + 0.12 * noise
        # normalize and vertically offset
        y = (y - y.mean()) / (y.std() + 1e-9)
        y = 0.75 * y + 0.18 * k
        Y.append(y)
    return t, np.vstack(Y)
 def make_discrete_trends(p: Params, vocab: int = 9, change_rate_hz: float = 1.3) -> tuple[np.ndarray, np.ndarray]:
    """
    Discrete step-like channels: useful if you want a "token-id / discrete feature" feel.
    Returns:
      t: (N,)
      X: (n_curves, N) integers
    """
    rng = np.random.default_rng(p.seed + 123)
    N = int(p.seconds * p.fs)
    t = np.linspace(0, p.seconds, N, endpoint=False)
    expected = max(1, int(p.seconds * change_rate_hz))
    X = np.zeros((p.n_curves, N), dtype=int)
    for c in range(p.n_curves):
        k = rng.poisson(expected) + 1
        pts = np.unique(rng.integers(0, N, size=k))
        pts = np.sort(np.concatenate([[0], pts, [N]]))
        cur = rng.integers(0, vocab)
        for a, b in zip(pts[:-1], pts[1:]):
            if a != 0 and rng.random() < 0.9:
                cur = rng.integers(0, vocab)
            X[c, a:b] = cur
    return t, X
 # ----------------------------
 # Plot helpers (SVG, transparent, axes-free)
 # ----------------------------
 def _transparent_fig_ax(width_in: float, height_in: float):
    fig = plt.figure(figsize=(width_in, height_in), dpi=200)
    fig.patch.set_alpha(0.0)
    ax = fig.add_axes([0.03, 0.03, 0.94, 0.94])
    ax.patch.set_alpha(0.0)
    ax.set_axis_off()
    return fig, ax
 def save_attention_svg(A: np.ndarray, out: Path, *, show_colorbar: bool = False) -> None:
    fig, ax = _transparent_fig_ax(4.2, 4.2)
    # Using default colormap (no explicit color specification)
    im = ax.imshow(A, aspect="equal", interpolation="nearest")
    if show_colorbar:
        # colorbar can be useful, but it adds clutter in diagrams
        cax = fig.add_axes([0.92, 0.10, 0.03, 0.80])
        cb = fig.colorbar(im, cax=cax)
        cb.outline.set_linewidth(1.0)
    out.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
    plt.close(fig)
 def save_multi_curve_svg(t: np.ndarray, Y: np.ndarray, out: Path, *, lw: float = 2.0) -> None:
    fig, ax = _transparent_fig_ax(6.0, 2.2)
    for i in range(Y.shape[0]):
        ax.plot(t, Y[i], linewidth=lw)
    y_all = Y.reshape(-1)
    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
    ypad = 0.08 * (ymax - ymin + 1e-9)
    ax.set_xlim(t[0], t[-1])
    ax.set_ylim(ymin - ypad, ymax + ypad)
    out.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
    plt.close(fig)
 def save_discrete_svg(t: np.ndarray, X: np.ndarray, out: Path, *, lw: float = 2.0, spacing: float = 1.25) -> None:
    fig, ax = _transparent_fig_ax(6.0, 2.2)
    for i in range(X.shape[0]):
        y = X[i].astype(float) + i * spacing
        ax.step(t, y, where="post", linewidth=lw)
    y_all = (X.astype(float) + np.arange(X.shape[0])[:, None] * spacing).reshape(-1)
    ymin, ymax = float(np.min(y_all)), float(np.max(y_all))
    ypad = 0.10 * (ymax - ymin + 1e-9)
    ax.set_xlim(t[0], t[-1])
    ax.set_ylim(ymin - ypad, ymax + ypad)
    out.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out, format="svg", bbox_inches="tight", pad_inches=0.0, transparent=True)
    plt.close(fig)
 # ----------------------------
 # CLI
 # ----------------------------
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--outdir", type=Path, default=Path("out"))
    ap.add_argument("--seed", type=int, default=7)
    # attention
    ap.add_argument("--T", type=int, default=24)
    ap.add_argument("--attn-mode", type=str, default="local", choices=["local", "global", "causal"])
    ap.add_argument("--colorbar", action="store_true")
    # curves
    ap.add_argument("--seconds", type=float, default=10.0)
    ap.add_argument("--fs", type=int, default=200)
    ap.add_argument("--n-curves", type=int, default=7)
    # discrete optional
    ap.add_argument("--with-discrete", action="store_true")
    ap.add_argument("--disc-vocab", type=int, default=9)
    ap.add_argument("--disc-rate", type=float, default=1.3)
    args = ap.parse_args()
    p = Params(
        seed=args.seed,
        T=args.T,
        n_curves=args.n_curves,
        seconds=args.seconds,
        fs=args.fs,
    )
    rng = np.random.default_rng(args.seed)
    # 1) attention map
    A = make_attention_map(args.T, rng, mode=args.attn_mode)
    save_attention_svg(A, args.outdir / "attention_weights.svg", show_colorbar=args.colorbar)
    # 2) continuous trends
    t, Y = make_token_activation_trends(p)
    save_multi_curve_svg(t, Y, args.outdir / "token_activation_trends.svg")
    # 3) discrete trends (optional)
    if args.with_discrete:
        td, X = make_discrete_trends(p, vocab=args.disc_vocab, change_rate_hz=args.disc_rate)
        save_discrete_svg(td, X, args.outdir / "discrete_tokens.svg")
    print("Wrote:")
    print(f"  {args.outdir / 'attention_weights.svg'}")
    print(f"  {args.outdir / 'token_activation_trends.svg'}")
    if args.with_discrete:
        print(f"  {args.outdir / 'discrete_tokens.svg'}")
 if __name__ == "__main__":
    main()
--- a/arxiv-style/fig-type-aware-routing-realdata.pdf
+++ b/arxiv-style/fig-type-aware-routing-realdata.pdf
--- a/arxiv-style/main.tex
+++ b/arxiv-style/main.tex
@@ -22,17 +22,16 @@
 \usepackage{bm}
 \usepackage{array}       % For column formatting
 \usepackage{caption}     % Better caption spacing
 \usepackage{float}       % Precise figure placement
 % 标题
 \title{Mask-DDPM: Transformer-Conditioned Mixed-Type Diffusion for Semantically Valid ICS Telemetry Synthesis}
 % 若不需要日期，取消下面一行的注释
 \date{}
 \newif\ifuniqueAffiliation
 \uniqueAffiliationtrue
-\ifuniqueAffiliation % 标准作者块
+\ifuniqueAffiliation
 \author{
    Zhenglan Chen \\
 	Aberdeen Institute of Data Science and Artificial Intelligence\\
@@ -53,17 +52,15 @@
 	\texttt{20223803065@m.scnu.edu.cn}
 	\And
 	Huan Yang \\
-    foo\\
+    School of Artificial Intelligence\\
    South China Normal University\\
    Guangzhou, Guangdong 510631, China \\
-    \texttt{foo@bar.com} \\
+    \texttt{huan.yang@m.scnu.edu.cn} \\
 }
 \fi
 % 页眉设置
 \renewcommand{\shorttitle}{\textit{arXiv} Template}
 %%% PDF 元数据
 \hypersetup{
 pdftitle={Your Paper Title},
 pdfsubject={cs.LG, cs.CR},
@@ -75,47 +72,46 @@ pdfkeywords={Keyword1, Keyword2, Keyword3},
 \maketitle
 \begin{abstract}
-Industrial control systems (ICS) security research is increasingly constrained by the scarcity and non-shareability of realistic traffic and telemetry, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol feature/telemetry level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-horizon temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees categorical legality by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows (L=96) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 ± 0.0079 (continuous), mean JSD = 0.0284 ± 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 ± 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows.
+Industrial control systems (ICS) security research is increasingly constrained by the scarcity and limited shareability of realistic communication traces and process measurements, especially for attack scenarios. To mitigate this bottleneck, we study synthetic generation at the protocol-feature and process-signal level, where samples must simultaneously preserve temporal coherence, match continuous marginal distributions, and keep discrete supervisory variables strictly within valid vocabularies. We propose Mask-DDPM, a hybrid framework tailored to mixed-type, multi-scale ICS sequences. Mask-DDPM factorizes generation into (i) a causal Transformer trend module that rolls out a stable long-range temporal scaffold for continuous channels, (ii) a trend-conditioned residual DDPM that refines local stochastic structure and heavy-tailed fluctuations without degrading global dynamics, (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees valid symbol generation by construction, and (iv) a type-aware decomposition/routing layer that aligns modeling mechanisms with heterogeneous ICS variable origins and enforces deterministic reconstruction where appropriate. Evaluated on fixed-length windows ($L=96$) derived from the HAI Security Dataset, Mask-DDPM achieves stable fidelity across seeds with mean KS = 0.3311 $\pm$ 0.0079 (continuous), mean JSD = 0.0284 $\pm$ 0.0073 (discrete), and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027, indicating faithful marginals, preserved short-horizon dynamics, and valid discrete semantics. The resulting generator provides a reproducible basis for data augmentation, benchmarking, and downstream ICS protocol reconstruction workflows.
 \end{abstract}
 % 关键词
 \keywords{Machine Learning \and Cyber Defense \and ICS}
 % 1. Introduction
 \section{Introduction}
 \label{sec:intro}
-Industrial control systems (ICS) form the backbone of modern critical infrastructure, which includes power grids, water treatment, manufacturing, and transportation, among others. These systems monitor, regulate, and automate the physical processes through sensors, actuators, programmable logic controllers (PLCs), and monitoring software. Unlike conventional IT systems, ICS operate in real time, closely coupled with physical processes and safety‑critical constraints, using heterogeneous and legacy communication protocols such as Modbus/TCP and DNP3 that were not originally designed with robust security in mind. This architectural complexity and operational criticality make ICS high‑impact targets for cyber attacks, where disruptions can result in physical damage, environmental harm, and even loss of life. Recent reviews of ICS security highlight the expanding attack surface due to increased connectivity, legacy systems’ vulnerabilities, and the inadequacy of traditional security controls in capturing the nuances of ICS networks and protocols \citep{10.1007/s10844-022-00753-1, Nankya2023-gp}
+Industrial control systems (ICS) form the backbone of modern critical infrastructure, which includes power grids, water treatment, manufacturing, and transportation, among others. These systems monitor, regulate, and automate physical processes through sensors, actuators, programmable logic controllers (PLCs), and monitoring software. Unlike conventional IT systems, ICS operate in real time, closely coupled with physical processes and safety-critical constraints, using heterogeneous and legacy communication protocols such as Modbus/TCP and DNP3 that were not originally designed with robust security in mind. This architectural complexity and operational criticality make ICS high-impact targets for cyber attacks, where disruptions can result in physical damage, environmental harm, and even loss of life. Recent reviews of ICS security highlight the expanding attack surface due to increased connectivity, vulnerabilities in legacy systems, and the inadequacy of traditional security controls in capturing the nuances of ICS networks and protocols \citep{10.1007/s10844-022-00753-1, Nankya2023-gp}
-While machine learning (ML) techniques have shown promise for anomaly detection and automated cybersecurity within ICS, they rely heavily on labeled datasets that capture both benign operations and diverse attack patterns. In practice, real ICS traffic data, especially attack‑triggered captures, are scarce due to confidentiality, safety, and legal restrictions, and available public ICS datasets are few, limited in scope, or fail to reflect current threat modalities. For instance, the HAI Security Dataset provides operational telemetry and anomaly flags from a realistic control system setup for research purposes, but must be carefully preprocessed to derive protocol‑relevant features for ML tasks \citep{shin}. Data scarcity directly undermines model generalization, evaluation reproducibility, and the robustness of intrusion detection research, especially when training or testing ML models on realistic ICS behavior remains confined to small or outdated collections of examples \citep{info16100910}.
+While machine learning (ML) techniques have shown promise for anomaly detection and automated cybersecurity within ICS, they rely heavily on labeled datasets that capture both benign operations and diverse attack patterns. In practice, real ICS traffic data, especially attack-triggered captures, are scarce due to confidentiality, safety, and legal restrictions, and available public ICS datasets are few, limited in scope, or fail to reflect current threat modalities. For instance, the HAI Security Dataset provides operational telemetry and anomaly flags from a realistic control system setup for research purposes, but must be carefully preprocessed to derive protocol-relevant features for ML tasks \citep{shin}. Data scarcity directly undermines model generalization, evaluation reproducibility, and the robustness of intrusion detection research, especially when training or testing ML models on realistic ICS behavior remains confined to small or outdated collections of examples \citep{info16100910}.
-Synthetic data generation offers a practical pathway to mitigate these challenges. By programmatically generating feature‑level sequences that mimic the statistical and temporal structure of real ICS telemetry, researchers can augment scarce training sets, standardize benchmarking, and preserve operational confidentiality. Relative to raw packet captures, feature‑level synthesis abstracts critical protocol semantics and statistical patterns without exposing sensitive fields, making it more compatible with safety constraints and compliance requirements in ICS environments. Modern generative modeling, including diffusion models, has advanced significantly in producing high‑fidelity synthetic data across domains. Diffusion approaches, such as denoising diffusion probabilistic models, learn to transform noise into coherent structured samples and have been successfully applied to tabular or time series data synthesis with better stability and data coverage compared to adversarial methods \citep{pmlr-v202-kotelnikov23a, rasul2021autoregressivedenoisingdiffusionmodels}
+Synthetic data generation offers a practical pathway to mitigate these challenges. By programmatically generating feature-level sequences that mimic the statistical and temporal structure of real ICS telemetry, researchers can augment scarce training sets, standardize benchmarking, and preserve operational confidentiality. Relative to raw packet captures, feature-level synthesis abstracts critical protocol semantics and statistical patterns without exposing sensitive fields, making it more compatible with safety constraints and compliance requirements in ICS environments. Modern generative modeling, including diffusion models, has advanced significantly in producing high-fidelity synthetic data across domains. Diffusion approaches, such as denoising diffusion probabilistic models, learn to transform noise into coherent structured samples and have been successfully applied to tabular or time series data synthesis with better stability and data coverage compared to adversarial methods \citep{pmlr-v202-kotelnikov23a, rasul2021autoregressivedenoisingdiffusionmodels}
-Despite these advances, most existing work either focuses on packet‑level generation \citep{jiang2023netdiffusionnetworkdataaugmentation} or is limited to generic tabular data \citep{pmlr-v202-kotelnikov23a}, rather than domain‑specific control sequence synthesis tailored for ICS protocols where temporal coherence, multi‑channel dependencies, and discrete protocol legality are jointly required. This gap motivates our focus on protocol feature-level generation for ICS, which involves synthesizing sequences of protocol-relevant fields conditioned on their temporal and cross-channel structure. In this work, we formulate a hybrid modeling pipeline that decouples long‑horizon trends and local statistical detail while preserving discrete semantics of protocol tokens. By combining causal Transformers with diffusion‑based refiners, and enforcing deterministic validity constraints during sampling, our framework generates semantically coherent, temporally consistent, and distributionally faithful ICS feature sequences. We evaluate features derived from the HAI Security Dataset and demonstrate that our approach produces high‑quality synthetic sequences suitable for downstream augmentation, benchmarking, and integration into packet‑construction workflows that respect realistic ICS constraints.
+Despite these advances, most existing work either focuses on packet-level generation \citep{jiang2023netdiffusionnetworkdataaugmentation} or is limited to generic tabular data \citep{pmlr-v202-kotelnikov23a}, rather than domain-specific control sequence synthesis tailored for ICS protocols where temporal coherence, multi-channel dependencies, and discrete protocol legality are jointly required. This gap motivates our focus on protocol feature-level generation for ICS, which involves synthesizing sequences of protocol-relevant fields conditioned on their temporal and cross-channel structure. In this work, we formulate a hybrid modeling pipeline that decouples long-horizon trends and local statistical detail while preserving discrete semantics of protocol tokens. By combining causal Transformers with diffusion-based refiners, and enforcing deterministic validity constraints during sampling, our framework generates semantically coherent, temporally consistent, and distributionally faithful ICS feature sequences. We evaluate features derived from the HAI Security Dataset and demonstrate that our approach produces high-quality synthetic sequences suitable for downstream augmentation, benchmarking, and integration into packet-reconstruction workflows that respect realistic ICS constraints.
 % 2. Related Work
 \section{Related Work}
 \label{sec:related}
-Early generation of network data oriented towards ``realism'' mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \citep{10.1145/1151659.1159928,10.1145/1159913.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \citep{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \citep{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \citep{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \citep{7469060,10.1145/3055366.3055375}. This makes ``synthesis at the feature/telemetry level, aware of protocol and semantics'' more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from ``packet syntax reproduction'' to ``modeling of high-level spatio-temporal distributions and uncertainties'', requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
+Early generation of network data oriented towards "realism" mostly remained at the packet/flow header level, either through replay or statistical synthesis based on single-point observations. Swing, in a closed-loop, network-responsive manner, extracts user/application/network distributions from single-point observations to reproduce burstiness and correlation across multiple time scales \citep{10.1145/1151659.1159928,10.1145/1159913.1159928}. Subsequently, a series of works advanced header synthesis to learning-based generation: the WGAN-based method added explicit verification of protocol field consistency to NetFlow/IPFIX \citep{Ring_2019}, NetShare reconstructed header modeling as flow-level time series and improved fidelity and scalability through domain encoding and parallel fine-tuning \citep{10.1145/3544216.3544251}, and DoppelGANger preserved the long-range structure and downstream sorting consistency of networked time series by decoupling attributes from sequences \citep{Lin_2020}. However, in industrial control system (ICS) scenarios, the original PCAP is usually not shareable, and public testbeds (such as SWaT, WADI) mostly provide process/monitoring telemetry and protocol interactions for security assessment, but public datasets emphasize operational variables rather than packet-level traces \citep{7469060,10.1145/3055366.3055375}. This makes "synthesis at the feature/telemetry level, aware of protocol and semantics" more feasible and necessary in practice: we are more concerned with reproducing high-level distributions and multi-scale temporal patterns according to operational semantics and physical constraints without relying on the original packets. From this perspective, the generation paradigm naturally shifts from "packet syntax reproduction" to "modeling of high-level spatio-temporal distributions and uncertainties", requiring stable training, strong distribution fitting, and interpretable uncertainty characterization.
-Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \citep{NEURIPS2020_4c5bcfec}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \citep{song2021scorebasedgenerativemodelingstochastic}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \citep{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \citep{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \citep{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \citep{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \citep{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \citep{11087622}. These results overall point to a conclusion: when the research focus is on ``telemetry/high-level features'' rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a ``monolithic diffusion'' is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
+Diffusion models exhibit good fit along this path: DDPM achieves high-quality sampling and stable optimization through efficient $\epsilon$ parameterization and weighted variational objectives \citep{NEURIPS2020_4c5bcfec}, the SDE perspective unifies score-based and diffusion, providing likelihood evaluation and prediction-correction sampling strategies based on probability flow ODEs \citep{song2021scorebasedgenerativemodelingstochastic}. For time series, TimeGrad replaces the constrained output distribution with conditional denoising, capturing high-dimensional correlations at each step \citep{rasul2021autoregressivedenoisingdiffusionmodels}; CSDI explicitly performs conditional diffusion and uses two-dimensional attention to simultaneously leverage temporal and cross-feature dependencies, suitable for conditioning and filling in missing values \citep{tashiro2021csdiconditionalscorebaseddiffusion}; in a more general spatio-temporal structure, DiffSTG generalizes diffusion to spatio-temporal graphs, combining TCN/GCN with denoising U-Net to improve CRPS and inference efficiency in a non-autoregressive manner \citep{wen2024diffstgprobabilisticspatiotemporalgraph}, and PriSTI further enhances conditional features and geographical relationships, maintaining robustness under high missing rates and sensor failures \citep{liu2023pristiconditionaldiffusionframework}; in long sequences and continuous domains, DiffWave verifies that diffusion can also match the quality of strong vocoders under non-autoregressive fast synthesis \citep{kong2021diffwaveversatilediffusionmodel}; studies on cellular communication traffic show that diffusion can recover spatio-temporal patterns and provide uncertainty characterization at the urban scale \citep{11087622}. These results overall point to a conclusion: when the research focus is on "telemetry/high-level features" rather than raw messages, diffusion models provide stable and fine-grained distribution fitting and uncertainty quantification, which is exactly in line with the requirements of ICS telemetry synthesis. Meanwhile, directly entrusting all structures to a "monolithic diffusion" is not advisable: long-range temporal skeletons and fine-grained marginal distributions often have optimization tensions, requiring explicit decoupling in modeling.
-Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be ``legal'' under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \citep{austin2023structureddenoisingdiffusionmodels}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \citep{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \citep{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \citep{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of ``sequence legality and state coverage'' \citep{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of ``legality by construction'' in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
+Looking further into the mechanism complexity of ICS: its channel types are inherently mixed, containing both continuous process trajectories and discrete supervision/status variables, and discrete channels must be "legal" under operational constraints. The aforementioned progress in time series diffusion has mainly occurred in continuous spaces, but discrete diffusion has also developed systematic methods: D3PM improves sampling quality and likelihood through absorption/masking and structured transitions in discrete state spaces \citep{austin2023structureddenoisingdiffusionmodels}, subsequent masked diffusion provides stable reconstruction on categorical data in a more simplified form \citep{Lin_2020}, multinomial diffusion directly defines diffusion on a finite vocabulary through mechanisms such as argmax flows \citep{hoogeboom2021argmaxflowsmultinomialdiffusion}, and Diffusion-LM demonstrates an effective path for controllable text generation by imposing gradient constraints in continuous latent spaces \citep{li2022diffusionlmimprovescontrollabletext}. From the perspectives of protocols and finite-state machines, coverage-guided fuzz testing emphasizes the criticality of "sequence legality and state coverage" \citep{meng2025aflnetyearslatercoverageguided,godefroid2017learnfuzzmachinelearninginput,she2019neuzzefficientfuzzingneural}, echoing the concept of "legality by construction" in discrete diffusion: preferentially adopting absorption/masking diffusion on discrete channels, supplemented by type-aware conditioning and sampling constraints, to avoid semantic invalidity and marginal distortion caused by post hoc thresholding.
-From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-modal fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \citep{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \citep{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \citep{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \citep{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \citep{2023}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \citep{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
+From the perspective of high-level synthesis, the temporal structure is equally indispensable: ICS control often involves delay effects, phased operating conditions, and cross-channel coupling, requiring models to be able to characterize low-frequency, long-range dependencies while also overlaying multi-facated fine-grained fluctuations on them. The Transformer series has provided sufficient evidence in long-sequence time series tasks: Transformer-XL breaks through the fixed-length context limitation through a reusable memory mechanism and significantly enhances long-range dependency expression \citep{dai2019transformerxlattentivelanguagemodels}; Informer uses ProbSparse attention and efficient decoding to balance span and efficiency in long-sequence prediction \citep{zhou2021informerefficienttransformerlong}; Autoformer robustly models long-term seasonality and trends through autocorrelation and decomposition mechanisms \citep{wu2022autoformerdecompositiontransformersautocorrelation}; FEDformer further improves long-period prediction performance in frequency domain enhancement and decomposition \citep{zhou2022fedformerfrequencyenhanceddecomposed}; PatchTST enhances the stability and generalization of long-sequence multivariate prediction through local patch-based representation and channel-independent modeling \citep{nie2023patchtst}. Combining our previous positioning of diffusion, this chain of evidence points to a natural division of labor: using attention-based sequence models to first extract stable low-frequency trends/conditions (long-range skeletons), and then allowing diffusion to focus on margins and details in the residual space; meanwhile, discrete masking/absorbing diffusion is applied to supervised/pattern variables to ensure vocabulary legality by construction. This design not only inherits the advantages of time series diffusion in distribution fitting and uncertainty characterization \citep{rasul2021autoregressivedenoisingdiffusionmodels,tashiro2021csdiconditionalscorebaseddiffusion,wen2024diffstgprobabilisticspatiotemporalgraph,liu2023pristiconditionaldiffusionframework,kong2021diffwaveversatilediffusionmodel,11087622}, but also stabilizes the macroscopic temporal support through the long-range attention of Transformer, enabling the formation of an operational integrated generation pipeline under the mixed types and multi-scale dynamics of ICS.
 % 3. Methodology
 \section{Methodology}
 \label{sec:method}
-Industrial control system (ICS) telemetry is intrinsically mixed-type and mechanistically heterogeneous: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic—it directly affects what “realistic” synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled \citep{nist2023sp80082}.
+Industrial control system (ICS) telemetry is intrinsically mixed-type and mechanistically heterogeneous: continuous process trajectories (e.g., sensor and actuator signals) coexist with discrete supervisory states (e.g., modes, alarms, interlocks), and the underlying generating mechanisms range from physical inertia to program-driven step logic. This heterogeneity is not cosmetic: it directly affects what realistic synthesis means, because a generator must jointly satisfy (i) temporal coherence, (ii) distributional fidelity, and (iii) discrete semantic validity (i.e., every discrete output must belong to its legal vocabulary by construction). These properties are emphasized broadly in operational-technology security guidance and ICS engineering practice, where state logic and physical dynamics are tightly coupled \citep{nist2023sp80082}.
-We formalize each training instance as a fixed-length window of length We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
+We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
-A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}. Figure~\ref{fig:design} summarizes how our framework maps these requirements into a staged generator for mixed-type ICS telemetry.
 \begin{figure}[htbp]
  \centering
-  \includegraphics[width=0.8\textwidth]{fig-design-v2.png}
+  \includegraphics[width=0.8\textwidth]{fig-design-v4-from-user-svg-cropped.pdf}
-  % \caption{Description of the figure.}
+  \caption{Masked-DDPM: Unified Synthesis for ICS traffic}
  \label{fig:design}
 \end{figure}
@@ -130,18 +126,18 @@ Motivated by these considerations, we propose Mask-DDPM, organized in the follow
  \item Type-aware decomposition: a type-aware factorization and routing layer that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
 \end{enumerate}
-This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
+This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. As shown in Figure~\ref{fig:design}, these components are arranged sequentially so that temporal scaffolding, residual refinement, and discrete legality are enforced in complementary rather than competing stages. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
 \subsection{Transformer trend module for continuous dynamics}
 \label{sec:method-trans}
-We instantiate the temporal backbone as a causal Transformer trend extractor, leveraging self-attention’s ability to represent long-range dependencies and cross-channel interactions without recurrence \citep{vaswani2017attention}. Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling—common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation \citep{vaswani2017attention,nist2023sp80082}. Crucially, in our design the Transformer is not asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
+We instantiate the temporal backbone as a causal Transformer trend extractor, leveraging self-attention's ability to represent long-range dependencies and cross-channel interactions without recurrence \citep{vaswani2017attention}. Compared with recurrent trend extractors (e.g., GRU-style backbones), a Transformer trend module offers a direct mechanism to model delayed effects and multivariate coupling common in ICS, where control actions may influence downstream sensors with nontrivial lags and regime-dependent propagation \citep{vaswani2017attention,nist2023sp80082}. Crucially, in our design the Transformer is not asked to be the entire generator; instead, it serves a deliberately restricted role: providing a stable, temporally coherent conditioning signal that later stochastic components refine.
 For continuous channels $\bm{X}$, we posit an additive decomposition:
 \begin{equation}
 \bm{X} = \bm{S} + \bm{R},
 \label{eq:additive_decomp}
 \end{equation}
-where $\bm{S} \in \mathbb{R}^{L \times d_c}$ is a smooth trend capturing predictable temporal evolution, and $\bm{R} \in \mathbb{R}^{L \times d_c}$ is a residual capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-based temporal objective. This separation reflects an explicit division of labor: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level—a strategy aligned with “predict-then-refine” perspectives in time-series diffusion modeling \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+where $\bm{S} \in \mathbb{R}^{L \times d_c}$ is a smooth trend capturing predictable temporal evolution, and $\bm{R} \in \mathbb{R}^{L \times d_c}$ is a residual capturing distributional detail (e.g., bursts, heavy tails, local fluctuations) that is difficult to represent robustly with a purely regression-based temporal objective. This separation reflects an explicit division of labor: the trend module prioritizes temporal coherence, while diffusion (introduced next) targets distributional realism at the residual level, a strategy aligned with predict-then-refine perspectives in time-series diffusion modeling \citep{kollovieh2023tsdiff,sikder2023transfusion}.
 We parameterize the trend $\bm{S}$ using a causal Transformer $f_\phi$. With teacher forcing, we train $F_\phi$ to predict the next-step trend from past observations:
 \begin{equation}
@@ -153,20 +149,20 @@ using the mean-squared error objective:
 \mathcal{L}_{\text{trend}}(\phi) = \frac{1}{(L-1)d_c} \sum_{t=1}^{L-1} \bigl\| \hat{\bm{S}}_{t+1} - \bm{X}_{t+1} \bigr\|_2^2.
 \label{eq:trend_loss}
 \end{equation}
-At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S}}$, and and then define the residual target for diffusion as $\bm{R} = \bm{X} - \hat{\bm{S}}$. This setup intentionally “locks in” a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a conditioning-first temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an end-to-end monolithic generator \citep{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}.
+At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S}}$, and then define the residual target for diffusion as $\bm{R} = \bm{X} - \hat{\bm{S}}$. This setup intentionally locks in a coherent low-frequency scaffold before any stochastic refinement is applied, thereby reducing the burden on downstream diffusion modules to simultaneously learn both long-range structure and marginal detail. In this sense, our use of Transformers is distinctive: it is a conditioning-first temporal backbone designed to stabilize mixed-type diffusion synthesis in ICS, rather than an end-to-end monolithic generator \citep{vaswani2017attention,kollovieh2023tsdiff,yuan2025ctu}.
 \subsection{DDPM for continuous residual generation}
 \label{sec:method-ddpm}
-We model the residual RRR with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff, sikder2023transfusion}.
+We model the residual $\bm{R}$ with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff, sikder2023transfusion}.
 Let $\bm{K}$ denote the number of diffusion steps, with a noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
 \begin{equation}
-q(\bm{r}_k \mid \bm{r}_0) &= \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr)
+q(\bm{r}_k \mid \bm{r}_0) = \mathcal{N}\bigl( \sqrt{\bar{\alpha}_k}\,\bm{r}_0,\; (1 - \bar{\alpha}_k)\mathbf{I} \bigr)
 \label{eq:forward_corruption}
 \end{equation}
 equivalently,
 \begin{equation}
-\bm{r}_k &= \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
+\bm{r}_k = \sqrt{\bar{\alpha}_k}\,\bm{r}_0 + \sqrt{1 - \bar{\alpha}_k}\,\boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
 \label{eq:forward_corruption_eq}
 \end{equation}
 The learned reverse process is parameterized as:
@@ -192,7 +188,7 @@ After sampling $\hat{\bm{R}}$ by reverse diffusion, we reconstruct the continuou
 \subsection{Masked diffusion for discrete ICS variables}
 \label{sec:method-discrete}
-Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., “in-between” modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables \citep{austin2021structured,shi2024simplified}. In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness \citep{nist2023sp80082}.
+Discrete ICS variables must remain categorical, making Gaussian diffusion inappropriate for supervisory states and mode-like channels. While one can attempt continuous relaxations or post-hoc discretization, such strategies risk producing semantically invalid intermediate states (e.g., in-between modes) and can distort the discrete marginal distribution. Discrete-state diffusion provides a principled alternative by defining a valid corruption process directly on categorical variables \citep{austin2021structured,shi2024simplified}. In the ICS setting, this is not a secondary detail: supervisory tags often encode control logic boundaries (modes, alarms, interlocks) that must remain within a finite vocabulary to preserve semantic correctness \citep{nist2023sp80082}.
 We therefore adopt masked (absorbing) diffusion for discrete channels, where corruption replaces tokens with a special $\texttt{[MASK]}$ symbol according to a schedule \citep{shi2024simplified}. For each variable $j$, define a masking schedule $\{m_k\}_{k=1}^K$ (with $m_k\in[0,1]$) increasing in $k$. The forward corruption process is:
 \begin{equation}
@@ -217,9 +213,9 @@ where $\mathrm{CE}(\cdot,\cdot)$ is cross-entropy. At sampling time, we initiali
 \subsection{Type-aware decomposition as factorization and routing layer}
 \label{sec:method-types}
-Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some “derived tags” are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables \citep{nist2023sp80082}.
+Even with a trend-conditioned residual DDPM and a discrete masked-diffusion branch, a single uniform modeling treatment can remain suboptimal because ICS variables are generated by qualitatively different mechanisms. For example, program-driven setpoints exhibit step-and-dwell dynamics; controller outputs follow control laws conditioned on process feedback; actuator positions may show saturation and dwell; and some derived tags are deterministic functions of other channels. Treating all channels as if they were exchangeable stochastic processes can misallocate model capacity and induce systematic error concentration on a small subset of mechanistically distinct variables \citep{nist2023sp80082}.
-We therefore introduce a type-aware decomposition that formalizes this heterogeneity as a routing and constraint layer.  Let $\tau(i)\in{1,\dots,6}$ assign each variable (i) to a type class. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section. Importantly, this refinement does not change the core diffusion backbone; it changes which mechanism is responsible for which variable, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
+We therefore introduce a type-aware decomposition that formalizes this heterogeneity as a routing and constraint layer. Let $\tau(i)\in{1,\dots,6}$ assign each variable $i$ to a type class. For expository convenience, the assignment can be viewed as a mapping $\tau(i)=\mathrm{TypeAssign}(m_i, s_i, d_i)$, where $m_i$, $s_i$, and $d_i$ denote metadata/engineering role, temporal signature, and dependency pattern, respectively. The type assignment can be initialized from domain semantics (tag metadata, value domains, and engineering meaning), and subsequently refined via an error-attribution workflow described in the Benchmark section. Importantly, this refinement does not change the core diffusion backbone; it changes which mechanism is responsible for which variable, thereby aligning inductive bias with variable-generating mechanism while preserving overall coherence.
 We use the following taxonomy:
 \begin{enumerate}
@@ -236,78 +232,138 @@ We use the following taxonomy:
 	\item Type 6 (auxiliary/low-impact variables): weakly coupled or sparse signals; we allow simplified modeling (e.g., calibrated marginals or lightweight temporal models) to avoid allocating diffusion capacity where it is not warranted.
 \end{enumerate}
-Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
+\begin{figure}[H]
  \centering
  \includegraphics[width=0.98\textwidth]{typeclass-cropped.pdf}
  \caption{Type assignment and six-type taxonomy.}
  \label{fig:type_taxonomy}
 \end{figure}
-From a novelty standpoint, this layer is not merely an engineering “patch”; it is an explicit methodological statement that ICS synthesis benefits from typed factorization—a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \citep{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
+Figure~\ref{fig:type_taxonomy} visualizes the six-type taxonomy and the routing logic behind it. Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
 From a novelty standpoint, this layer is not merely an engineering patch; it is an explicit methodological statement that ICS synthesis benefits from typed factorization, a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \citep{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
 \subsection{Joint optimization and end-to-end sampling}
 \label{sec:method-joint}
-We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer $f_{\phi}$ to obtain $\hat{\bm{S}}$; (ii) we compute residual targets $\hat{\bm{R}} = \bm{X} - \hat{\bm{S}}$ for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM $p_{\theta}(\bm{R}\mid \hat{\bm{S}})$ and masked diffusion model $p_{\psi}(\bm{Y}\mid \text{masked}(\bm{Y}), \hat{\bm{S}}, \hat{\bm{X}})$; and (iv) we apply type-aware routing and deterministic reconstruction during sampling. This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate “structure vs. realism” conflicts \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+We train the model in a staged manner consistent with the above factorization, which improves optimization stability and encourages each component to specialize in its intended role. Specifically: (i) we train the trend Transformer $f_{\phi}$ to obtain $\hat{\bm{S}}$; (ii) we compute residual targets $\hat{\bm{R}} = \bm{X} - \hat{\bm{S}}$ for the continuous variables routed to residual diffusion; (iii) we train the residual DDPM $p_{\theta}(\bm{R}\mid \hat{\bm{S}})$ and masked diffusion model $p_{\psi}(\bm{Y}\mid \text{masked}(\bm{Y}), \hat{\bm{S}}, \hat{\bm{X}})$; and (iv) we apply type-aware routing and deterministic reconstruction during sampling. This staged strategy is aligned with the design goal of separating temporal scaffolding from distributional refinement, and it mirrors the broader intuition in time-series diffusion that decoupling coarse structure and stochastic detail can mitigate structure-vs.-realism conflicts \citep{kollovieh2023tsdiff,sikder2023transfusion}.
-A simple combined objective is $\mathcal{L} = \lambda\mathcal{L}_{\text{cont}} + (1-\lambda)\mathcal{L}_{\text{disc}}$ with $\lambda\in[0,1]$ controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or by specialized drivers (Type 1/3, depending on configuration) are prevented from forcing the diffusion models into statistically incoherent compromises.
+A simple combined objective is $\mathcal{L} = \lambda\mathcal{L}_{\text{cont}} + (1-\lambda)\mathcal{L}_{\text{disc}}$ with $\lambda\in[0,1]$ controlling the balance between continuous and discrete learning. Type-aware routing determines which channels contribute to which loss and which are excluded in favor of deterministic reconstruction. In practice, this routing acts as a principled guardrail against negative transfer across variable mechanisms: channels that are best handled deterministically (Type 5) or as exogenous / specialized state channels (e.g., driver-like or actuator-state variables) are prevented from forcing the diffusion models into statistically incoherent compromises.
 At inference time, generation follows the same structured order: (i) trend $\hat{\bm{S}}$ via the Transformer, (ii) residual $\hat{\bm{R}}$ via DDPM, (iii) discrete $\hat{\bm{Y}}$ via masked diffusion, and (iv) type-aware assembly with deterministic reconstruction for routed variables. This pipeline produces $(\hat{\bm{X}},\hat{\bm{Y}})$ that are temporally coherent by construction (through $\hat{\bm{S}}$), distributionally expressive (through $\hat{\bm{R}}$ denoising), and discretely valid (through masked diffusion), while explicitly accounting for heterogeneous variable-generating mechanisms through type-aware routing. In combination, these choices constitute our central methodological contribution: a unified Transformer + mixed diffusion generator for ICS telemetry, augmented by typed factorization to align model capacity with domain mechanism \citep{ho2020denoising,shi2024simplified,yuan2025ctu,nist2023sp80082}.
 % 4. Benchmark
 \section{Benchmark}
 \label{sec:benchmark}
-We evaluate the proposed pipeline on feature sequences derived from the HAI Security Dataset, using fixed-length windows (L=96) that preserve the mixed-type structure of ICS telemetry. The goal of this benchmark is not only to report “overall similarity”, but to justify why the proposed factorization is a better fit for protocol feature synthesis: continuous channels must match physical marginals \citep{coletta2023constrained}, discrete channels must remain semantically legal, and both must retain short-horizon dynamics that underpin state transitions and interlocks \citep{yang2001interlock}.
+A credible ICS generator must clear three hurdles. It must first be \emph{semantically legal}: any out-of-vocabulary supervisory token renders a sample unusable, regardless of marginal fidelity. It must then match the heterogeneous statistics of mixed-type telemetry, including continuous process channels and discrete supervisory states. Finally, it must preserve \emph{mechanism-level realism}: switch-and-dwell behavior, bounded control motion, cross-tag coordination, and short-horizon persistence. We therefore organize the benchmark as a funnel from legality and reproducibility to structural diagnosis and ablation \citep{coletta2023constrained,yang2001interlock,stenger2024survey}.
-This emphasis reflects evaluation practice in time-series generation, where strong results are typically supported by multiple complementary views (marginal fidelity, dependency/temporal structure, and downstream plausibility), rather than a single aggregate score \citep{stenger2024survey}. In the ICS setting, this multi-view requirement is sharper: a generator that matches continuous marginals while emitting out-of-vocabulary supervisory tokens is unusable for protocol reconstruction, and a generator that matches marginals but breaks lag structure can produce temporally implausible command/response sequences.
+For continuous channels, we use the Kolmogorov--Smirnov (KS) statistic because ICS process signals are often bounded, saturated, heavy-tailed, and plateau-dominated, so moment matching alone is too weak. KS directly compares empirical cumulative distributions, makes no parametric assumption, and is sensitive to support or shape mismatches that are operationally meaningful in telemetry. For discrete channels, realism is primarily about how probability mass is distributed over a finite vocabulary, so we use Jensen--Shannon divergence (JSD) between per-feature categorical marginals and average across discrete variables \citep{lin1991divergence,yoon2019timegan}. To assess short-horizon dynamics, we compare lag-1 autocorrelation feature-wise and report the mean absolute difference between real and synthetic lag-1 coefficients. We also track semantic legality by counting out-of-vocabulary discrete outputs and report a filtered KS that excludes near-constant channels so that trivially flat tags do not dominate the aggregate.
-Recent ICS time-series generators often emphasize aggregate similarity scores and utility-driven evaluations (e.g., anomaly-detection performance) to demonstrate realism, which is valuable but can under-specify mixed-type protocol constraints. Our benchmark complements these practices by making mixed-type legality and per-feature distributional alignment explicit: discrete outputs are evaluated as categorical distributions (JSD) and are constrained to remain within the legal vocabulary by construction, while continuous channels are evaluated with nonparametric distribution tests (KS) \citep{yoon2019timegan}. This combination provides a direct, protocol-relevant justification for the hybrid design, rather than relying on a single composite score that may mask discrete failures.
+\subsection{Core fidelity, legality, and reproducibility}
 For continuous channels, we measure distributional alignment using the Kolmogorov–Smirnov (KS) statistic computed per feature between the empirical distributions of real and synthetic samples, and then averaged across features. For discrete channels, we quantify marginal fidelity with Jensen–Shannon divergence (JSD) \citep{lin1991divergence,yoon2019timegan} between categorical distributions per feature, averaged across discrete variables. To assess temporal realism, we compare lag-1 autocorrelation at the feature level and report the mean absolute difference between real and synthetic lag-1 autocorrelation, averaged across features. In addition, to avoid degenerate comparisons driven by near-constant tags, features whose empirical standard deviation falls below a small threshold are excluded from continuous KS aggregation; such channels carry limited distributional information and can distort summary statistics.
 \subsection{Quantitative results}
 \label{sec:benchmark-quant}
-Across all runs, the mean continuous KS is 0.3311 (std 0.0079) and the mean discrete JSD is 0.0284 (std 0.0073), indicating that the generator preserves both continuous marginals and discrete semantic distributions at the feature level. Temporal consistency is similarly stable across runs, with a mean lag-1 autocorrelation difference of 0.2684 (std 0.0027), suggesting that the synthesized windows retain short-horizon dynamical structure \citep{ni2021sigwasserstein} instead of collapsing to marginal matching alone. The best-performing instance (by mean KS) attains 0.3224, and the small inter-seed variance shows that the reported fidelity is reproducible rather than driven by a single favorable initialization.
+Across three independent runs, Mask-DDPM achieves mean KS $=0.3311 \pm 0.0079$, mean JSD $=0.0284 \pm 0.0073$, and mean absolute lag-1 difference $=0.2684 \pm 0.0027$, while maintaining a validity rate of \textbf{100\%} across the modeled discrete channels. The small dispersion across runs shows that mixed-type fidelity is reproducible rather than dependent on a single favorable seed. On a representative diagnostic slice, the model attains mean KS $=0.4025$, filtered mean KS $=0.3191$, mean JSD $=0.0166$, and mean absolute lag-1 difference $=0.2859$, again with zero invalid discrete tokens. The main pattern is that discrete legality is already solved, while continuous mismatch is concentrated in a limited subset of difficult channels rather than spread uniformly across the telemetry space.
 \begin{figure}[htbp]
  \centering
-  \includegraphics[width=0.8\textwidth]{fig-overall-benchmark-v1.png}
+  \includegraphics[width=\textwidth]{fig-benchmark-story-v2.png}
-  % \caption{Description of the figure.}
+  \caption{Benchmark evidence chain.}
-  \label{fig:benchmark}
+  \label{fig:benchmark_story}
 \end{figure}
 \begin{table}[htbp]
 \centering
-\caption{Summary of benchmark metrics. Lower values indicate better performance.}
+\caption{Core benchmark summary. Lower is better except for validity rate.}
-\label{tab:metrics}
+\label{tab:core_metrics}
-\begin{tabular}{@{}l l c c@{}}
+\begin{tabular}{@{}lcc@{}}
 \toprule
-\textbf{Metric} & \textbf{Aggregation} & \textbf{Lower is better} & \textbf{Mean $\pm$ Std} \\
+\textbf{Metric} & \textbf{3-run mean $\pm$ std} & \textbf{Diagnostic slice} \\
 \midrule
-KS (continuous) & mean over continuous features & \checkmark & 0.3311 $\pm$ 0.0079 \\
+Mean KS (continuous) & $0.3311 \pm 0.0079$ & $0.4025$ \\
-JSD (discrete)  & mean over discrete features   & \checkmark & 0.0284 $\pm$ 0.0073 \\
+Filtered mean KS & -- & $0.3191$ \\
-Abs $\Delta$ lag-1 autocorr & mean over features & \checkmark & 0.2684 $\pm$ 0.0027 \\
+Mean JSD (discrete) & $0.0284 \pm 0.0073$ & $0.0166$ \\
 Mean abs. $\Delta$ lag-1 autocorr & $0.2684 \pm 0.0027$ & $0.2859$ \\
 Validity rate (26 discrete tags) $\uparrow$ & $100.0 \pm 0.0\%$ & $100.0\%$ \\
 \bottomrule
 \end{tabular}
 \end{table}
-To make the benchmark actionable (and comparable to prior work), we report type-appropriate, interpretable statistics instead of collapsing everything into a single similarity score. This matters in mixed-type ICS telemetry: continuous fidelity can be high while discrete semantics fail, and vice versa. By separating continuous (KS), discrete (JSD), and temporal (lag-1) views, the evaluation directly matches the design goals of the hybrid generator: distributional refinement for continuous residuals, vocabulary-valid reconstruction for discrete supervision, and trend-induced short-horizon coherence.
+%Question about the following part. "Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis."
 Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis. The left panel shows seed-level stability across the three benchmark runs. The middle panel shows that the dominant continuous mismatch is concentrated in a relatively small subset of control-sensitive variables rather than indicating a global collapse of the generator. The right panel shows that the remaining realism gap is mechanism-specific, with program-like long-dwell behavior and actuator-state occupancy contributing more strongly than PV-like channels on this slice.
-In addition, the seed-averaged reporting mirrors evaluation conventions in recent diffusion-based time-series generation studies, where robustness across runs is increasingly treated as a first-class signal rather than an afterthought. In this sense, the small inter-seed variance is itself evidence that the factorized training and typed routing reduce instability and localized error concentration, which is frequently observed when heterogeneous channels compete for the same modeling capacity.
+\subsection{Type-aware diagnostics}
 \label{sec:benchmark-typed}
 Type-aware diagnostics make that mechanism gap explicit. Table~\ref{tab:typed_diagnostics} reports one representative statistic per variable family on the same diagnostic slice. Because each family is evaluated with a different proxy, the absolute-error column should be interpreted within type, while the relative-error column is the more comparable cross-type indicator.
-% 5. Future Work
+\begin{table}[htbp]
-\section{Future Work}
+\centering
-\label{sec:future}
+\caption{Type-aware diagnostic summary. Lower values indicate better alignment.}
-Future work will further expand from "generating legal ICS feature sequences" to "data construction and adversarial evaluation for security tasks". The core contribution of this paper focuses on generating feature sequences that are temporally consistent, have credible distributions, and have legal discrete values under mixed types and multi-scale dynamics. However, in the actual research of intrusion detection and anomaly detection, the more critical bottleneck is often the lack of "illegal data/anomaly data" with clear attack semantics and sufficient coverage. Therefore, a direct and important extension direction is to use the legal sequences generated in this paper as a controllable and reproducible "base line operation flow", and then, on the premise of maintaining sequence-level legality and engineering constraints, inject or mix illegal behaviors according to specified attack patterns, thereby systematically constructing a dataset for training and evaluating the recognition of illegal data packets.
+\label{tab:typed_diagnostics}
 \begin{tabular}{@{}llcc@{}}
 \toprule
 \textbf{Type} & \textbf{Proxy statistic} & \textbf{Mean abs. error} & \textbf{Mean rel. error} \\
 \midrule
 Program & mean dwell & $318.70$ & $2.19$ \\
 Controller & change rate & $0.104$ & $0.25$ \\
 Actuator & top-3 mass & $0.0615$ & $0.69$ \\
 PV & tail ratio & $1.614$ & $0.20$ \\
 Auxiliary & lag-1 autocorr & $0.125$ & $0.37$ \\
 \bottomrule
 \end{tabular}
 \end{table}
-Specifically, attack injection can be upgraded from "simple perturbation" to "semantically consistent patterned rewriting": on continuous channels, implement bias injection, covert manipulation near thresholds, instantaneous mutations, and intermittent bursts, etc., so that it can both mimic the temporal characteristics pursued by attackers for concealment and not violate the basic boundary conditions of process dynamics; on discrete channels, implement illegal state transitions, alarm suppression/delayed triggering, pattern camouflage, etc., so that it reflects the trajectory morphology of "unreachable but forcibly created" under real control logic. Furthermore, the attack injection process itself can be coordinated with the type routing and constraint layer in this paper: for deterministically derived variables, illegal behaviors should be transmitted through the modification of upstream variables to maintain consistency; for supervised variables constrained by finite-state machines, interpretable illegal transitions should be generated through the "minimum violation path" or "controlled violation intensity", and violation points and violation types should be explicitly marked to facilitate downstream detection tasks to learn more fine-grained discrimination criteria.
+Program-like channels remain the hardest family by a clear margin: mean-dwell mismatch is still large, indicating that the generator does not yet sustain the long plateaus characteristic of schedule-driven behavior. Actuator channels form the next clear difficulty, while PV channels are the most stable family under this diagnostic. In short, legality is solved, but the remaining realism gap is not uniform across types; it is dominated primarily by long-dwell program behavior and actuator-state occupancy.
-In terms of method morphology, this direction also naturally supports stronger controllability and measurability: attack patterns can be regarded as conditional variables to uniformly conditionally orchestrate legitimate generation and illegal injection, generating control samples of "different attack strategies under the same legitimate framework", thereby transforming dataset construction into a repeatable scenario generation process; meanwhile, by controlling the injection location, duration, amplitude, and coupling range, the performance degradation curves of detectors under different threat intensities and different operating condition stages can be systematically scanned, forming a more stable benchmark than "single acquisition/single script". Ultimately, this approach will transform the legitimate data generation capabilities presented in this paper into the infrastructure for security research: first providing a shareable and reproducible legitimate operation distribution, then injecting illegal patterns with clear semantics in a controllable manner, producing a dataset with sufficient coverage and consistent annotation for training and evaluating models that identify illegal packets/abnormal sequences, and promoting the improvement of reproducibility and engineering credibility in this direction.
+\subsection{Ablation study}
 \label{sec:benchmark-ablation}
 We evaluate ten controlled variants under a shared pipeline and summarize six representative metrics: continuous fidelity (KS), discrete fidelity (JSD), short-horizon dynamics (lag-1), cross-variable coupling, predictive transfer, and downstream anomaly utility. Figure~\ref{fig:ablation_impact} visualizes signed changes relative to the full model, and Table~\ref{tab:ablation} gives the underlying values.
-% 6. Conclusion
+\begin{figure}[htbp]
-\section{Conclusion}
+  \centering
  \includegraphics[width=\textwidth]{fig-benchmark-ablations-v1.png}
  \caption{Ablation impact.}
  \label{fig:ablation_impact}
 \end{figure}
 \begin{table}[htbp]
 \centering
 \small
 \caption{Ablation study. Lower is better except for anomaly AUPRC.}
 \label{tab:ablation}
 \begin{tabular}{@{}lcccccc@{}}
 \toprule
 \textbf{Variant} & \textbf{KS$\downarrow$} & \textbf{JSD$\downarrow$} & \textbf{Lag-1$\downarrow$} & \textbf{Coupling$\downarrow$} & \textbf{Pred. RMSE$\downarrow$} & \textbf{AUPRC$\uparrow$} \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Full model}} \\
 Full model & $0.402$ & $0.028$ & $0.291$ & $0.215$ & $0.972$ & $0.644$ \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Structure and conditioning}} \\
 No temporal scaffold & $0.408$ & $0.031$ & $0.664$ & $0.306$ & $0.977$ & $0.645$ \\
 No file-level context & $0.405$ & $0.033$ & $0.237$ & $0.262$ & $0.986$ & $0.640$ \\
 No type routing & $0.356$ & $0.022$ & $0.138$ & $0.324$ & $1.017$ & $0.647$ \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Distribution shaping}} \\
 No quantile transform & $0.599$ & $0.010$ & $0.156$ & $0.300$ & $1.653$ & $0.417$ \\
 No post-calibration & $0.543$ & $0.024$ & $0.253$ & $0.249$ & $1.086$ & $0.647$ \\
 \midrule
 \multicolumn{7}{@{}l}{\textit{Loss and target design}} \\
 No SNR weighting & $0.400$ & $0.022$ & $0.299$ & $0.214$ & $0.961$ & $0.637$ \\
 No quantile loss & $0.413$ & $0.018$ & $0.311$ & $0.213$ & $0.965$ & $0.645$ \\
 No residual-stat loss & $0.404$ & $0.029$ & $0.285$ & $0.210$ & $0.970$ & $0.647$ \\
 Epsilon target & $0.482$ & $0.102$ & $0.728$ & $0.195$ & $0.968$ & $0.647$ \\
 \bottomrule
 \end{tabular}
 \end{table}
 The ablation results reveal three distinct roles. First, temporal staging is what makes the sequence look dynamical rather than merely plausible frame by frame: removing the temporal scaffold leaves KS nearly unchanged but more than doubles lag-1 error ($0.291 \rightarrow 0.664$) and substantially worsens coupling ($0.215 \rightarrow 0.306$). Second, quantile-based distribution shaping is one of the main contributors to usable continuous realism: without the quantile transform, KS degrades sharply ($0.402 \rightarrow 0.599$), synthetic-only predictive RMSE deteriorates ($0.972 \rightarrow 1.653$), and anomaly utility collapses ($0.644 \rightarrow 0.417$). Third, routing is the key counterexample to one-dimensional evaluation: disabling type routing can improve KS or lag-1 in isolation, yet it worsens coupling ($0.215 \rightarrow 0.324$) and predictive transfer ($0.972 \rightarrow 1.017$), showing that typed decomposition helps preserve coordinated mechanism-level behavior.
 Taken together, the benchmark supports a focused claim. Mask-DDPM already provides stable mixed-type fidelity and perfect discrete legality, while the remaining error is concentrated in ICS-specific channels whose realism depends on rare switching, long dwell intervals, constrained occupancy, and persistent local dynamics.
 % 5. Conclusion and Future Work
 \section{Conclusion and Future Work}
 \label{sec:conclusion}
-This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control system (ICS) security by proposing a practical synthetic telemetry generation framework at the protocol feature level. We introduced Mask-DDPM, a hybrid generator designed explicitly for the mixed-type and multi-scale nature of ICS data, where continuous process dynamics must remain temporally coherent while discrete supervisory variables must remain categorically legal by construction.
+This paper addresses the data scarcity and shareability barriers that limit machine-learning research for industrial control systems (ICS) security by proposing Mask-DDPM, a hybrid synthetic telemetry generator at the protocol-feature level. By combining a causal Transformer trend module, a trend-conditioned residual DDPM, a masked diffusion branch for discrete variables, and a type-aware routing layer, the framework preserves long-horizon temporal structure, improves local distributional fidelity, and guarantees discrete semantic legality. On windows derived from the HAI Security Dataset, the model achieves stable mixed-type fidelity across seeds, with mean KS = 0.3311 $\pm$ 0.0079 on continuous features, mean JSD = 0.0284 $\pm$ 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference = 0.2684 $\pm$ 0.0027.
-Our main contributions are: (i) a causal Transformer trend module that provides a stable long-horizon temporal scaffold for continuous channels; (ii) a trend-conditioned residual DDPM that focuses modeling capacity on local stochastic detail and marginal fidelity without destabilizing global structure; (iii) a masked (absorbing) diffusion branch for discrete variables that guarantees in-vocabulary outputs and supports semantics-aware conditioning on continuous context; and (iv) a type-aware decomposition/routing layer that aligns model mechanisms with heterogeneous ICS variable origins (e.g., process inertia, step-and-dwell setpoints, deterministic derived tags), enabling deterministic enforcement where appropriate and improving capacity allocation.
+Overall, Mask-DDPM provides a reproducible foundation for generating shareable and semantically valid ICS feature sequences for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Future work will proceed in two complementary directions. Vertically, we will strengthen the theoretical foundation of the framework by introducing more explicit control-theoretic constraints, structured state-space or causal priors, and formal transition models for supervisory logic, so that legality, stability, and cross-channel coupling can be characterized more rigorously. Horizontally, we will extend the framework beyond the current setting to additional industrial control protocols such as Modbus/TCP, DNP3, IEC 104, and OPC UA, and investigate analogous adaptations to automotive communication protocols such as CAN/CAN FD and automotive Ethernet. A related extension is controllable attack or violation injection on top of legal base traces, enabling reproducible adversarial benchmarks for anomaly detection and intrusion-detection studies.
 We evaluated the approach on windows derived from the HAI Security Dataset and reported mixed-type, protocol-relevant metrics rather than a single aggregate score. Across seeds, the model achieves stable fidelity with mean KS = 0.3311 ± 0.0079 on continuous features, mean JSD = 0.0284 ± 0.0073 on discrete features, and mean absolute lag-1 autocorrelation difference 0.2684 ± 0.0027, indicating that Mask-DDPM preserves both marginal distributions and short-horizon dynamics while maintaining discrete legality.
 Overall, Mask-DDPM provides a reproducible foundation for generating shareable, semantically valid ICS feature sequences suitable for data augmentation, benchmarking, and downstream packet/trace reconstruction workflows. Building on this capability, a natural next step is to move from purely legal synthesis toward controllable scenario construction, including structured attack/violation injection under engineering constraints to support adversarial evaluation and more comprehensive security benchmarks.
 % 参考文献
 \bibliographystyle{unsrtnat}
 \bibliography{references}
--- a/arxiv-style/references.bib
+++ b/arxiv-style/references.bib
@@ -553,3 +553,62 @@ Reference for Benchmark
  year={2001},
  publisher={Elsevier}
 }
@misc{austin2023structureddenoisingdiffusionmodels,
      title={Structured Denoising Diffusion Models in Discrete State-Spaces},
      author={Jacob Austin and Daniel D. Johnson and Jonathan Ho and Daniel Tarlow and Rianne van den Berg},
      year={2023},
      eprint={2107.03006},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2107.03006},
 }
@article{10.1145/1151659.1159928,
 author = {Vishwanath, Kashi Venkatesh and Vahdat, Amin},
 title = {Realistic and responsive network traffic generation},
 year = {2006},
 issue_date = {October 2006},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 volume = {36},
 number = {4},
 issn = {0146-4833},
 url = {https://doi.org/10.1145/1151659.1159928},
 doi = {10.1145/1151659.1159928},
 abstract = {This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.},
 journal = {SIGCOMM Comput. Commun. Rev.},
 month = aug,
 pages = {111–122},
 numpages = {12},
 keywords = {burstiness, energy plot, generator, internet, modeling, structural model, traffic, wavelets}
 }
@inproceedings{NEURIPS2020_4c5bcfec,
 author = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {6840--6851},
 publisher = {Curran Associates, Inc.},
 title = {Denoising Diffusion Probabilistic Models},
 url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
 volume = {33},
 year = {2020}
 }
@misc{song2021scorebasedgenerativemodelingstochastic,
      title={Score-Based Generative Modeling through Stochastic Differential Equations},
      author={Yang Song and Jascha Sohl-Dickstein and Diederik P. Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},
      year={2021},
      eprint={2011.13456},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2011.13456},
 }
@inproceedings{nie2023patchtst,
  title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
  author={Nie, Yuqi and Nguyen, Nam H. and Sinthong, Phanwadee and Kalagnanam, Jayant},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2023},
  url={https://arxiv.org/abs/2211.14730}
 }
--- a/arxiv-style/template.tex
+++ b/arxiv-style/template.tex
@@ -1,214 +0,0 @@
 \documentclass{article}
 \usepackage{arxiv}
 \usepackage[utf8]{inputenc} % allow utf-8 input
 \usepackage[T1]{fontenc}    % use 8-bit T1 fonts
 \usepackage{hyperref}       % hyperlinks
 \usepackage{url}            % simple URL typesetting
 \usepackage{booktabs}       % professional-quality tables
 \usepackage{amsfonts}       % blackboard math symbols
 \usepackage{nicefrac}       % compact symbols for 1/2, etc.
 \usepackage{microtype}      % microtypography
 \usepackage{cleveref}       % smart cross-referencing
 \usepackage{lipsum}         % Can be removed after putting your text content
 \usepackage{graphicx}
 \usepackage{natbib}
 \usepackage{doi}
 \title{A template for the \emph{arxiv} style}
 % Here you can change the date presented in the paper title
 %\date{September 9, 1985}
 % Or remove it
 %\date{}
 \newif\ifuniqueAffiliation
 % Comment to use multiple affiliations variant of author block 
 \uniqueAffiliationtrue
 \ifuniqueAffiliation % Standard variant of author block
 \author{ \href{https://orcid.org/0000-0000-0000-0000}{\includegraphics[scale=0.06]{orcid.pdf}\hspace{1mm}David S.~Hippocampus}\thanks{Use footnote for providing further
 		information about author (webpage, alternative
 		address)---\emph{not} for acknowledging funding agencies.} \\
 	Department of Computer Science\\
 	Cranberry-Lemon University\\
 	Pittsburgh, PA 15213 \\
 	\texttt{hippo@cs.cranberry-lemon.edu} \\
 	%% examples of more authors
 	\And
 	\href{https://orcid.org/0000-0000-0000-0000}{\includegraphics[scale=0.06]{orcid.pdf}\hspace{1mm}Elias D.~Striatum} \\
 	Department of Electrical Engineering\\
 	Mount-Sheikh University\\
 	Santa Narimana, Levand \\
 	\texttt{stariate@ee.mount-sheikh.edu} \\
 	%% \AND
 	%% Coauthor \\
 	%% Affiliation \\
 	%% Address \\
 	%% \texttt{email} \\
 	%% \And
 	%% Coauthor \\
 	%% Affiliation \\
 	%% Address \\
 	%% \texttt{email} \\
 	%% \And
 	%% Coauthor \\
 	%% Affiliation \\
 	%% Address \\
 	%% \texttt{email} \\
 }
 \else
 % Multiple affiliations variant of author block
 \usepackage{authblk}
 \renewcommand\Authfont{\bfseries}
 \setlength{\affilsep}{0em}
 % box is needed for correct spacing with authblk
 \newbox{\orcid}\sbox{\orcid}{\includegraphics[scale=0.06]{orcid.pdf}} 
 \author[1]{%
 	\href{https://orcid.org/0000-0000-0000-0000}{\usebox{\orcid}\hspace{1mm}David S.~Hippocampus\thanks{\texttt{hippo@cs.cranberry-lemon.edu}}}%
 }
 \author[1,2]{%
 	\href{https://orcid.org/0000-0000-0000-0000}{\usebox{\orcid}\hspace{1mm}Elias D.~Striatum\thanks{\texttt{stariate@ee.mount-sheikh.edu}}}%
 }
 \affil[1]{Department of Computer Science, Cranberry-Lemon University, Pittsburgh, PA 15213}
 \affil[2]{Department of Electrical Engineering, Mount-Sheikh University, Santa Narimana, Levand}
 \fi
 % Uncomment to override  the `A preprint' in the header
 %\renewcommand{\headeright}{Technical Report}
 %\renewcommand{\undertitle}{Technical Report}
 \renewcommand{\shorttitle}{\textit{arXiv} Template}
 %%% Add PDF metadata to help others organize their library
 %%% Once the PDF is generated, you can check the metadata with
 %%% $ pdfinfo template.pdf
 \hypersetup{
 pdftitle={A template for the arxiv style},
 pdfsubject={q-bio.NC, q-bio.QM},
 pdfauthor={David S.~Hippocampus, Elias D.~Striatum},
 pdfkeywords={First keyword, Second keyword, More},
 }
 \begin{document}
 \maketitle
 \begin{abstract}
 	\lipsum[1]
 \end{abstract}
 % keywords can be removed
 \keywords{First keyword \and Second keyword \and More}
 \section{Introduction}
 \lipsum[2]
 \lipsum[3]
 \section{Headings: first level}
 \label{sec:headings}
 \lipsum[4] See Section \ref{sec:headings}.
 \subsection{Headings: second level}
 \lipsum[5]
 \begin{equation}
 	\xi _{ij}(t)=P(x_{t}=i,x_{t+1}=j|y,v,w;\theta)= {\frac {\alpha _{i}(t)a^{w_t}_{ij}\beta _{j}(t+1)b^{v_{t+1}}_{j}(y_{t+1})}{\sum _{i=1}^{N} \sum _{j=1}^{N} \alpha _{i}(t)a^{w_t}_{ij}\beta _{j}(t+1)b^{v_{t+1}}_{j}(y_{t+1})}}
 \end{equation}
 \subsubsection{Headings: third level}
 \lipsum[6]
 \paragraph{Paragraph}
 \lipsum[7]
 \section{Examples of citations, figures, tables, references}
 \label{sec:others}
 \subsection{Citations}
 Citations use \verb+natbib+. The documentation may be found at
 \begin{center}
 	\url{http://mirrors.ctan.org/macros/latex/contrib/natbib/natnotes.pdf}
 \end{center}
 Here is an example usage of the two main commands (\verb+citet+ and \verb+citep+): Some people thought a thing \citep{kour2014real, keshet2016prediction} but other people thought something else \citep{kour2014fast}. Many people have speculated that if we knew exactly why \citet{kour2014fast} thought this\dots
 \subsection{Figures}
 \lipsum[10]
 See Figure \ref{fig:fig1}. Here is how you add footnotes. \footnote{Sample of the first footnote.}
 \lipsum[11]
 \begin{figure}
 	\centering
 	\fbox{\rule[-.5cm]{4cm}{4cm} \rule[-.5cm]{4cm}{0cm}}
 	\caption{Sample figure caption.}
 	\label{fig:fig1}
 \end{figure}
 \subsection{Tables}
 See awesome Table~\ref{tab:table}.
 The documentation for \verb+booktabs+ (`Publication quality tables in LaTeX') is available from:
 \begin{center}
 	\url{https://www.ctan.org/pkg/booktabs}
 \end{center}
 \begin{table}
 	\caption{Sample table title}
 	\centering
 	\begin{tabular}{lll}
 		\toprule
 		\multicolumn{2}{c}{Part}                   \\
 		\cmidrule(r){1-2}
 		Name     & Description     & Size ($\mu$m) \\
 		\midrule
 		Dendrite & Input terminal  & $\sim$100     \\
 		Axon     & Output terminal & $\sim$10      \\
 		Soma     & Cell body       & up to $10^6$  \\
 		\bottomrule
 	\end{tabular}
 	\label{tab:table}
 \end{table}
 \subsection{Lists}
 \begin{itemize}
 	\item Lorem ipsum dolor sit amet
 	\item consectetur adipiscing elit.
 	\item Aliquam dignissim blandit est, in dictum tortor gravida eget. In ac rutrum magna.
 \end{itemize}
 \bibliographystyle{unsrtnat}
 \bibliography{references}  %%% Uncomment this line and comment out the ``thebibliography'' section below to use the external .bib file (using bibtex) .
 %%% Uncomment this section and comment out the \bibliography{references} line above to use inline references.
 % \begin{thebibliography}{1}
 % 	\bibitem{kour2014real}
 % 	George Kour and Raid Saabne.
 % 	\newblock Real-time segmentation of on-line handwritten arabic script.
 % 	\newblock In {\em Frontiers in Handwriting Recognition (ICFHR), 2014 14th
 % 			International Conference on}, pages 417--422. IEEE, 2014.
 % 	\bibitem{kour2014fast}
 % 	George Kour and Raid Saabne.
 % 	\newblock Fast classification of handwritten on-line arabic characters.
 % 	\newblock In {\em Soft Computing and Pattern Recognition (SoCPaR), 2014 6th
 % 			International Conference of}, pages 312--318. IEEE, 2014.
 % 	\bibitem{keshet2016prediction}
 % 	Keshet, Renato, Alina Maor, and George Kour.
 % 	\newblock Prediction-Based, Prioritized Market-Share Insight Extraction.
 % 	\newblock In {\em Advanced Data Mining and Applications (ADMA), 2016 12th International 
 %                       Conference of}, pages 81--94,2016.
 % \end{thebibliography}
 \end{document}
--- a/arxiv-style/typeclass-cropped.pdf
+++ b/arxiv-style/typeclass-cropped.pdf
--- a/fig/fig-design-v3.drawio.svg
+++ b/fig/fig-design-v3.drawio.svg
--- a/fig/fig-design-v3.png
+++ b/fig/fig-design-v3.png
--- a/fig/fig-design-v4.drawio.svg
+++ b/fig/fig-design-v4.drawio.svg
--- a/fig/fig-type-aware-v1.drawio.svg
+++ b/fig/fig-type-aware-v1.drawio.svg
--- a/fig/mask-ddpm-figure.drawio.svg
+++ b/fig/mask-ddpm-figure.drawio.svg
--- a/texput.log
+++ b/texput.log
@@ -0,0 +1,21 @@
 This is pdfTeX, Version 3.141592653-2.6-1.40.28 (MiKTeX 25.12) (preloaded format=pdflatex 2026.4.14)  21 APR 2026 17:01
 entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.
 **
 ! Emergency stop.
 <*> 
 End of file on the terminal!
 Here is how much of TeX's memory you used:
 2 strings out of 467871
 14 string characters out of 5435199
 433733 words of memory out of 5000000
 28986 multiletter control sequences out of 15000+600000
 627721 words of font info for 40 fonts, out of 8000000 for 9000
 1141 hyphenation exceptions out of 8191
 0i,0n,0p,1b,6s stack positions out of 10000i,1000n,20000p,200000b,200000s
 !  ==> Fatal error occurred, no output PDF file produced!
Author	SHA1	Message	Date
Markyan04	32b508721e	final	2026-04-21 17:02:33 +08:00
Markyan04	0ed8318821	Improve layout and spacing; add enumitem Update main.tex to tighten internal spacing and list formatting: add \usepackage{enumitem}, set compact float spacing (\textfloatsep, \floatsep, \intextsep, \abovecaptionskip, \belowcaptionskip), and configure \setlist for denser lists; enable \raggedbottom. Adjust figure include to 0.995\textwidth with trim+clip to avoid overfull boxes. Make small editorial tweaks ("time-step" hyphenation and minor rephrasing in the Conclusion). Recompiled artifacts (main.aux, main.log, main.pdf) were also updated.	2026-04-21 14:40:48 +08:00
Markyan04	c4ff6c5df4	fix ref problem - url and doi is not needed	2026-04-21 14:15:26 +08:00
Markyan04	45375f8420	fix quote probelm	2026-04-21 14:11:51 +08:00
DaZuo0122	ce2cabf505	Update gitignore to exclude pdfs	2026-04-21 11:10:05 +08:00
DaZuo0122	d85fc5df08	Refactor: delete border of fig design v4	2026-04-21 10:38:21 +08:00
Markyan04	5f4458f77b	New pdf	2026-04-21 00:20:06 +08:00
Markyan04	012f34ac45	newest pdf	2026-04-21 00:13:04 +08:00
Markyan04	6f908a493b	新版pdf	2026-04-21 00:08:30 +08:00
Markyan04	9a4c42772a	Rename subsection and rebuild PDF Shorten the subsection title in the document TOC from "Joint optimization and end-to-end sampling" to "Joint optimization and sampling". Regenerated build artifacts (main.aux, main.log and main.pdf) to reflect the change; log timestamp and output size updated as part of the rebuild.	2026-04-20 23:46:53 +08:00
DaZuo0122	be54120218	Refactor: deleted misleading end-to-end description	2026-04-20 23:41:43 +08:00
DaZuo0122	6b5dafe778	Refactor: deleted misleading end-to-end description	2026-04-20 23:39:03 +08:00
Markyan04	6e406bf6f8	Fix arXiv ref and the complie problem	2026-04-20 18:06:41 +08:00
MZ YANG	98a67a25a7	Tighten benchmark section for page limit	2026-04-20 16:44:33 +08:00
MZ YANG	8af3a60542	Add LNCS template sync and updated paper assets	2026-04-20 16:07:59 +08:00
DaZuo0122	6adcbfb0c0	Update authors info	2026-04-19 10:53:18 +08:00
MZ YANG	d2e1ba5f36	Clean manuscript references and terminology	2026-04-18 22:29:51 +08:00
MZ YANG	b67e7ffb0a	Align benchmark evidence text and figure	2026-04-18 22:13:51 +08:00
Markyan04	4a6dcb77a5	Refine paper text, add figure captions/labels Clarify and reflow several paragraphs in arxiv-style/main.tex: reference Figure~\ref{fig:design} when describing the staged generator; rename residual token from RRR to \bm{R}; replace unnumbered \caption* with numbered \caption and add labels for the type taxonomy, benchmark story, and ablation figures (fig:type_taxonomy, fig:benchmark_story, fig:ablation_impact); add a reference to Table~\ref{tab:core_metrics} and a brief commented note before the benchmark paragraph. These edits improve cross-referencing, readability, and figure numbering. Also update texput.log timestamp (texput.log).	2026-04-17 17:40:59 +08:00
Markyan04	025d0c2632	Fix: quotation mark style	2026-04-17 17:11:25 +08:00
DaZuo0122	750351f1b4	Fix: quotation mark style at line 94	2026-04-18 00:51:06 +08:00
DaZuo0122	1e167188f6	Add: caption for fig:design	2026-04-18 00:45:07 +08:00
DaZuo0122	6790f2ad40	Add: caption for fig:design	2026-04-18 00:42:27 +08:00
MZ YANG	269187438b	Update methodology figures for submission build	2026-04-16 20:13:43 +08:00
DaZuo0122	3f5be77a7b	Merge branch 'esorics' of https://gitea.markyan04.cn/ModuFlow/internal-docs into esorics	2026-04-16 16:56:07 +08:00
DaZuo0122	b1b0c83944	Refactor: fig-design-v4 removed title	2026-04-16 16:54:56 +08:00
Markyan04	d8a805c162	Consolidate Future Work into Conclusion Remove the separate "Future Work" section and replace the conclusion header with a combined "Conclusion and Future Work" section. Condense and rephrase the prior verbose future-work discussion into a shorter concluding paragraph that preserves the paper's contributions and reported metrics, and adds explicit future directions: strengthening control-theoretic foundations (structured state-space/causal priors, formal supervisory transitions), expanding to additional ICS/automotive protocols (Modbus/TCP, DNP3, IEC 104, OPC UA, CAN/CAN FD, automotive Ethernet), and enabling controllable attack/violation injection for reproducible adversarial benchmarks. No functional changes to figures or metrics; bibliography remains unchanged.	2026-04-14 15:20:58 +08:00
Markyan04	22be1aba74	Polish paper text, add refs and remove template Edit arxiv-style/main.tex for clarity and wording (abstract and methodology), remove stray template comments and minor copy edits; append several bibliography entries to arxiv-style/references.bib; delete the unused arxiv-style/template.tex file; add texput.log (LaTeX compilation output). Primarily editorial and bibliography updates plus cleanup of the template artifact.	2026-04-14 14:41:51 +08:00
MZ YANG	d51e31e53f	Use draw.io type taxonomy figure in methodology	2026-04-10 21:06:16 +08:00
DaZuo0122	043d264125	Changed methodology pic to v4	2026-04-10 17:54:46 +08:00
MZ YANG	fb6c0ee368	Refine type taxonomy figure and benchmark layout	2026-04-10 17:40:52 +08:00
MZ YANG	6f6a4b6a20	Update benchmark and type-aware figures	2026-03-27 23:37:11 +08:00
DaZuo0122	0ba59c131c	Add: source code of drawio, and remove the corresponding gitignore	2026-03-24 15:46:08 +08:00
DaZuo0122	566e251743	Add: python scripts for figure generation	2026-02-09 00:24:40 +08:00