Refine paper text, add figure captions/labels

Clarify and reflow several paragraphs in arxiv-style/main.tex: reference Figure~\ref{fig:design} when describing the staged generator; rename residual token from RRR to \bm{R}; replace unnumbered \caption* with numbered \caption and add labels for the type taxonomy, benchmark story, and ablation figures (fig:type_taxonomy, fig:benchmark_story, fig:ablation_impact); add a reference to Table~\ref{tab:core_metrics} and a brief commented note before the benchmark paragraph. These edits improve cross-referencing, readability, and figure numbering. Also update texput.log timestamp (texput.log).
2026-04-17 17:40:59 +08:00
parent 025d0c2632
commit 4a6dcb77a5
2 changed files with 15 additions and 11 deletions
--- a/arxiv-style/main.tex
+++ b/arxiv-style/main.tex
@@ -106,7 +106,7 @@ Industrial control system (ICS) telemetry is intrinsically mixed-type and mechan
 We model each training instance as a fixed-length window of length $L$, comprising continuous channels $\bm{X} \in \mathbb{R}^{L \times d_c}$ and discrete channels $\bm{Y} = \{y^{(j)}_{1:L}\}_{j=1}^{d_d}$, where each discrete variable satisfies $y^{(j)}_t \in \mathcal{V}_j$ for a finite vocabulary $\mathcal{V}_j$. Our objective is to learn a generator that produces synthetic $(\hat{\bm{X}}, \hat{\bm{Y}})$ that are simultaneously coherent and distributionally faithful, while also ensuring $\hat{y}^{(j)}_t\in\mathcal{V}_j$ for all $j$, $t$ by construction (rather than via post-hoc rounding or thresholding).
-A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}.
+A key empirical and methodological tension in ICS synthesis is that temporal realism and marginal/distributional realism can compete when optimized monolithically: sequence models trained primarily for regression often over-smooth heavy tails and intermittent bursts, while purely distribution-matching objectives can erode long-range structure. Diffusion models provide a principled route to rich distribution modeling through iterative denoising, but they do not, by themselves, resolve (i) the need for a stable low-frequency temporal scaffold, nor (ii) the discrete legality constraints for supervisory variables \citep{ho2020denoising,song2021score}. Recent time-series diffusion work further suggests that separating coarse structure from stochastic refinement can be an effective inductive bias for long-horizon realism \citep{kollovieh2023tsdiff,sikder2023transfusion}. Figure~\ref{fig:design} summarizes how our framework maps these requirements into a staged generator for mixed-type ICS telemetry.
 \begin{figure}[htbp]
  \centering
@@ -126,7 +126,7 @@ Motivated by these considerations, we propose Mask-DDPM, organized in the follow
  \item Type-aware decomposition: a type-aware factorization and routing layer that assigns variables to the most appropriate modeling mechanism and enforces deterministic constraints where warranted.
 \end{enumerate}
-This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
+This ordering is intentional. The trend module establishes a macro-temporal scaffold; residual diffusion then concentrates capacity on micro-structure and marginal fidelity; masked diffusion provides a native mechanism for discrete legality; and the type-aware layer operationalizes the observation that not all ICS variables should be modeled with the same stochastic mechanism. As shown in Figure~\ref{fig:design}, these components are arranged sequentially so that temporal scaffolding, residual refinement, and discrete legality are enforced in complementary rather than competing stages. Importantly, while diffusion-based generation for ICS telemetry has begun to emerge, existing approaches remain limited and typically emphasize continuous synthesis or augmentation; in contrast, our pipeline integrates (i) a Transformer-conditioned residual diffusion backbone, (ii) a discrete masked-diffusion branch, and (iii) explicit type-aware routing for heterogeneous variable mechanisms within a single coherent generator \citep{yuan2025ctu,sha2026ddpm}.
 \subsection{Transformer trend module for continuous dynamics}
 \label{sec:method-trans}
@@ -153,7 +153,7 @@ At inference, we roll out the Transformer autoregressively to obtain $\hat{\bm{S
 \subsection{DDPM for continuous residual generation}
 \label{sec:method-ddpm}
-We model the residual RRR with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff, sikder2023transfusion}.
+We model the residual $\bm{R}$ with a denoising diffusion probabilistic model (DDPM) conditioned on the trend $\hat{\bm{S}}$ \citep{ho2020denoising}. Diffusion models learn complex data distributions by inverting a tractable noising process through iterative denoising, and have proven effective at capturing multimodality and heavy-tailed structure that is often attenuated by purely regression-based sequence models \citep{ho2020denoising,song2021score}. Conditioning the diffusion model on $\hat{\bm{S}}$ is central: it prevents the denoiser from re-learning the low-frequency scaffold and focuses capacity on residual micro-structure, mirroring the broader principle that diffusion excels as a distributional corrector when a reasonable coarse structure is available \citep{kollovieh2023tsdiff, sikder2023transfusion}.
 Let $\bm{K}$ denote the number of diffusion steps, with a noise schedule $\{\beta_k\}_{k=1}^K$, $\alpha_k = 1 - \beta_k$, and $\bar{\alpha}_k = \prod_{i=1}^k \alpha_i$. The forward corruption process is:
 \begin{equation}
@@ -235,10 +235,11 @@ We use the following taxonomy:
 \begin{figure}[H]
  \centering
  \includegraphics[width=0.98\textwidth]{typeclass-cropped.pdf}
-  \caption*{Type assignment and six-type taxonomy.}
+  \caption{Type assignment and six-type taxonomy.}
  \label{fig:type_taxonomy}
 \end{figure}
-Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
+Figure~\ref{fig:type_taxonomy} visualizes the six-type taxonomy and the routing logic behind it. Type-aware decomposition improves synthesis quality through three mechanisms. First, it improves capacity allocation by preventing a small set of mechanistically atypical variables from dominating gradients and distorting the learned distribution for the majority class (typically Type 4). Second, it enables constraint enforcement by deterministically reconstructing Type 5 variables, preventing logically inconsistent samples that purely learned generators can produce. Third, it improves mechanism alignment by attaching inductive biases consistent with step/dwell or saturation behaviors where generic denoisers may implicitly favor smoothness.
 From a novelty standpoint, this layer is not merely an engineering patch; it is an explicit methodological statement that ICS synthesis benefits from typed factorization, a principle that has analogues in mixed-type generative modeling more broadly, but that remains underexplored in diffusion-based ICS telemetry synthesis \citep{shi2025tabdiff,yuan2025ctu,nist2023sp80082}.
@@ -263,12 +264,13 @@ For continuous channels, we prioritize marginal agreement because ICS process si
 \label{sec:benchmark-quant}
 Across three independent runs, Mask-DDPM achieves mean KS $=0.3311 \pm 0.0079$, mean JSD $=0.0284 \pm 0.0073$, and mean absolute lag-1 difference $=0.2684 \pm 0.0027$, while maintaining a validity rate of \textbf{100\%} across the modeled discrete channels. The small dispersion across runs suggests that the generator is reproducible at the level of global mixed-type fidelity rather than depending on a single favorable seed. This is the first major benchmark takeaway: semantic legality is already saturated by construction, so the remaining challenge is no longer whether the model can emit valid symbols, but whether it can place valid symbols and trajectories in the right temporal and cross-channel context.
-A representative diagnostic slice provides the complementary localized view. On that slice, the model attains mean KS $=0.4025$, filtered mean KS $=0.3191$, mean JSD $=0.0166$, and mean absolute lag-1 difference $=0.2859$, again with zero invalid discrete tokens. Two patterns matter most. First, the discrete branch remains consistently reliable: low JSD together with perfect validity indicates that supervisory semantics are being learned rather than repaired after the fact. Second, the gap between overall KS and filtered KS suggests that continuous mismatch is concentrated in a limited subset of difficult channels instead of being spread uniformly across the telemetry space.
+A representative diagnostic slice provides the complementary localized view. As summarized in Table~\ref{tab:core_metrics}, the model attains mean KS $=0.4025$, filtered mean KS $=0.3191$, mean JSD $=0.0166$, and mean absolute lag-1 difference $=0.2859$ on that slice, again with zero invalid discrete tokens. Two patterns matter most. First, the discrete branch remains consistently reliable: low JSD together with perfect validity indicates that supervisory semantics are being learned rather than repaired after the fact. Second, the gap between overall KS and filtered KS suggests that continuous mismatch is concentrated in a limited subset of difficult channels instead of being spread uniformly across the telemetry space.
 \begin{figure}[htbp]
  \centering
  \includegraphics[width=\textwidth]{fig-benchmark-story-v2.png}
-  \caption*{Benchmark evidence chain.}
+  \caption{Benchmark evidence chain.}
  \label{fig:benchmark_story}
 \end{figure}
 \begin{table}[htbp]
@@ -288,7 +290,8 @@ Validity rate (26 discrete tags) $\uparrow$ & $100.0 \pm 0.0\%$ & $100.0\%$ \\
 \end{tabular}
 \end{table}
-The benchmark evidence chain plot turns the table into a structural diagnosis. Continuous error is concentrated in a relatively small subset of control-sensitive channels rather than indicating a global collapse of the generator, while the type-aware panel shows that the remaining gap is mechanism-specific. In other words, the model has largely solved legality and a substantial portion of mixed-type marginal fidelity, but realism remains harder for behaviors governed by switching, long dwell, bounded operating regimes, and strong local persistence.
+%Question about the following part. "Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis."
 Figure~\ref{fig:benchmark_story} turns the table into a structural diagnosis. Continuous error is concentrated in a relatively small subset of control-sensitive channels rather than indicating a global collapse of the generator, while the type-aware panel shows that the remaining gap is mechanism-specific. In other words, the model has largely solved legality and a substantial portion of mixed-type marginal fidelity, but realism remains harder for behaviors governed by switching, long dwell, bounded operating regimes, and strong local persistence.
 \subsection{Extended realism and downstream utility}
 \label{sec:benchmark-extended}
@@ -343,12 +346,13 @@ This typed view sharpens the story substantially. Program-like channels remain t
 \subsection{Ablation study}
 \label{sec:benchmark-ablation}
-A good ablation does more than show that removing components changes numbers; it should identify which failure mode each component is preventing. We therefore evaluate ten controlled variants under a shared pipeline and summarize six representative metrics: continuous fidelity (KS), discrete fidelity (JSD), short-horizon dynamics (lag-1), cross-variable coupling, predictive transfer, and downstream anomaly utility. The ablation summary plot visualizes signed changes relative to the full model, and Table~\ref{tab:ablation} gives the underlying values.
+A good ablation does more than show that removing components changes numbers; it should identify which failure mode each component is preventing. We therefore evaluate ten controlled variants under a shared pipeline and summarize six representative metrics: continuous fidelity (KS), discrete fidelity (JSD), short-horizon dynamics (lag-1), cross-variable coupling, predictive transfer, and downstream anomaly utility. Figure~\ref{fig:ablation_impact} visualizes signed changes relative to the full model, and Table~\ref{tab:ablation} gives the underlying values.
 \begin{figure}[htbp]
  \centering
  \includegraphics[width=\textwidth]{fig-benchmark-ablations-v1.png}
-  \caption*{Ablation impact.}
+  \caption{Ablation impact.}
  \label{fig:ablation_impact}
 \end{figure}
 \begin{table}[htbp]
--- a/texput.log
+++ b/texput.log
@@ -1,4 +1,4 @@
-This is pdfTeX, Version 3.141592653-2.6-1.40.28 (MiKTeX 25.12) (preloaded format=pdflatex 2026.4.14)  14 APR 2026 14:30
+This is pdfTeX, Version 3.141592653-2.6-1.40.28 (MiKTeX 25.12) (preloaded format=pdflatex 2026.4.14)  17 APR 2026 17:18
 entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.