forked from manbo/internal-docs
Merge pull request 'docs: update ML/DL pre-requirements' (#1) from cnm/internal-docs:master into master
Reviewed-on: manbo/internal-docs#1
This commit is contained in:
@@ -44,8 +44,142 @@ Optional: low level language, experience on **network programming**
|
|||||||
|
|
||||||
## ML/DL
|
## ML/DL
|
||||||
|
|
||||||
> WIP(Work in Process)
|
## ML / DL
|
||||||
|
|
||||||
|
> This section focuses on **practical, engineering-oriented understanding and usage** of ML/DL,
|
||||||
|
> rather than theoretical derivation or cutting-edge research.
|
||||||
>
|
>
|
||||||
> To be finished by Mingzhe Yang and Hongyu Yan
|
> You are **NOT required** to be an ML researcher,
|
||||||
|
> but you should be able to **read code, run experiments, and explain model behaviors**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1. Foundations
|
||||||
|
|
||||||
|
> Assumed background: completion of a **Year 3 “AI Foundations” (or equivalent)** course.
|
||||||
|
> If you have not formally taken such a course, **understanding the core concepts is sufficient**.
|
||||||
|
> No in-depth theoretical derivations are required.
|
||||||
|
|
||||||
|
You should know:
|
||||||
|
|
||||||
|
- The difference between **Machine Learning (ML)** and **Deep Learning (DL)**
|
||||||
|
- Learning paradigms:
|
||||||
|
- Supervised / Unsupervised / Self-supervised learning
|
||||||
|
- Basic concepts:
|
||||||
|
- Dataset / Batch / Epoch
|
||||||
|
- Loss function
|
||||||
|
- Optimizer (eg. SGD, Adam)
|
||||||
|
- Overfitting / Underfitting
|
||||||
|
- The difference between training and inference
|
||||||
|
|
||||||
|
You should be able to:
|
||||||
|
|
||||||
|
- Train a basic model using common frameworks (eg. `PyTorch`, `TensorFlow`)
|
||||||
|
- Understand and implement a standard training loop:
|
||||||
|
- Forward pass → loss computation → backward pass → parameter update
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Neural Network Basics
|
||||||
|
|
||||||
|
You should know:
|
||||||
|
|
||||||
|
- Common network layers:
|
||||||
|
- Linear / Fully Connected layers
|
||||||
|
- Convolution layers
|
||||||
|
- Normalization layers (BatchNorm / LayerNorm)
|
||||||
|
- Common activation functions:
|
||||||
|
- ReLU / LeakyReLU / Sigmoid / Tanh
|
||||||
|
- The **conceptual role** of backpropagation (no formula derivation required)
|
||||||
|
|
||||||
|
You should understand:
|
||||||
|
|
||||||
|
- How data flows through a neural network
|
||||||
|
- How gradients affect parameter updates
|
||||||
|
- Why deeper networks are harder to train
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Generative Models (Overview Level)
|
||||||
|
|
||||||
|
You should take a glance at the following models and understand their **core ideas and behavioral characteristics**.
|
||||||
|
|
||||||
|
#### GAN (Generative Adversarial Network)
|
||||||
|
|
||||||
|
You should know:
|
||||||
|
|
||||||
|
- The roles of the Generator and the Discriminator
|
||||||
|
- The adversarial training process
|
||||||
|
- What the loss functions roughly represent
|
||||||
|
|
||||||
|
You should understand:
|
||||||
|
|
||||||
|
- Why GAN training can be unstable
|
||||||
|
- What *mode collapse* means
|
||||||
|
- Typical use cases (eg. image generation, data augmentation)
|
||||||
|
|
||||||
|
Optional but recommended:
|
||||||
|
|
||||||
|
- Run or read code of a simple GAN implementation (eg. DCGAN)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### Diffusion Models
|
||||||
|
|
||||||
|
You should know:
|
||||||
|
|
||||||
|
- The forward process: gradually adding noise to data
|
||||||
|
- The reverse process: denoising and sampling
|
||||||
|
- Why diffusion models can generate high-quality samples
|
||||||
|
|
||||||
|
You should understand:
|
||||||
|
|
||||||
|
- Differences between diffusion models and GANs in training stability
|
||||||
|
- Why diffusion sampling is usually slower
|
||||||
|
- High-level ideas of noise prediction vs data prediction
|
||||||
|
|
||||||
|
Optional but recommended:
|
||||||
|
|
||||||
|
- Run inference using a pretrained diffusion model
|
||||||
|
- Understand the role of timestep / scheduler
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Engineering Perspective
|
||||||
|
|
||||||
|
You should be familiar with:
|
||||||
|
|
||||||
|
- Differences between GPU and CPU training / inference
|
||||||
|
- Basic memory and performance considerations
|
||||||
|
- Model checkpoint loading and saving
|
||||||
|
- Reproducibility basics (random seed, configuration, logging)
|
||||||
|
|
||||||
|
You should be able to:
|
||||||
|
|
||||||
|
- Read and modify existing ML/DL codebases
|
||||||
|
- Debug common issues:
|
||||||
|
- NaN loss
|
||||||
|
- No convergence
|
||||||
|
- OOM (out-of-memory)
|
||||||
|
- Integrate ML/DL components into a larger system (eg. networked services, data pipelines)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. Relation to This Project
|
||||||
|
|
||||||
|
You should understand:
|
||||||
|
|
||||||
|
- ML/DL models are treated as **modules**, not black boxes
|
||||||
|
- Model outputs should be **interpretable or observable** when possible
|
||||||
|
- ML components may interact with:
|
||||||
|
- Network traffic
|
||||||
|
- Logs / metrics
|
||||||
|
- Online or streaming data
|
||||||
|
|
||||||
|
You are expected to:
|
||||||
|
|
||||||
|
- Use ML/DL as a **tool**, not an end goal
|
||||||
|
- Be comfortable combining ML logic with system / network code
|
||||||
|
|
||||||
|
|
||||||
Take a glance on `GAN` and `Diffusion`.
|
Take a glance on `GAN` and `Diffusion`.
|
||||||
|
|||||||
Reference in New Issue
Block a user