internal-docs/knowledges/pre_requirements.md

# Required Knowledges for project

> Please note: You can **always** ask LLM for help.
>
> **Important**: We assume you have understanding of `Linux`(bash, ssh, etc), `Git`, and teamwork basics(eg. how to send `issues`, `PR`(pull request))

Because of the nature of this project, knowledge requirements are divided into two parts, **network** and **ML/DL**.

As year 3+ students, you should at least have good understanding of courses **AI foundations**, **Network security technology** and **Computer networks**.


> Here, requirement means you have used that language in a multi-file project which loc(lines of code) > 100
>
> For languages like `c/c++`, you should have experience on at least one build systems(`make`, `cmake`, `xmake`, etc)

For programming, `Python` is a must and a low level language(eg.`c`, `c++`, `rust`, `zig`, etc) will help a lot.

So the minimum requirement for programming is: `Python` + a GC language other than `Python`(eg.`JS/TS`, `Java`, `GO`, `C#`, `Lua` etc)

Optional: low level language, experience on **network programming**

## Network

1. **OSI 7 layer model**: You should know:

  - major protocols from L2 to L4, including **packet schema**, **routing**, **De/construction**.

  - Connection definition/tracing. Eg. "how to trace a TCP/UDP connection"

2. **Behaviors and common Algos**: You should know:

  - Boardcast mechanism

  - Common routing algos

  - `send/send_to`, `recv/recv_from` or their equivalents on modern OS kernel

3. **Security**: You should know:

  - Common behaviors of firewall

  - Few packets attacks(like `tcp rst`, `replay`, etc)


## ML/DL

## ML / DL

> This section focuses on **practical, engineering-oriented understanding and usage** of ML/DL,
> rather than theoretical derivation or cutting-edge research.
>
> You are **NOT required** to be an ML researcher,
> but you should be able to **read code, run experiments, and explain model behaviors**.

---

### 1. Foundations

> Assumed background: completion of a **Year 3 “AI Foundations” (or equivalent)** course.
> If you have not formally taken such a course, **understanding the core concepts is sufficient**.
> No in-depth theoretical derivations are required.

You should know:

- The difference between **Machine Learning (ML)** and **Deep Learning (DL)**
- Learning paradigms:
  - Supervised / Unsupervised / Self-supervised learning
- Basic concepts:
  - Dataset / Batch / Epoch
  - Loss function
  - Optimizer (eg. SGD, Adam)
  - Overfitting / Underfitting
- The difference between training and inference

You should be able to:

- Train a basic model using common frameworks (eg. `PyTorch`, `TensorFlow`)
- Understand and implement a standard training loop:
  - Forward pass → loss computation → backward pass → parameter update

---

### 2. Neural Network Basics

You should know:

- Common network layers:
  - Linear / Fully Connected layers
  - Convolution layers
  - Normalization layers (BatchNorm / LayerNorm)
- Common activation functions:
  - ReLU / LeakyReLU / Sigmoid / Tanh
- The **conceptual role** of backpropagation (no formula derivation required)

You should understand:

- How data flows through a neural network
- How gradients affect parameter updates
- Why deeper networks are harder to train

---

### 3. Generative Models (Overview Level)

You should take a glance at the following models and understand their **core ideas and behavioral characteristics**.

#### GAN (Generative Adversarial Network)

You should know:

- The roles of the Generator and the Discriminator
- The adversarial training process
- What the loss functions roughly represent

You should understand:

- Why GAN training can be unstable
- What *mode collapse* means
- Typical use cases (eg. image generation, data augmentation)

Optional but recommended:

- Run or read code of a simple GAN implementation (eg. DCGAN)

---

#### Diffusion Models

You should know:

- The forward process: gradually adding noise to data
- The reverse process: denoising and sampling
- Why diffusion models can generate high-quality samples

You should understand:

- Differences between diffusion models and GANs in training stability
- Why diffusion sampling is usually slower
- High-level ideas of noise prediction vs data prediction

Optional but recommended:

- Run inference using a pretrained diffusion model
- Understand the role of timestep / scheduler

---

### 4. Engineering Perspective

You should be familiar with:

- Differences between GPU and CPU training / inference
- Basic memory and performance considerations
- Model checkpoint loading and saving
- Reproducibility basics (random seed, configuration, logging)

You should be able to:

- Read and modify existing ML/DL codebases
- Debug common issues:
  - NaN loss
  - No convergence
  - OOM (out-of-memory)
- Integrate ML/DL components into a larger system (eg. networked services, data pipelines)

---

### 5. Relation to This Project

You should understand:

- ML/DL models are treated as **modules**, not black boxes
- Model outputs should be **interpretable or observable** when possible
- ML components may interact with:
  - Network traffic
  - Logs / metrics
  - Online or streaming data

You are expected to:

- Use ML/DL as a **tool**, not an end goal
- Be comfortable combining ML logic with system / network code


Take a glance on `GAN` and `Diffusion`.