Files
internal-docs/knowledges/pre_requirements.md
2025-12-29 12:01:11 +08:00

5.2 KiB

Required Knowledges for project

Please note: You can always ask LLM for help.

Important: We assume you have understanding of Linux(bash, ssh, etc), Git, and teamwork basics(eg. how to send issues, PR(pull request))

Because of the nature of this project, knowledge requirements are divided into two parts, network and ML/DL.

As year 3+ students, you should at least have good understanding of courses AI foundations, Network security technology and Computer networks.

Here, requirement means you have used that language in a multi-file project which loc(lines of code) > 100

For languages like c/c++, you should have experience on at least one build systems(make, cmake, xmake, etc)

For programming, Python is a must and a low level language(eg.c, c++, rust, zig, etc) will help a lot.

So the minimum requirement for programming is: Python + a GC language other than Python(eg.JS/TS, Java, GO, C#, Lua etc)

Optional: low level language, experience on network programming

Network

  1. OSI 7 layer model: You should know:
  • major protocols from L2 to L4, including packet schema, routing, De/construction.

  • Connection definition/tracing. Eg. "how to trace a TCP/UDP connection"

  1. Behaviors and common Algos: You should know:
  • Boardcast mechanism

  • Common routing algos

  • send/send_to, recv/recv_from or their equivalents on modern OS kernel

  1. Security: You should know:
  • Common behaviors of firewall

  • Few packets attacks(like tcp rst, replay, etc)

ML/DL

ML / DL

This section focuses on practical, engineering-oriented understanding and usage of ML/DL,
rather than theoretical derivation or cutting-edge research.

You are NOT required to be an ML researcher,
but you should be able to read code, run experiments, and explain model behaviors.


1. Foundations

Assumed background: completion of a Year 3 “AI Foundations” (or equivalent) course.
If you have not formally taken such a course, understanding the core concepts is sufficient.
No in-depth theoretical derivations are required.

You should know:

  • The difference between Machine Learning (ML) and Deep Learning (DL)
  • Learning paradigms:
    • Supervised / Unsupervised / Self-supervised learning
  • Basic concepts:
    • Dataset / Batch / Epoch
    • Loss function
    • Optimizer (eg. SGD, Adam)
    • Overfitting / Underfitting
  • The difference between training and inference

You should be able to:

  • Train a basic model using common frameworks (eg. PyTorch, TensorFlow)
  • Understand and implement a standard training loop:
    • Forward pass → loss computation → backward pass → parameter update

2. Neural Network Basics

You should know:

  • Common network layers:
    • Linear / Fully Connected layers
    • Convolution layers
    • Normalization layers (BatchNorm / LayerNorm)
  • Common activation functions:
    • ReLU / LeakyReLU / Sigmoid / Tanh
  • The conceptual role of backpropagation (no formula derivation required)

You should understand:

  • How data flows through a neural network
  • How gradients affect parameter updates
  • Why deeper networks are harder to train

3. Generative Models (Overview Level)

You should take a glance at the following models and understand their core ideas and behavioral characteristics.

GAN (Generative Adversarial Network)

You should know:

  • The roles of the Generator and the Discriminator
  • The adversarial training process
  • What the loss functions roughly represent

You should understand:

  • Why GAN training can be unstable
  • What mode collapse means
  • Typical use cases (eg. image generation, data augmentation)

Optional but recommended:

  • Run or read code of a simple GAN implementation (eg. DCGAN)

Diffusion Models

You should know:

  • The forward process: gradually adding noise to data
  • The reverse process: denoising and sampling
  • Why diffusion models can generate high-quality samples

You should understand:

  • Differences between diffusion models and GANs in training stability
  • Why diffusion sampling is usually slower
  • High-level ideas of noise prediction vs data prediction

Optional but recommended:

  • Run inference using a pretrained diffusion model
  • Understand the role of timestep / scheduler

4. Engineering Perspective

You should be familiar with:

  • Differences between GPU and CPU training / inference
  • Basic memory and performance considerations
  • Model checkpoint loading and saving
  • Reproducibility basics (random seed, configuration, logging)

You should be able to:

  • Read and modify existing ML/DL codebases
  • Debug common issues:
    • NaN loss
    • No convergence
    • OOM (out-of-memory)
  • Integrate ML/DL components into a larger system (eg. networked services, data pipelines)

5. Relation to This Project

You should understand:

  • ML/DL models are treated as modules, not black boxes
  • Model outputs should be interpretable or observable when possible
  • ML components may interact with:
    • Network traffic
    • Logs / metrics
    • Online or streaming data

You are expected to:

  • Use ML/DL as a tool, not an end goal
  • Be comfortable combining ML logic with system / network code

Take a glance on GAN and Diffusion.