Upload files to "drafts"

This commit is contained in:
2026-01-24 14:05:01 +08:00
commit fdcf5838b3

View File

@@ -0,0 +1,193 @@
## Testing enviornment setup
Install tools:
```bash
sudo apt update
sudo apt install -y hyperfine heaptrack valgrind
sudo apt install -y \
build-essential clang lld pkg-config \
linux-perf \
iperf3 netperf net-tools \
tcpdump ethtool iproute2 \
bpftrace bpfcc-tools \
strace ltrace \
sysstat procps \
git perl
```
Install framegraph(not shipped on debian):
```bash
git clone https://github.com/brendangregg/FlameGraph ~/FlameGraph
echo 'export PATH="$HOME/FlameGraph:$PATH"' >> ~/.bashrc
source ~/.bashrc
which flamegraph.pl
```
modify the Cargo.toml of verion 0.1.0:
```toml
[profile.release]
lto = true
codegen-units = 1
debug = 1
strip = "none"
panic = "abort"
```
Build with frame pointers to help profiling:
```bash
git clone https://github.com/DaZuo0122/oxidinetd.git
RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release
```
`profiling.conf`:
```yaml
127.0.0.1 9000 127.0.0.1 9001
```
Backend iperf3 server:
```bash
iperf3 -s -p 9001
```
forwarder:
```bash
./oi -c profiling.conf
```
triggers redirect:
```bash
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 1
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 8
```
verification:
```bash
sudo ss -tnp | egrep '(:9000|:9001)'
```
## Testing
CPU hotspot:
```bash
sudo perf top -p $(pidof oi)
```
If you see lots of:
- sys_read, sys_write, __x64_sys_sendto, tcp_sendmsg → syscall/copy overhead
- futex, __lll_lock_wait → contention/locks
- epoll_wait → executor wake behavior / too many idle polls
Hard numbers:
```bash
sudo perf stat -p $(pidof oi) -e \
cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations \
-- sleep 30
```
Big differences to watch:
- context-switches much higher on oi → too many tasks/wakers / lock contention
- instructions much higher on oi for same throughput → runtime overhead / copies
- cache-misses higher → allocations / poor locality
Flamegraph
Record:
```bash
sudo perf record -F 199 -g -p $(pidof oi) -- sleep 30
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
```
If the stack looks “flat / missing” (common with async + LTO), use dwarf unwinding:
```bash
sudo perf record -F 199 --call-graph dwarf,16384 -p $(pidof oi) -- sleep 30
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
```
syscall-cost check:
```bash
sudo strace -ff -c -p $(pidof oi) -o /tmp/oi.strace
# run 1530s under load, then Ctrl+C
tail -n +1 /tmp/oi.strace.*
```
If you see huge % time in read/write/sendmsg/recvmsg, youre dominated by copying + syscalls.
ebpf stuffs
--skipped--
Smol-focused bottlenecks + the “fix list”
A) If youre syscall/copy bound
Best improvement candidates:
buffer reuse (no per-loop Vec allocation)
reduce tiny writes (coalesce)
zero-copy splice (Linux-only, biggest win but more complex)
For Linux zero-copy, youd implement a splice(2)-based fast path (socket→pipe→socket). Thats how high-performance forwarders avoid double-copy.
B) If youre executor/waker bound (common for async forwarders)
Symptoms:
perf shows a lot of runtime / wake / scheduling
perf stat shows more context switches than rinetd
Fixes:
dont spawn 2 tasks per connection (one per direction) unless needed
→ do a single task that forwards both directions in one loop (state machine)
avoid any shared Mutex on hot path (logging/metrics)
keep per-conn state minimal
C) If youre single-thread limited
smol can be extremely fast, but if youre effectively running everything on one thread, throughput may cap earlier.
Fix direction:
move to smol::Executor + N threads (usually num_cpus)
or run multiple block_on() workers (careful: avoid accept() duplication)