Update: changes buffer from 8KiB to 16 and finally 64, have discovered more things than expect.
This commit is contained in:
@@ -1,193 +1,533 @@
|
|||||||
|
## Testing enviornment setup
|
||||||
## Testing enviornment setup
|
Install tools:
|
||||||
Install tools:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo apt update
|
||||||
sudo apt update
|
|
||||||
|
sudo apt install -y hyperfine heaptrack valgrind
|
||||||
sudo apt install -y hyperfine heaptrack valgrind
|
|
||||||
|
sudo apt install -y \
|
||||||
sudo apt install -y \
|
build-essential clang lld pkg-config \
|
||||||
build-essential clang lld pkg-config \
|
linux-perf \
|
||||||
linux-perf \
|
iperf3 netperf net-tools \
|
||||||
iperf3 netperf net-tools \
|
tcpdump ethtool iproute2 \
|
||||||
tcpdump ethtool iproute2 \
|
bpftrace bpfcc-tools \
|
||||||
bpftrace bpfcc-tools \
|
strace ltrace \
|
||||||
strace ltrace \
|
sysstat procps \
|
||||||
sysstat procps \
|
git perl
|
||||||
git perl
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
Install framegraph(not shipped on debian):
|
||||||
Install framegraph(not shipped on debian):
|
|
||||||
|
```bash
|
||||||
```bash
|
git clone https://github.com/brendangregg/FlameGraph ~/FlameGraph
|
||||||
git clone https://github.com/brendangregg/FlameGraph ~/FlameGraph
|
|
||||||
|
echo 'export PATH="$HOME/FlameGraph:$PATH"' >> ~/.bashrc
|
||||||
echo 'export PATH="$HOME/FlameGraph:$PATH"' >> ~/.bashrc
|
|
||||||
|
source ~/.bashrc
|
||||||
source ~/.bashrc
|
|
||||||
|
which flamegraph.pl
|
||||||
which flamegraph.pl
|
```
|
||||||
```
|
|
||||||
|
modify the Cargo.toml of verion 0.1.0:
|
||||||
modify the Cargo.toml of verion 0.1.0:
|
|
||||||
|
```toml
|
||||||
```toml
|
[profile.release]
|
||||||
[profile.release]
|
lto = true
|
||||||
lto = true
|
codegen-units = 1
|
||||||
codegen-units = 1
|
debug = 1
|
||||||
debug = 1
|
strip = "none"
|
||||||
strip = "none"
|
panic = "abort"
|
||||||
panic = "abort"
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
Build with frame pointers to help profiling:
|
||||||
Build with frame pointers to help profiling:
|
|
||||||
|
```bash
|
||||||
```bash
|
git clone https://github.com/DaZuo0122/oxidinetd.git
|
||||||
git clone https://github.com/DaZuo0122/oxidinetd.git
|
|
||||||
|
RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release
|
||||||
RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
`profiling.conf`:
|
||||||
`profiling.conf`:
|
|
||||||
|
```yaml
|
||||||
```yaml
|
127.0.0.1 9000 127.0.0.1 9001
|
||||||
127.0.0.1 9000 127.0.0.1 9001
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
Backend iperf3 server:
|
||||||
Backend iperf3 server:
|
|
||||||
|
```bash
|
||||||
```bash
|
iperf3 -s -p 9001
|
||||||
iperf3 -s -p 9001
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
forwarder:
|
||||||
forwarder:
|
|
||||||
|
```bash
|
||||||
```bash
|
./oi -c profiling.conf
|
||||||
./oi -c profiling.conf
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
triggers redirect:
|
||||||
triggers redirect:
|
|
||||||
|
```bash
|
||||||
```bash
|
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 1
|
||||||
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 1
|
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 8
|
||||||
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 8
|
```
|
||||||
```
|
|
||||||
|
verification:
|
||||||
verification:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo ss -tnp | egrep '(:9000|:9001)'
|
||||||
sudo ss -tnp | egrep '(:9000|:9001)'
|
```
|
||||||
```
|
|
||||||
|
## Testing
|
||||||
## Testing
|
|
||||||
|
CPU hotspot:
|
||||||
CPU hotspot:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo perf top -p $(pidof oi)
|
||||||
sudo perf top -p $(pidof oi)
|
```
|
||||||
```
|
|
||||||
|
If you see lots of:
|
||||||
If you see lots of:
|
|
||||||
|
- sys_read, sys_write, __x64_sys_sendto, tcp_sendmsg → syscall/copy overhead
|
||||||
- sys_read, sys_write, __x64_sys_sendto, tcp_sendmsg → syscall/copy overhead
|
|
||||||
|
- futex, __lll_lock_wait → contention/locks
|
||||||
- futex, __lll_lock_wait → contention/locks
|
|
||||||
|
- epoll_wait → executor wake behavior / too many idle polls
|
||||||
- epoll_wait → executor wake behavior / too many idle polls
|
|
||||||
|
|
||||||
|
Hard numbers:
|
||||||
Hard numbers:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo perf stat -p $(pidof oi) -e \
|
||||||
sudo perf stat -p $(pidof oi) -e \
|
cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations \
|
||||||
cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations \
|
-- sleep 30
|
||||||
-- sleep 30
|
```
|
||||||
```
|
|
||||||
|
Big differences to watch:
|
||||||
Big differences to watch:
|
|
||||||
|
- context-switches much higher on oi → too many tasks/wakers / lock contention
|
||||||
- context-switches much higher on oi → too many tasks/wakers / lock contention
|
|
||||||
|
- instructions much higher on oi for same throughput → runtime overhead / copies
|
||||||
- instructions much higher on oi for same throughput → runtime overhead / copies
|
|
||||||
|
- cache-misses higher → allocations / poor locality
|
||||||
- cache-misses higher → allocations / poor locality
|
|
||||||
|
|
||||||
|
Flamegraph
|
||||||
Flamegraph
|
Record:
|
||||||
Record:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo perf record -F 199 -g -p $(pidof oi) -- sleep 30
|
||||||
sudo perf record -F 199 -g -p $(pidof oi) -- sleep 30
|
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
|
||||||
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
|
```
|
||||||
```
|
|
||||||
|
If the stack looks “flat / missing” (common with async + LTO), use dwarf unwinding:
|
||||||
If the stack looks “flat / missing” (common with async + LTO), use dwarf unwinding:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo perf record -F 199 --call-graph dwarf,16384 -p $(pidof oi) -- sleep 30
|
||||||
sudo perf record -F 199 --call-graph dwarf,16384 -p $(pidof oi) -- sleep 30
|
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
|
||||||
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
|
```
|
||||||
```
|
|
||||||
|
|
||||||
|
syscall-cost check:
|
||||||
syscall-cost check:
|
|
||||||
|
```bash
|
||||||
```bash
|
sudo strace -ff -c -p $(pidof oi) -o /tmp/oi.strace
|
||||||
sudo strace -ff -c -p $(pidof oi) -o /tmp/oi.strace
|
# run 15–30s under load, then Ctrl+C
|
||||||
# run 15–30s under load, then Ctrl+C
|
tail -n +1 /tmp/oi.strace.*
|
||||||
tail -n +1 /tmp/oi.strace.*
|
```
|
||||||
```
|
|
||||||
|
If you see huge % time in read/write/sendmsg/recvmsg, you’re dominated by copying + syscalls.
|
||||||
If you see huge % time in read/write/sendmsg/recvmsg, you’re dominated by copying + syscalls.
|
|
||||||
|
|
||||||
|
ebpf stuffs
|
||||||
ebpf stuffs
|
|
||||||
|
--skipped--
|
||||||
--skipped--
|
|
||||||
|
|
||||||
|
Smol-focused bottlenecks + the “fix list”
|
||||||
Smol-focused bottlenecks + the “fix list”
|
A) If you’re syscall/copy bound
|
||||||
A) If you’re syscall/copy bound
|
|
||||||
|
Best improvement candidates:
|
||||||
Best improvement candidates:
|
|
||||||
|
buffer reuse (no per-loop Vec allocation)
|
||||||
buffer reuse (no per-loop Vec allocation)
|
|
||||||
|
reduce tiny writes (coalesce)
|
||||||
reduce tiny writes (coalesce)
|
|
||||||
|
zero-copy splice (Linux-only, biggest win but more complex)
|
||||||
zero-copy splice (Linux-only, biggest win but more complex)
|
|
||||||
|
For Linux zero-copy, you’d implement a splice(2)-based fast path (socket→pipe→socket). That’s how high-performance forwarders avoid double-copy.
|
||||||
For Linux zero-copy, you’d implement a splice(2)-based fast path (socket→pipe→socket). That’s how high-performance forwarders avoid double-copy.
|
|
||||||
|
B) If you’re executor/waker bound (common for async forwarders)
|
||||||
B) If you’re executor/waker bound (common for async forwarders)
|
|
||||||
|
Symptoms:
|
||||||
Symptoms:
|
|
||||||
|
perf shows a lot of runtime / wake / scheduling
|
||||||
perf shows a lot of runtime / wake / scheduling
|
|
||||||
|
perf stat shows more context switches than rinetd
|
||||||
perf stat shows more context switches than rinetd
|
|
||||||
|
Fixes:
|
||||||
Fixes:
|
|
||||||
|
don’t spawn 2 tasks per connection (one per direction) unless needed
|
||||||
don’t spawn 2 tasks per connection (one per direction) unless needed
|
→ do a single task that forwards both directions in one loop (state machine)
|
||||||
→ do a single task that forwards both directions in one loop (state machine)
|
|
||||||
|
avoid any shared Mutex on hot path (logging/metrics)
|
||||||
avoid any shared Mutex on hot path (logging/metrics)
|
|
||||||
|
keep per-conn state minimal
|
||||||
keep per-conn state minimal
|
|
||||||
|
C) If you’re single-thread limited
|
||||||
C) If you’re single-thread limited
|
|
||||||
|
smol can be extremely fast, but if you’re effectively running everything on one thread, throughput may cap earlier.
|
||||||
smol can be extremely fast, but if you’re effectively running everything on one thread, throughput may cap earlier.
|
|
||||||
|
Fix direction:
|
||||||
Fix direction:
|
|
||||||
|
move to smol::Executor + N threads (usually num_cpus)
|
||||||
move to smol::Executor + N threads (usually num_cpus)
|
|
||||||
|
or run multiple block_on() workers (careful: avoid accept() duplication)
|
||||||
or run multiple block_on() workers (careful: avoid accept() duplication)
|
|
||||||
|
|
||||||
|
## outcome oi
|
||||||
|
|
||||||
|
### CPU hotspot
|
||||||
|
testing commands:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
iperf3 -c 127.0.0.1 -p 9000 -t 30 -P 1
|
||||||
|
|
||||||
|
sudo perf stat -p $(pidof oi) -e \
|
||||||
|
cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations \
|
||||||
|
-- sleep 30
|
||||||
|
```
|
||||||
|
|
||||||
|
perf report:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Performance counter stats for process id '207279':
|
||||||
|
|
||||||
|
98,571,874,480 cpu_atom/cycles/ (0.10%)
|
||||||
|
134,732,064,800 cpu_core/cycles/ (99.90%)
|
||||||
|
75,889,748,906 cpu_atom/instructions/ # 0.77 insn per cycle (0.10%)
|
||||||
|
159,098,987,713 cpu_core/instructions/ # 1.18 insn per cycle (99.90%)
|
||||||
|
30,443,258 cpu_atom/cache-misses/ (0.10%)
|
||||||
|
3,155,528 cpu_core/cache-misses/ (99.90%)
|
||||||
|
15,003,063,317 cpu_atom/branches/ (0.10%)
|
||||||
|
31,479,765,962 cpu_core/branches/ (99.90%)
|
||||||
|
149,091,165 cpu_atom/branch-misses/ # 0.99% of all branches (0.10%)
|
||||||
|
195,562,861 cpu_core/branch-misses/ # 0.62% of all branches (99.90%)
|
||||||
|
1,138 context-switches
|
||||||
|
37 cpu-migrations
|
||||||
|
|
||||||
|
33.004738330 seconds time elapsed
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### FlameGraph
|
||||||
|
testing commands:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo perf record -F 199 -g -p $(pidof oi) -- sleep 30
|
||||||
|
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi.svg
|
||||||
|
```
|
||||||
|
|
||||||
|
outcome:
|
||||||
|
|
||||||
|
oi.svg
|
||||||
|
|
||||||
|
commands:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo perf record -F 199 --call-graph dwarf,16384 -p $(pidof oi) -- sleep 30
|
||||||
|
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > oi_dwarf.svg
|
||||||
|
```
|
||||||
|
|
||||||
|
outcome:
|
||||||
|
|
||||||
|
oi_dwarf.svg
|
||||||
|
|
||||||
|
|
||||||
|
### syscall-cost check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo strace -ff -C -p $(pidof oi) -o /tmp/oi.strace
|
||||||
|
# run 15–30s under load, then Ctrl+C
|
||||||
|
tail -n +1 /tmp/oi.strace.*
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## More real setup
|
||||||
|
|
||||||
|
traffic goes through real kernel routing + 2 TCP legs
|
||||||
|
|
||||||
|
|
||||||
|
Create namespaces + veth links:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip netns add ns_client
|
||||||
|
sudo ip netns add ns_server
|
||||||
|
|
||||||
|
sudo ip link add veth_c type veth peer name veth_c_ns
|
||||||
|
sudo ip link set veth_c_ns netns ns_client
|
||||||
|
|
||||||
|
sudo ip link add veth_s type veth peer name veth_s_ns
|
||||||
|
sudo ip link set veth_s_ns netns ns_server
|
||||||
|
|
||||||
|
sudo ip addr add 10.0.1.1/24 dev veth_c
|
||||||
|
sudo ip link set veth_c up
|
||||||
|
sudo ip addr add 10.0.0.1/24 dev veth_s
|
||||||
|
sudo ip link set veth_s up
|
||||||
|
|
||||||
|
sudo ip netns exec ns_client ip addr add 10.0.1.2/24 dev veth_c_ns
|
||||||
|
sudo ip netns exec ns_client ip link set veth_c_ns up
|
||||||
|
sudo ip netns exec ns_client ip link set lo up
|
||||||
|
|
||||||
|
sudo ip netns exec ns_server ip addr add 10.0.0.2/24 dev veth_s_ns
|
||||||
|
sudo ip netns exec ns_server ip link set veth_s_ns up
|
||||||
|
sudo ip netns exec ns_server ip link set lo up
|
||||||
|
|
||||||
|
sudo sysctl -w net.ipv4.ip_forward=1
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Config to force redirect path:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
10.0.1.1 9000 10.0.0.2 9001
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Start backend server in ns_server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip netns exec ns_server iperf3 -s -p 9001
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Run client in ns_client → forwarder → backend:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip netns exec ns_client iperf3 -c 10.0.1.1 -p 9000 -t 30 -P 8
|
||||||
|
```
|
||||||
|
|
||||||
|
perf report:
|
||||||
|
|
||||||
|
```text
|
||||||
|
sudo perf stat -p $(pidof oi) -e cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations -- sleep 33
|
||||||
|
|
||||||
|
Performance counter stats for process id '209785':
|
||||||
|
|
||||||
|
113,810,599,893 cpu_atom/cycles/ (0.11%)
|
||||||
|
164,681,878,450 cpu_core/cycles/ (99.89%)
|
||||||
|
102,575,167,734 cpu_atom/instructions/ # 0.90 insn per cycle (0.11%)
|
||||||
|
237,094,207,911 cpu_core/instructions/ # 1.44 insn per cycle (99.89%)
|
||||||
|
33,093,338 cpu_atom/cache-misses/ (0.11%)
|
||||||
|
5,381,441 cpu_core/cache-misses/ (99.89%)
|
||||||
|
20,012,975,873 cpu_atom/branches/ (0.11%)
|
||||||
|
46,120,077,111 cpu_core/branches/ (99.89%)
|
||||||
|
211,767,555 cpu_atom/branch-misses/ # 1.06% of all branches (0.11%)
|
||||||
|
245,969,685 cpu_core/branch-misses/ # 0.53% of all branches (99.89%)
|
||||||
|
1,686 context-switches
|
||||||
|
150 cpu-migrations
|
||||||
|
|
||||||
|
33.004363800 seconds time elapsed
|
||||||
|
```
|
||||||
|
|
||||||
|
flamegraph
|
||||||
|
|
||||||
|
|
||||||
|
### Add latency + small-packet tests
|
||||||
|
|
||||||
|
netperf (request/response)
|
||||||
|
|
||||||
|
Start netserver in backend namespace:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip netns exec ns_server netserver -p 9001
|
||||||
|
```
|
||||||
|
|
||||||
|
Run TCP_RR against forwarded port:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip netns exec ns_client netperf -H 10.0.1.1 -p 9000 -t TCP_RR -l 30 -- -r 32,32
|
||||||
|
```
|
||||||
|
|
||||||
|
## After opt
|
||||||
|
|
||||||
|
Here, we changed future_lite::io 8KiB buffer to a customized 16KiB buffer. (To avoid conflict, I changed binary name to oiopt).
|
||||||
|
|
||||||
|
```rust
|
||||||
|
async fn pump(mut r: TcpStream, mut w: TcpStream) -> io::Result<u64> {
|
||||||
|
// let's try 16KiB instead of future_lite::io 8KiB
|
||||||
|
// and do a profiling to see the outcome
|
||||||
|
let mut buf = vec![0u8; 16 * 1024];
|
||||||
|
let mut total = 0u64;
|
||||||
|
|
||||||
|
loop {
|
||||||
|
let n = r.read(&mut buf).await?;
|
||||||
|
if n == 0 {
|
||||||
|
// EOF: send FIN to peer
|
||||||
|
let _ = w.shutdown(Shutdown::Write);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
w.write_all(&buf[0..n]).await?;
|
||||||
|
total += n as u64;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(total)
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// And change the function call in handle_tcp_connection
|
||||||
|
let client_to_server = pump(client_stream.clone(), server_stream.clone());
|
||||||
|
let server_to_client = pump(server_stream, client_stream);
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### outcomes
|
||||||
|
|
||||||
|
Still with `sudo ip netns exec ns_client iperf3 -c 10.0.1.1 -p 9000 -t 30 -P 8`
|
||||||
|
|
||||||
|
|
||||||
|
perf stat:
|
||||||
|
|
||||||
|
```text
|
||||||
|
sudo perf stat -p $(pidof oiopt) -e cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations -- sleep 33
|
||||||
|
|
||||||
|
Performance counter stats for process id '883435':
|
||||||
|
|
||||||
|
118,960,667,431 cpu_atom/cycles/ (0.05%)
|
||||||
|
131,934,369,110 cpu_core/cycles/ (99.95%)
|
||||||
|
100,530,466,140 cpu_atom/instructions/ # 0.85 insn per cycle (0.05%)
|
||||||
|
185,203,788,299 cpu_core/instructions/ # 1.40 insn per cycle (99.95%)
|
||||||
|
11,027,490 cpu_atom/cache-misses/ (0.05%)
|
||||||
|
2,123,369 cpu_core/cache-misses/ (99.95%)
|
||||||
|
19,641,945,774 cpu_atom/branches/ (0.05%)
|
||||||
|
36,245,438,057 cpu_core/branches/ (99.95%)
|
||||||
|
214,098,497 cpu_atom/branch-misses/ # 1.09% of all branches (0.05%)
|
||||||
|
179,848,095 cpu_core/branch-misses/ # 0.50% of all branches (99.95%)
|
||||||
|
2,308 context-switches
|
||||||
|
31 cpu-migrations
|
||||||
|
|
||||||
|
33.004555878 seconds time elapsed
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
system call check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo timeout 30s strace -c -f -p $(pidof oiopt)
|
||||||
|
```
|
||||||
|
|
||||||
|
output:
|
||||||
|
|
||||||
|
```text
|
||||||
|
strace: Process 883435 attached with 4 threads
|
||||||
|
strace: Process 883438 detached
|
||||||
|
strace: Process 883437 detached
|
||||||
|
strace: Process 883436 detached
|
||||||
|
strace: Process 883435 detached
|
||||||
|
% time seconds usecs/call calls errors syscall
|
||||||
|
------ ----------- ----------- --------- --------- ----------------
|
||||||
|
57.80 14.590016 442121 33 epoll_wait
|
||||||
|
28.84 7.279883 4 1771146 sendto
|
||||||
|
13.33 3.363882 1 1771212 48 recvfrom
|
||||||
|
0.02 0.003843 61 62 44 futex
|
||||||
|
0.01 0.001947 12 159 epoll_ctl
|
||||||
|
0.00 0.000894 99 9 9 connect
|
||||||
|
0.00 0.000620 34 18 9 accept4
|
||||||
|
0.00 0.000503 14 34 timerfd_settime
|
||||||
|
0.00 0.000446 13 33 33 read
|
||||||
|
0.00 0.000271 15 18 ioctl
|
||||||
|
0.00 0.000189 21 9 write
|
||||||
|
0.00 0.000176 19 9 socket
|
||||||
|
0.00 0.000099 11 9 getsockopt
|
||||||
|
0.00 0.000079 4 18 shutdown
|
||||||
|
0.00 0.000049 2 18 close
|
||||||
|
------ ----------- ----------- --------- --------- ----------------
|
||||||
|
100.00 25.242897 7 3542787 143 total
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Further tests to explain why this huge
|
||||||
|
|
||||||
|
Changed 16KiB buffer to 64KiB, and named the binary to oiopt64
|
||||||
|
|
||||||
|
iperf3 throughput under `-P 8`, highest 54.1Gbits/sec, other threads are much higher than before(16KiB buffer)
|
||||||
|
|
||||||
|
perf stat:
|
||||||
|
|
||||||
|
```text
|
||||||
|
sudo perf stat -p $(pidof oiopt64) -e cycles,instructions,cache-misses,branches,branch-misses,context-switches,cpu-migrations -- sleep 33
|
||||||
|
|
||||||
|
Performance counter stats for process id '893123':
|
||||||
|
|
||||||
|
120,859,810,675 cpu_atom/cycles/ (0.15%)
|
||||||
|
134,735,934,329 cpu_core/cycles/ (99.85%)
|
||||||
|
79,946,979,880 cpu_atom/instructions/ # 0.66 insn per cycle (0.15%)
|
||||||
|
127,036,644,759 cpu_core/instructions/ # 0.94 insn per cycle (99.85%)
|
||||||
|
24,713,474 cpu_atom/cache-misses/ (0.15%)
|
||||||
|
9,604,449 cpu_core/cache-misses/ (99.85%)
|
||||||
|
15,584,074,530 cpu_atom/branches/ (0.15%)
|
||||||
|
24,796,180,117 cpu_core/branches/ (99.85%)
|
||||||
|
175,778,825 cpu_atom/branch-misses/ # 1.13% of all branches (0.15%)
|
||||||
|
135,067,353 cpu_core/branch-misses/ # 0.54% of all branches (99.85%)
|
||||||
|
1,519 context-switches
|
||||||
|
50 cpu-migrations
|
||||||
|
|
||||||
|
33.006529572 seconds time elapsed
|
||||||
|
```
|
||||||
|
|
||||||
|
system call check:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo timeout 30s strace -c -f -p $(pidof oiopt64)
|
||||||
|
```
|
||||||
|
|
||||||
|
output:
|
||||||
|
|
||||||
|
```text
|
||||||
|
strace: Process 893123 attached with 4 threads
|
||||||
|
strace: Process 893126 detached
|
||||||
|
strace: Process 893125 detached
|
||||||
|
strace: Process 893124 detached
|
||||||
|
strace: Process 893123 detached
|
||||||
|
% time seconds usecs/call calls errors syscall
|
||||||
|
------ ----------- ----------- --------- --------- ----------------
|
||||||
|
54.56 18.079500 463576 39 epoll_wait
|
||||||
|
27.91 9.249443 7 1294854 2 sendto
|
||||||
|
17.49 5.796927 4 1294919 51 recvfrom
|
||||||
|
0.01 0.003778 50 75 49 futex
|
||||||
|
0.01 0.002188 12 175 epoll_ctl
|
||||||
|
0.00 0.000747 83 9 9 connect
|
||||||
|
0.00 0.000714 17 40 timerfd_settime
|
||||||
|
0.00 0.000510 13 39 38 read
|
||||||
|
0.00 0.000452 25 18 9 accept4
|
||||||
|
0.00 0.000310 17 18 ioctl
|
||||||
|
0.00 0.000232 23 10 write
|
||||||
|
0.00 0.000200 22 9 socket
|
||||||
|
0.00 0.000183 20 9 getsockopt
|
||||||
|
0.00 0.000100 5 18 shutdown
|
||||||
|
0.00 0.000053 2 18 close
|
||||||
|
0.00 0.000020 20 1 mprotect
|
||||||
|
0.00 0.000015 15 1 sched_yield
|
||||||
|
0.00 0.000005 5 1 madvise
|
||||||
|
------ ----------- ----------- --------- --------- ----------------
|
||||||
|
100.00 33.135377 12 2590253 158 total
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Cleanup:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip netns del ns_client
|
||||||
|
sudo ip netns del ns_server
|
||||||
|
sudo ip link del veth_c
|
||||||
|
sudo ip link del veth_s
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user