Version 10 (modified by 5 years ago) ( diff ) | ,
---|
Latency tuning guidelines
Many of our tutorials and applications depend on reliabile, low latency processing.
This is a rough checklist of steps for creating an image (e.g. baseline_1804_lowlatency) that has minimized deterministic latency.
factors affecting this:
- maximum stable processor clock speed
- we cannot depend on boost states (intel turbo boost), as they can't be maintained under all load conditions, or for many cores.
- it is possible to fix a small number of cores to a high boost by disabling others, but this usually requires bios support
- number of cpus
- systems with multiple cpus can introduce latency with communication between them. Single cpu systems are much simpler to optimize
- NUMA issues: with multiple cpus, you must pay close attention to which memory is allocated to which cpu, as well as where pcie devices are attached.
- Choosing node to use:
- from above, pick a node with a single cpu, and the maximum clock available
- install a low latency kernel
apt install linux-lowlatency-hwe-18.04 linux-tools-lowlatency-hwe-18.04
- usage of tuned-adm
- tuned is a perfomance optimization project that wraps many configuration methods
apt install tuned
Monitoring tools
- htop
- i7z
- hwloc (to show pci and numa layout)
Results
Dell R740 results
transition latency | cstate |
1 | 0 |
2 | 1 |
10 | 10 |
tuned latency | cstate | workload | mhz |
1 | 0 | idle | 3300 |
1 | 0 | stress -c 48 | 3300 |
1 | 0 | stress —matrix 48 | 3000 |
1 | 0 | mprime -t | 2300 |
- USRP 2974:
- C0 2500mhz all core
- C0 2300mhz stress —matrix 0
- C0 2000mhz prime95 avx
- dell r740xd xeon gold
- 3300mhz c0 all core
- 3000mhz c0 stress-ng —matrix 0
- 2300mhz c0 p95 avx512
Note:
See TracWiki
for help on using the wiki.