wiki:Resources/Notes/LatencyTuning

Latency tuning guidelines

Many of our tutorials and applications depend on reliabile, low latency processing.

This is a rough checklist of steps for creating an image (e.g. baseline_1804_lowlatency) that has minimized deterministic latency.

factors affecting this:

  • maximum stable processor clock speed
    • we cannot depend on boost states (intel turbo boost), as they can't be maintained under all load conditions, or for many cores.
    • it is possible to fix a small number of cores to a high boost by disabling others, but this usually requires bios support
  • number of cpus
    • systems with multiple cpus can introduce latency with communication between them. Single cpu systems are much simpler to optimize
    • NUMA issues: with multiple cpus, you must pay close attention to which memory is allocated to which cpu, as well as where pcie devices are attached.
  1. Choosing node to use:
    • from above, pick a node with a single cpu, and the maximum clock available
  2. install a low latency kernel
    • apt install linux-lowlatency-hwe-18.04 linux-tools-lowlatency-hwe-18.04
  3. usage of tuned-adm
    • tuned is a perfomance optimization project that wraps many configuration methods
    • apt install tuned

Monitoring tools

  • htop
  • i7z
  • hwloc (to show pci and numa layout)

Results

Dell R740 results

transition latency cstate
1 0
2 1
10 10
tuned latency cstate workload mhz
1 0 idle 3300
1 0 stress -c 48 3300
1 0 stress —matrix 48 3000
1 0 mprime -t 2300
  • USRP 2974:
    • C0 2500mhz all core
    • C0 2300mhz stress —matrix 0
    • C0 2000mhz prime95 avx
  • dell r740xd xeon gold
    • 3300mhz c0 all core
    • 3000mhz c0 stress-ng —matrix 0
    • 2300mhz c0 p95 avx512
Last modified 5 years ago Last modified on May 12, 2020, 12:31:53 AM
Note: See TracWiki for help on using the wiki.