== Latency tuning guidelines == Many of our tutorials and applications depend on reliabile, low latency processing. This is a rough checklist of steps for creating an image (e.g. baseline_1804_lowlatency) that has minimized deterministic latency. factors affecting this: * maximum stable processor clock speed * we cannot depend on boost states (intel turbo boost), as they can't be maintained under all load conditions, or for many cores. * it is possible to fix a small number of cores to a high boost by disabling others, but this usually requires bios support * number of cpus * systems with multiple cpus can introduce latency with communication between them. Single cpu systems are much simpler to optimize * NUMA issues: with multiple cpus, you must pay close attention to which memory is allocated to which cpu, as well as where pcie devices are attached. 1. Choosing node to use: * from above, pick a node with a single cpu, and the maximum clock available 1. install a low latency kernel * {{{ apt install linux-lowlatency-hwe-18.04 linux-tools-lowlatency-hwe-18.04}}} 1. usage of tuned-adm * tuned is a perfomance optimization project that wraps many configuration methods * {{{ apt install tuned }}} == Monitoring tools == * htop * i7z * hwloc (to show pci and numa layout) == Results r740 results || transition latency || cstate || || 1 || 0 || || 2 || 1 || || 10 || 10 || || tuned latency || cstate || workload || mhz || || 1 || 0 || idle || 3300 || || 1 || 0 || stress -c 48 || 3300 || || 1 || 0 || stress --matrix 48 || 3000 || || 1 || 0 || mprime -t || 2300 || * USRP 2974: * C0 2500mhz all core * C0 2300mhz stress --matrix 0 * C0 2000mhz prime95 avx * dell r740xd xeon gold * 3300mhz c0 all core * 3000mhz c0 stress-ng --matrix 0 * 2300mhz c0 p95 avx512