linuxperf (CPUs running Linux)
Introduction
linuxperf is an Adaptyst system module built on top of Linux “perf” with custom patches:
- It samples both on-CPU and off-CPU activity of every spawned thread and process.
- It minimises risk of broken profiled stacks for programs compiled with frame pointers
(i.e. in case of GCC, with
-fno-omit-frame-pointerand-mno-omit-leaf-frame-pointerif available) by detecting inappropriate kernel and CPU configurations automatically. - It performs cache-aware roofline profiling using the CARM Tool from INESC-ID (only Intel x86-64 at the moment, the AMD x86-64 support is coming soon).
- In Adaptyst Analyser, it displays results as a timeline of threads/processes, each with interactive non-time-ordered and time-ordered flame graphs. If run, cache-aware roofline plots are also available with an option to plot specific code segments from flame graphs as points.
- Its main functionality is designed with hardware portability in mind, with support of x86-64, arm64, and RISC-V ISAs at the minimum.
- It supports custom sampling-based “perf” events for profiling low-level software-hardware interactions.
On-CPU profiling uses perf with the task-clock event. Off-CPU profiling is based on
eBPF-implemented sampling explained with the diagram below (using the example of a single
process with interleaving on-CPU and off-CPU activity). The sampling period is calculated from a
user-provided off-CPU sampling frequency.

The module is open-source and licensed under GNU GPL v2 (this GPL version only). The repository can be found on GitHub.
Installation
Requirements
To use linuxperf, you need to satisfy the Adaptyst core requirements and the requirements below:
- Linux 5.8 or newer compiled with:
CONFIG_DEBUG_INFO_BTF=y(or equivalent, you can check this by seeing if/sys/kernel/btfexists in your system)CONFIG_FTRACE_SYSCALLS=y(or equivalent, you can check this by seeing if/sys/kernel/tracing/events/syscallsexists in your system and is not empty, but you may need to mount/sys/kernel/tracingfirst)- If you want complete kernel debug symbols,
CONFIG_KALLSYMS=yandCONFIG_KALLSYMS_ALL=y(or equivalent) should also be set. - Kernel recompilation may NOT be needed! If you have
/sys/kernel/btfand/sys/kernel/tracing/events/syscallsas explained above and you don’t care about having kernel debug symbols, you’re already good to go here!
- addr2line (part of binutils, tested with 2.42.0)
- libnuma (if a machine with your profiled application has NUMA, tested with 2.0.19)
- The patched “perf” dependencies:
- libtraceevent (tested with 1.8.4)
- libelf (tested with elfutils 0.193)
- flex (tested with 2.6.4)
- Bison (tested with 3.8.2)
- libbpf (tested with 1.5.1)
- libpython (corresponding to Python 3.7 or newer)
If you build from source, you also need:
- CMake 3.20 or newer
- GCC (if building with the roofline support, otherwise you can use any compiler you like, tested with 13.3.1)
- pkg-config (tested with 2.4.3)
- Clang older than 20 for building the patched “perf” (it can be removed after installing linuxperf, tested with 19.1.7)
Development headers
If you build from source, development headers of the dependencies above should also be installed! Otherwise, the compilation may fail.
(The headers of addr2line, CMake, Clang, flex, and Bison are not needed.)
The tested dependency versions are a guideline only, linuxperf may compile and run without issues with older versions. However, it is recommended to use the newest versions available for your distribution (or for installing from source if distribution versions don’t solve e.g. compilation errors).
linuxperf uses the patched “perf”, temporarily available at https://gitlab.cern.ch/adaptyst/linux (inside tools/perf). However, you don’t have to download and install it manually, this is handled automatically by the installation scripts (see the “Manually from source” section below). If there are extra dependencies actually needed by the patched “perf” and not listed above, please let us know.
A profiled program along with dependencies should be compiled with frame pointers (i.e. in case of gcc, with the -fno-omit-frame-pointer flag along with -mno-omit-leaf-frame-pointer if available). If you can, it is recommended to have everything in the system compiled with frame pointers (this can be achieved e.g. in Gentoo and Fedora 38+).
Manually from source
Please clone the GitHub repository at the tag of your choice (it’s usually the newest one from here) and run cmake <path to your repository> in a separate directory (as either non-root or root, non-root recommended) followed by cmake --build . (as either non-root or root, non-root recommended) and cmake --install . (as root unless you run the installation for a non-system module directory). The patched “perf” is downloaded and set up automatically by the build system (in a way not conflicting with any existing “perf” installations by default).
Here are the CMake options you can use/change for linuxperf:
ROOFLINE: indicates whether cache-aware roofline support should be enabled (default:ON, this requires GCC)INSTALL_PATH: indicates the path where linuxperf should be installed (default: the value provided by Adaptyst viaADAPTYST_MODULE_PATHin CMake, this is usually/opt/adaptyst/modules)PERF: indicates whether the patched “perf” should be compiled and installed (default:ON). If this option is set toOFF, no otherPERF_*values will be considered and the path to a local “perf” installation needs to be set manually via theperf_*module options.PERF_REPOSITORY_DIR: indicates the path to a local (not remote) patched “perf” repository which should be used for setting up “perf” (as a reminder, the patched “perf” is available at https://gitlab.cern.ch/adaptyst/linux). If this is set, “perf” will not be downloaded.PERF_TAG: indicates the git tag in the Adaptyst patched “perf” repository to be cloned/downloaded when setting up “perf” (default:dev-20250408)
Container image
Container images temporarily unavailable
Due to the new modular release of Adaptyst, container images are currently being redesigned and will be available again soon. Sorry for any inconvenience caused!
Adaptyst Analyser module
The module for Adaptyst Analyser can be found on GitHub. As for all modules, the Adaptyst Analyser part is independent of the Adaptyst one and can be installed via adaptyst-analyser by cloning the repository at the tag of your choice (it’s usually the newest one from here) and running adaptyst-analyser <path to the cloned repository>.
Prerequisites
Before running the module for the first time, you need to set the maximum number of stack entries to be collected by running sysctl kernel.perf_event_max_stack=<value>, where <value> is a number of your choice larger than or equal to 1024. Otherwise, the off-CPU profiling will fail.
Important
Max stack sizes larger than 1024 are currently not supported for off-CPU stacks! The maximum number of entries in off-CPU stacks is always set to 1024, regardless of the value of kernel.perf_event_max_stack.
If your machine has NUMA (non-uniform memory access), you should note that NUMA memory balancing in Linux limits the reliability of obtaining complete stacks across all CPUs / CPU cores. In this case, you must either disable NUMA balancing by running sysctl kernel.numa_balancing=0 or run Adaptyst with linuxperf on a single NUMA memory node.
Permissions needed
By default, linuxperf must have root permissions to be used.
However, linuxperf can be also used as non-root as long as all of the requirements below are met:
- The patched “perf” executable has the CAP_PERFMON, CAP_BPF, and CAP_IPC_LOCK capabilities set as permissive and effective (you can do it by running
setcap cap_perfmon,cap_bpf,cap_ipc_lock+ep <path to "perf">, the default path is/opt/adaptyst/modules/linuxperf/perf/bin/perf). If you want to see kernel symbols in stack traces, the executable must also have the CAP_SYSLOG capability set as permissive and effective. - You are part of the
tracinggroup. If it doesn’t exist, you must create it first. Thetracingname is arbitrary here, you can give the group any name you want. /sys/kernel/tracingis mounted as tracefs with permissions 750 or more lax and as thetracinggroup.- Mount
/sys/kernel/tracingin a standard way if not mounted yet (i.e. runmount -t tracefs nodev /sys/kernel/tracing). - Once
/sys/kernel/tracingis mounted in a standard way, remount the directory by runningmount -o remount,mode=0750,gid=<GID of the tracing group> /sys/kernel/tracing. - If the above doesn’t work, you need to change group ownership of all contents inside
/sys/kernel/tracingby running for examplechown -R root:tracing /sys/kernel/tracing. You may also need to change file permissions in a similar way by running for examplechmod -R 750 /sys/kernel/tracing. - You can also opt for automating the above in any way you like.
- Mount
Docker
If Adaptyst is run in a Docker container, please note the following:
- Your container must have the CAP_PERFMON, CAP_BPF, and CAP_IPC_LOCK capabilities (and optionally CAP_SYSLOG).
- eBPF-based context switch tracing needed for off-CPU profiling and system call tracing needed for tracing threads/processes may not work out-of-the-box. If this happens, see if running your container with
--pid=hosthelps. - You may need to mount
/sys/kernel/tracingmanually, either when creating your container or inside your container. - A user inside your container must belong to the group which owns
/sys/kernel/tracing.
Apptainer/Singularity
If Adaptyst is run in an Apptainer/Singularity container, please note the following:
- Your container must have the CAP_PERFMON, CAP_BPF, and CAP_IPC_LOCK capabilities (and optionally CAP_SYSLOG). Make sure that Apptainer/Singularity supports this in your case (e.g. on AlmaLinux 9,
apptainer-suidmust be installed in addition to Apptainer alone). /sys/kernel/tracingmust be mounted in the container, e.g. by bind mounting in Apptainer/Singularity.
Options
| Name | Type | Default value | Explanation |
|---|---|---|---|
| buffer_size | Unsigned integer | 1024 | Internal communication buffer size in bytes. |
| warmup | Unsigned integer | 1 | Warmup time in seconds between all profilers signalling their readiness and notifying the analysed program that it can start. Increase this value if you see missing information after profiling. |
| freq | Unsigned integer | 10 | Sampling frequency per second for on-CPU time profiling. |
| buffer | Unsigned integer | 1 | Buffer up to this number of events before sending data for processing (1 effectively disables buffering). |
| off_cpu_freq | Integer | 1000 | Sampling frequency per second for off-CPU time profiling (0 disables off-CPU profiling, -1 makes linuxperf capture all off-CPU events). |
| off_cpu_buffer | Unsigned integer | 0 | Buffer up to this number of off-CPU events before sending data for processing (0 leaves the default eBPF adaptive buffering, 1 effectively disables buffering). |
| events | Array of strings | [] | Extra perf events to be used for sampling with a given period. The strings must be of form EVENT,PERIOD,TITLE,UNIT each, i.e. do a sample on every PERIOD occurrences of EVENT and display the results under the title TITLE with a unit UNIT at a website. Run perf list for the list of possible values for EVENT. |
| filter | One of: deny:<FILE>, allow:<FILE>, python:<FILE>, or an empty string (i.e. not set) | Not set | Set stack trace filtering options. deny:<FILE> cuts all stack elements matching a set of conditions specified in a given text file. allow:<FILE> accepts only stack elements matching a set of conditions specified in a given text file. python:<FILE> sends all stack trace elements to a given Python script for filtering. Unless filter_mark is used, all filtered out elements are deleted completely. See Stack filtering for more details. |
| filter_mark | Boolean | False | When filter is used, mark filtered out stack elements as “(cut)” and squash any consecutive “(cut)"’s into one rather than deleting them completely. |
| capture_mode | One of: kernel, user, or both | user | Capture only kernel (kernel), only user (i.e. non-kernel, user), or both stack trace types (both). See Kernel and user stacks for more details. |
| perf_path | String | <MOD PATH>/perf | Path to the patched “perf” installation. Change it only if you know what you’re doing. |
| perf_script_path | String | <MOD PATH> | Path to the linuxperf scripts interacting with “perf”. Change it only if you know what you’re doing. |
| roofline* | Unsigned integer | 0 | Run also cache-aware roofline profiling with the specified sampling frequency per second (0 means “do not run roofline profiling”). See Cache-aware roofline analysis for more details. |
| roofline_benchmark_path* | String | Not set | Path to the cache-aware roofline benchmarking results produced by the CARM Tool. Either this option or carm_tool_path must be set if roofline > 0. |
| carm_tool_path* | String | Not set | Path to the CARM Tool cloned repository. Either this option or roofline_benchmark_path must be set if roofline > 0. |
*These options are not available if linuxperf has been compiled without the cache-aware roofline analysis support.
Stack filtering
To have a finer control of what parts of stack traces are saved and processed into flame graphs for example, linuxperf offers an option of defining a dedicated filter using either regular expressions or a custom Python script implementing the filtering API.
Regular expressions
A regex-based filter can be configured using a text file. The syntax is as follows:
# Each line can be either a comment starting with #, a regex prefixed
# by "SYM " indicating that symbol names should be queried in stack traces,
# a regex prefixed by "EXEC " indicating that executable paths (e.g.
# shared libraries) should be queried in stack traces, a regex prefixed
# by "ANY " indicating that both symbol names and executable paths should
# be queried, or "OR" separating the condition groups.
#
# A condition group is a set of SYM, EXEC, and ANY statements. The group
# is considered satisfied when *all* statements inside it evaluate to true.
#
# The entire filter evaluates to true when *any* condition group evaluates
# to true.
#
# In the example below, the filter for a stack trace element evalues to true
# when *either*:
# - the symbol name in the element matches <regex1> AND the executable path
# in the element matches <regex2> AND anything in the element matches <regex3>
# - the executable path in the element matches <regex4>
# - the executable path in the element matches <regex5>
SYM <regex1>
EXEC <regex2>
ANY <regex3>
OR
EXEC <regex4>
OR
EXEC <regex5>Regular expressions should be written using the Python variant. The filter defined in this way processes every element in a stack trace one-by-one, so it is not possible to make e.g. conditions based on the contents of several elements at once. This can be done using the Python API instead.
After the filter is defined, it can be supplied to linuxperf via the “filter” option. The filter can serve as either an allowlist (i.e. if the filter evaluates to true for an element, the element is saved, otherwise the element is cut) or a denylist (i.e. if the filter evaluates to true for an element, the element is cut, otherwise the element is saved).
Python API
For more advanced use cases, linuxperf allows running a Python script which is given a full stack trace at once per sample/event for processing.
The script should implement the following global methods:
setup()(no arguments and no return value): called once at the beginning of profiling by each “perf”-based profiler (linuxperf uses two such profilers by default: one for process/thread tracing and one for on-CPU/off-CPU profiling. Additionally, one profiler per custom performance counter is also used.)process(callchain)(callchainis a tuple and the return value is a list): called once per sample/event with an entire stack trace stored incallchainand returning the list of exactly the same size ascallchain, where the only values areTrueorFalse, indicating at the i-th index whether the i-th element of the stack trace should be saved (True) or cut (False)- Every element of
callchainis of form((symbol name, executable path), offset address). The first element is the top of the stack. symbol nameis a string and can be either a demangled symbol name if found, an executable path enclosed in [] (e.g.[/lib/libc.so]) if found, or in the worst case, an instruction pointer address in hex enclosed in [] (e.g.[0xFF]).executable pathis a string and can be either an executable path if found or an empty string otherwise.offset addressis a string and can be either an offset address in hex within the executable pointed to by an executable path if found or an instruction pointer address in hex otherwise.
- Every element of
Once your script is defined, it can be supplied to linuxperf via the “filter” option.
Security warning
The Python API allows executing an arbitrary Python code! Please exercise extreme caution when asking linuxperf to use scripts from unknown sources for filtering.
Marking deleted elements
By default, when a stack trace element is meant to be deleted, it is removed completely without leaving any traces (e.g. if B is cut from A -> B -> C, only A -> C remains).
However, it is possible to tell linuxperf to mark deleted elements as (cut)
instead, where all consecutive (cut)’s are squashed into one (cut),
e.g. if B is cut from A -> B -> B -> C -> B -> D, the resulting trace is
A -> (cut) -> C -> (cut) -> D. In order to do so, set the “filter_mark” option.
Kernel and user stacks
By default, linuxperf collects only stack traces from the user space of a program. However,
there is an option of accessing the kernel space as well and obtaining traces from there.
Please use the “capture_mode” option for this: you can ask linuxperf to extract either user
stacks only (user), kernel stacks only (kernel), or both stack traces
(which are merged into one then) (both).
Unless you profile a specialised program involving e.g. device drivers, you shouldn’t need to change where stack traces will be collected from for a given application.
User space vs kernel space
If you are not familiar with the “user space” and “kernel space” terms, please check out this Wikipedia article.
Cache-aware roofline analysis
By default, cache-aware roofline profiling is not performed. If you want to use this feature, this section is for you.
Introduction
Thanks to the integration with the CARM Tool from INESC-ID, linuxperf can automatically perform cache-aware roofline profiling of your program after doing a few initial configuration steps.
As the result, it will be possible to view various roofline graphs in Adaptyst Analyser and plot specific code segments there to quickly check whether they are more memory-bound or compute-bound.
Supported architectures
Currently, only x86-64 Intel CPUs are supported. Adding support for x86-64 AMD CPUs is work in progress.
Initial setup
To get started, clone the CARM Tool repository from https://github.com/champ-hub/carm-roofline and set the “carm_tool_path" option to the path to the cloned repository directory.
Profiling
In order to run cache-aware roofline profiling in addition to standard analysis done by linuxperf, set the “roofline” option to the sampling frequency of roofline-related performance counters (in Hz, sampling is done in the same way as custom “perf” events).
If no roofline benchmarking has been performed for a machine yet, linuxperf will start the CARM Tool automatically and let it perform the necessary tests. This may take a long while, so you have to be patient here. Afterwards, your session will be run as usual.
Running roofline benchmarks manually
It is also possible to run the roofline benchmarks manually. In this case, please run the run.py script inside the CARM Tool repository and set the “roofline_benchmark_path” option to the path to the CSV file generated by the CARM Tool.
Normally, the CSV file is produced inside carm_results/roofline in the directory where run.py has been run. The location may be different if you have used custom options in run.py, please consult the CARM Tool documentation and/or help message then.
Note
If you run roofline benchmarks manually, you don’t have to specify the “carm_tool_path” option as long as the “roofline_benchmark_path” one is set.
Features for Adaptyst Analyser
Note
A more interactive/pictorial version of this section is coming soon!
The Adaptyst Analyser part of the module can be set up by following the instructions here.
As suggested by the introduction, the main view of results produced by linuxperf in Adaptyst Analyser is a timeline of threads/processes:
- You can browse the thread/process tree (including expanding and collapsing threads/processes) on the left and see how long the thread/process ran for on the right in form of timeline blocks.
- Each thread/process has a corresponding name, PID, and TID.
- Each block has red and blue parts. Red parts correspond to on-CPU activity while blue parts correspond to off-CPU activity. Not every off-CPU activity may have been captured depending on the off-CPU sampling frequency chosen when profiling.
- Right-click a thread/process block to check the exact runtime of the thread/process, the perf-sampled runtime, available linuxperf analysis results (e.g. flame graphs), and the stack trace of a function which spawned the thread/process if available. If the difference between the sampled and exact runtime is significant (the threshold can be adjusted by the user in the module settings), the sampled runtime will be shown in red.
- Click an analysis result of your choice in a thread/process context menu to open it in a new Adaptyst Analyser window.
- For flame graphs, you can change the profiling metric, switch between non-time-ordered and time-ordered graphs, search for a specific phrase (regular expressions are also supported), interact with the graphs themselves (e.g. zoom in/out), and download them as SVG. For performance reasons, blocks corresponding to less than a specific percentage of samples will be collapsed ("(compressed)” will be shown instead, you can click it to expand it). This behaviour can be adjusted in the module settings.
- You can open general analysis results (e.g. roofline plots) by clicking the “General analyses” icon next to the refresh icon above the timeline view.
- When checking the spawning stack trace of a thread/process, you can hover over functions to see the corresponding source code files and line numbers if available. If a function is green, you can also click it to open the source code inside the website, with the spawning line highlighted.
- When checking flame graphs, you can right-click a function block to open the menu, where you can either plot the block on a roofline plot or open the corresponding source code (if available) inside the website, with most-metric-contributing lines highlighted in different shades of red/blue along with an option to hover over line numbers to check the sampled metric values.
- You can use regular expressions for replacing block names in flame graphs. Click the magnifying-glass-like icon next to the search box and follow the instructions (you can enter your regex in the search box beforehand to see its matches if you want). If you want to edit the replacements, right-click the same icon to open the menu with your entered replacements, click the replacement you want to change or remove, and follow the instructions.
Off-CPU timeline sampling
If you have a session with a huge number of off-CPU regions, rendering the timeline may become resource- and time-consuming for a web browser. In this case, you may want to enable off-CPU timeline sampling which samples captured off-CPU regions in a similar way linuxperf samples off-CPU activity during profiling.
This can be done on the client side by setting the off-CPU timeline display scale to less than 1 in the module settings (the smaller the value is than 1, the larger the sampling period is; the current period is shown when a session is opened).
Troubleshooting
Profiler “…” (perf-record / perf-script) has returned non-zero exit code
If you get an error message similar to the one in the title, please look at the logs saved by Adaptyst.
If the logs mention “can’t access trace events”, permission denied issues, or problems with eBPF, please ensure that the requirements for using linuxperf as non-root are met or run Adaptyst as root. If it doesn’t work or the logs specify a different problem (or don’t say anything), feel free to contact us.
One or more expected symbol maps haven’t been found
If you get a warning message as in the title, you can check whether your profiled program can be configured to emit “perf” symbol maps as documented here.
You should note that the lack of symbol maps is not an error, it will just make some symbol names unresolved and point to the name of an expected map file instead. This does not cause broken stack traces.
“perf” compilation fails when installing linuxperf
The patched “perf” is an integral part of linuxperf. When you get errors at the stage of setting up “perf” and they point to some dependency missing (e.g. Clang), please install it and try again. If you get complaints about a program/library missing which is not listed in the installation instructions, please install it as well and let us know!
If the solution above doesn’t help, you can contact us. Additionally, if you know what you’re doing, you can change the “perf” compilation and installation settings (e.g. disable it altogether) in CMake (see Advanced CMake options in the installation guide) and for example, try setting up the patched “perf” manually (the repository is at https://gitlab.cern.ch/adaptyst/linux, go inside tools/perf there).