Running Adaptyst

Prerequisites

Before running Adaptyst for the first time, you need to set the maximum number of stack entries to be collected by running sysctl kernel.perf_event_max_stack=<value>, where <value> is a number of your choice larger than or equal to 1024. Otherwise, the off-CPU profiling will fail.

Important

Max stack sizes larger than 1024 are currently not supported for off-CPU stacks! The maximum number of entries in off-CPU stacks is always set to 1024, regardless of the value of kernel.perf_event_max_stack.

If your machine has NUMA (non-uniform memory access), you should note that NUMA memory balancing in Linux limits the reliability of obtaining complete stacks across all CPUs / CPU cores. In this case, you must either disable NUMA balancing by running sysctl kernel.numa_balancing=0 or run Adaptyst on a single NUMA node.

Configuration files

If you haven’t provided a custom config file path when installing Adaptyst, the system-wide configuration file can be found in /etc/adaptyst.conf. Additionally, in case you want to provide your own local settings, any values set system-wide can be overridden and any other extra values can be added by a local config file in .adaptyst/adaptyst.conf inside your home directory. Both files follow exactly the same syntax.

If you want to use a different path for the system-wide config and/or local config at runtime, you can set the ADAPTYST_CONFIG and ADAPTYST_LOCAL_CONFIG environment variables respectively. Additionally, Adaptyst calls its Python scripts from inside /opt/adaptyst by default, but this path can also be changed at runtime by setting the ADAPTYST_SCRIPT_DIR environment variable. It is possible to vary most of these paths permanently as well when compiling Adaptyst from source, see the installation guide.

The currently-supported settings in the configuration files are shown below:

Config file format with supported settings
# The local path to an installation directory of the Adaptyst-patched "perf"
# (with bin etc. directories inside). In case of no changes to the installation
# options, this is /opt/adaptyst/perf by default.
perf_path=<path>

# The local path to the CARM Tool repository. See "Cache-aware
# roofline profiling".
carm_tool_path=<path>

# The local path to a CSV file with the roofline benchmarking results produced
# by the CARM Tool. See "Cache-aware roofline profiling".
roofline_benchmark_path=<path>

Profiling

To profile your program, please run the following command as root:

adaptyst <command to be profiled>

Important

  • If your command has whitespaces, you must run Adaptyst in one of these ways: adaptyst "<command to be profiled>" or adaptyst -- <command to be profiled>.
  • If your command uses any shell-specific syntax (e.g. redirection using < or >), you need to pass it through your shell as Adaptyst uses direct system calls to run programs. For example, to profile program123 < test.txt run through bash, you need to execute adaptyst -- /bin/bash -c "program123 < test.txt".

Running Adaptyst as non-root

Adaptyst can be run as non-root as long as all of the requirements below are met:

  • The Adaptyst-patched “perf” executable has the CAP_PERFMON, CAP_BPF, and CAP_IPC_LOCK capabilities set as permissive and effective (you can do it by running setcap cap_perfmon,cap_bpf,cap_ipc_lock+ep <path to "perf">, the default path is /opt/adaptyst/perf/bin/perf). If you want to see kernel symbols in stack traces, the executable must also have the CAP_SYSLOG capability set as permissive and effective.
  • You are part of the tracing group. If it doesn’t exist, you must create it first. The tracing name is arbitrary here, you can give the group any name you want.
  • /sys/kernel/tracing is mounted as tracefs with permissions 750 or more lax and as the tracing group.
    • Mount /sys/kernel/tracing in a standard way if not mounted yet (i.e. run mount -t tracefs nodev /sys/kernel/tracing).
    • Once /sys/kernel/tracing is mounted in a standard way, remount the directory by running mount -o remount,mode=0750,gid=<GID of the tracing group> /sys/kernel/tracing.
    • If the above doesn’t work, you need to change group ownership of all contents inside /sys/kernel/tracing by running for example chown -R root:tracing /sys/kernel/tracing. You may also need to change file permissions in a similar way by running for example chmod -R 750 /sys/kernel/tracing.
    • You can also opt for automating the above in any way you like.

Roofline profiling

By default, cache-aware roofline profiling is not performed. If you want to do this as well, see Cache-aware roofline profiling.

If you want to see what extra options you can set (e.g. an on-CPU/off-CPU sampling frequency, the quiet mode), run adaptyst --help.

Help message printed by adaptyst --help
$ adaptyst --help
Adaptyst: a performance analysis tool 


adaptyst [OPTIONS] [COMMAND...]


POSITIONALS:
  COMMAND                          Command to be profiled (required) 

OPTIONS:
  -h,      --help                  Print this help message and exit 
  -v,      --version               Print version and exit 
  -F,      --freq UINT>0           Sampling frequency per second for on-CPU 
                                   time profiling (default: 10) 
                                   
  -B,      --buffer UINT>0         Buffer up to this number of events before 
                                   sending data for processing (1 effectively 
                                   disables buffering) (default: 1) 
                                   
  -f,      --off-cpu-freq UINT or -1 
                                   Sampling frequency per second for off-CPU 
                                   time profiling (0 disables off-CPU 
                                   profiling, -1 makes Adaptyst capture *all* 
                                   off-CPU events) (default: 1000) 
                                   
  -b,      --off-cpu-buffer UINT   Buffer up to this number of off-CPU events 
                                   before sending data for processing (0 
                                   leaves the default adaptive buffering, 1 
                                   effectively disables buffering) (default: 0) 
                                   
  -p,      --post-process UINT     Number of threads isolated from profiled 
                                   command to use for profilers and processing 
                                   (must not be greater than 13). Use 0 to not 
                                   isolate profiler and processing threads 
                                   from profiled command threads (NOT 
                                   RECOMMENDED). (default: 1) 
                                   
  -a,      --address ADDRESS:PORT  Delegate processing to another machine 
                                   running adaptyst-server. All results will 
                                   be stored on that machine. 
                                   
  -c,      --codes TYPE[:ARG]      Send the newline-separated list of detected 
                                   source code files to a specified 
                                   destination rather than pack the code files 
                                   on the same machine where a profiled 
                                   program is run. The value can be either 
                                   "srv" (i.e. the server receives the list, 
                                   looks for the files there, and creates a 
                                   source code archive there as well), 
                                   "file:<path>" (i.e. the list is saved to 
                                   <path> and can be then read e.g. by 
                                   adaptyst-code), or "fd:<number>" (i.e. the 
                                   list is written to a specified file 
                                   descriptor). 
                                   
  -s,      --server-buffer UINT>0  Communication buffer size in bytes for 
                                   internal adaptyst-server. Not to be used 
                                   with -a. (default when no -a: 1024) 
                                   
  -w,      --warmup UINT>0         Warmup time in seconds between 
                                   adaptyst-server signalling readiness for 
                                   receiving data and starting the profiled 
                                   program. Increase this value if you see 
                                   missing information after profiling (note 
                                   that adaptyst-server is also used 
                                   internally if no -a option is specified). 
                                   (default: 1) 
                                   
  -e,      --event EVENT,PERIOD,TITLE 
                                   Extra perf event to be used for sampling 
                                   with a given period (i.e. do a sample on 
                                   every PERIOD occurrences of an event and 
                                   display the results under the title TITLE 
                                   in a website). Run "perf list" for the list 
                                   of possible events. You can specify 
                                   multiple events by specifying this option 
                                   more than once. Use quotes if you need to 
                                   use spaces. 
                                   
  -r,      --roofline UINT>0       Run also cache-aware roofline profiling 
                                   with the specified sampling frequency per 
                                   second 
                                   
  -i,      --filter TYPE:FILE      Set stack trace filtering options. 
                                   deny:<FILE> cuts all stack elements 
                                   matching a set of conditions specified in a 
                                   given text file (use - for stdin). 
                                   allow:<FILE> accepts only stack elements 
                                   matching a set of conditions specified in a 
                                   given text file (use - for stdin). 
                                   python:<FILE> sends all stack trace 
                                   elements to a given Python script for 
                                   filtering. Unless -k is used, all filtered 
                                   out elements are deleted completely. See 
                                   the Adaptyst documentation to check in 
                                   detail how to use filtering. 
                                   
  -k,      --mark Needs: --filter  When -i is used, mark filtered out stack 
                                   trace elements as "(cut)" and squash any 
                                   consecutive "(cut)"'s into one rather than 
                                   deleting them completely 
                                   
  -m,      --mode kernel OR user OR both 
                                   Capture only kernel, only user (i.e. 
                                   non-kernel), or both stack trace types 
                                   respectively (default: "user") 
                                   
  -q,      --quiet                 Do not print anything (if set, check exit 
                                   code for any errors) 
                                   

If you want to change the paths of the system-wide and local Adaptyst 
configuration files, set the environment variables ADAPTYST_CONFIG and 
ADAPTYST_LOCAL_CONFIG respectively to values of your choice. Similarly, 
you can set the ADAPTYST_SCRIPT_DIR environment variable to change the path 
where Adaptyst looks for its Python scripts.

After profiling is completed, you can check the results inside results.

You can run adaptyst multiple times, all profiling results will be saved inside the same results directory provided that every adaptyst execution is done inside the same working directory.

Structure of results

The structure of results is as follows:

  • (year)_(month)_(day)_(hour)_(minute)_(second)_(hostname)__(command): the directory for a given profiling session
    • out: the directory with output logs
      • perf_(record or script)_(event)_stdout.log, perf_(record or script)_(event)_stderr.log: stdout and stderr logs from perf-record/perf-script. (event) can be either “main” (on-CPU/off-CPU profiling), “syscall” (syscall profiling for tracing threads/processes), or a custom perf event specified by the user.
      • stdout.log, stderr.log: stdout and stderr logs from the profiled command.
    • processed: the directory with processed profiling information
      • metadata.json: metadata (such as the thread/process tree and thread/process spawning stack traces) stored in JSON.
      • (event)_callchains.json: mappings between compressed callchain names and uncompressed ones stored in JSON. (event) can be either “walltime” (on-CPU/off-CPU profiling), “syscall” (syscall profiling for tracing threads/processes, applicable to metadata.json), or a custom perf event specified by the user.
      • (PID)_(TID).json: all samples gathered by on-CPU/off-CPU profiling and custom perf event profiling (if any) stored in JSON, per thread/process.
      • event_dict.data: mappings between custom perf events and their website titles as specified by the user (it is not created when no custom events are provided).
      • sources.json: mappings between executable offsets and corresponding source code files and line numbers.
      • src.zip: the ZIP archive with all source code files detected by Adaptyst. This can be created manually by adaptyst-code if not present.
      • roofline.csv: if enabled, the CARM Tool roofline benchmarking results for a machine (they are not program-specific).

Please note that the overall output format will change as part of implementing our roadmap. Once it happens, the documentation will be updated accordingly.

Docker

Adaptyst can run in a Docker container. For a quick start, the ready-to-use image is provided by us (see Installation).

Please note the following:

  • Your container must have the CAP_PERFMON, CAP_BPF, and CAP_IPC_LOCK capabilities (and optionally CAP_SYSLOG, see “Running Adaptyst as non-root” above).
  • eBPF-based context switch tracing needed for off-CPU profiling and system call tracing needed for tracing threads/processes may not work out-of-the-box. If this happens, see if running your container with --pid=host helps.
  • You may need to mount /sys/kernel/tracing manually, either when creating your container or inside your container. See “Running Adaptyst as non-root” above.
  • A user inside your container must belong to the group which owns /sys/kernel/tracing.

Apptainer/Singularity

Adaptyst can also run in an Apptainer/Singularity container. For a quick start, the ready-to-use image is provided by us (see Installation).

Please note the following:

  • Your container must have the CAP_PERFMON, CAP_BPF, and CAP_IPC_LOCK capabilities (and optionally CAP_SYSLOG, see “Running Adaptyst as non-root” above). Make sure that Apptainer/Singularity supports this in your case (e.g. on AlmaLinux 9, apptainer-suid must be installed in addition to Apptainer alone).
  • /sys/kernel/tracing must be mounted in the container, e.g. by bind mounting in Apptainer/Singularity.

Profiled source code packaging

By default, Adaptyst looks for and packs source code files on the same machine where your profiled program is run. This behaviour can be changed by running adaptyst with the -c option.

You can either make adaptyst-server look for and pack source code files instead (if the server runs on a different machine), save the list of detected source code file paths to a text file, or print such list to a given file descriptor. In the two latter cases, you can supply the list to adaptyst-code to create a ZIP archive with the code files. adaptyst-code is a Python script and has less strict setup requirements than the full Adaptyst suite ones.

The resulting ZIP archive should be placed as src.zip inside results/<your profiling session>/processed (see “Structure of results” above).

Here’s an example of running Adaptyst with a custom source code packing mode:

$ ls
<initial contents>
$ adaptyst -c file:code_paths.lst -- <command>
...
$ ls
<initial contents>
code_paths.lst
results
$ adaptyst-code code_paths.lst
...
$ ls
<initial contents>
code_paths.lst
results
src.zip
$ mv src.zip results/<your profiling session>/processed/
$

If you want to see how you can configure adaptyst-code to your needs, run adaptyst-code --help.

Help message printed by adaptyst-code --help
usage: adaptyst-code [-h] [-o FILE] [-v] PATHS_FILE

Profiled code packaging tool for Adaptyst

positional arguments:
  PATHS_FILE  path to a code paths file generated by Adaptyst
              (usually as code_paths.lst) or written manually with
              one source code file path per line (use "-" for stdin)

options:
  -h, --help  show this help message and exit
  -o FILE     path to an output ZIP archive to be produced (src.zip
              in the current directory by default, use "-" for stdout)
  -v          print path-processing-related errors to stderr (these
              are not printed by default)