Cache-aware roofline profiling

Introduction

Thanks to the integration with the CARM Tool from INESC-ID, Adaptyst can automatically perform cache-aware roofline profiling of your program after doing a few initial configuration steps.

As the result, it will be possible to view various roofline graphs in Adaptyst Analyser and plot specific code segments there to quickly check whether they are more memory-bound or compute-bound.

Supported architectures

Currently, only x86-64 Intel CPUs are supported. Adding support for x86-64 AMD CPUs is work in progress.

Initial setup

To get started, clone the CARM Tool repository from https://github.com/champ-hub/carm-roofline and add the following line to your Adaptyst configuration file:

carm_tool_path=<path to the cloned repository directory>

Profiling

In order to run cache-aware roofline profiling in addition to standard profiling done by Adaptyst, run adaptyst with the --roofline <num> option, where <num> is the sampling frequency of roofline-related performance counters (in Hz, sampling is done in the same way as custom “perf” events).

If no roofline benchmarking has been performed for a machine yet, Adaptyst will start the CARM Tool automatically and let it perform the necessary tests. This may take a long while, so you have to be patient here. Afterwards, your profiling session will be run as usual.

Running roofline benchmarks manually

It is also possible to run the roofline benchmarks manually. In this case, please run the run.py script inside the CARM Tool repository and add the following line to your Adaptyst configuration file afterwards:

roofline_benchmark_path=<path to the CSV file generated by the CARM Tool>

Normally, the CSV file is produced inside carm_results/roofline in the directory where run.py has been run. The location may be different if you have used custom options in run.py, please consult the CARM Tool documentation and/or help message then.