Introduction

running-ng is a collection of scripts that help people run workloads in a methodologically sound settings.

Disclaimer

At this stage, the focus of this project is driven by the internal use of members from Steve Blackburn's lab. If you are a member of the lab, you know what to do if you encounter any issue, and you can ignore the below.

If you are a member of the public, please kindly note that the project is open-source and documented on a "good-faith" basis. We might not have the time to consider your features requests. Please don't be offended if we ignore these. Having said that, you are very welcomed to use it, and we will be very pleased if this helps anyone. In particular, we are grateful if you report bugs you found with steps to reproduce it.

⚠️ Warning

The syntax (of configuration files and command line arguments) of running-ng is not stabilized yet. When you use it, expect breaking changes, although we will try to minimize this where possible.

running-ng has been tested by few people, and we think it is stable enough to use for your experiments. However, there are probably few wrinkles to be ironed out. Please file any bug or feature request on the issue tracker.

You are also welcome to implement new features and/or fix bugs by opening pull requests. But before you do so, please discuss with Steve first for major design changes. For non-user-facing changes, please discuss with the maintainers first.

History

The predecessor of running-ng is running, a set of scripts written in Perl, dating back to 2005. However, the type of workloads we are evaluation has changed a bit, and we want a new set of scripts that fit our needs better.

Two major sources of inspiration are mu-perf-benchmarks and menthol.

mu-perf-benchmarks is a performance regression framework built for The Mu Micro Virtual Machine. Zixian coauthored the framework with John Zhang in 2017. It features a web frontend for displaying results. You can see the live instance here.

menthol is a benchmarking framework built for running benchmarks in high-performance computing (HPC) settings. Zixian built it for his research project about evaluating Chapel's performance in 2018. The framework can run benchmarks in different languages on either single node or on a cluster through PBS job scheduler.

Maintainers

Installation

pip3 install --user -U running-ng

The base configuration files can be usually be found in paths like ~/.local/lib/python3.6/site-packages/running/config/base. The exact path might differ depending on your Python version, etc.

Adding running to PATH

You will need to add the folder where running is installed to your PATH. On a typical Linux installation, running is installed to ~/.local/bin.

You will need to refer to the documentation of the shell you are using.

Here is an example for bash.

# Add the following to ~/.bashrc
PATH=$PATH:$HOME/.local/bin

You don't need to use export. Generally, $PATH already exists and is exported to child processes.

Please check whether your ~/.bash_profile or ~/.profile sources ~/.bashrc. If not, when you use a login shell (e.g., in the case of tmux), the content of ~/.bashrc might not be applied.

To ensure ~/.bashrc is always sourced, you can add the following to ~/.bash_profile.

if [ -f ~/.bashrc ]; then
  . ~/.bashrc
fi

If you are a moma user, please change these dotfiles on squirrel.moma, and then run sudo /moma-admin/config/update_self.fish. Note that you should run this command using a SSH session on a standard terminal instead of using the integrated terminal in VSCode Remote. Please check here for how to setup a UNIX password for sudo.

Quickstart

This guide will show you how to use running-ng to compare two different builds of JVMs.

Note that for each occurrence in the form /path/to/*, you need to replace it with the real path of the respective item in the filesystem.

Installation

Please follow the installation guide to install running-ng. You will need Python 3.6+.

Then, create a file two_builds.yml with the following content.

includes:
  - "$RUNNING_NG_PACKAGE_DATA/base/runbms.yml"

The YAML file represents a dictionary (key-value pairs) that defines the experiments you are running. The includes directive here will populate the dictionary with some default values shipped with running-ng.

If you use moma machines, please substitute runbms.yml with runbms-anu.yml.

Prepare Benchmarks

Add the following to two_builds.yml.

benchmarks:
  dacapochopin-29a657f:
    - avrora
    - batik
    - biojava
    - cassandra
    - eclipse
    - fop
    - graphchi
    - h2
    - h2o
    - jme
    - jython
    - luindex
    - lusearch
    - pmd
    - sunflow
    - tradebeans 
    - tradesoap
    - tomcat
    - xalan
    - zxing

This specify a list of benchmarks used in this experiment from the benchmark suite dacapochopin-29a657f. The benchmark suite is defined in $RUNNING_NG_PACKAGE_DATA/base/dacapo.yml. By default, the minimum heap sizes of dacapochopin-29a657f benchmarks are measured with AdoptOpenJDK 15 using G1 GC. If you are using OpenJDK 11 or 17, you can override the value of suites.dacapochopin-29a657f.minheap to temurin-17-G1 or temurin-11-G1. That is, you can, for example, add "suites.dacapochopin-29a657f.minheap": "temurin-17-G1" to overrides.

Then, add the following to two_builds.yml.

overrides:
  "suites.dacapochopin-29a657f.timing_iteration": 5
  "suites.dacapochopin-29a657f.callback": "probe.DacapoChopinCallback"

That is, we want to run five iterations for each invocation, and use DacapoChopinCallback because it is the appropriate callback for this release of DaCapo.

Prepare Your Builds

In this guide, we assume you use mmtk-openjdk. Please follow its build guide.

I assume you produced two different builds you want to compare. Add the following to two_builds.yml.

runtimes:
  build1:
    type: OpenJDK
    release: 11
    home: "/path/to/build1/jdk" # make sure /path/to/build1/jdk/bin/java exists
  build2:
    type: OpenJDK
    release: 11
    home: "/path/to/build2/jdk" # make sure /path/to/build2/jdk/bin/java exists

This defines two builds of runtimes.

I recommend that you use absolute paths for the builds, although relative paths will work, and will be relative to where you run running.

I strongly recommend you rename the builds (both the name in the configuration file and the folder name) to something more sensible, preferably with the commit hash for easy troubleshooting and performance debugging later.

Prepare Probes

Please clone probes, and run make.

Add the following to two_builds.yml.

modifiers:
  probes_cp:
    type: JVMClasspath
    val: "/path/to/probes/out /path/to/probes/out/probes.jar"
  probes:
    type: JVMArg
    val: "-Djava.library.path=/path/to/probes/out -Dprobes=RustMMTk"

This defines two modifiers, which will be used later to modify the JVM command line arguments.

Please only use absolute paths for all the above.

Prepare Configs

Finally, add he following to two_builds.yml.

configs:
  - "build1|ms|s|c2|mmtk_gc-SemiSpace|tph|probes_cp|probes"
  - "build2|ms|s|c2|mmtk_gc-SemiSpace|tph|probes_cp|probes"

The syntax is described here.

Sanity Checks

The basic form of usage looks like this.

running runbms /path/to/log two_builds.yml 8

That is, run the experiments as specified by two_builds.yml, store the results in /path/to/log, and explore eight different heap sizes (with careful arrangement of which size to run first and which to run later).

See here for a complete reference of runbms.

Dry run

A dry run (by supplying -d to running NOT runbms) allows you to see the commands to be executed.

running -d runbms /path/to/log two_builds.yml 8 -i 1

Make sure it looks like what you want.

Single Invocation

Now, actually run the experiment, but only for one invocation (by supplying -i 1 to runbms).

running runbms /path/to/log two_builds.yml 8 -i 1

This allows you to see any issue before wasting several days only realizing that something didn't work.

Run It

Once you are happy with everything, run the experiments.

running runbms /path/to/log two_builds.yml 8 -p "two_builds"

Don't forget to give the results folder a prefix so that you can later tell what the experiment was for.

Analysing Results

This is outside the scope of this quickstart guide.

Basics

Briefly talk about how basic concepts fit together here...

Before diving into the details, please read the design principles to help you better understand why things are organized in such way.

Design Principles

Sound methodology

Sound methodology is crucial for the type of performance analysis work we do. Please see the documentation for each of the command for details. We also try to include sensible default values in the base configuration files.

Reproducibility

It should be easy to reproduce a set of experiments. To this end, various commands will save as much metadata with the results. For example, runbms saves the flattened configuration file and command line arguments in the results folder. For each log, basic information about the execution environment, such as uname, the model name of the CPU, and frequencies of CPU cores, is saved as well.

Extensibility

Broadly, the project consists of two parts: the core and the commands. The core provides abstractions for core concepts, such as benchmarks and execution environments, and can be extended through class inheritance.

The commands are the user-facing parts that uses the core to provide concrete functionalities.

Reusability

The configuration files can be easily reused through the includes and overrides mechanisms. For example, people might want to run multiple sets of experiments with minor tweaks, and being able to share a common base configuration file is ergonomic. This is also crucial to the first point that people can get a set of sensible default values by including base configuration files shipped with the project.

Human-readable syntax

We use YAML as the format for the configuration files. Please read the syntax reference for more details.

Configuration File Syntax

The configuration file is in YAML format. You can find a good YAML tutorial here. Below is the documentation for all the top-level keys that are common to all commands.

benchmarks

A YAML list of benchmarks to run in each specified benchmark suite.

For example:

benchmarks:
  dacapo2006:
    - eclipse
  dacapobach:
    - avrora
    - fop

specifies running to run the eclipse benchmark from the dacapo2006 benchmark suite; and the avrora and fop benchmarks from the dacapobach benchmark suite. These benchmark suites have to be defined previously (usually through an includes key).

Note that each benchmark of a benchmark suite can either be a string or a suite-specific dictionary. For example, for the DaCapo benchmark suite, the following two snippets are equivalent.

benchmarks:
  dacapo2006:
    - eclipse
benchmarks:
  dacapo2006:
    - {name: eclipse, bm_name: eclipse, size: default}

configs

A YAML list of configuration strings to be used to run the benchmarks. These are specified as a runtime followed by a '|' separated list of modifiers, i.e. "<runtime>|<modifier>|...|<modifier>".

For example:

configs:
  - "openjdk11|ms|s|c2"
  - "openjdk15|ms|s"

specifies running to use the openjdk11 runtime with ms, s, and c2 modifiers; and the openjdk15 runtime with the ms, and s modifiers. In the example above, we assume that both the runtimes and modifiers have been previously defined (in either the current configuration file or in an includes file).

Each segment in the configuration strings can have whitespaces in them, so that it's easier for multi-line editing.

For example:

configs:
  - "openjdk8 |foo-1 |bar|buzz"
  - "openjdk15|foo-16|   |buzz"

includes

A YAML list of paths to YAML files that are to be included into the current configuration file for definitions of some keys.

This is primarily used to provide re-usability and extensibility of configuration files. A pre-processor step in running takes care of including all the specified files. A flattened version of the final configuration file is also generated and placed in the results folder for reproducibility.

The paths can be either absolute or relative. Relative paths are solved relative to the current file. For example, if $HOME/configs/foo.yml has an include line ../bar.yml, the line is interpreted as $HOME/bar.yml. Similarly,

includes:
 - "./base/suites.yml"
 - "./base/modifiers.yml"

includes the suites.yml and modifiers.yml files located at ./base respectively.

Any environment variable in the paths are also resolved before any further processing. This include a special environment variable $RUNNING_NG_PACKAGE_DATA that allows you to refer to various configuration files shipping with running-ng, regardless how you installed running-ng. For example, in a global pip installation, $RUNNING_NG_PACKAGE_DATA will look like /usr/local/lib/python3.10/dist-packages/running/config.

overrides

Under construction 🚧.

modifiers

A YAML dictionary of program arguments or environment variables that are to be used with config strings. Cannot use - in the key for a modifier. Each modifier requires a type key with other keys being specific to that type. For more information regarding the different types of modifiers, please refer to this page.

Warning preview feature ⚠️. We can exclude certain benchmarks from using a specific modifier by using an excludes key along with a YAML list of benchmarks to be excluded from each benchmark suite.

For example:

modifiers:
  s:
    type: JVMArg
    val: "-server"
  c2:
    type: JVMArg
    val: "-XX:-TieredCompilation -Xcomp"
    excludes:
      dacapo2006:
        - eclipse

specifies two modifiers, s and c2, both of type JVMArg with their respective values. Here, the eclipse benchmark from the dacapo2006 benchmark suite has been excluded from the c2 modifier.

Warning preview feature ⚠️. Similarly, we can attach the modifier only to specific benchmarks by using an includes key.

For example:

modifiers:
  c2:
    type: JVMArg
    val: "-XX:-TieredCompilation -Xcomp"
    includes:
      dacapo2006:
        - eclipse

The c2 modifier will only be attached when running the eclipse benchmark from the dacapo2006 benchmark suite.

excludes has a higher priority than includes.

For example:

modifiers:
  c2:
    type: JVMArg
    val: "-XX:-TieredCompilation -Xcomp"
    includes:
      dacapo2006:
        - eclipse
        - fop
    excludes:
      dacapo2006:
        - fop

The c2 modifier will only be attached when running the eclipse benchmark from the dacapo2006 benchmark suite, no other benchmark will run with this modifier (not even fop even though it appears in the includes).

Value Options

These are special modifiers whose values can be specified through their use in a configuration string. Concrete values are specified as - separated values after the modifier's name in a configuration string. These values will be indexed by the modifier through syntax similar to Python format strings.

This is best understood via an example:

modifiers:
  env_var:
    type: EnvVar
    var: "FOO{0}"
    val: "{1}"

[...]

configs:
  - "openjdk11|env_var-42-43"

specifies to run the openjdk11 runtime with the environment variable FOO42 set to 43. Note that value options are not limited only to environment variables, and can be used for all modifier types.

runtimes

A YAML dictionary of runtime definitions that are to be used with config strings. Each runtime requires a type key with other keys being specific to that type. For more information regarding the different types of runtimes, please refer to this page.

suites

A YAML dictionary of benchmark suite definitions that are to be used as keys of benchmarks. Each benchmark suite requires a type key with other keys being specific to that type. For more information regarding the different types of benchmark suites, please refer to this page.

Benchmark Suite

BinaryBenchmarkSuite (preview ⚠️)

A BinaryBenchmarkSuite is a suite of programs which can be used to run binary benchmarks such as for C/C++ benchmarking.

Keys

programs: A yaml list of benchmarks in the format:

programs:
  <BM_NAME_1>:
    path: /full/path/to/benchmark/binary_1
    args: "Any arguments to binary_1"
  <BM_NAME_2>:
    path: /full/path/to/benchmark/binary_2
    args: "Any arguments to binary_2"
  [...]

A possible use-case could use wrapper shell scripts around the benchmark to output timing and other information in a tab-separated table.

DaCapo

DaCapo benchmark suite.

Keys

release: one of the possible values ["2006", "9.12", "evaluation"]. The value is required.

path: path to the DaCapo jar. The value is required. Environment variables will be expanded.

minheap: a string that selects one of the minheap_values sets to use.

minheap_values: a dictionary containing multiple named sets of minimal heap sizes that is enough for a benchmark from the suite to run without triggering OutOfMemoryError. Each size is measured in MiB. The default value is an empty dictionary. The minheap values are used only when running runbms with a valid N value. If the minheap value for a benchmark is not specified, a default of 4096 is used. An example looks like this.

minheap_values:
  adoptopenjdk-15-G1:
    avrora: 7
    batik: 253
  temurin-17-G1:
    avrora: 7
    batik: 189

timing_iteration: specifying the timing iteration. It can either be a number, which is passed to DaCapo as -n, or a string converge. The default value is 3.

callback: the class (possibly within some packages) for the DaCapo callback. The value is passed to DaCapo as -c. The default value is null.

timeout: timeout for one invocation of a benchmark in seconds. The default value is null.

wrapper (preview ⚠️): specifying a wrapper (i.e., extra stuff on the command line before java) when running benchmarks. The default value is null, a no-op. There are two possible ways to specify wrapper. First, a single string with shell-like syntax. Multiple arguments are space separated. This wrapper is used for all benchmarks in the benchmark suite. Second, a dictionary of strings with shell-like syntax to specify possibly different wrappers for different benchmarks. If a benchmark doesn't have a wrapper in the dictionary, it is treated as null.

companion (preview ⚠️): the syntax is similar to wrapper. The companion program will start before the main program. The main program will start two seconds after the companion program to make sure the companion is fully initialized. Once the main program finishes, we will wait for the companion program to finish. Therefore, companion programs should have appropriate timeouts or detect when main program finishes. Here is an example of using companion to launch bpftrace in the background to count the system calls.

includes:
  - "$RUNNING_NG_PACKAGE_DATA/base/runbms.yml"

overrides:
  "suites.dacapo2006.timing_iteration": 1
  "suites.dacapo2006.companion": "sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @syscall[args->id] = count(); @process[comm] = count();} interval:s:10 { printf(\"Goodbye world!\\n\"); exit(); }'"
  "invocations": 1

benchmarks:
  dacapo2006:
    - fop

configs:
  - "temurin-17"

In the log file, the output from the main program and the output from the companion program is separated by *****.

size: specifying the size of input data. Note that the names of the sizes are subject to change depending on the DaCapo releases. The default value is null, which means DaCapo will use the default size unless you override that for individual benchmarks.

Benchmark Specification

Some of the suite-wide keys can be overridden in a per-benchmark-basis. The keys currently supported are timing_iteration, size, and timeout. Note that, within a suite, your choice of name should uniquely identify a particular way of running a benchmark of name bm_name. The name is used to get the minheap value, etc., which can depend of the size of input data and/or the timing iteration. Therefore, it is highly recommended that you give a name different from the bm_name.

Note that, you might need to adjust various other values, including but not limit to the minheap value dictionary and the modifier exclusion dictionary.

The following is an example.

benchmarks:
  dacapo2006:
    - {name: eclipse_large, bm_name: eclipse, size: large}

SPECjbb2015 (preview ⚠️)

SPECjbb2015.

Keys

release: one of the possible values ["1.03"]. The value is required.

path: path to the jar. The value is required. Note that the property file should reside in path/../config/specjbb2015.props per the standard folder structure of the ISO image provided by SPEC. Environment variables will be expanded.

Benchmark Specification

Only strings are allowed, which should correspond to the the mode of the SPECjbb2015 controller. Right now, only "composite" is supported.

SPECjvm98 (preview ⚠️)

SPECjvm98.

Note that you will need to prepend probes to the classpaths, so that the modified SpecApplication can be used.

Here is an example configuration file.

includes:
  - "/home/zixianc/running-ng/src/running/config/base/runbms.yml"

modifiers:
  probes_cp:
    type: JVMClasspathPrepend
    val: "/home/zixianc/MMTk-Dev/evaluation/probes /home/zixianc/MMTk-Dev/evaluation/probes/probes.jar"

benchmarks:
  specjvm98:
    - _213_javac

configs:
  - "adoptopenjdk-8|probes_cp"

Keys

release: one of the possible values ["1.03_05"]. The value is required.

path: path to the SPECjvm98 folder, where you can find SpecApplication.class. The value is required. Environment variables will be expanded.

timing_iteration: specifying the timing iteration. It can only be a number, which is passed to SpecApplication as -i. The value is required.

Benchmark Specification

Only strings are allowed, which should correspond to benchmark program of SPECjvm98. The following are the benchmarks:

  • _200_check
  • _201_compress
  • _202_jess
  • _209_db
  • _213_javac
  • _222_mpegaudio
  • _227_mtrt
  • _228_jack

Octane (preview ⚠️)

Keys

path: path to the Octane benchmark folder. The value is required. Environment variables will be expanded.

wrapper: path to the Octane wrapper written by Wenyu Zhao. The value is required.

timing_iteration: specifying the timing iteration using an integer. The value is required.

minheap: a string that selects one of the minheap_values sets to use.

minheap_values: a dictionary containing multiple named sets of minimal heap sizes that is enough for a benchmark from the suite to run without triggering Fatal javascript OOM in .... Each size is measured in MiB. The default value is an empty dictionary. The minheap values are used only when running runbms with a valid N value. If the minheap value for a benchmark is not specified, a default of 4096 is used. An example looks like this.

minheap_values:
  d8:
    octane:
      box2d: 5
      codeload: 159
      crypto: 3

JuliaGCBenchmarks (preview ⚠️)

GC benchmarks for Julia: https://github.com/JuliaCI/GCBenchmarks

Keys

path: path to the GCBenchmarks folder. The value is required. Environment variables will be expanded.

minheap: a string that selects one of the minheap_values sets to use.

minheap_values: a dictionary containing multiple named sets of minimal heap sizes that is enough for a benchmark from the suite to run without triggering Out of Memory!. An example looks like this:

    minheap_values:
      julia-mmtk-immix:
        multithreaded/binary_tree/tree_immutable: 225
        multithreaded/binary_tree/tree_mutable: 384
        multithreaded/bigarrays/objarray: 9225
        serial/TimeZones: 5960
        serial/append: 1563
        serial/bigint/pollard: 198
        serial/linked/list: 4325
        serial/linked/tree: 216
        serial/strings/strings: 2510
        slow/bigint/pidigits: 198
        slow/rb_tree/rb_tree: 8640

Runtime

JikesRVM

NativeExecutable (preview ⚠️)

A NativeExecutable type specifies runbms to directly run the benchmarks on native hardware. This is supposed to be used in tandem with BinaryBenchmarkSuite.

OpenJDK

D8 (preview ⚠️)

Keys

executable: path to the d8 executable. Environment variables will be expanded.

SpiderMonkey (preview ⚠️)

Keys

executable: path to the js executable. Environment variables will be expanded.

JavaScriptCore (preview ⚠️)

Keys

executable: path to the jsc executable. Environment variables will be expanded.

JuliaMMTK (preview ⚠️)

Keys

executable: path to the julia executable. Environment variables will be expanded.

JuliaStock (preview ⚠️)

Julia with the stock GC. It does not allow setting a heap size, and will not throw OOM unless killed by the operating system.

Keys

executable: path to the julia executable. Environment variables will be expanded.

Modifier

EnvVar

Keys

var: name of the variable.

val: value of the variable. Environment variables will be expanded.

Description

Set an environment variable. Might override an environment variable inherited from the parent process.

JVMArg

JVM specific.

Keys

val: a single string with shell-like syntax. Multiple arguments are space separated. Environment variables will be expanded.

Description

Specify arguments to a JVM, as opposed to the program.

JSArg (preview ⚠️)

JavaScriptRuntime specific.

Keys

val: a single string with shell-like syntax. Multiple arguments are space separated. Environment variables will be expanded.

Description

Specify arguments to a JavaScript runtime (e.g., d8), as opposed to the program.

JVMClasspathAppend

JVM specific.

Keys

val: a single string with shell-like syntax. Multiple classpaths are space separated. Environment variables will be expanded.

Description

Append a list of classpaths to the existing classpaths.

JVMClasspathPrepend

JVM specific.

Keys

val: a single string with shell-like syntax. Multiple classpaths are space separated. Environment variables will be expanded.

Description

Prepend a list of classpaths to the existing classpaths.

JVMClasspath

A backward-compatibility alias of JVMClasspathAppend. Environment variables will be expanded.

ProgramArg

Keys

val: a single string with shell-like syntax. Multiple arguments are space separated. Environment variables will be expanded.

Description

Specify arguments to a program, as opposed to the runtime.

ModifierSet (preview ⚠️)

Keys

val: | separated values, with possible value options. See here for details.

Description

Specify a set of modifiers, including other ModifierSets. That is, you can use ModifierSet recursively.

Wrapper (preview ⚠️)

Keys

val: a single string with shell-like syntax. Multiple arguments are space separated. Environment variables will be expanded.

Description

Specify a wrapper. If a wrapper also exist for the benchmark suite you use, this wrapper will follow that.

Companion (preview ⚠️)

Keys

val: a single string with shell-like syntax. Multiple arguments are space separated.

Description

Specify a companion program. If a companion program also exist for the benchmark suite you use, this companion program will follow that.

JuliaArg (preview ⚠️)

JuliaMMTk and JuliaStock specific.

Keys

val: a single string with shell-like syntax. Multiple arguments are space separated. Environment variables will be expanded.

NoImplicitHeapsizeModifier (preview ⚠️)

runbms specific.

Description

Normally runbms will iterate through a set of heap sizes, either specific multiples of the minheap of each benchmark via -s, or spreading the multiples across 1~heap_range (using N and optionally ns).

This modifier prevents runbms from applying the heap sizes for certain configs, which is useful, for example, for running NoGC or EpsilonGC.

Keys

No argument is allowed.

Command References

Please see the sections in this chapter for the references for each of the subcommands.

Usage

running [-h|--help] [-v|--verbose] [-d|--dry-run] [--version] subcommand

-h: print help message.

-v: use DEBUG for logging level. The default logging level is INFO.

-d: enable dry run. Each of the subcommands that respect is flag will print out the commands to be executed in a child process instead of actually executing them.

--version: print the version number of running-ng.

Convention

For each subcommand, the documentation can roughly be divided into two parts, the command line usage and the keys in the config file.

Unless otherwise specified, the keys specified here are common to all subcommands, with the keys specific to each subcommand documented in their respective documentation.

runbms

This subcommand runs benchmarks with different configs, possibly with varying heap sizes.

Usage

runbms [-h|--help] [-i|--invocations INVOCATIONS] [-s|--slice SLICE] [-p|--id-prefix ID_PREFIX] [-m|--minheap-multiplier MINHEAP_MULTIPLIER] [--skip-oom SKIP_OOM] [--skip-timeout SKIP_TIMEOUT] [--resume RESUME] [--workdir WORKDIR] [--skip-log-compression] LOG_DIR CONFIG [N] [n ...]

-h: print help message.

-i: set the number of invocations. Overrides invocations in the config file.

-s: only use the specified heap sizes. This is a comma-separated string of integers or floating point numbers. For each slice s in SLICE, we run benchmarks at s * minheap. N and ns are ignored.

-p: add a prefix to the folder names where the results are stored. By default, the folder that stores the result is named using the host name and the timestamp. However, you can add a prefix to the folder name to signify which experiments the results belong to.

-m (preview ⚠️): multiple the minheap value for each benchmark by MINHEAP_MULTIPLIER. Do NOT use this unless you know what you are doing. Override minheap_multiplier in the config file.

--skip-oom (preview ⚠️): skip the remaining invocations if a benchmark under a config has run out of memory more than SKIP_OOM times.

--skip-timeout (preview ⚠️): skip the remaining invocations if a benchmark under a config has timed out more than SKIP_TIMEOUT times.

--resume (preview ⚠️): resume a previous run under LOG_DIR/RESUME. If a .log.gz already exists for a group of invocations, they will be skipped. Remember to clean up the partial *.log files before resuming.

--workdir (preview ⚠️): use the specified directory as the working directory for benchmarks. If not specified, a temporary directory will be created under an OS-dependent location with a runbms- prefix.

--skip-log-compression: skip compressing log file as gzip.

LOG_DIR: where to store the results. This is required.

CONFIG: the path to the configuration file. This is required.

N: the number of different heap sizes to explore. Must be powers of two. Explore heap sizes denoted by 0, 1, ..., and N (N + 1 different sizes in total). The heap size 0 represents 1.0 * minheap, and the heap size N represents heap_range * minheap (by default, 6.0 * minheap). If N is omitted, then the script will run benchmarks without explicit explicitly setting heap sizes, unless you specify -s or use a modifier that sets the heap size.

n: the heap sizes to explore. Instead of exploring 0, 1, ..., and N, only explore the ns specified.

Keys

invocations: see above.

minheap_multiplier: see above.

heap_range: the heap size relative to the minheap when n = N.

spread_factor: changes how 0, 1, ..., and N are spread out. When spread_factor is zero, the differences between 0, 1, ..., and N are the same. The larger the spread_factor is, the coarser the spacing is at the end relative to start. Please do NOT change this unless you understand how it works.

remote_host: the remote host to rsync the results to. The exact absolute path of LOG_DIR is used on both the local and the remote machine.

plugins (preview ⚠️): plugins of this command. Must be a dictionary, similar to how modifiers are declared.

Plugins (preview ⚠️)

Zulip

Zulip integration for notifying when experiments start or end. No message will be sent if it'a dry run.

Here is an example.

plugins:
  zulip:
    type: Zulip
    request:
      type: private
      to: ["your user id here"]

Keys

request: please follow the Zulip API documentation. Note that you don't need to put in content here. Please contact the administrators of your organization for your user ID. If you use a bot user and want to post to a channel, please subscribe the bot user to the channel so that messages can be edited.

config_file: an optional string to the path of config file. If not specified, the default is ~/.zuliprc. Please make sure that this file can only be accessed by you (e.g., chmod 600 ~/.zuliprc). If you are a moma user, please create this file on squirrel, and it will then be synced to other machines. Please follow the Zulip documentation for the syntax of the config file and for obtaining an API key. If you can't create a new bot, please contact the administrators of your organization.

CopyFile

Copying files from the working directory.

Here is an example.

plugins:
  dacapo_latency:
    type: CopyFile
    patterns:
      - "scratch/dacapo-latency-*.csv"

Keys

patterns: a list of patterns following the Python 3 pathlib.Path.glob syntax. Files matched the patterns will be copied to LOG_DIR where different subfolders will be created for each invocation.

skip_failed: don't copy files from failed runs. The default value is true.

Interpreting the Outputs

Under construction 🚧.

Console Outputs

Log directory

Heap Size Calculations

Please refer to the source code like here and here for the actual algorithm.

But the basic idea is as follow. First, we start with the ends and the middle and gradually fill the gap. This is to make sure you can see the big picture trend. Second, the difference between sizes are smaller for smaller sizes and larger for large sizes, because the performance is much more sensitive to the change in heap sizes when the heap is small.

Best Practices

Under construction 🚧.

Continuously Monitor Your Experiments

The results are rsynced to remote_host once all invocations for a benchmark at a heap size are finished. You shouldn't log into the experiment machine so not to disturb the experiments. You should log into the remote host and check the LOG_DIR there and see the new results that came in.

minheap

This subcommand runs benchmarks with different configs while varying heap sizes in a binary search fashion in order to determine the minimum heap required to run each benchmark.

The result is stored in a YAML file. The dictionary keys are encoded config strings. For each config, there is one dictionary per benchmark suite, where the minimum heap size for each benchmark is stored. An example is as follows.

temurin-17.openjdk_common.hotspot_gc-G1:
  dacapochopin-69a704e:
    avrora: 7
    batik: 189
temurin-17.openjdk_common.hotspot_gc-Parallel:
  dacapochopin-69a704e:
    avrora: 5
    batik: 235

At the end of each run, minheap will print out the configuration that achieves the smallest minheap size for most benchmarks. The minheap values for that configuration will be printed out, which can then be used to populate the minheap values a benchmark suite, such as a DaCapo benchmark suite. An example is as follows.

temurin-17.openjdk_common.hotspot_gc-G1 obtained the most number of smallest minheap sizes: 8
Minheap configuration to be copied to runbms config files
dacapochopin-69a704e:
  avrora: 7
  batik: 189
  biojava: 95
  eclipse: 411
  fop: 15
  graphchi: 255
  h2: 773
  jme: 29
  jython: 25
  luindex: 42
  lusearch: 21
  pmd: 156
  sunflow: 29
  tomcat: 21
  tradebeans: 131
  tradesoap: 103
  xalan: 8
  zxing: 97

Usage

minheap [-h] [-a|--attempts ATTEMPTS] CONFIG RESULT

-h: print help message.

-a (preview ⚠️): set the number of attempts. Overrides attempts in the config file.

CONFIG: the path to the configuration file. This is required.

RESULT: where to store the results. This file contains both the interim results and the final result. An interrupted execution can be resumed by using the same RESULT path. This is required.

Keys

maxheap: the upper bound of the search.

attempts (preview ⚠️): for a particular heap size, if an invocation passes or fails with OOM (timeout treated as OOM), the binary search will continue with the next appropriate heap size. If an invocation crashes and if the total number of invocations has not exceeded ATTEMPTS, the same heap size will be repeated. If all ATTEMPTS invocations crash, the binary search for this config will stop, and minheap will report inf.

fillin

Here are some recipes for common tasks.

Whole-Process Performance Event Monitoring

JVMTI

Please clone and build probes, and then build distillation. You might need to change the paths referred in the Makefiles to match your environment.

Under the distillation folder, you will find a JVMTI agent, libperf_statistics.so. You can check the source code here. To use the agent, there are four things you need to do.

First, you will need to tell the dynamic linker to load the shared library before the VM boots. This ensures that the inherit flag of perf_event_attr works properly and all child threads subsequently spawned are included in the results.

modifiers:
  jvmti_env:
    type: EnvVar
    var: "LD_PRELOAD"
    val: "/path/to/distillation/libperf_statistics.so"

Second, you need to specify a list of events you want to measure.

modifiers:
  perf:
    type: EnvVar
    var: "PERF_EVENTS"
    val: "PERF_COUNT_HW_CPU_CYCLES,PERF_COUNT_HW_INSTRUCTIONS,PERF_COUNT_HW_CACHE_LL:MISS,PERF_COUNT_HW_CACHE_L1D:MISS,PERF_COUNT_HW_CACHE_DTLB:MISS"

If you want to get a full list of events you can use on a particular machine, you can clone and build libpfm4 and run the showevtinfo program.

Third, you need to tell the JVM to load the agent. Note that you need to specify the absolute path.

modifiers:
  jvmti:
    type: JVMArg
    val: "-agentpath:/path/to/distillation/libperf_statistics.so"

Finally, you need to let the DaCapo benchmark inform the start and the end of a benchmark iteration. We will reuse the RustMMTk probe here, as the callback functions in the JVMTI agent are also called harness_begin and harness_end.

modifiers:
  probes_cp:
    type: JVMClasspath
    val: "/path/to/probes/out /path/to/probes/out/probes.jar"
  probes:
    type: JVMArg
    val: "-Djava.library.path=/path/to/probes/out -Dprobes=RustMMTk"

Now, putting it all together, you can define a set of modifiers, and use that set in your config strings.

modifiers:
  jvmti_common:
    type: ModifierSet
    val: "probes|probes_cp|jvmti|jvmti_env|perf"

MMTk

Please clone and build probes. You will need to build mmtk-core with the perf_counter feature.

First, you need to let the DaCapo benchmark inform the start and the end of a benchmark iteration.

modifiers:
  probes_cp:
    type: JVMClasspath
    val: "/path/to/probes/out /path/to/probes/out/probes.jar"
  probes:
    type: JVMArg
    val: "-Djava.library.path=/path/to/probes/out -Dprobes=RustMMTk"

Then, you can specify a list of events you want to measure.

modifiers:
  mmtk_perf:
    type: EnvVar
    var: "MMTK_PHASE_PERF_EVENTS"
    val: "PERF_COUNT_HW_CPU_CYCLES,0,-1;PERF_COUNT_HW_INSTRUCTIONS,0,-1;PERF_COUNT_HW_CACHE_LL:MISS,0,-1;PERF_COUNT_HW_CACHE_L1D:MISS,0,-1;PERF_COUNT_HW_CACHE_DTLB:MISS,0,-1"

Note that the list is semicolon-separated. Each entry consists of three parts, separated by commas. The first part is the name of the event. Please refer to the previous section for details. The second part and the third part are pid and cpu, per man perf_event_open. In most cases, you want to use 0,-1, that is measuring the calling thread (the results will be combined later through the inherit flag) on any CPU. For some events, such as RAPL, only package-wide measurement is supported, and you will have to adjust the values accordingly.

Note that you might have to increase the value of MAX_PHASES in crate::util::statistics::stats to a larger value, e.g., 1 << 14, so that the array storing the per-phase value will not overflow.

Work-Packet Performance Event Monitoring

It's similar to the whole-process performance event monitoring for MMTk. Just use MMTK_WORK_PERF_EVENTS instead of MMTK_PHASE_PERF_EVENTS.

Machine-Specific Known Problems

On Xeon D-1540 Broadwell, the PERF_COUNT_HW_CACHE_LL:MISS event is always zero.

perf stat -e LLC-load-misses,cycles /bin/ls

 Performance counter stats for '/bin/ls':

                 0      LLC-load-misses
         1,729,786      cycles

       0.001135511 seconds time elapsed

       0.001180000 seconds user
       0.000000000 seconds sys

On AMD machines, the PERF_COUNT_HW_CACHE_LL:MISS event fails to open. perf_event_open syscall fails with No such file or directory.

Frequently Asked Questions

Changelog

Unreleased

Added

Changed

Deprecated

Removed

Fixed

Security

v0.4.7 (2024-08-30)

Fixed

Commands

  • runbms: correctly apply a default minheap value for a benchmark without a defined minheap value in the config file.

v0.4.6 (2024-05-23)

Fixed

Commands

  • runbms: remove superfluous log messages when no configs has a NoImplicitHeapsizeModifier.

v0.4.5 (2024-05-23)

Added

Modifiers

  • NoImplicitHeapsizeModifier

v0.4.4 (2023-11-23)

Fixed

Benchmark Suites

  • DaCapo correctly accepts the 23.11 release specified in dacapo.yml.

v0.4.3 (2023-11-20)

Added

Base Configurations

  • DaCapo 23.11-Chopin available as dacapochopin. Please use dacapochopin_jdk9, dacapochopin_jdk11, dacapochopin_jdk17, and dacapochopin_jdk21 modifiers for JDK 9, 11, 17, and 21 respectively when you use this suite with these JDK versions.
  • Temurin 21

Changed

Base Configurations

  • Environment variables are expanded when resolving paths of runtimes and benchmark suites.
  • --add-exports java.base/jdk.internal.ref=ALL-UNNAMED is no longer automatically added when running DaCapo benchmarks on >= JDK 9. This approach doesn't scale now we have more workarounds specific to different JDK versions. This is also too opaque and not clear how it's implemented. New modifiers are introduced to address this issue.

Modifiers

  • EnvVar val is expanded using the outside environment prior to benchmark execution.

Deprecated

  • Deprecating Python 3.7 support for users. Python 3.7 was last released on June 6, 2023 (3.7.17), and no new release has been made since.

Removed

  • Dropping Python 3.7 support for developers (NOT users). pytest 7.4+ requires at least Python 3.8 (still supported by Ubuntu 20.04 LTS).

v0.4.2 (2023-09-10)

Changed

  • All Modifier instances now supports includes for only attaching them to certain benchmarks

Fixed

Runtimes

  • D8 now detects new JavaScript OOM error pattern.

Security

v0.4.1 (2023-08-22)

Fixed

Commands

  • runbms: apply modifiers in the config file.
  • minheap: apply modifiers in the config file.

v0.4.0 (2023-08-17)

Added

Modifiers

  • JuliaArg

Runtimes

  • JuliaMMTK
  • JuliaStock

Benchmark Suites

  • JuliaGCBenchmarks

Commands

  • runbms gains an extra argument, --skip-log-compression, to skip compressing log files with gzip.

Changed

Base Configurations

  • runbms: don't sync to squirrel.moma for the default runbms.yml. moma machine users should use runbms-anu.yml for the old behaviour.

Fixed

  • Gracifully handle empty modifiers strings in configs, such as openjdk7||foobar.

Benchmark Suites

  • DaCapo specific workarounds are now handled by the DaCapo class rather than the JavaBenchmark class to avoid confusions.

v0.3.9 (2023-08-02)

Fixed

Benchmark Suites

  • DaCapo: don't explicitly pass -s default to DaCapo unless the user requests so by setting the size key of DaCapo or overriding the sizes for individual benchmarks using the benchmark specification syntax. This is so that users can override the size via ProgramArg.

v0.3.8 (2023-02-21)

Changed

Commands

  • runbms: companion programs are now expected to self-terminate.

v0.3.7 (2023-02-14)

Fixed

Commands

  • runbms: better heuristics to detect whether a host is in the moma subnet.

v0.3.6 (2023-01-16)

Added

Base Configurations

  • DaCapo Chopin Snapshot-6e411f33

Fixed

  • Fixed type annotations in untyped functions and make Optionals explicit.

v0.3.5 (2022-10-13)

Changed

Commands

  • runbms: when a companion program exits with a non-zero code, a warning is generated instead of an exception to prevent stopping the entire experiment.

v0.3.4 (2022-10-13)

Fixed

Commands

  • runbms: fix the file descriptor leak when running benchmarks with companion programs.

v0.3.3 (2022-10-12)

Changed

Commands

  • runbms prints out the logged in users when emitting warnings when the machine has more than one logged in users.

Fixed

Modifiers

  • Companion: skip value options expansion if no value option is provided to avoid interpreting bpftrace syntax as replacement fields.

v0.3.2 (2022-10-12)

Added

Modifiers

  • Companion

v0.3.1 (2022-09-18)

Added

Base Syntax

  • Use the $RUNNING_NG_PACKAGE_DATA environment variable to refer to base configurations shipped with running-ng, such as $RUNNING_NG_PACKAGE_DATA/base/runbms.yml, regardless how you installed runnin-ng.

Benchmark Suites

  • DaCapo gains an extra key companion to facilitate eBPF tracing programs.

Changed

  • Overhauled Python packaging with PEP 517
  • zulip is now an optional Python dependency. Use pip install running-ng[zulip] if you want to use the Zulip runbms plugin.

Removed

  • Dropping Python 3.6 support for users.

Base Configurations

  • Removing AdoptOpenJDK from the base configuration files. AdoptOpenJDK is now replaced by Temurin.

v0.3.0 (2022-03-19)

Added

Modifiers

  • JVMClasspathAppend
  • JVMClasspathPrepend

Benchmark Suites

  • SPECjvm98

Changed

Modifiers

  • JVMClasspath is now an alias of JVMClasspathAppend. This is backward compatible.

Commands

  • runbms prints out the version number of running-ng in log files.

Deprecated

  • Deprecating Python 3.6 support for users. Python 3.6 will NOT be supported once moma machines are upgraded to the latest Ubuntu LTS.

Removed

  • Dropping Python 3.6 support for developers (NOT users). pytest 7.1+ requires at least Python 3.7.

v0.2.2 (2022-03-07)

Fixed

Benchmark Suites

  • JavaBenchmarkSuite: Some DaCapo benchmarks refers to internal classes (e.g., under jdk.internal.ref), and DaCapo implemented a workaround for this behaviour in the jar. However, since we are invoking DaCapo using -cp and the name of the main class, that workaround is bypassed. That workaround is now reimplemented in running-ng through an extra JVM argument --add-exports.

v0.2.1 (2022-03-05)

Changed

Commands

  • runbms now skips printing CPU frequencies if the system doesn't support it, e.g., when using Docker Desktop on Mac.

Fixed

Benchmark Suites

  • BinaryBenchmarkSuite: fixes missing parameter when constructing BinaryBenchmark due to a bug in previous refactoring

v0.2.0 (2022-02-20)

Added

Base Configurations

  • AdoptOpenJDK 16
  • DaCapo Chopin Snapshot-29a657f, Chopin Snapshot-f480064
  • Temurin 8, 11, 17
  • SPECjbb 2015, 1.03

Commands

  • minheap gains an extra key attempts (can be overridden by --attempts) so that crashes don't cause bogus minheap measurements.
  • minheap stores results in a YAML file, which is also used to resume an interrupted execution.
  • minheap prints the minheap values of the best config at the ends.
  • runbms gains an extra argument, --resume, to resume an interrupted execution from a log folder.
  • runbms gains an extra argument, --workdir, to override the default working directory.
  • runbms adds more information of the environment to the log file, including the date, logged in users, system load, and top processes.
  • runbms gains a callback-based plugin system, and an extra key plugins is added.
  • runbms gains a plugin CopyFile to copy files from the working directory.
  • runbms gains a plugin Zulip, which sends messages about the progress of the experiments, and warns about reservation expiration on moma machines.
  • runbms outputs a warning message if more than one users are logged in.
  • runbms uses uppercase letters if there are more than 26 configs.

Modifiers

  • ModifierSet
  • Wrapper
  • JSArg

Runtimes

  • D8
  • SpiderMonkey
  • JavaScriptCore
  • JVM now detects OOM generated in the form of Rust panic from mmtk-core.

Benchmark Suites

  • DaCapo gains an extra key size, which is used to specify the size of the input.
  • DaCapo now allows individual benchmark to override the timing iteration, input size, and timeout of the suite.
  • SPECjbb2015: basic support for running SPECjbb 2015 in composite mode.
  • Octane: basic support for running Octane using Wenyu's wrapper script.

Changed

Benchmark Suites

  • The minheap key of DaCapo changes from a dictionary to a string. The string is used to look up minheap_values, which are collections of minheap values. This makes it easier to store multiple sets of minheap values for the same benchmark suite measured using different runtimes.

Base Syntax

  • Whitespaces can be used in config strings for visual alignment. They are ignored when parsed.

Commands

  • The --slice argument of runbms now accepts multiple comma-separated floating point numbers.

Removed

Base Configurations

  • DaCapo Chopin Snapshot-69a704e

Fixed

Commands

  • Resolving relative paths of runtimes before running. Otherwise, they would be resolved relative to the runbms working directory.
  • Use the BinaryIO interface of file IO and interprocess communication to avoid invalid UTF-8 characters from crashing the script.
  • Subprocesses now inherit environment variables from the the parent process.
  • minheap now runs in a temporary working directory to avoid file-based conflicts between concurrent executions. Note that network-port-based conflicts can still happen.

v0.1.0 (2021-08-09)

Initial release.

Added

Commands

  • fillin
  • minheap
  • runbms

Modifiers

  • JVMArg
  • JVMClasspath
  • EnvVar
  • ProgramArg

Runtimes

  • NativeExecutable
  • OpenJDK
  • JikesRVM

Benchmark Suites

  • BinaryBenchmarkSuite
  • DaCapo

Base Configurations

  • AdoptOpenJDK 8, 11, 12, 13, 14, 15
  • DaCapo 2006, 9.12 (Bach), 9.12 MR1, 9.12 MR1 for Java 6, Chopin Snapshot-69a704e