Introduction
running-ng
is a collection of scripts that help people run workloads in a methodologically sound settings.
Disclaimer
At this stage, the focus of this project is driven by the internal use of members from Steve Blackburn's lab. If you are a member of the lab, you know what to do if you encounter any issue, and you can ignore the below.
If you are a member of the public, please kindly note that the project is open-source and documented on a "good-faith" basis. We might not have the time to consider your features requests. Please don't be offended if we ignore these. Having said that, you are very welcomed to use it, and we will be very pleased if this helps anyone. In particular, we are grateful if you report bugs you found with steps to reproduce it.
⚠️ Warning
The syntax (of configuration files and command line arguments) of running-ng
is not stabilized yet.
When you use it, expect breaking changes, although we will try to minimize this where possible.
running-ng
has been tested by few people, and we think it is stable enough to use for your experiments.
However, there are probably few wrinkles to be ironed out.
Please file any bug or feature request on the issue tracker.
You are also welcome to implement new features and/or fix bugs by opening pull requests. But before you do so, please discuss with Steve first for major design changes. For non-user-facing changes, please discuss with the maintainers first.
History
The predecessor of running-ng
is running
, a set of scripts written in Perl, dating back to 2005.
However, the type of workloads we are evaluation has changed a bit, and we want a new set of scripts that fit our needs better.
Two major sources of inspiration are mu-perf-benchmarks
and menthol
.
mu-perf-benchmarks
is a performance regression framework built for The Mu Micro Virtual Machine.
Zixian coauthored the framework with John Zhang in 2017.
It features a web frontend for displaying results.
You can see the live instance here.
menthol
is a benchmarking framework built for running benchmarks in high-performance computing (HPC) settings.
Zixian built it for his research project about evaluating Chapel's performance in 2018.
The framework can run benchmarks in different languages on either single node or on a cluster through PBS job scheduler.
Maintainers
Installation
pip3 install --user -U running-ng
The base configuration files can be usually be found in paths like ~/.local/lib/python3.6/site-packages/running/config/base
.
The exact path might differ depending on your Python version, etc.
Adding running
to PATH
You will need to add the folder where running
is installed to your PATH
.
On a typical Linux installation, running
is installed to ~/.local/bin
.
You will need to refer to the documentation of the shell you are using.
Here is an example for bash.
# Add the following to ~/.bashrc
PATH=$PATH:$HOME/.local/bin
You don't need to use export
.
Generally, $PATH
already exists and is exported to child processes.
Please check whether your ~/.bash_profile
or ~/.profile
source
s ~/.bashrc
.
If not, when you use a login shell (e.g., in the case of tmux), the content of ~/.bashrc
might not be applied.
To ensure ~/.bashrc
is always sourced, you can add the following to ~/.bash_profile
.
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
If you are a moma user, please change these dotfiles on squirrel.moma
, and then run sudo /moma-admin/config/update_self.fish
.
Note that you should run this command using a SSH session on a standard terminal instead of using the integrated terminal in VSCode Remote.
Please check here for how to setup a UNIX password for sudo
.
Quickstart
This guide will show you how to use running-ng
to compare two different builds of JVMs.
Note that for each occurrence in the form /path/to/*
, you need to replace it with the real path of the respective item in the filesystem.
Installation
Please follow the installation guide to install running-ng
.
You will need Python 3.6+.
Then, create a file two_builds.yml
with the following content.
includes:
- "$RUNNING_NG_PACKAGE_DATA/base/runbms.yml"
The YAML file represents a dictionary (key-value pairs) that defines the experiments you are running.
The includes
directive here will populate the dictionary with some default values shipped with running-ng
.
If you use moma machines, please substitute runbms.yml
with runbms-anu.yml
.
Prepare Benchmarks
Add the following to two_builds.yml
.
benchmarks:
dacapochopin-29a657f:
- avrora
- batik
- biojava
- cassandra
- eclipse
- fop
- graphchi
- h2
- h2o
- jme
- jython
- luindex
- lusearch
- pmd
- sunflow
- tradebeans
- tradesoap
- tomcat
- xalan
- zxing
This specify a list of benchmarks used in this experiment from the benchmark suite dacapochopin-29a657f
.
The benchmark suite is defined in $RUNNING_NG_PACKAGE_DATA/base/dacapo.yml
.
By default, the minimum heap sizes of dacapochopin-29a657f
benchmarks are measured with AdoptOpenJDK 15 using G1 GC.
If you are using OpenJDK 11 or 17, you can override the value of suites.dacapochopin-29a657f.minheap
to temurin-17-G1
or temurin-11-G1
.
That is, you can, for example, add "suites.dacapochopin-29a657f.minheap": "temurin-17-G1"
to overrides
.
Then, add the following to two_builds.yml
.
overrides:
"suites.dacapochopin-29a657f.timing_iteration": 5
"suites.dacapochopin-29a657f.callback": "probe.DacapoChopinCallback"
That is, we want to run five iterations for each invocation, and use DacapoChopinCallback
because it is the appropriate callback for this release of DaCapo.
Prepare Your Builds
In this guide, we assume you use mmtk-openjdk
.
Please follow its build guide.
I assume you produced two different builds you want to compare.
Add the following to two_builds.yml
.
runtimes:
build1:
type: OpenJDK
release: 11
home: "/path/to/build1/jdk" # make sure /path/to/build1/jdk/bin/java exists
build2:
type: OpenJDK
release: 11
home: "/path/to/build2/jdk" # make sure /path/to/build2/jdk/bin/java exists
This defines two builds of runtimes.
I recommend that you use absolute paths for the builds, although relative paths will work, and will be relative to where you run running
.
I strongly recommend you rename the builds (both the name in the configuration file and the folder name) to something more sensible, preferably with the commit hash for easy troubleshooting and performance debugging later.
Prepare Probes
Please clone probes
, and run make
.
Add the following to two_builds.yml
.
modifiers:
probes_cp:
type: JVMClasspath
val: "/path/to/probes/out /path/to/probes/out/probes.jar"
probes:
type: JVMArg
val: "-Djava.library.path=/path/to/probes/out -Dprobes=RustMMTk"
This defines two modifiers, which will be used later to modify the JVM command line arguments.
Please only use absolute paths for all the above.
Prepare Configs
Finally, add he following to two_builds.yml
.
configs:
- "build1|ms|s|c2|mmtk_gc-SemiSpace|tph|probes_cp|probes"
- "build2|ms|s|c2|mmtk_gc-SemiSpace|tph|probes_cp|probes"
The syntax is described here.
Sanity Checks
The basic form of usage looks like this.
running runbms /path/to/log two_builds.yml 8
That is, run the experiments as specified by two_builds.yml
, store the results in /path/to/log
, and explore eight different heap sizes (with careful arrangement of which size to run first and which to run later).
See here for a complete reference of runbms
.
Dry run
A dry run (by supplying -d
to running
NOT runbms
) allows you to see the commands to be executed.
running -d runbms /path/to/log two_builds.yml 8 -i 1
Make sure it looks like what you want.
Single Invocation
Now, actually run the experiment, but only for one invocation (by supplying -i 1
to runbms
).
running runbms /path/to/log two_builds.yml 8 -i 1
This allows you to see any issue before wasting several days only realizing that something didn't work.
Run It
Once you are happy with everything, run the experiments.
running runbms /path/to/log two_builds.yml 8 -p "two_builds"
Don't forget to give the results folder a prefix so that you can later tell what the experiment was for.
Analysing Results
This is outside the scope of this quickstart guide.
Basics
Briefly talk about how basic concepts fit together here...
Before diving into the details, please read the design principles to help you better understand why things are organized in such way.
Design Principles
Sound methodology
Sound methodology is crucial for the type of performance analysis work we do. Please see the documentation for each of the command for details. We also try to include sensible default values in the base configuration files.
Reproducibility
It should be easy to reproduce a set of experiments.
To this end, various commands will save as much metadata with the results.
For example, runbms
saves the flattened configuration file and command line arguments in the results folder.
For each log, basic information about the execution environment, such as uname
, the model name of the CPU, and frequencies of CPU cores, is saved as well.
Extensibility
Broadly, the project consists of two parts: the core and the commands. The core provides abstractions for core concepts, such as benchmarks and execution environments, and can be extended through class inheritance.
The commands are the user-facing parts that uses the core to provide concrete functionalities.
Reusability
The configuration files can be easily reused through the includes
and overrides
mechanisms.
For example, people might want to run multiple sets of experiments with minor tweaks, and being able to share a common base configuration file is ergonomic.
This is also crucial to the first point that people can get a set of sensible default values by including base configuration files shipped with the project.
Human-readable syntax
We use YAML as the format for the configuration files. Please read the syntax reference for more details.
Configuration File Syntax
The configuration file is in YAML format. You can find a good YAML tutorial here. Below is the documentation for all the top-level keys that are common to all commands.
benchmarks
A YAML list of benchmarks to run in each specified benchmark suite.
For example:
benchmarks:
dacapo2006:
- eclipse
dacapobach:
- avrora
- fop
specifies running
to run the eclipse
benchmark from the dacapo2006
benchmark suite; and the avrora
and fop
benchmarks from the dacapobach
benchmark suite. These benchmark suites have to be defined previously (usually
through an includes
key).
Note that each benchmark of a benchmark suite can either be a string or a suite-specific dictionary. For example, for the DaCapo benchmark suite, the following two snippets are equivalent.
benchmarks:
dacapo2006:
- eclipse
benchmarks:
dacapo2006:
- {name: eclipse, bm_name: eclipse, size: default}
configs
A YAML list of configuration strings to be used to run the benchmarks. These are
specified as a runtime
followed by a '|'
separated list of
modifiers, i.e. "<runtime>|<modifier>|...|<modifier>"
.
For example:
configs:
- "openjdk11|ms|s|c2"
- "openjdk15|ms|s"
specifies running
to use the openjdk11
runtime
with ms
, s
, and c2
modifiers; and the openjdk15
runtime
with the ms
, and s
modifiers. In
the example above, we assume that both the runtimes
and modifiers have been
previously defined (in either the current configuration file or in an includes
file).
Each segment in the configuration strings can have whitespaces in them, so that it's easier for multi-line editing.
For example:
configs:
- "openjdk8 |foo-1 |bar|buzz"
- "openjdk15|foo-16| |buzz"
includes
A YAML list of paths to YAML files that are to be included into the current configuration file for definitions of some keys.
This is primarily used to provide re-usability and extensibility of
configuration files. A pre-processor step in running
takes care of including
all the specified files. A flattened version of the final configuration file is
also generated and placed in the results folder for reproducibility.
The paths can be either absolute or relative.
Relative paths are solved relative to the current file.
For example, if $HOME/configs/foo.yml
has an include
line ../bar.yml
, the
line is interpreted as $HOME/bar.yml
.
Similarly,
includes:
- "./base/suites.yml"
- "./base/modifiers.yml"
includes the suites.yml
and modifiers.yml
files located at ./base
respectively.
Any environment variable in the paths are also resolved before any further processing.
This include a special environment variable $RUNNING_NG_PACKAGE_DATA
that allows
you to refer to various configuration files shipping with running-ng, regardless how you installed running-ng.
For example, in a global pip
installation, $RUNNING_NG_PACKAGE_DATA
will look like /usr/local/lib/python3.10/dist-packages/running/config
.
overrides
Under construction 🚧.
modifiers
A YAML dictionary of program arguments or environment variables that are to be
used with config strings. Cannot use -
in the key for a modifier.
Each modifier requires a type
key with other keys being specific to that
type
. For more information regarding the different type
s of modifiers,
please refer to this page.
Warning preview feature ⚠️. We can exclude certain benchmarks from using a
specific modifier by using an excludes
key along with a YAML list of benchmarks
to be excluded from each benchmark suite.
For example:
modifiers:
s:
type: JVMArg
val: "-server"
c2:
type: JVMArg
val: "-XX:-TieredCompilation -Xcomp"
excludes:
dacapo2006:
- eclipse
specifies two modifiers, s
and c2
, both of type
JVMArg
with their respective values. Here, the
eclipse
benchmark from the dacapo2006
benchmark suite has been excluded from
the c2
modifier.
Warning preview feature ⚠️. Similarly, we can attach the modifier only to
specific benchmarks by using an includes
key.
For example:
modifiers:
c2:
type: JVMArg
val: "-XX:-TieredCompilation -Xcomp"
includes:
dacapo2006:
- eclipse
The c2
modifier will only be attached when running the eclipse
benchmark
from the dacapo2006
benchmark suite.
excludes
has a higher priority than includes
.
For example:
modifiers:
c2:
type: JVMArg
val: "-XX:-TieredCompilation -Xcomp"
includes:
dacapo2006:
- eclipse
- fop
excludes:
dacapo2006:
- fop
The c2
modifier will only be attached when running the eclipse
benchmark
from the dacapo2006
benchmark suite, no other benchmark will run with this
modifier (not even fop
even though it appears in the includes
).
Value Options
These are special modifiers whose values can be specified through their use in a
configuration string. Concrete values are specified as -
separated
values after the modifier's name in a configuration string. These values will be
indexed by the modifier through syntax similar to Python format strings.
This is best understood via an example:
modifiers:
env_var:
type: EnvVar
var: "FOO{0}"
val: "{1}"
[...]
configs:
- "openjdk11|env_var-42-43"
specifies to run the openjdk11
runtime
with the environment
variable FOO42
set to 43
. Note that value options are not limited only to
environment variables, and can be used for all modifier type
s.
runtimes
A YAML dictionary of runtime definitions that are to be used with config strings.
Each runtime requires a type
key with other keys being specific to that
type
. For more information regarding the different type
s of runtimes,
please refer to this page.
suites
A YAML dictionary of benchmark suite definitions that are to be used as keys of benchmarks
.
Each benchmark suite requires a type
key with other keys being specific to that
type
. For more information regarding the different type
s of benchmark suites,
please refer to this page.
Benchmark Suite
BinaryBenchmarkSuite
(preview ⚠️)
A BinaryBenchmarkSuite
is a suite of programs which can be used to run binary
benchmarks such as for C/C++ benchmarking.
Keys
programs
: A yaml list of benchmarks in the format:
programs:
<BM_NAME_1>:
path: /full/path/to/benchmark/binary_1
args: "Any arguments to binary_1"
<BM_NAME_2>:
path: /full/path/to/benchmark/binary_2
args: "Any arguments to binary_2"
[...]
A possible use-case could use wrapper shell scripts around the benchmark to output timing and other information in a tab-separated table.
DaCapo
Keys
release
: one of the possible values ["2006", "9.12", "evaluation"]
.
The value is required.
path
: path to the DaCapo jar
.
The value is required.
Environment variables will be expanded.
minheap
: a string that selects one of the minheap_values
sets to use.
minheap_values
: a dictionary containing multiple named sets of minimal heap sizes that is enough for a benchmark from the suite to run without triggering OutOfMemoryError
.
Each size is measured in MiB.
The default value is an empty dictionary.
The minheap values are used only when running runbms
with a valid N
value.
If the minheap value for a benchmark is not specified, a default of 4096
is used.
An example looks like this.
minheap_values:
adoptopenjdk-15-G1:
avrora: 7
batik: 253
temurin-17-G1:
avrora: 7
batik: 189
timing_iteration
: specifying the timing iteration.
It can either be a number, which is passed to DaCapo as -n
, or a string converge
.
The default value is 3.
callback
: the class (possibly within some packages) for the DaCapo callback. The value is passed to DaCapo as -c
.
The default value is null
.
timeout
: timeout for one invocation of a benchmark in seconds.
The default value is null
.
wrapper
(preview ⚠️): specifying a wrapper (i.e., extra stuff on the command line before java
) when running benchmarks.
The default value is null
, a no-op.
There are two possible ways to specify wrapper
.
First, a single string with shell-like syntax.
Multiple arguments are space separated.
This wrapper is used for all benchmarks in the benchmark suite.
Second, a dictionary of strings with shell-like syntax to specify possibly different wrappers for different benchmarks.
If a benchmark doesn't have a wrapper in the dictionary, it is treated as null
.
companion
(preview ⚠️): the syntax is similar to wrapper
.
The companion program will start before the main program.
The main program will start two seconds after the companion program to make sure the companion is fully initialized.
Once the main program finishes, we will wait for the companion program to finish.
Therefore, companion programs should have appropriate timeouts or detect when main program finishes.
Here is an example of using companion
to launch bpftrace
in the background to count the system calls.
includes:
- "$RUNNING_NG_PACKAGE_DATA/base/runbms.yml"
overrides:
"suites.dacapo2006.timing_iteration": 1
"suites.dacapo2006.companion": "sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @syscall[args->id] = count(); @process[comm] = count();} interval:s:10 { printf(\"Goodbye world!\\n\"); exit(); }'"
"invocations": 1
benchmarks:
dacapo2006:
- fop
configs:
- "temurin-17"
In the log file, the output from the main program and the output from the companion program is separated by *****
.
size
: specifying the size of input data.
Note that the names of the sizes are subject to change depending on the DaCapo releases.
The default value is null
, which means DaCapo will use the default size unless you override that for individual benchmarks.
Benchmark Specification
Some of the suite-wide keys can be overridden in a per-benchmark-basis.
The keys currently supported are timing_iteration
, size
, and timeout
.
Note that, within a suite, your choice of name
should uniquely identify a particular way of running a benchmark of name bm_name
.
The name
is used to get the minheap value, etc., which can depend of the size of input data and/or the timing iteration.
Therefore, it is highly recommended that you give a name
different from the bm_name
.
Note that, you might need to adjust various other values, including but not limit to the minheap value dictionary and the modifier exclusion dictionary.
The following is an example.
benchmarks:
dacapo2006:
- {name: eclipse_large, bm_name: eclipse, size: large}
SPECjbb2015
(preview ⚠️)
Keys
release
: one of the possible values ["1.03"]
.
The value is required.
path
: path to the jar
.
The value is required.
Note that the property file should reside in path/../config/specjbb2015.props
per the standard folder structure of the ISO image provided by SPEC.
Environment variables will be expanded.
Benchmark Specification
Only strings are allowed, which should correspond to the the mode of the SPECjbb2015 controller.
Right now, only "composite"
is supported.
SPECjvm98
(preview ⚠️)
Note that you will need to prepend probes to the classpaths, so that the modified SpecApplication
can be used.
Here is an example configuration file.
includes:
- "/home/zixianc/running-ng/src/running/config/base/runbms.yml"
modifiers:
probes_cp:
type: JVMClasspathPrepend
val: "/home/zixianc/MMTk-Dev/evaluation/probes /home/zixianc/MMTk-Dev/evaluation/probes/probes.jar"
benchmarks:
specjvm98:
- _213_javac
configs:
- "adoptopenjdk-8|probes_cp"
Keys
release
: one of the possible values ["1.03_05"]
.
The value is required.
path
: path to the SPECjvm98 folder, where you can find SpecApplication.class
.
The value is required.
Environment variables will be expanded.
timing_iteration
: specifying the timing iteration.
It can only be a number, which is passed to SpecApplication as -i
.
The value is required.
Benchmark Specification
Only strings are allowed, which should correspond to benchmark program of SPECjvm98. The following are the benchmarks:
- _200_check
- _201_compress
- _202_jess
- _209_db
- _213_javac
- _222_mpegaudio
- _227_mtrt
- _228_jack
Octane
(preview ⚠️)
Keys
path
: path to the Octane benchmark folder.
The value is required.
Environment variables will be expanded.
wrapper
: path to the Octane wrapper written by Wenyu Zhao.
The value is required.
timing_iteration
: specifying the timing iteration using an integer.
The value is required.
minheap
: a string that selects one of the minheap_values
sets to use.
minheap_values
: a dictionary containing multiple named sets of minimal heap sizes that is enough for a benchmark from the suite to run without triggering Fatal javascript OOM in ...
.
Each size is measured in MiB.
The default value is an empty dictionary.
The minheap values are used only when running runbms
with a valid N
value.
If the minheap value for a benchmark is not specified, a default of 4096
is used.
An example looks like this.
minheap_values:
d8:
octane:
box2d: 5
codeload: 159
crypto: 3
JuliaGCBenchmarks
(preview ⚠️)
GC benchmarks for Julia: https://github.com/JuliaCI/GCBenchmarks
Keys
path
: path to the GCBenchmarks folder.
The value is required.
Environment variables will be expanded.
minheap
: a string that selects one of the minheap_values
sets to use.
minheap_values
: a dictionary containing multiple named sets of minimal heap sizes that is enough for a benchmark from the suite to run without triggering Out of Memory!
.
An example looks like this:
minheap_values:
julia-mmtk-immix:
multithreaded/binary_tree/tree_immutable: 225
multithreaded/binary_tree/tree_mutable: 384
multithreaded/bigarrays/objarray: 9225
serial/TimeZones: 5960
serial/append: 1563
serial/bigint/pollard: 198
serial/linked/list: 4325
serial/linked/tree: 216
serial/strings/strings: 2510
slow/bigint/pidigits: 198
slow/rb_tree/rb_tree: 8640
Runtime
JikesRVM
NativeExecutable
(preview ⚠️)
A NativeExecutable
type specifies runbms
to
directly run the benchmarks on native hardware. This is supposed to be used in
tandem with
BinaryBenchmarkSuite
.
OpenJDK
D8
(preview ⚠️)
Keys
executable
: path to the d8
executable.
Environment variables will be expanded.
SpiderMonkey
(preview ⚠️)
Keys
executable
: path to the js
executable.
Environment variables will be expanded.
JavaScriptCore
(preview ⚠️)
Keys
executable
: path to the jsc
executable.
Environment variables will be expanded.
JuliaMMTK
(preview ⚠️)
Keys
executable
: path to the julia
executable.
Environment variables will be expanded.
JuliaStock
(preview ⚠️)
Julia with the stock GC. It does not allow setting a heap size, and will not throw OOM unless killed by the operating system.
Keys
executable
: path to the julia
executable.
Environment variables will be expanded.
Modifier
EnvVar
Keys
var
: name of the variable.
val
: value of the variable.
Environment variables will be expanded.
Description
Set an environment variable. Might override an environment variable inherited from the parent process.
JVMArg
JVM
specific.
Keys
val
: a single string with shell-like syntax.
Multiple arguments are space separated.
Environment variables will be expanded.
Description
Specify arguments to a JVM, as opposed to the program.
JSArg
(preview ⚠️)
JavaScriptRuntime
specific.
Keys
val
: a single string with shell-like syntax.
Multiple arguments are space separated.
Environment variables will be expanded.
Description
Specify arguments to a JavaScript runtime (e.g., d8
), as opposed to the program.
JVMClasspathAppend
JVM
specific.
Keys
val
: a single string with shell-like syntax.
Multiple classpaths are space separated.
Environment variables will be expanded.
Description
Append a list of classpaths to the existing classpaths.
JVMClasspathPrepend
JVM
specific.
Keys
val
: a single string with shell-like syntax.
Multiple classpaths are space separated.
Environment variables will be expanded.
Description
Prepend a list of classpaths to the existing classpaths.
JVMClasspath
A backward-compatibility alias of JVMClasspathAppend
.
Environment variables will be expanded.
ProgramArg
Keys
val
: a single string with shell-like syntax.
Multiple arguments are space separated.
Environment variables will be expanded.
Description
Specify arguments to a program, as opposed to the runtime.
ModifierSet
(preview ⚠️)
Keys
val
: |
separated values, with possible value options. See here for details.
Description
Specify a set of modifiers, including other ModifierSet
s.
That is, you can use ModifierSet
recursively.
Wrapper
(preview ⚠️)
Keys
val
: a single string with shell-like syntax.
Multiple arguments are space separated.
Environment variables will be expanded.
Description
Specify a wrapper. If a wrapper also exist for the benchmark suite you use, this wrapper will follow that.
Companion
(preview ⚠️)
Keys
val
: a single string with shell-like syntax.
Multiple arguments are space separated.
Description
Specify a companion program. If a companion program also exist for the benchmark suite you use, this companion program will follow that.
JuliaArg
(preview ⚠️)
JuliaMMTk
and JuliaStock
specific.
Keys
val
: a single string with shell-like syntax.
Multiple arguments are space separated.
Environment variables will be expanded.
NoImplicitHeapsizeModifier
(preview ⚠️)
runbms
specific.
Description
Normally runbms
will iterate through a set of heap sizes, either specific multiples of the minheap of each benchmark via -s
, or spreading the multiples across 1~heap_range
(using N
and optionally ns
).
This modifier prevents runbms
from applying the heap sizes for certain configs, which is useful, for example, for running NoGC
or EpsilonGC
.
Keys
No argument is allowed.
Command References
Please see the sections in this chapter for the references for each of the subcommands.
Usage
running [-h|--help] [-v|--verbose] [-d|--dry-run] [--version] subcommand
-h
: print help message.
-v
: use DEBUG
for logging level.
The default logging level is INFO
.
-d
: enable dry run.
Each of the subcommands that respect is flag will print out the commands to be executed in a child process instead of actually executing them.
--version
: print the version number of running-ng
.
Convention
For each subcommand, the documentation can roughly be divided into two parts, the command line usage and the keys in the config file.
Unless otherwise specified, the keys specified here are common to all subcommands, with the keys specific to each subcommand documented in their respective documentation.
runbms
This subcommand runs benchmarks with different configs, possibly with varying heap sizes.
Usage
runbms [-h|--help] [-i|--invocations INVOCATIONS] [-s|--slice SLICE] [-p|--id-prefix ID_PREFIX] [-m|--minheap-multiplier MINHEAP_MULTIPLIER] [--skip-oom SKIP_OOM] [--skip-timeout SKIP_TIMEOUT] [--resume RESUME] [--workdir WORKDIR] [--skip-log-compression] LOG_DIR CONFIG [N] [n ...]
-h
: print help message.
-i
: set the number of invocations.
Overrides invocations
in the config file.
-s
: only use the specified heap sizes.
This is a comma-separated string of integers or floating point numbers.
For each slice s
in SLICE
, we run benchmarks at s * minheap
.
N
and n
s are ignored.
-p
: add a prefix to the folder names where the results are stored.
By default, the folder that stores the result is named using the host name and the timestamp.
However, you can add a prefix to the folder name to signify which experiments the results belong to.
-m
(preview ⚠️): multiple the minheap value for each benchmark by MINHEAP_MULTIPLIER
.
Do NOT use this unless you know what you are doing.
Override minheap_multiplier
in the config file.
--skip-oom
(preview ⚠️): skip the remaining invocations if a benchmark under a config
has run out of memory more than SKIP_OOM
times.
--skip-timeout
(preview ⚠️): skip the remaining invocations if a benchmark under a config
has timed out more than SKIP_TIMEOUT
times.
--resume
(preview ⚠️): resume a previous run under LOG_DIR/RESUME
. If a .log.gz
already exists for a group of invocations, they will be skipped. Remember to clean up the partial *.log
files before resuming.
--workdir
(preview ⚠️): use the specified directory as the working directory for benchmarks.
If not specified, a temporary directory will be created under an OS-dependent location with a runbms-
prefix.
--skip-log-compression
: skip compressing log file as gzip.
LOG_DIR
: where to store the results.
This is required.
CONFIG
: the path to the configuration file.
This is required.
N
: the number of different heap sizes to explore.
Must be powers of two.
Explore heap sizes denoted by 0, 1, ..., and N
(N + 1
different sizes in total).
The heap size 0 represents 1.0 * minheap
, and the heap size N
represents heap_range * minheap
(by default, 6.0 * minheap
).
If N
is omitted, then the script will run benchmarks without explicit explicitly setting heap sizes, unless you specify -s
or use a modifier that sets the heap size.
n
: the heap sizes to explore.
Instead of exploring 0, 1, ..., and N
, only explore the n
s specified.
Keys
invocations
: see above.
minheap_multiplier
: see above.
heap_range
: the heap size relative to the minheap when n = N
.
spread_factor
: changes how 0, 1, ..., and N
are spread out.
When spread_factor
is zero, the differences between 0, 1, ..., and N
are the same.
The larger the spread_factor
is, the coarser the spacing is at the end relative to start.
Please do NOT change this unless you understand how it works.
remote_host
: the remote host to rsync
the results to.
The exact absolute path of LOG_DIR
is used on both the local and the remote machine.
plugins
(preview ⚠️): plugins of this command.
Must be a dictionary, similar to how modifiers are declared.
Plugins (preview ⚠️)
Zulip
Zulip integration for notifying when experiments start or end. No message will be sent if it'a dry run.
Here is an example.
plugins:
zulip:
type: Zulip
request:
type: private
to: ["your user id here"]
Keys
request
: please follow the Zulip API documentation.
Note that you don't need to put in content
here.
Please contact the administrators of your organization for your user ID.
If you use a bot user and want to post to a channel, please subscribe the bot user to the channel so that messages can be edited.
config_file
: an optional string to the path of config file.
If not specified, the default is ~/.zuliprc
.
Please make sure that this file can only be accessed by you (e.g., chmod 600 ~/.zuliprc
).
If you are a moma user, please create this file on squirrel
, and it will then be synced to other machines.
Please follow the Zulip documentation for the syntax of the config file and for obtaining an API key.
If you can't create a new bot, please contact the administrators of your organization.
CopyFile
Copying files from the working directory.
Here is an example.
plugins:
dacapo_latency:
type: CopyFile
patterns:
- "scratch/dacapo-latency-*.csv"
Keys
patterns
: a list of patterns following the Python 3 pathlib.Path.glob
syntax.
Files matched the patterns will be copied to LOG_DIR
where different subfolders will be created for each invocation.
skip_failed
: don't copy files from failed runs. The default value is true.
Interpreting the Outputs
Under construction 🚧.
Console Outputs
Log directory
Heap Size Calculations
Please refer to the source code like here and here for the actual algorithm.
But the basic idea is as follow. First, we start with the ends and the middle and gradually fill the gap. This is to make sure you can see the big picture trend. Second, the difference between sizes are smaller for smaller sizes and larger for large sizes, because the performance is much more sensitive to the change in heap sizes when the heap is small.
Best Practices
Under construction 🚧.
Continuously Monitor Your Experiments
The results are rsync
ed to remote_host
once all invocations for a benchmark at a heap size are finished.
You shouldn't log into the experiment machine so not to disturb the experiments.
You should log into the remote host and check the LOG_DIR
there and see the new results that came in.
minheap
This subcommand runs benchmarks with different configs while varying heap sizes in a binary search fashion in order to determine the minimum heap required to run each benchmark.
The result is stored in a YAML file. The dictionary keys are encoded config strings. For each config, there is one dictionary per benchmark suite, where the minimum heap size for each benchmark is stored. An example is as follows.
temurin-17.openjdk_common.hotspot_gc-G1:
dacapochopin-69a704e:
avrora: 7
batik: 189
temurin-17.openjdk_common.hotspot_gc-Parallel:
dacapochopin-69a704e:
avrora: 5
batik: 235
At the end of each run, minheap
will print out the configuration that achieves the smallest minheap size for most benchmarks.
The minheap values for that configuration will be printed out, which can then be used to populate the minheap values a benchmark suite, such as a DaCapo benchmark suite.
An example is as follows.
temurin-17.openjdk_common.hotspot_gc-G1 obtained the most number of smallest minheap sizes: 8
Minheap configuration to be copied to runbms config files
dacapochopin-69a704e:
avrora: 7
batik: 189
biojava: 95
eclipse: 411
fop: 15
graphchi: 255
h2: 773
jme: 29
jython: 25
luindex: 42
lusearch: 21
pmd: 156
sunflow: 29
tomcat: 21
tradebeans: 131
tradesoap: 103
xalan: 8
zxing: 97
Usage
minheap [-h] [-a|--attempts ATTEMPTS] CONFIG RESULT
-h
: print help message.
-a
(preview ⚠️): set the number of attempts.
Overrides attempts
in the config file.
CONFIG
: the path to the configuration file.
This is required.
RESULT
: where to store the results.
This file contains both the interim results and the final result.
An interrupted execution can be resumed by using the same RESULT
path.
This is required.
Keys
maxheap
: the upper bound of the search.
attempts
(preview ⚠️): for a particular heap size, if an invocation passes or fails with OOM (timeout treated as OOM), the binary search will continue with the next appropriate heap size.
If an invocation crashes and if the total number of invocations has not exceeded ATTEMPTS
, the same heap size will be repeated.
If all ATTEMPTS
invocations crash, the binary search for this config will stop, and minheap
will report inf
.
fillin
Here are some recipes for common tasks.
Whole-Process Performance Event Monitoring
JVMTI
Please clone and build probes
, and then build distillation
.
You might need to change the paths referred in the Makefile
s to match your environment.
Under the distillation
folder, you will find a JVMTI agent, libperf_statistics.so
.
You can check the source code here.
To use the agent, there are four things you need to do.
First, you will need to tell the dynamic linker to load the shared library before the VM boots.
This ensures that the inherit
flag of perf_event_attr
works properly and all child threads subsequently spawned are included in the results.
modifiers:
jvmti_env:
type: EnvVar
var: "LD_PRELOAD"
val: "/path/to/distillation/libperf_statistics.so"
Second, you need to specify a list of events you want to measure.
modifiers:
perf:
type: EnvVar
var: "PERF_EVENTS"
val: "PERF_COUNT_HW_CPU_CYCLES,PERF_COUNT_HW_INSTRUCTIONS,PERF_COUNT_HW_CACHE_LL:MISS,PERF_COUNT_HW_CACHE_L1D:MISS,PERF_COUNT_HW_CACHE_DTLB:MISS"
If you want to get a full list of events you can use on a particular machine, you can clone and build libpfm4
and run the showevtinfo
program.
Third, you need to tell the JVM to load the agent. Note that you need to specify the absolute path.
modifiers:
jvmti:
type: JVMArg
val: "-agentpath:/path/to/distillation/libperf_statistics.so"
Finally, you need to let the DaCapo benchmark inform the start and the end of a benchmark iteration.
We will reuse the RustMMTk
probe here, as the callback functions in the JVMTI agent are also called harness_begin
and harness_end
.
modifiers:
probes_cp:
type: JVMClasspath
val: "/path/to/probes/out /path/to/probes/out/probes.jar"
probes:
type: JVMArg
val: "-Djava.library.path=/path/to/probes/out -Dprobes=RustMMTk"
Now, putting it all together, you can define a set of modifiers, and use that set in your config strings.
modifiers:
jvmti_common:
type: ModifierSet
val: "probes|probes_cp|jvmti|jvmti_env|perf"
MMTk
Please clone and build probes
.
You will need to build mmtk-core
with the perf_counter
feature.
First, you need to let the DaCapo benchmark inform the start and the end of a benchmark iteration.
modifiers:
probes_cp:
type: JVMClasspath
val: "/path/to/probes/out /path/to/probes/out/probes.jar"
probes:
type: JVMArg
val: "-Djava.library.path=/path/to/probes/out -Dprobes=RustMMTk"
Then, you can specify a list of events you want to measure.
modifiers:
mmtk_perf:
type: EnvVar
var: "MMTK_PHASE_PERF_EVENTS"
val: "PERF_COUNT_HW_CPU_CYCLES,0,-1;PERF_COUNT_HW_INSTRUCTIONS,0,-1;PERF_COUNT_HW_CACHE_LL:MISS,0,-1;PERF_COUNT_HW_CACHE_L1D:MISS,0,-1;PERF_COUNT_HW_CACHE_DTLB:MISS,0,-1"
Note that the list is semicolon-separated.
Each entry consists of three parts, separated by commas.
The first part is the name of the event.
Please refer to the previous section for details.
The second part and the third part are pid
and cpu
, per man perf_event_open
.
In most cases, you want to use 0,-1
, that is measuring the calling thread (the results will be combined later through the inherit
flag) on any CPU.
For some events, such as RAPL, only package-wide measurement is supported, and you will have to adjust the values accordingly.
Note that you might have to increase the value of MAX_PHASES
in crate::util::statistics::stats
to a larger value, e.g., 1 << 14
, so that the array storing the per-phase value will not overflow.
Work-Packet Performance Event Monitoring
It's similar to the whole-process performance event monitoring for MMTk.
Just use MMTK_WORK_PERF_EVENTS
instead of MMTK_PHASE_PERF_EVENTS
.
Machine-Specific Known Problems
On Xeon D-1540 Broadwell, the PERF_COUNT_HW_CACHE_LL:MISS
event is always zero.
perf stat -e LLC-load-misses,cycles /bin/ls
Performance counter stats for '/bin/ls':
0 LLC-load-misses
1,729,786 cycles
0.001135511 seconds time elapsed
0.001180000 seconds user
0.000000000 seconds sys
On AMD machines, the PERF_COUNT_HW_CACHE_LL:MISS
event fails to open.
perf_event_open
syscall fails with No such file or directory
.
Frequently Asked Questions
Changelog
Unreleased
Added
Changed
Deprecated
Removed
Fixed
Security
v0.4.7
(2024-08-30)
Fixed
Commands
runbms
: correctly apply a default minheap value for a benchmark without a defined minheap value in the config file.
v0.4.6
(2024-05-23)
Fixed
Commands
runbms
: remove superfluous log messages when noconfigs
has aNoImplicitHeapsizeModifier
.
v0.4.5
(2024-05-23)
Added
Modifiers
NoImplicitHeapsizeModifier
v0.4.4
(2023-11-23)
Fixed
Benchmark Suites
DaCapo
correctly accepts the23.11
release
specified indacapo.yml
.
v0.4.3
(2023-11-20)
Added
Base Configurations
- DaCapo 23.11-Chopin available as
dacapochopin
. Please usedacapochopin_jdk9
,dacapochopin_jdk11
,dacapochopin_jdk17
, anddacapochopin_jdk21
modifiers for JDK 9, 11, 17, and 21 respectively when you use this suite with these JDK versions. - Temurin 21
Changed
Base Configurations
- Environment variables are expanded when resolving paths of runtimes and benchmark suites.
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED
is no longer automatically added when running DaCapo benchmarks on >= JDK 9. This approach doesn't scale now we have more workarounds specific to different JDK versions. This is also too opaque and not clear how it's implemented. New modifiers are introduced to address this issue.
Modifiers
EnvVar
val
is expanded using the outside environment prior to benchmark execution.
Deprecated
- Deprecating Python 3.7 support for users. Python 3.7 was last released on June 6, 2023 (3.7.17), and no new release has been made since.
Removed
- Dropping Python 3.7 support for developers (NOT users). pytest 7.4+ requires at least Python 3.8 (still supported by Ubuntu 20.04 LTS).
v0.4.2
(2023-09-10)
Changed
- All
Modifier
instances now supportsincludes
for only attaching them to certain benchmarks
Fixed
Runtimes
D8
now detects new JavaScript OOM error pattern.
Security
v0.4.1
(2023-08-22)
Fixed
Commands
runbms
: apply modifiers in the config file.minheap
: apply modifiers in the config file.
v0.4.0
(2023-08-17)
Added
Modifiers
JuliaArg
Runtimes
JuliaMMTK
JuliaStock
Benchmark Suites
JuliaGCBenchmarks
Commands
runbms
gains an extra argument,--skip-log-compression
, to skip compressing log files withgzip
.
Changed
Base Configurations
runbms
: don't sync tosquirrel.moma
for the defaultrunbms.yml
. moma machine users should userunbms-anu.yml
for the old behaviour.
Fixed
- Gracifully handle empty modifiers strings in configs, such as
openjdk7||foobar
.
Benchmark Suites
- DaCapo specific workarounds are now handled by the
DaCapo
class rather than theJavaBenchmark
class to avoid confusions.
v0.3.9
(2023-08-02)
Fixed
Benchmark Suites
DaCapo
: don't explicitly pass-s default
to DaCapo unless the user requests so by setting thesize
key ofDaCapo
or overriding the sizes for individual benchmarks using the benchmark specification syntax. This is so that users can override the size viaProgramArg
.
v0.3.8
(2023-02-21)
Changed
Commands
runbms
: companion programs are now expected to self-terminate.
v0.3.7
(2023-02-14)
Fixed
Commands
runbms
: better heuristics to detect whether a host is in the moma subnet.
v0.3.6
(2023-01-16)
Added
Base Configurations
- DaCapo Chopin Snapshot-6e411f33
Fixed
- Fixed type annotations in untyped functions and make
Optional
s explicit.
v0.3.5
(2022-10-13)
Changed
Commands
runbms
: when a companion program exits with a non-zero code, a warning is generated instead of an exception to prevent stopping the entire experiment.
v0.3.4
(2022-10-13)
Fixed
Commands
runbms
: fix the file descriptor leak when running benchmarks with companion programs.
v0.3.3
(2022-10-12)
Changed
Commands
runbms
prints out the logged in users when emitting warnings when the machine has more than one logged in users.
Fixed
Modifiers
Companion
: skip value options expansion if no value option is provided to avoid interpreting bpftrace syntax as replacement fields.
v0.3.2
(2022-10-12)
Added
Modifiers
Companion
v0.3.1
(2022-09-18)
Added
Base Syntax
- Use the
$RUNNING_NG_PACKAGE_DATA
environment variable to refer to base configurations shipped with running-ng, such as$RUNNING_NG_PACKAGE_DATA/base/runbms.yml
, regardless how you installed runnin-ng.
Benchmark Suites
DaCapo
gains an extra keycompanion
to facilitate eBPF tracing programs.
Changed
- Overhauled Python packaging with PEP 517
zulip
is now an optional Python dependency. Usepip install running-ng[zulip]
if you want to use theZulip
runbms
plugin.
Removed
- Dropping Python 3.6 support for users.
Base Configurations
- Removing AdoptOpenJDK from the base configuration files. AdoptOpenJDK is now replaced by Temurin.
v0.3.0
(2022-03-19)
Added
Modifiers
JVMClasspathAppend
JVMClasspathPrepend
Benchmark Suites
SPECjvm98
Changed
Modifiers
JVMClasspath
is now an alias ofJVMClasspathAppend
. This is backward compatible.
Commands
runbms
prints out the version number ofrunning-ng
in log files.
Deprecated
- Deprecating Python 3.6 support for users. Python 3.6 will NOT be supported once moma machines are upgraded to the latest Ubuntu LTS.
Removed
- Dropping Python 3.6 support for developers (NOT users). pytest 7.1+ requires at least Python 3.7.
v0.2.2
(2022-03-07)
Fixed
Benchmark Suites
JavaBenchmarkSuite
: Some DaCapo benchmarks refers to internal classes (e.g., underjdk.internal.ref
), and DaCapo implemented a workaround for this behaviour in the jar. However, since we are invoking DaCapo using-cp
and the name of the main class, that workaround is bypassed. That workaround is now reimplemented in running-ng through an extra JVM argument--add-exports
.
v0.2.1
(2022-03-05)
Changed
Commands
runbms
now skips printing CPU frequencies if the system doesn't support it, e.g., when using Docker Desktop on Mac.
Fixed
Benchmark Suites
BinaryBenchmarkSuite
: fixes missing parameter when constructingBinaryBenchmark
due to a bug in previous refactoring
v0.2.0
(2022-02-20)
Added
Base Configurations
- AdoptOpenJDK 16
- DaCapo Chopin Snapshot-29a657f, Chopin Snapshot-f480064
- Temurin 8, 11, 17
- SPECjbb 2015, 1.03
Commands
minheap
gains an extra keyattempts
(can be overridden by--attempts
) so that crashes don't cause bogus minheap measurements.minheap
stores results in a YAML file, which is also used to resume an interrupted execution.minheap
prints the minheap values of the best config at the ends.runbms
gains an extra argument,--resume
, to resume an interrupted execution from a log folder.runbms
gains an extra argument,--workdir
, to override the default working directory.runbms
adds more information of the environment to the log file, including the date, logged in users, system load, and top processes.runbms
gains a callback-based plugin system, and an extra keyplugins
is added.runbms
gains a pluginCopyFile
to copy files from the working directory.runbms
gains a pluginZulip
, which sends messages about the progress of the experiments, and warns about reservation expiration on moma machines.runbms
outputs a warning message if more than one users are logged in.runbms
uses uppercase letters if there are more than 26 configs.
Modifiers
ModifierSet
Wrapper
JSArg
Runtimes
D8
SpiderMonkey
JavaScriptCore
JVM
now detects OOM generated in the form of Rust panic frommmtk-core
.
Benchmark Suites
DaCapo
gains an extra keysize
, which is used to specify the size of the input.DaCapo
now allows individual benchmark to override the timing iteration, input size, and timeout of the suite.SPECjbb2015
: basic support for running SPECjbb 2015 in composite mode.Octane
: basic support for running Octane using Wenyu's wrapper script.
Changed
Benchmark Suites
- The
minheap
key ofDaCapo
changes from a dictionary to a string. The string is used to look upminheap_values
, which are collections of minheap values. This makes it easier to store multiple sets of minheap values for the same benchmark suite measured using different runtimes.
Base Syntax
- Whitespaces can be used in config strings for visual alignment. They are ignored when parsed.
Commands
- The
--slice
argument ofrunbms
now accepts multiple comma-separated floating point numbers.
Removed
Base Configurations
- DaCapo Chopin Snapshot-69a704e
Fixed
Commands
- Resolving relative paths of runtimes before running. Otherwise, they would be resolved relative to the
runbms
working directory. - Use the
BinaryIO
interface of file IO and interprocess communication to avoid invalid UTF-8 characters from crashing the script. - Subprocesses now inherit environment variables from the the parent process.
minheap
now runs in a temporary working directory to avoid file-based conflicts between concurrent executions. Note that network-port-based conflicts can still happen.
v0.1.0
(2021-08-09)
Initial release.
Added
Commands
fillin
minheap
runbms
Modifiers
JVMArg
JVMClasspath
EnvVar
ProgramArg
Runtimes
NativeExecutable
OpenJDK
JikesRVM
Benchmark Suites
BinaryBenchmarkSuite
DaCapo
Base Configurations
- AdoptOpenJDK 8, 11, 12, 13, 14, 15
- DaCapo 2006, 9.12 (Bach), 9.12 MR1, 9.12 MR1 for Java 6, Chopin Snapshot-69a704e