GEOPM - Global Extensible Open Power Manager
============================================

DISCLAIMER
----------
SEE COPYING FILE FOR LICENSE INFORMATION.

LAST UPDATE
-----------
2019 April 3

Christopher Cantalupo <christopher.m.cantalupo@intel.com> <br>

WEB PAGES
---------
https://geopm.github.io <br>
https://geopm.github.io/man/geopm.7.html <br>
https://geopm.slack.com

SUMMARY
-------
The Global Extensible Open Power Manager (GEOPM) is a framework for
exploring power and energy optimizations targeting high performance
computing.  The GEOPM package provides many built-in features.  A
simple use case is reading hardware counters and setting hardware
controls with platform independant syntax using a command line tool on
a particular compute node.  An advanced use case is dynamically
coordinating hardware settings across all compute nodes used by an
application is response to the application's behavior and requests
from the resource manager.  The dynamic coordination is implemented as
a hierarchical control system for scalable communication and
decentralized control. The hierarchical control system can optimize
for various objective functions including maximizing global
application performance within a power bound or minimizing energy
consumption with marginal degradation of application performance.  The
root of the control hierarchy tree can communicate with the system
resource manager to extend the hierarchy above the individual MPI
application and enable the management of system power resources for
multiple MPI jobs and multiple users by the system resource manager.

The GEOPM package provides two libraries: libgeopm for use with MPI
applications, and libgeopmpolicy for use with applications that do not
link to MPI.  There are several command line tools included in GEOPM
which have dedicated manual pages.  The geopmlaunch(1) command line
tool is used to launch an MPI application while enabling the GEOPM
runtime to create a GEOPM Controller thread on each compute node.  The
Controller loads plugins and executes the Agent algorithm to control
the compute application.  The geopmlaunch(1) command is part of the
geopmpy python package that is included in the GEOPM installation.
See the GEOPM overview man page for further documentation and links:
[geopm(7)](https://geopm.github.io/man/geopm.7.html).

The GEOPM runtime is extended through three plugin classes: Agent,
IOGroup, and Comm.  New implementations of these classes can be
dynamically loaded at runtime by the GEOPM Controller.  The Agent
class defines which data are collected, how control decisions are
made, and what messages are communicated between Agents in the tree
hierarchy.  The reading of data and writing of controls from within a
compute node is abstracted from the Agent through the PlatformIO
interface.  This interface provides access to the IOGroup
implementations that provide a variety of signals and controls.
IOGroup plugins can be developed independently of the Agents to extend
the read and write capabilities provided by GEOPM.  The PlatformIO
abstraction enables Agent implementations to be ported to different
hardware platforms without modification.  Messaging between Agents
running on different compute nodes is encapsulated in the Comm class.
New implementations of the Comm class make it possible to port
inter-node communication used by the GEOPM runtime to different
underlying communication protocols and hardware without modifying the
Agent implementations.

The libgeopm library can be called directly or indirectly within MPI
applications to enable application feedback for informing the control
decisions.  The indirect calls are facilitated by GEOPM's integration
with MPI and OpenMP through their profiling decorators, and the direct
calls are made through the geopm_prof_c(3) or geopm_fortran(3)
interfaces.  Marking up a compute application with profiling
information through these interfaces can enable better integration of
the GEOPM runtime with the compute application and more precise
control.

TRAVIS CI
---------
[![Build Status](https://travis-ci.org/geopm/geopm.svg)](https://travis-ci.org/geopm/geopm)

The GEOPM public GitHub project has been integrated with Travis
continuous integration.

http://travis-ci.org/geopm/geopm

All pull requests will be built and tested automatically by Travis.

INSTALL
-------
The OpenHPC project provides the most robust way to install GEOPM.

[OpenHPC](https://openhpc.community/)

The GEOPM project was first packaged with OpenHPC version 1.3.6.  The
OpenHPC install guide contains documentation on how to install GEOPM
and its dependencies and can be found on the OpenHPC download page.

[OpenHPC Downloads](https://openhpc.community/downloads/)

The OpenHPC packages are distributed from the OpenHPC OBS build server.

[yum and zypper repositories](http://build.openhpc.community/OpenHPC:/1.3/updates/)

The OpenHPC project packages all of the dependencies required by GEOPM
that are not part of a standard Linux distribution.  This includes the
msr-safe kernel driver and MSR save/restore functionality built into
the Slurm resource manager to enable robust reset of hardware controls
when returning compute nodes to the general pool available to other
users.

PYTHON INSTALL
--------------
The GEOPM python tools are packaged in the RPMs described above, but
they are also available from PyPI as the `geopmpy` package.  For
example, to install the geopmpy package into your home directory, run
the following command:

    pip install --user geopmpy

Note this installs only the GEOPM python tools and does not install
the full GEOPM runtime.

BUILD REQUIREMENTS
------------------
In order to build the GEOPM package from source, the below
requirements must be met.

The GEOPM package requires a compiler that supports the MPI 2.2 and
C++11 standards.  These requirements can be met by using GCC version
4.7 or greater and installing the openmpi-devel package version 1.7 or
greater on RHEL or SLES Linux.  Documentation creation including
man pages further requires the rubygems package, and the ruby-devel
package.

RHEL:

    yum install openmpi-devel ruby-devel rubygems

SUSE:

    zypper install openmpi-devel ruby-devel rubygems

Alternatively these can be installed from source, and an alternate MPI
implementation to OpenMPI can be selected (e.g. one provided by
OpenMPI).  See

    ./configure --help

for details on how to use non-standard install locations for build
requirements through the

    ./configure --with-<feature>

options.


BUILD INSTRUCTIONS
------------------
The source code can be rebuilt from the source RPMs available from
OpenHPC.  To build from the git repository follow the instructions
below.

To build all targets and install it in a "build/geopm" subdirectory of your
home directory run the following commands:

    ./autogen.sh
    ./configure --prefix=$HOME/build/geopm
    make
    make install

If building with the Intel toolchain the following environment variables
must be set prior to running configure:

    export CC=icc
    export CXX=icpc
    export FC=ifort
    export F77=ifort
    export MPICC=mpicc
    export MPICXX=mpicxx

An RPM can be created on a RHEL or SUSE system with the

    make rpm

target.  Note that the --with-mpi-bin option may be required to inform
configure about the location of the MPI compiler wrappers.  The following
command may be sufficient to determine the location:

    dirname $(find /usr -name mpicc)

To build in an environment without support for OpenMP (i.e. clang on Mac OS X)
use the

    ./configure --disable-openmp

option.  The

    ./configure --disable-mpi

option can be used to build only targets which do not require MPI.  By default
MPI targets are built.


RUN REQUIREMENTS
----------------
We are targeting SLES12 and RHEL7 distributions for functional runtime
support.  There is a single runtime requirement that can be obtained
from these distributions for the OpenMPI implementation of MPI.  To install,
follow the instructions below for your Linux distribution.

RHEL:

    yum install openmpi

SUSE:

    zypper install openmpi

Alternatively the MPI requirement can be met by using OpenHPC
packages.

### SYSTEMD CONFIGURATION
In order for GEOPM to properly use shared memory to communicate
between the Controller and the application, it may be necessary to
alter the configuration for systemd.  The default behavior of systemd
is to clean-up all inter-process communication for non-system users.
This causes issues with GEOPM's initialization routines for shared
memory.  This can be disabled by ensuring that `RemoveIPC=no` is set
in `/etc/systemd/logind.conf`.  Most Linux distributions change the
default setting to disable this behavior.  More information can be
found [here](https://superuser.com/a/1179962).

### MSR DRIVER
The msr-safe kernel driver must be loaded at runtime to support
user-level read and write of white-listed MSRs.  The msr-safe kernel
driver is distributed with OpenHPC and can be installed using the RPMs
distributed there (see INSTALL section above).

The source code for the driver can be found here at the link below.

[msr-safe repo](https://github.com/scalability-llnl/msr-safe)

Alternately, you can run GEOPM as root with the standard msr driver loaded:

    modprobe msr

### LINUX POWER MANAGEMENT
Note that other Linux mechanisms for power management can interfere
with GEOPM, and these must be disabled.  We suggest disabling the
intel_pstate kernel driver by modifying the kernel command line
through grub2 or the boot loader on your system by adding:

    "intel_pstate=disable"

The cpufreq driver will be enabled when the intel_pstate driver is
disabled.  The cpufreq driver has several modes controlled by the
scaling_governor sysfs entry.  When the performance mode is selected,
the driver will not interfere with GEOPM.  For SLURM based systems the
[GEOPM launch wrappers](#geopm-mpi-launch-wrapper) will attempt to set
the scaling governor to "performance".  This alleviates the need to
manually set the governor.  Older versions of SLURM require the
desired governors to be explicitly listed in /etc/slurm.conf.  In
particular, SLURM 15.x requires the following option:

    CpuFreqGovernors=OnDemand,Performance

More information on the slurm.conf file can be found
[here](https://slurm.schedmd.com/slurm.conf.html).
Non-SLURM systems must still set the scaling governor through some
other mechanism to ensure proper GEOPM behavior.  The following
command will set the governor to performance:

    echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

See kernel documentation
[here](https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt)
for more information.

### GEOPM APPLICATION LAUNCH WRAPPER

The GEOPM package installs the command, "geopmlaunch".  This is a
wrapper for the MPI launch commands like "srun", "aprun", and
"mpiexec" where the wrapper script enables the GEOPM runtime.  The
"geopmlaunch" command supports exactly the same command line interface
as the underlying launch command, but the wrapper extends the
interface with GEOPM specific options.  The "geopmlaunch" application
launches the primary compute application and the GEOPM control thread
on each compute node and manages the CPU affinity requirements for all
processes.  The wrapper is documented in the geopmlaunch(1) man page.

There are several underlying MPI application launchers that
"geopmlaunch" wrapper supports.  See the geopmlaunch(1) man page for
information on available launchers and how to select them.  If the
launch mechanism for your system is not supported, then affinity
requirements must be enforced by the user and all options to the GEOPM
runtime must be passed through environment variables.  Please consult
the geopm(7) man page for documentation of the environment variables
used by the GEOPM runtime that are otherwise controlled by the wrapper
script.

### CPU AFFINITY REQUIREMENTS
The GEOPM runtime requires that each MPI process of the application
under control is affinitized to distinct CPUs.  This is a strict
requirement for the runtime and must be enforced by the MPI launch
command.

Affinitizing the GEOPM control thread to a CPU that is distinct from
the application CPUs may improve performance of the application, but
this is not a requirement.  On systems where an application achieves
highest performance when leaving a CPU unused by the application so
that this CPU can be dedicated to the operating system, it is usually
best to affinitize the GEOPM control thread to this CPU designated for
system threads.

There are many ways to launch an MPI application, and there is no
single uniform way of enforcing MPI rank CPU affinities across
different job launch mechanisms.  Additionally, OpenMP runtimes, which
are associated with the compiler choice, have different mechanisms for
affinitizing OpenMP threads within CPUs available to each MPI process.
To complicate things further the GEOPM control thread can be launched
as an application thread or a process that may be part of the primary
MPI application or a completely separate MPI application.  For these
reasons it is difficult to document how to correctly affinitize
processes in all configurations.  Please refer to your site
documentation about CPU affinity for the best solution on the system
you are using and consider extending the geopmlaunch wrapper to
support your system configuration (please see the CONTRIBUTING.md file
for information about how to share these implementation with the
community).

TESTING
-------
From within the source code directory, unit tests can be executed with
the "make check" target.  The unit tests can be built without executing
them with the "make checkprogs" target.  A typical parallel build and
test cyle is executed with the following commands:

    make -j
    make checkprogs -j
    make check

The unit tests can be executed on any development system, including
VMs and containers, that meets the BUILD REQUIREMENTS section above.

The integration tests are located in the "test_integration" directory.
These tests require a system meeting all of the requirements discussed
in the RUN REQUIREMENTS section above and can be executed as follows:

    cd test_integration
    ./geopm_test_integration.py

These integration tests are based on pyunit and leverage the geopmpy
python package to validate the runtime.  Please report failures of
these tests as issues.

STATUS
------
This software is production quality as of version 1.0.  We will be
enforcing [semantic versioning](https://semver.org/) for all releases
following version 1.0.  We are very interested in feedback from the
community.  Refer to the ChangeLog a high level history of changes in
each release.  See
[github issues page](https://github.com/geopm/geopm/issues)
for information about ongoing work and please provide feedback by
opening issues.  Test coverage by unit tests is lacking for some files
and will continue to be improved.  The line coverage results from gcov
as reported by gcovr for the latest release can be found
[here](http://geopm.github.io/coverage/index.html)

Some new features of GEOPM are still under development, and their
interfaces may change before they are included in official releases.
To enable these features in the GEOPM install location, configure
GEOPM with the `--enable-beta` configure flag.  The features currently
considered unfinalized are the endpoint interface, the `geopmendpoint`
application, the `geopmanalysis` application, and the `geopmplotter`
application.

ACKNOWLEDGMENTS
---------------
Development of the GEOPM software package has been partially funded
through contract B609815 with Argonne National Laboratory.
