PCPMON README
=============

PCPMON is a graphical client for 'Performance Co-Pilot', a free
performance-monitoring package from SGI. It allows you to graphicaly
display values gathered from one or multiple computers in real-time.
The program also employs expression evaluation.


Setup
=====

Program settings (displayed values and display/interval settings) are setup in
an intuitive GUI interface. The settings can be saved to a file in a XML format.
When a parameter is given to the program, it is considered to be saved settings
file and it will be loaded.

Stack mode
----------

Program allows you to display data in a "stack mode" which shows vertical bars
as high as the sum of all the values and proportionaly indicates how much each
of the values adds to the sum. It is useful, for example, to show CPU load with
the distinction of time spent in user/system/nice mode.

It makes only sense to use stack mode when all the values have the same "unit",
otherwise you would mix apples with pears.

Archive mode
============

The program can operate ether in live mode, where the graph shows just-in-time
values, or in archive mode, when the values are obtained from a PCP archive.
Each value has defined host (computer name), from which the values should be
obtained when in live mode. The archive mode is invoked by defining one or more
aliases on the command line for the host(s). The alias is defined after the
switch -a, for example:

pcpmon -a arthur=/var/log/pcp/pmlogger/arthur/20000605.08.33 cpu

The switch above defines that metrics got from the computer named arthur in the
live mode should be obtained from PCP archive with base path
/var/log/pcp/pmlogger/arthur/20000605.08.33. You can define more aliases by
putting more -a switches (and "equations") on the command line, but you cannot
alias one name to more files. When using the archive mode, aliases for all
computer names used must be defined (i.e., you cannot combine archive and live
values).

The -o option must follow an -a option. It has one parameter - time offset in
seconds which should be added to the archive records timestamps. The offset cen
be negative. This feature is useful for displaying data from two days in one
graph, for example, for comparsion. The command-line would look like:

pcpmon -a arthur=/var/log/pcp/pmlogger/arthur/20000605.00.10 \
arthur=/var/log/pcp/pmlogger/arthur/20000606.00.10 -o -86400 cpu

The shown time range is the intersection of the time ranges of all the archives.
Below the graph there is displayed exact time and date for the leftmost sample
shown.

In the archive mode, eight buttons appear above the graph with buttons:
|<  - jump at the beginning of the time interval
<<< - jump three screens back
<<  - jump one screen back
<   - jump 1/3 of the screen back
>   - jump 1/3 of the screen forward
>>  - jump one screen forward
>>> - jump three screens forward
>|  - jump at the end of the time interval

Oversampling
------------

To show long time periods in archive mode, PCPMON features "oversampling".
Oversampling means that one sample on the screen is taken from one or more
samples from the archive. The "one-to-many" conversion is done by a groupping
function. Current grouping functions are:
- min - show minimal value
- max - show maximal value
- sum - show sum of the values
- arithmetic average - show arithmetic average of the values
- geometric average - show geometric average of the values

Screenshot mode
===============

The program can be ran in command-line mode. When an --shot option is specified,
the program just outputs first screen drawn as PNG image. This mode doesn't
require XWindow running, but program must still be compiled with GTK (and thus
X11 libraries).

Expressions
===========

The only little complicated things in the program are the expressions which
define the measured values. If you think you are an expert, you can simply read
the exprparser.y file where is the expression gramatic defined. For the rest:

The expressions are much like "normal" aritmetic expressions. They can contain
numeric constants or variables. Variables can be in three forms:

pmname[instance_index], pmname["instance_name"] or just:
pmname which is equivalent to pmname[0]

pmname is Performance Co-Pilots performance metric identificator, such as
kernel.all.load. Try running the pminfo program to see available pmnames. Some
metrics can have multiple instance values. For this case, you can use the index
to get other value than the first one. Or you can use the symbolic instance
name. Example:

kernel.all.load gives 1 minute load average,
kernel.all.load[0] gives 1 minute load average too,
kernel.all.load["1 minute"] gives 1 minute load average too,
kernel.all.load[1] gives 5 minute load average,
kernel.all.load["5 minute"] gives 5 minute load average too,
kernel.all.load[2] gives 15 minute load average, and
kernel.all.load["15 minute"] gives 15 minute load average.

In the example above, you can use symbolic names as well as indices, since the
numbers are permanent. However if you wish to show, e.g., network interface
metric, the numbers differ on every computer so you should use symbolic names
instead.

Instance names are currently resolved when the program is started or when values
are modified. This should be enough for most cases.

Operators in expressions:
-------------------------

Note: All arithmetic is done in double precision floating point numbers.

Unary:

- (unary minus) - Arithmetic unary minus

^ (delta) - Delta operator. It is very useful since many values (e.g.,
transferred bytes) are counters, but you are usually more interested in transfer
rates. So you simply write ^transferred.bytes.expression. The deltas can be
applied to expression too, not only to variables. However, this makes wrapping
detection little (only little) unreliable, but it should not cause big problems.
Please note that writing ^^variable won't give you second derivation. Delta is
not a derivation.

Binary:

- + * / % - Arithmetic operators (% is modulo)

Grupping:

Normal grupping using ( ) parentheses applies.

Functions:
----------

Following arithmetic functions are available:

abs(a) - absolute value for a (a for a>=0, -a for a<0)
ceil(a) - lowest integer higher than a
floor(a) - highest integer lower than a
interval() - current sampling interval in seconds
log(a) - decadic logarithm of a
min(a,b) - lower value from a and b
max(a,b) - higher value from a and b
round(a) - round value to nearest integer value

Examples:
---------

Few examples are below. You can see more when you click the ... (three dots)
button right of the expression input box.

Free memory [MB]: (mem.util.free+mem.util.cached+mem.util.bufmem) / 1048576

CPU utilization [%]: ^(kernel.all.cpu.nice+kernel.all.cpu.user+kernel.all.cpu.sys)/^(kernel.all.cpu.nice+kernel.all.cpu.user+kernel.all.cpu.sys+kernel.all.cpu.idle)*100

