wiki:UserGuide/OmfQuickStart

Version 11 (modified by msherman, 4 years ago) ( diff )

Resource Control with OMF

A brief overview

OMF is a subset of OEDL which is a domain specific language providing a framwork to control and management functionalities for large scale testbeds with multiple networked resources. This framework is implemented as a set of libraries and services mainly written in Ruby. OMF can be used to initiate experiments, control resources & collect live measurements for recording and analysis. The full details for OMF can be found here.

To get quickly the sections below describe some of the most used OMF commands in the basic form. These commands are run from the testbed console.

OMF usage

user@console:~$ omf

Run a command on the testbed(s)

Usage: omf  [COMMAND] [ARGUMENT]...
  Available COMMANDs:
    help   Print this help message or a specify command usage
    exec   Execute an experiment script
    load   Load a disk image on a given set of nodes
    save   Save a disk image from a given node into a file
    tell   Switch a given set of nodes ON/OFF or reboot them
    stat   Returns the status of a given set of nodes
  To get more help on individual commands: 'omf [COMMAND]'

OMF status to retrieve status of node(s)

user@console:~$ omf status

Returns the status of the nodes in a testbed.

Usage:
omf stat [-h] -t TOPOLOGY [-h] [-s] [-c AGGREGATE]}}}

Arguments:

-h, --help                 print this help message
-t, --topology TOPOLOGY    a valid topology file or description (MANDATORY)
-s, --summary              print a summary of the node status for the testbed
-c, --config AGGREGATE     use testbed AGGREGATE

Examples:

Get the status of all nodes in the testbed. This is good way to find all the FQDN of the nodes in the console.

omf stat -t all

Get the status of a single node.

omf stat -t sdr2-md1.bed.cosmos-lab.org

Get the status of multiple nodes. Use a comma-separate list to specify multiple nodes in the topology (no space before or after the comma).

omf stat -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org

OMF tell to power on or off the node(s)

user@console:~$ omf tell

Switch ON/OFF and reboot the nodes in a testbed.

Usage:
omf tell [-h] -t TOPOLOGY -a ACTION [-c AGGREGATE]}}}

Arguments:
-h, --help                 print this help message
-a, --action ACTION        specify an ACTION:
                           on          turn node(s) ON
                           offs        turn node(s) OFF (soft)
                           offh        turn node(s) OFF (hard)
                           reboot      reboots node(s) (soft)
                           reset       resets node(s) (hard)
-t, --topology TOPOLOGY    a valid topology file or description (MANDATORY)
-c, --config AGGREGATE     use testbed AGGREGATE

Examples:

Reset (soft) a node.

omf tell -a reset -t sdr2-md1.bed.cosmos-lab.org

Reboot (hard) a node.

omf tell -a reboot -t sdr2-md1.bed.cosmos-lab.org

Turn on multiple nodes.

omf tell -a offh -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org

Turn offh (hard) multiple nodes.

omf tell -a offh -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org

OMF load to install a disk image on the node(s)

user@console:~$ omf load
Install a given disk image on the nodes in a testbed.

Usage:
omf load [-h] -t TOPOLOGY [-i IMAGE_PATH] [-o TIMEOUT] [-c AGGREGATE]}}}

Arguments:

-h, --help                 print this help message
-t, --topology TOPOLOGY    a valid topology file or description (MANDATORY)
                           (if a file 'TOPOLOGY' doesn't exist, interpret it as a
                           comma-separated list of nodes)
-i, --image IMAGE          disk image to load
                           (default is 'baseline.ndz', the latest stable baseline image)
-c, --config AGGREGATE     use testbed AGGREGATE
-o, --timeout TIMEOUT      a duration (in sec.) after which imageNodes should stop waiting for
                           nodes that have not finished their image installation
                           (default is 800 sec, i.e. 13min 20sec)
-r, --resize SIZE          Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or
                           leave x percent of free space if SIZE=x%
  --outpath PATH           Path where the resulting Topologies should be saved
                           (default is '/tmp')
  --outprefix PREFIX       Prefix to use for naming the resulting Topologies
                           (default is your experiment ID)

Examples:

Load the default baseline image (baseline.ndz) on all nodes in the testbed.

omf load -t system:topo:all -i baseline.ndz

Load a specific image (my_image.ndz) on all nodes in a topology.

omf load -t system:topo:all -i my_image.ndz

Load a specific image (my_image.ndz) onto a single node.

omf load -t sdr2-md1.bed.cosmos-lab.org -i my_image.ndz

Load a specific image (my_image.ndz) onto multiple nodes with a timeout of 400 seconds.

omf load -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org -i my_image.ndz -o 400

OMF save to export the disk image from a single node into the repository.

user@console:~$ omf save

Save a disk image from a given node into an archive file.

Usage:
omf save -n NODE [-h] [-c AGGREGATE]}}}


Arguments:
-h, --help           print this help message
-n, --node NODE      a valid description of a single node (MANDATORY)
                     (no default here, you have to enter a node!)
-r, --resize SIZE    Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or
                     leave x percent of free space if SIZE=x%

NODE must be specified in FQDN (fully qualified domain name) format (eg. node1-1.sb1.orbit-lab.org).

Examples:

Save the disk image from a node.

omf save -n sdr2-md1.bed.cosmos-lab.org

Example of Output

Once you have the image prepared the way you want it. On the node run:

console:~$ ssh root@sdr2-lg1.sb1.cosmos-lab.org
root@sdr-console: ./prepare.sh

This will remove udev rules (to prevent renaming of interfaces) and dump log files to lower the size of the image. It also shutdown the node.

Once the node has been shutdown, to save the existing disk image running on the node, use the omf save command on the console:

console:~$ omf save -n sdr2-lg1.sb1.cosmos-lab.org

The output of this image saving process will look like the following:

INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
INFO NodeHandler: Slice ID: pxe_slice 
INFO NodeHandler: Experiment ID: pxe_slice-2013-02-06t14.14.46-05.00
INFO NodeHandler: Message authentication is disabled
INFO Experiment: load system:exp:stdlib
INFO property.resetDelay: resetDelay = 230 (Fixnum)
INFO property.resetTries: resetTries = 1 (Fixnum)
INFO Experiment: load system:exp:eventlib
INFO Experiment: load system:exp:saveNode
INFO property.node: node = "node1-1.sb1.orbit-lab.org" (String)
INFO property.pxe: pxe = "1.1.6" (String)
INFO property.domain: domain = "grid.orbit-lab.org" (String)
INFO property.started: started = "false" (String)
INFO property.image: image = nil (NilClass)
INFO property.resize: resize = nil (NilClass)
WARN exp: Saving only works for ext2/ext3 partitions and MBR (msdos) partition tables. Saving any other filesystem or partition table type will produce a 0 byte image.
INFO Topology: Loading topology 'node1-1.sb1.orbit-lab.org'.
INFO Experiment: Resetting resources
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb1.orbit-lab.org) [0 sec.]
.
.
.
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb1.orbit-lab.org) [80 sec.]
INFO ALL_UP: Event triggered. Starting the associated tasks.
INFO node1-1.sb1.orbit-lab.org:  
INFO node1-1.sb1.orbit-lab.org: - Saving image of '/dev/sda' on node 'node1-1.sb1.orbit-lab.org'
INFO node1-1.sb1.orbit-lab.org:   to the file 'bob-node-node1-1.sb1.orbit-lab.org-2013-02-06-14-16-23.ndz' on host '10.10.0.42'
INFO node1-1.sb1.orbit-lab.org:  
INFO property.started: started = "true" (String)
INFO exp:  
INFO exp: - Saving process started at: Wed Feb 06 14:16:27 -0500 2013
INFO exp:   (this may take a while depending on the size of your image)
INFO Experiment: DONE!
INFO ExecApp: Application 'commServer' finished
INFO run: Experiment sb1_2008_07_20_23_38_04 finished after 9:19
done.

Please make sure that the process ends without errors.

If there are no errors, at the end of the saving process, you will have disk image file with the name:

bob-node-sdr2-lg1.sb1.cosmos-lab.org-2013-02-06-14-16-23.ndz

in the directory

/export/omf/omf-images

This directory is available on each console, as well as the machine with the host name "frisbee". This information is printed in the output shown above.

You can then reload this disk image on a node (or nodes) using the omf load command.

Working with Saved Images

Images are treated as standard linux files. That means that you can:

  • check that they have a nonzero size ls -al imagename
  • Rename them mv imagename imagenewname
  • Delete them rm imagename
  • set permissions chmod 600 filename
  • set user and group chown username:groupname filename

When you use OMF load, the -i flag refers to a file name in this directory. It obeys linux file permissions, so if you want to keep other people from loading your image, ensure that it doesn't allow group or everyone read permissions.

omf exec to execute an OEDL script

user@console:~$ omf exec

Execute an experiment script.

Usage:
exec [OPTIONS] ExperimentName [-- EXP_OPTIONS]}}}

!ExperimentName is the filename of the experiment script

[EXP_OPTIONS] are any options defined in the experiment script

[OPTIONS] are any of the following:
-a, --allow-missing              Continue experiment even if some nodes did not check in
-c, --config NAME                Configuration section from the config file ('default' if omitted)
-C, --configfile FILE            File containing local configuration parameters
-d, --debug                      Operate in debug mode
-i, --interactive                Run the experiment controller in interactive mode
-l, --libraries LIST             Comma separated list of libraries to load (defaults to [system:exp:stdlib,system:exp:eventlib,system:exp:winlib])
--log FILE                       File containing logging configuration information
-m, --message MESSAGE            Message to add to experiment trace
-n, --just-print                 Print the commands that would be executed, but do not execute them
-N, --no-am                      Don't use the Aggregate Manager (AM)
-p, --print URI                  Print the contents of the experiment script
-o, --output-result FILE         File to write final state information to
-e, --experiment-id EXPID        Set the ID for this experiment, instead of the default standard ID
-O, --output-app                 Display STDOUT & STDERR output from the executed applications
-r, --reset                      If set, then reset (reboot) the nodes before the experiment
-s, --shutdown                   If set, then shut down resources at the end of an experiment
-S, --slice NAME                 Name of the Slice where this EC should operate
-t, --tags TAGS                  Comma separated list of tags to add to experiment trace
--oml-uri URI                    The URI to the OML server for this experiment
-x, --extra-libs LIST            Comma separated list of libraries to load in addition to [system:exp:stdlib,system:exp:eventlib,system:exp:winlib]
--slave-mode EXPID               Run in slave mode in disconnected experiment, EXPID is the exp. ID
--slave-mode-resource NAME       When in slave mode, NAME is the HRN of the resource for this EC
-h, --help                       Show this message
-v, --version                    Show the version
Note: See TracWiki for help on using the wiki.