Version 14 (modified by 2 years ago) ( diff ) | ,
---|
Site Navigation
Resource Control with OMF
Table of Contents
A brief overview
OMF is a subset of OEDL which is a domain specific language providing a framwork to control and management functionalities for large scale testbeds with multiple networked resources. This framework is implemented as a set of libraries and services mainly written in Ruby. OMF can be used to initiate experiments, control resources & collect live measurements for recording and analysis. The full details for OMF can be found here.
To get quickly the sections below describe some of the most used OMF commands in the basic form. These commands are run from the testbed console.
OMF usage
user@console:~$ omf Run a command on the testbed(s) Usage: omf [COMMAND] [ARGUMENT]... Available COMMANDs: help Print this help message or a specify command usage exec Execute an experiment script load Load a disk image on a given set of nodes save Save a disk image from a given node into a file tell Switch a given set of nodes ON/OFF or reboot them stat Returns the status of a given set of nodes To get more help on individual commands: 'omf [COMMAND]'
OMF status to retrieve status of node(s)
user@console:~$ omf status Returns the status of the nodes in a testbed. Usage: omf stat [-h] -t TOPOLOGY [-h] [-s] [-c AGGREGATE]}}} Arguments: -h, --help print this help message -t, --topology TOPOLOGY a valid topology file or description (MANDATORY) -s, --summary print a summary of the node status for the testbed -c, --config AGGREGATE use testbed AGGREGATE
Examples:
Get the status of all nodes in the testbed. This is good way to find all the FQDN of the nodes in the console.
omf stat -t all
Get the status of a single node.
omf stat -t sdr2-md1.bed.cosmos-lab.org
Get the status of multiple nodes. Use a comma-separate list to specify multiple nodes in the topology (no space before or after the comma).
omf stat -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org
OMF tell to power on or off the node(s)
user@console:~$ omf tell Switch ON/OFF and reboot the nodes in a testbed. Usage: omf tell [-h] -t TOPOLOGY -a ACTION [-c AGGREGATE]}}} Arguments: -h, --help print this help message -a, --action ACTION specify an ACTION: on turn node(s) ON offs turn node(s) OFF (soft) offh turn node(s) OFF (hard) reboot reboots node(s) (soft) reset resets node(s) (hard) -t, --topology TOPOLOGY a valid topology file or description (MANDATORY) -c, --config AGGREGATE use testbed AGGREGATE
Examples:
Reset (soft) a node.
omf tell -a reset -t sdr2-md1.bed.cosmos-lab.org
Reboot (hard) a node.
omf tell -a reboot -t sdr2-md1.bed.cosmos-lab.org
Turn on multiple nodes.
omf tell -a on -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org
Turn offh (hard) multiple nodes.
omf tell -a offh -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org
OMF load to install a disk image on the node(s)
user@console:~$ omf load Install a given disk image on the nodes in a testbed. Usage: omf load [-h] -t TOPOLOGY [-i IMAGE_PATH] [-o TIMEOUT] [-c AGGREGATE]}}} Arguments: -h, --help print this help message -t, --topology TOPOLOGY a valid topology file or description (MANDATORY) (if a file 'TOPOLOGY' doesn't exist, interpret it as a comma-separated list of nodes) -i, --image IMAGE disk image to load (default is 'baseline.ndz', the latest stable baseline image) -c, --config AGGREGATE use testbed AGGREGATE -o, --timeout TIMEOUT a duration (in sec.) after which imageNodes should stop waiting for nodes that have not finished their image installation (default is 800 sec, i.e. 13min 20sec) -r, --resize SIZE Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or leave x percent of free space if SIZE=x% --outpath PATH Path where the resulting Topologies should be saved (default is '/tmp') --outprefix PREFIX Prefix to use for naming the resulting Topologies (default is your experiment ID)
Examples:
Load the default baseline image (baseline.ndz) on all nodes in the testbed.
omf load -t system:topo:all -i baseline.ndz
Load a specific image (my_image.ndz) on all nodes in a topology.
omf load -t system:topo:all -i my_image.ndz
Load a specific image (my_image.ndz) onto a single node.
omf load -t sdr2-md1.bed.cosmos-lab.org -i my_image.ndz
Load a specific image (my_image.ndz) onto multiple nodes with a timeout of 400 seconds.
omf load -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org -i my_image.ndz -o 400
OMF save to export the disk image from a single node into the repository.
Once you have the image prepared the way you want it. You need to run the following script on the node:
root@srv1-lg1.sb1.orbit-lab.org: /root/prepare.sh
This will remove udev rules (to prevent renaming of interfaces) and dump log files to lower the size of the image. It will also shutdown the node.
Once the node has been shutdown, to save the existing disk image use the command:
user@console:~$ omf save Save a disk image from a given node into an archive file. Usage: omf save -n NODE [-h] [-c AGGREGATE]}}} Arguments: -h, --help print this help message -n, --node NODE a valid description of a single node (MANDATORY) (no default here, you have to enter a node!) -r, --resize SIZE Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or leave x percent of free space if SIZE=x%
NODE must be specified in FQDN (fully qualified domain name) format (eg. srv1-lg1.sb2.cosmos-lab.org).
Examples:
Save the disk image from a node.
omf save -n srv1-lg1.sb2.cosmos-lab.org
Example of Output
Once you have the image prepared the way you want it. On the node run:
console:~$ ssh root@sdr2-lg1.sb1.cosmos-lab.org root@sdr-console: ./prepare.sh
This will remove udev rules (to prevent renaming of interfaces) and dump log files to lower the size of the image. It also shutdown the node.
Once the node has been shutdown, to save the existing disk image running on the node, use the omf save command on the console:
console:~$ omf save -n sdr2-lg1.sb1.cosmos-lab.org
The output of this image saving process will look like the following:
INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675) INFO NodeHandler: Slice ID: pxe_slice INFO NodeHandler: Experiment ID: pxe_slice-2013-02-06t14.14.46-05.00 INFO NodeHandler: Message authentication is disabled INFO Experiment: load system:exp:stdlib INFO property.resetDelay: resetDelay = 230 (Fixnum) INFO property.resetTries: resetTries = 1 (Fixnum) INFO Experiment: load system:exp:eventlib INFO Experiment: load system:exp:saveNode INFO property.node: node = "node1-1.sb1.orbit-lab.org" (String) INFO property.pxe: pxe = "1.1.6" (String) INFO property.domain: domain = "grid.orbit-lab.org" (String) INFO property.started: started = "false" (String) INFO property.image: image = nil (NilClass) INFO property.resize: resize = nil (NilClass) WARN exp: Saving only works for ext2/ext3 partitions and MBR (msdos) partition tables. Saving any other filesystem or partition table type will produce a 0 byte image. INFO Topology: Loading topology 'node1-1.sb1.orbit-lab.org'. INFO Experiment: Resetting resources INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb1.orbit-lab.org) [0 sec.] . . . INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb1.orbit-lab.org) [80 sec.] INFO ALL_UP: Event triggered. Starting the associated tasks. INFO node1-1.sb1.orbit-lab.org: INFO node1-1.sb1.orbit-lab.org: - Saving image of '/dev/sda' on node 'node1-1.sb1.orbit-lab.org' INFO node1-1.sb1.orbit-lab.org: to the file 'bob-node-node1-1.sb1.orbit-lab.org-2013-02-06-14-16-23.ndz' on host '10.10.0.42' INFO node1-1.sb1.orbit-lab.org: INFO property.started: started = "true" (String) INFO exp: INFO exp: - Saving process started at: Wed Feb 06 14:16:27 -0500 2013 INFO exp: (this may take a while depending on the size of your image) INFO Experiment: DONE! INFO ExecApp: Application 'commServer' finished INFO run: Experiment sb1_2008_07_20_23_38_04 finished after 9:19 done.
Please make sure that the process ends without errors.
If there are no errors, at the end of the saving process, you will have disk image file with the name:
bob-node-sdr2-lg1.sb1.cosmos-lab.org-2013-02-06-14-16-23.ndz
in the directory
/export/omf/omf-images
This directory is available on each console, as well as the machine with the host name "frisbee". This information is printed in the output shown above.
You can then reload this disk image on a node (or nodes) using the omf load command.
Working with Saved Images
Images are treated as standard linux files. That means that you can:
- check that they have a nonzero size
ls -al imagename
- Rename them
mv imagename imagenewname
- Delete them
rm imagename
- set permissions
chmod 600 filename
- set user and group
chown username:groupname filename
When you use OMF load, the -i
flag refers to a file name in this directory. It obeys linux file permissions, so if you want to keep other people from loading your image, ensure that it doesn't allow group or everyone read permissions.
omf exec to execute an OEDL script
user@console:~$ omf exec Execute an experiment script. Usage: exec [OPTIONS] ExperimentName [-- EXP_OPTIONS]}}} !ExperimentName is the filename of the experiment script [EXP_OPTIONS] are any options defined in the experiment script [OPTIONS] are any of the following: -a, --allow-missing Continue experiment even if some nodes did not check in -c, --config NAME Configuration section from the config file ('default' if omitted) -C, --configfile FILE File containing local configuration parameters -d, --debug Operate in debug mode -i, --interactive Run the experiment controller in interactive mode -l, --libraries LIST Comma separated list of libraries to load (defaults to [system:exp:stdlib,system:exp:eventlib,system:exp:winlib]) --log FILE File containing logging configuration information -m, --message MESSAGE Message to add to experiment trace -n, --just-print Print the commands that would be executed, but do not execute them -N, --no-am Don't use the Aggregate Manager (AM) -p, --print URI Print the contents of the experiment script -o, --output-result FILE File to write final state information to -e, --experiment-id EXPID Set the ID for this experiment, instead of the default standard ID -O, --output-app Display STDOUT & STDERR output from the executed applications -r, --reset If set, then reset (reboot) the nodes before the experiment -s, --shutdown If set, then shut down resources at the end of an experiment -S, --slice NAME Name of the Slice where this EC should operate -t, --tags TAGS Comma separated list of tags to add to experiment trace --oml-uri URI The URI to the OML server for this experiment -x, --extra-libs LIST Comma separated list of libraries to load in addition to [system:exp:stdlib,system:exp:eventlib,system:exp:winlib] --slave-mode EXPID Run in slave mode in disconnected experiment, EXPID is the exp. ID --slave-mode-resource NAME When in slave mode, NAME is the HRN of the resource for this EC -h, --help Show this message -v, --version Show the version