wiki:UserGuide

Version 18 (modified by msherman, 5 years ago) ( diff )

Getting Started

This guide will walk you through the steps necessary to get started with using the COSMOS testbed.

You can browse the wiki as an anonymous user. After following the steps below, you can log in using the navigation bar in the top right.

Make an Account

NOTE: If you already have an ORBIT account, you can skip this step.

Here is the account creation flow:

  1. Request an Account
    1. Select your group in the drop-down
    2. Specify your contact info and mailing list preference
  2. Reply to confirmation email for Account Request
    1. Defeat your spam filter by whitelisting accountmanager@orbit-lab.org
  3. Your Group PI Approves your request
  4. You receive confirmation email for Account Approval

First, you will need to create a user account.

If your organization is not listed and you are faculty/staff member (or project manager in eligible institution) you can create a user group for which you will be in charge (please make sure to use the email address that can be independently verified).

Usage of COSMOS is governed by the UserGuide/AcceptibleUse terms and conditions

Please note that, once you fill in the form, you will receive an email to confirm your Account Request with a link that you have to access within 30 minutes in order to fully submit the account creation request.

Create and Configure SSH Keys

SSH access to COSMOS domains requires the use of public key authentication. If you try to connect using the username and password that you use for accessing the scheduler and status pages, you will receive the following message:

not_a_user@laptop:~$ ssh not_a_user@bed.cosmos-lab.org
not_a_user@bed.cosmos-lab.org: Permission denied (publickey).

You need to configure the SSH client on your computer to use a private key for connecting to COSMOS machines instead of a password.

Additionally, the corresponding public key needs to be added to your COSMOS account.

This page describes the procedure for:

  • generating a public/private key pair
  • configuring your SSH client to use the private key
  • uploading the public key to your COSMOS account.

The instructions here are for specific SSH client software, if you use a different SSH client than those referenced here, please follow the documentation provided with that SSH client and use the instructions here for reference.

  1. Select the OS of your computer




Make a Reservation

  1. Before you can access the test bed, you need to make a reservation for a particular experiment server and get it approved by the reservation service.
  2. The reservation scheduler can be seen when you first log in to the control panel.
  3. On the scheduler, select the grid square corresponding to the subdomain you wish to reserve at the time you want the reservation to start. This will open a dialog allowing you to configure your reservation.
  4. Making a Group Reservation: There are two ways to create a group reservation:

Method 1: Use the Group Reservation tab. This will automatically grant access to all users in the group.

Method 2: Go to the Participants tab — you can manually add individual users to your reservation.

  1. When you are done, hit submit. You should see this popup window:
  2. In the scheduler, you should see your reservation appear in yellow (indicating a reservation pending approval).
  3. Just before your time-slot starts, it will be approved automatically.
  4. During the time-slot, you will be able to log in to the console.

Log in to your Reservation

During your approved time slot, you will be able to ssh into the console of the respective domain. A console is a dedicated machine that allows access to all resources in that domain.

not_a_user@laptop:~$ ssh your_username@console.sb1.cosmos-lab.org
                                       Welcome to
   _____ ____   _____ __  __  ____   _____      _               ____
  / ____/ __ \ / ____|  \/  |/ __ \ / ____|    | |        /\   |  _ \
 | |   | |  | | (___ | \  / | |  | | (___ _____| |       /  \  | |_) | ___  _ __ __ _
 | |   | |  | |\___ \| |\/| | |  | |\___ \_____| |      / /\ \ |  _ < / _ \| '__/ _` |
 | |___| |__| |____) | |  | | |__| |____) |    | |____ / ____ \| |_) | (_) | | | (_| |
  \_____\____/|_____/|_|  |_|\____/|_____/     |______/_/    \_\____(_)___/|_|  \__, |
                                                                                 __/ |
                                                                                |___/
 Hostname         : console.sb1.cosmos-lab.org
 Operating system : Ubuntu 16.04.5 LTS; Kernel: 4.15.0-45-generic; Arch: x86_64;
 CPU              : 6 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
                    1 socket(s) with 6 core(s) per socket and 1 thread(s) per core
 Memory           : 3.9G
 Uptime           : up 5 weeks, 1 day, 19 hours, 28 minutes
 Users logged in  : 13
Last login: Thu May 16 10:18:10 2019 from 192.168.203.237
your_username@console:~$

Control Resources with OMF

Get the status of nodes

user@console:~$ omf status

Returns the status of the nodes in a testbed.

Usage:
omf stat [-h] -t TOPOLOGY [-h] [-s] [-c AGGREGATE]}}}

Arguments:

-h, --help                 print this help message
-t, --topology TOPOLOGY    a valid topology file or description (MANDATORY)
-s, --summary              print a summary of the node status for the testbed
-c, --config AGGREGATE     use testbed AGGREGATE

Examples:

Get the status of all nodes in the testbed. This is good way to find all the FQDN of the nodes in the console.

omf stat -t all

Get the status of a single node.

omf stat -t sdr2-md1.bed.cosmos-lab.org

Get the status of multiple nodes. Use a comma-separate list to specify multiple nodes in the topology (no space before or after the comma).

omf stat -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org

Load an image onto the node

user@console:~$ omf load
Install a given disk image on the nodes in a testbed.

Usage:
omf load [-h] -t TOPOLOGY [-i IMAGE_PATH] [-o TIMEOUT] [-c AGGREGATE]}}}

Arguments:

-h, --help                 print this help message
-t, --topology TOPOLOGY    a valid topology file or description (MANDATORY)
                           (if a file 'TOPOLOGY' doesn't exist, interpret it as a
                           comma-separated list of nodes)
-i, --image IMAGE          disk image to load
                           (default is 'baseline.ndz', the latest stable baseline image)
-c, --config AGGREGATE     use testbed AGGREGATE
-o, --timeout TIMEOUT      a duration (in sec.) after which imageNodes should stop waiting for
                           nodes that have not finished their image installation
                           (default is 800 sec, i.e. 13min 20sec)
-r, --resize SIZE          Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or
                           leave x percent of free space if SIZE=x%
  --outpath PATH           Path where the resulting Topologies should be saved
                           (default is '/tmp')
  --outprefix PREFIX       Prefix to use for naming the resulting Topologies
                           (default is your experiment ID)

Examples:

Load the default baseline image (baseline.ndz) on all nodes in the testbed.

omf load -t system:topo:all -i baseline.ndz

Load a specific image (my_image.ndz) on all nodes in a topology.

omf load -t system:topo:all -i my_image.ndz

Load a specific image (my_image.ndz) onto a single node.

omf load -t sdr2-md1.bed.cosmos-lab.org -i my_image.ndz

Load a specific image (my_image.ndz) onto multiple nodes with a timeout of 400 seconds.

omf load -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org -i my_image.ndz -o 400

Turn the node on

user@console:~$ omf tell

Switch ON/OFF and reboot the nodes in a testbed.

Usage:
omf tell [-h] -t TOPOLOGY -a ACTION [-c AGGREGATE]}}}

Arguments:
-h, --help                 print this help message
-a, --action ACTION        specify an ACTION:
                           on          turn node(s) ON
                           offs        turn node(s) OFF (soft)
                           offh        turn node(s) OFF (hard)
                           reboot      reboots node(s) (soft)
                           reset       resets node(s) (hard)
-t, --topology TOPOLOGY    a valid topology file or description (MANDATORY)
-c, --config AGGREGATE     use testbed AGGREGATE

Examples:

Reset (soft) a node.

omf tell -a reset -t sdr2-md1.bed.cosmos-lab.org

Reboot (hard) a node.

omf tell -a reboot -t sdr2-md1.bed.cosmos-lab.org

Turn on multiple nodes.

omf tell -a on -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org

Turn offh (hard) multiple nodes.

omf tell -a offh -t sdr2-md1.bed.cosmos-lab.org,srv1-co1.bed.cosmos-lab.org

Log into the node and make changes

TODO

Save the node with your changes

Once you have the image prepared the way you want it. You need to run the following script on the node:

root@srv1-lg1.sb1.orbit-lab.org: /root/prepare.sh

This will remove udev rules (to prevent renaming of interfaces) and dump log files to lower the size of the image. It will also shutdown the node.

Once the node has been shutdown, to save the existing disk image use the command:

user@console:~$ omf save

Save a disk image from a given node into an archive file.

Usage:
omf save -n NODE [-h] [-c AGGREGATE]}}}


Arguments:
-h, --help           print this help message
-n, --node NODE      a valid description of a single node (MANDATORY)
                     (no default here, you have to enter a node!)
-r, --resize SIZE    Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or
                     leave x percent of free space if SIZE=x%

NODE must be specified in FQDN (fully qualified domain name) format (eg. srv1-lg1.sb2.cosmos-lab.org).

Examples:

Save the disk image from a node.

omf save -n srv1-lg1.sb2.cosmos-lab.org

Run a Hello World Experiment

user@console:~$ omf exec

Execute an experiment script.

Usage:
exec [OPTIONS] ExperimentName [-- EXP_OPTIONS]}}}

!ExperimentName is the filename of the experiment script

[EXP_OPTIONS] are any options defined in the experiment script

[OPTIONS] are any of the following:
-a, --allow-missing              Continue experiment even if some nodes did not check in
-c, --config NAME                Configuration section from the config file ('default' if omitted)
-C, --configfile FILE            File containing local configuration parameters
-d, --debug                      Operate in debug mode
-i, --interactive                Run the experiment controller in interactive mode
-l, --libraries LIST             Comma separated list of libraries to load (defaults to [system:exp:stdlib,system:exp:eventlib,system:exp:winlib])
--log FILE                       File containing logging configuration information
-m, --message MESSAGE            Message to add to experiment trace
-n, --just-print                 Print the commands that would be executed, but do not execute them
-N, --no-am                      Don't use the Aggregate Manager (AM)
-p, --print URI                  Print the contents of the experiment script
-o, --output-result FILE         File to write final state information to
-e, --experiment-id EXPID        Set the ID for this experiment, instead of the default standard ID
-O, --output-app                 Display STDOUT & STDERR output from the executed applications
-r, --reset                      If set, then reset (reboot) the nodes before the experiment
-s, --shutdown                   If set, then shut down resources at the end of an experiment
-S, --slice NAME                 Name of the Slice where this EC should operate
-t, --tags TAGS                  Comma separated list of tags to add to experiment trace
--oml-uri URI                    The URI to the OML server for this experiment
-x, --extra-libs LIST            Comma separated list of libraries to load in addition to [system:exp:stdlib,system:exp:eventlib,system:exp:winlib]
--slave-mode EXPID               Run in slave mode in disconnected experiment, EXPID is the exp. ID
--slave-mode-resource NAME       When in slave mode, NAME is the HRN of the resource for this EC
-h, --help                       Show this message
-v, --version                    Show the version

Get Help and Support

COSMOS Wiki

Many of our users' most common questions have documented answers in this wiki. Please use the search function in the top-right corner.

Frequently Asked Questions

Many issues are very common, please refer to the FAQ

How to request help

In order to best solve your issues, please include the following in your communication:

  • Institutional Affiliation
  • email and username used for account registration
  • brief description of your issue
    • what you were trying to accomplish
    • what did not work as you expected
  • any other relevant information such as:
    • the commands you ran, and their output
    • error messages
    • log files

Community Mailing List

To ask questions of the user community, use the mailing list orbit-users@…

Technical Support

To get technical support from the testbed maintainers, email problems@…

Note: See TracWiki for help on using the wiki.