wiki:UserGuide/DiskImages

Disk Images

Summary

The imaging process is executed by the commands 'omf load' and 'omf save'

These provision a full disk image onto a set of nodes, and should work for any ext2/¾ filesystem.

After saving an image from one node, and loading it onto another, it will appear to the user that a copy of the hard disk has been made. Specifically, this is a block based copy, not a file based one.

The baseline image is a recommended starting point, as this provisioning tool does not currently work with standard .iso or similar files, instead using a custom compressed .ndz format.

Working with Image files

These images are loaded and saved using OMF commands.

Saved images can be seen in the directory /export/omf-images-5.4/ on the Consoles.

Images are treated as standard linux files. That means that you can:

  • check that they have a nonzero size ls -al imagename
  • Rename them mv imagename imagenewname
  • Delete them rm imagename
  • set permissions chmod 600 filename
  • set user and group chown username:groupname filename

When you use OMF load, the -i flag refers to a file name in this directory. It obeys linux file permissions, so if you want to keep other people from loading your image, ensure that it doesn't allow group or everyone read permissions.

Security and Access

Images

Images you save are saved to the directory "/export/omf-images-5.4/"

They have permissions to be writable by your user, and readable by your group, and all logged in users. You can customize this via the chmod and chown commands. For example, you may want to restrict the ability to load your images to only members of a specific group.

SSH

WARNING: For nodes that may be accessible externally, [mobile nodes, tunnels to an external subnet, etc] it is YOUR responsibility to set credentials to prevent remote login.

This can be done via the passwd command, and / or editing the file /etc/ssh/sshd_config The default baseline image allows passwordless based access as the user native, from RFC1918 private ip space: 10/8 172.16/12 192.168/16 Root login is disabled

Passwordless Sudo is enabled for the user native.

You should set up your own accounts, or customize your image's ssh config if you need something different.

Pre-defined images

Image Name Description username Updated Status
bare.ndz ubuntu18.04 + basic config root 2020-05-11 ready
baseline.ndz bare + omf tools root 2020-05-11 ready
baseline-uhd.ndz baseline + uhd 3.15 root 2020-05-11 ready
baseline-gr.ndz baseline-uhd + gnuradio 3.8 root 2020-05-11 ready
baseline-cuda.ndz baseline + cuda + drivers root 2020-05-20 internal*
baseline-tensorflow.ndz baseline-cuda + tensorflow root 2020-05-20 internal*
baseline-pytorch.ndz baseline-cuda + pytorch root 2020-05-20 internal*
baseline-deepstream.ndz baseline-cuda + deepstream root 2020-05-26 internal*

The images listed here are shortcuts to versioned image snapshots. Use the name under Image Name in the table.

To see the specific, versioned image, run:

user@console.bed:/export/omf-images-5.4$ ls -al baseline.ndz
lrwxrwxrwx 1 msherman winlab 36 May 11 21:25 baseline.ndz -> deploy-baseline-18.04-2020-05-11.ndz

*Images listed as internal are currently only available to members of winlab. Please contact a COSMOS administrator if you would like to use one of these images.

Bare

This is a customized image, build off of Ubuntu Server 18.04 The main changes are:

  • /etc/fstab and /etc/default/grub are modified
  • A recent kernek, lernel headers and build essential are installed
  • dhcp client, dns resolution, and hostname are configured
  • ssh is installed and configured
  • temporary files, bash history, apt lists, and such are purged.

Reference Dockerfile

FROM scratch as bare
ADD src/18.04-server-cloudimg-amd64-root.tar.xz            /
#docker optimizations for apt
RUN set -xe \
    \
# https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L85-L105
    && echo 'DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' > /etc/apt/apt.conf.d/docker-clean \
    && echo 'APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' >> /etc/apt/apt.conf.d/docker-clean \
    && echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean \
    \
# https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L109-L115
    && echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages \
    \
# https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L118-L130
    && echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes \
    \
# https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L134-L151
    && echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests

ARG KERNEL_TYPE="generic"
ARG COMMON_PKGS="vim emacs git dnsutils"

ENV DEBIAN_FRONTEND=noninteractive \
    TERM=linux
#set up apt sources
COPY files/apt/ /etc/apt/
RUN wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | apt-key add -
#install bootloader and kernel, common packages
RUN apt update && apt install --no-install-recommends -fy \
    linux-image-${KERNEL_TYPE} \
    linux-headers-${KERNEL_TYPE} \
    grub-pc \
    software-properties-common \
    build-essential \
    ssh \
    ${COMMON_PKGS}
    
#disable auto updates
RUN apt -fy purge unattended-upgrades

#create users with "blank" passwords. WARNING, very insecure!!!
RUN echo "root:root" | chpasswd && \
    sed -i 's/^\(root:\)[^:]*\(:.*\)$/\1\2/' /etc/shadow && \
    cp -r /etc/skel/. /root/

COPY files/fstab /etc/fstab
COPY files/grub /etc/default/grub
RUN rm /etc/default/grub.d/*
COPY files/00-netplan.yaml /etc/netplan/00-netplan.yaml
COPY files/ssh/server/* /etc/ssh/
COPY files/ssh/client/* /root/.ssh/

#fix ssh key permissions
RUN chmod 400 /etc/ssh/ssh_host_*_key && chmod 444 /etc/ssh/ssh_host_*_key.pub

#16.04 and prior use ifupdown
#COPY dhcp/hostname-ifupdown /etc/dhcp/dhclient-exit-hooks.d/hostname
#18.04 uses netplan and networkd-dispatcher
COPY files/dhcp/hostname-networkd /etc/networkd-dispatcher/routable.d/20-hostname.sh
RUN chmod +x /etc/networkd-dispatcher/routable.d/20-hostname.sh

#clean up build
RUN rm -f /etc/apt/apt.conf.d/01proxy && \
    rm -rf /var/lib/apt/lists/* && \
    apt clean && \
    apt autoclean

#commands are rune when container is started
#workaround for "locked" files in docker-build
#this may delay image saving
COPY files/late_commands.sh /root/late_commands.sh
ENTRYPOINT ["/root/late_commands.sh"]
CMD ["/bin/bash"]

Baseline

The baseline image is a very bare install of Ubuntu 18.04 Bionic

You should customize it to you needs, and use that as a base for your experiments.

After saving an image, it will NOT track changes to the baseline, it is a copy, not a delta.

You may periodically want to re-create your experimental images when a new baseline has been released, to support new hardware, or newer drivers, etc.

Reference Dockerfile

FROM container_bare:latest as baseline
ENV DEBIAN_FRONTEND=noninteractive \
    TERM=linux

RUN apt update && apt -y install \
    ruby ruby-dev iw

#Install OMF6 RC

#fix dependencies
RUN gem install hashie:'~>2' facter:'~>2' omf_rc:'6.2.3'
#manually patch omf_rc
COPY files/omf/config.yml /var/lib/gems/2.5.0/gems/omf_rc-6.2.3/config/config.yml
COPY files/omf/environment /var/lib/gems/2.5.0/gems/omf_rc-6.2.3/init/
COPY files/omf/omf_rc.service /var/lib/gems/2.5.0/gems/omf_rc-6.2.3/init/
COPY files/omf/install_omf_rc /usr/local/bin/install_omf_rc
RUN install_omf_rc -i -c

#copy misc files needed
COPY files/blacklist/* /etc/modprobe.d/
COPY files/prepare.sh /root/prepare.sh

#clean up build
RUN rm -f /etc/apt/apt.conf.d/01proxy && \
    rm -rf /var/lib/apt/lists/* && \
    apt clean && \
    apt autoclean

Baseline UHD

Baseline UHD starts from Baseline, then installs UHD3.15 installed from source, and downloads the fpga images with uhd_images_downloader

Reference Dockerfile

FROM container_baseline:latest as baseline-uhd
ENV DEBIAN_FRONTEND=noninteractive \
    TERM=linux

RUN apt update && apt -y install \
    cmake \ 
    debhelper \ 
    doxygen \ 
    dpdk-dev \
    libboost-date-time-dev \ 
    libboost-dev \ 
    libboost-filesystem-dev \ 
    libboost-program-options-dev \ 
    libboost-regex-dev \ 
    libboost-serialization-dev \ 
    libboost-system-dev \ 
    libboost-test-dev \ 
    libboost-thread-dev \ 
    libncurses5-dev \ 
    libusb-1.0-0-dev \ 
    pkg-config \ 
    python3-apt \
    python3-pip \
    python3-dev \ 
    python3-mako \ 
    python3-numpy \
    python3-requests

#install UHD
ARG UHD_VERSION=3.15.0
ARG UHD_PATCH=$UHD_VERSION.0
ARG UHD_TAG=v$UHD_PATCH
WORKDIR /opt/
RUN git clone https://github.com/EttusResearch/uhd -b $UHD_TAG --single-branch
RUN cd uhd/host && mkdir build && cd build && \
cmake .. && make -j`nproc`
RUN cd uhd/host/build && make test
RUN cd /opt/uhd/host/build && make install

#clean up build dir
RUN rm -rf /opt/uhd

#trick apt into thinking uhd was installed from repo
RUN apt update && apt install -y \
    equivs
RUN equivs-control libuhd-dev.control && \
    sed -i "s/<package name; defaults to equivs-dummy>/libuhd-dev/g" libuhd-dev.control && \
    sed -i "s/# Version: <enter version here; defaults to 1.0>/Version: $UHD_PATCH/g" libuhd-dev.control && \
    equivs-build libuhd-dev.control && \
    dpkg -i libuhd-dev*.deb
RUN equivs-control libuhd$UHD_VERSION.control && \
    sed -i "s/<package name; defaults to equivs-dummy>/libuhd$UHD_VERSION/g" libuhd$UHD_VERSION.control && \
    sed -i "s/# Version: <enter version here; defaults to 1.0>/Version: $UHD_PATCH/g" libuhd$UHD_VERSION.control && \
    equivs-build libuhd$UHD_VERSION.control && \
    dpkg -i libuhd$UHD_VERSION*.deb
RUN rm -f /opt/*.control && rm -f /opt/*.deb

#enable libraries and download images
RUN ldconfig
RUN uhd_images_downloader

#install usrp PCIe drivers
WORKDIR /opt/
ADD files/usrp/niusrprio-installer-18.0.0.tar.gz /opt/
#tell it how to log, set kernel target to latest installed
#handle exit 2 for some reason..
RUN cp ./niusrprio_installer/niusrprio_pcie /usr/local/bin/ && \
    KERNELTARGET=$(ls -tr /lib/modules | tail -1) \
    LOG_MSD_STDERR=true \
    /opt/niusrprio_installer/INSTALL --accept-license --no-prompt; \
    if [ "$?" -eq 2 ]; then exit 0; fi
#install unit file and udev rule
COPY files/usrp/niusrprio.service /etc/systemd/system/
COPY files/usrp/99-usrprio.rules /etc/udev/rules.d/

#add UHD related sysctls to system
RUN echo "net.core.rmem_max=33554432" >> /etc/sysctl.conf && \
    echo "net.core.wmem_max=33554432" >> /etc/sysctl.conf

#clean up build
RUN rm -f /etc/apt/apt.conf.d/01proxy && \
    rm -rf /var/lib/apt/lists/* && \
    apt clean && \
    apt autoclean

Baseline Gnu Radio

Baseline Gnu Radio starts from Baseline_uhd and then builds gnuradio verion 3.8 from source, against the installed UHD version. (Currently 3.15)

If you need a different version of UHD or Gnu Radio, please build it yourself from the parent image.

Reference Dockerfile

FROM container_baseline-uhd:latest as baseline-gr
ENV DEBIAN_FRONTEND=noninteractive \
    TERM=linux

RUN dpkg -l | grep uhd

RUN add-apt-repository -s ppa:gnuradio/gnuradio-releases && \
    add-apt-repository -s ppa:ettusresearch/uhd && \
    apt update && \
    apt build-dep -qy \
        gnuradio

ARG GR_VERSION=v3.8.1.0
WORKDIR /opt/
RUN git clone https://github.com/gnuradio/gnuradio -b $GR_VERSION \
    --single-branch --recurse-submodules

RUN cd gnuradio && mkdir build && cd build && \
cmake .. && make -j24
# RUN cd gnuradio/build && make test -j24
RUN cd gnuradio/build && make install -j24
RUN ldconfig

#set envs
RUN export "GNURADIO_PREFIX=$(gnuradio-config-info --prefix)" >> /root/.bashrc && \
    echo "export PYTHONPATH=$GNURADIO_PREFIX/lib/python3/dist-packages:$GNURADIO_PREFIX/lib/python3/site-packages:$PYTHONPATH" >> /root/.bashrc && \
    echo "export LD_LIBRARY_PATH=$GNURADIO_PREFIX/lib:$LD_LIBRARY_PATH" >> /root/.bashrc

#remove source
RUN rm -rf /opt/gnuradio

#clean up
RUN rm -f /etc/apt/apt.conf.d/01proxy && \
    rm -rf /var/lib/apt/lists/* && \
    apt clean && \
    apt autoclean

Baseline CUDA

The cuda baseline image is meant to be run on the cosmos server machines containing V100 GPUs. It is built with CUDA drivers libraries for general purpose GPU programming. The baseline image is built with driver version 440.33 with cuda 10.2 libraries. If you need an older version of cuda, there is also an image for version 10.1 called "baseline_cuda_1804_101.ndz" which has driver version 418.39. Note: all CUDA images are currently only available to members of winlab. Please contact a COSMOS administrator if you would like to use this image.

Build Steps

If you would like to create your own cuda image, you can do so by starting with the baseline_1804 image.

  1. Select the version of cuda you need from the cuda toolkit archive, then choose your operating system (Linux), architecture (x86_64), distribution (Ubuntu), and version (18.04). Choose "runfile(local)" as the installer type. If there is a download link available, copy the link and use wget to download it onto the node. If there is no download link, there should be a url provided to use with wget.
    • After the installer has downloaded, use chmod +x to make it executable.
    • Blacklist nouveau by creating the file /etc/modprobe.d/blacklist-nouveau.conf with these lines:
      blacklist nouveau
      options nouveau modeset=0
      
    • run sudo update-initramfs -u to apply the changes
    • reboot the node, and execute the installer
  2. You will also have to add the directory of cuda binaries to the path. Edit the .bash_rc file and add
    export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
  3. To verify your cuda installation, you can build and run some of the cuda samples. They'll be found in /usr/local/cuda/samples.

Baseline Tensorflow

Baseline Tensorflow is built from the baseline 18.04 CUDA 10.1 image, and has CudNN version 7.6 and Tensorflow version 2.2 installed. Note: all CUDA images are currently only available to members of winlab. Please contact a COSMOS administrator if you would like to use this image.

Baseline Pytorch

Baseline Tensorflow is built from the baseline 18.04 CUDA 10.2 image, and has CudNN version 7.6 and Pytorch version 1.5 installed. Note: all CUDA images are currently only available to members of winlab. Please contact a COSMOS administrator if you would like to use this image.

Baseline Deepstream

Baseline Deepstream includes version 5.0 of the Nvidia Deepstream SDK. The image is built from the baseline 18.04 CUDA 10.2 image, and includes CudNN version 7.6 and TensorRT version 7. Note: all CUDA images are currently only available to members of winlab. Please contact a COSMOS administrator if you would like to use this image.

Advanced

Image CI Pipieline

Documentation TODO

Building a baseline image

  1. Use pxe or usb to install ubuntu netinstall iso
  2. Start it up, run update and dist-upgrade
  3. set netplan.io to dhcp on all physical ethernet interfaces
  4. add dhclient-exit-hook/hostname to dynamically set hostname based on DHCP
  5. add prepare.sh script to generalize prior to saving images
Last modified 5 days ago Last modified on May 26, 2020, 7:37:00 PM
Note: See TracWiki for help on using the wiki.