Version 25 (modified by 5 years ago) ( diff ) | ,
---|
Site Navigation
Disk Images
Summary
The imaging process is executed by the commands 'omf load' and 'omf save'
These provision a full disk image onto a set of nodes, and should work for any ext2/¾ filesystem.
After saving an image from one node, and loading it onto another, it will appear to the user that a copy of the hard disk has been made. Specifically, this is a block based copy, not a file based one.
The baseline image is a recommended starting point, as this provisioning tool does not currently work with standard .iso or similar files, instead using a custom compressed .ndz format.
Security and Access
Images
Images you save are saved to the directory "/export/omf-images-5.4/"
They have permissions to be writable by your user, and readable by your group, and all logged in users. You can customize this via the chmod and chown commands. For example, you may want to restrict the ability to load your images to only members of a specific group.
SSH
WARNING: For nodes that may be accessible externally, [mobile nodes, tunnels to an external subnet, etc] it is YOUR responsibility to set credentials to prevent remote login.
This can be done via the passwd command, and / or editing the file /etc/ssh/sshd_config The default baseline image allows passwordless based access as the user native, from RFC1918 private ip space: 10/8 172.16/12 192.168/16 Root login is disabled
Passwordless Sudo is enabled for the user native.
You should set up your own accounts, or customize your image's ssh config if you need something different.
Pre-defined images
Image Name | Description | username | Updated | Status |
bare.ndz | ubuntu18.04 + basic config | root | 2020-05-11 | ready |
baseline.ndz | bare + omf tools | root | 2020-05-11 | ready |
baseline-uhd.ndz | baseline + uhd 3.15 | root | 2020-05-11 | ready |
baseline-gr.ndz | baseline-uhd + gnuradio 3.8 | root | 2020-05-11 | ready |
baseline-cuda.ndz | baseline + cuda + drivers | root | n/a | |
baseline-tensorflow.ndz | baseline-cuda + tensorflow | root | n/a | |
baseline-pytorch.ndz | baseline-cuda + pytorch | root | n/a |
The images listed here are shortcuts to versioned image snapshots. Use the name under Image Name in the table.
To see the specific, versioned image, run:
user@console.bed:/export/omf-images-5.4$ ls -al baseline.ndz lrwxrwxrwx 1 msherman winlab 36 May 11 21:25 baseline.ndz -> deploy-baseline-18.04-2020-05-11.ndz
Bare
This is a customized image, build off of Ubuntu Server 18.04 The main changes are:
- /etc/fstab and /etc/default/grub are modified
- A recent kernek, lernel headers and build essential are installed
- dhcp client, dns resolution, and hostname are configured
- ssh is installed and configured
- temporary files, bash history, apt lists, and such are purged.
Reference Dockerfile
FROM scratch as bare ADD src/18.04-server-cloudimg-amd64-root.tar.xz / #docker optimizations for apt RUN set -xe \ \ # https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L85-L105 && echo 'DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' > /etc/apt/apt.conf.d/docker-clean \ && echo 'APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' >> /etc/apt/apt.conf.d/docker-clean \ && echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean \ \ # https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L109-L115 && echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages \ \ # https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L118-L130 && echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes \ \ # https://github.com/docker/docker/blob/9a9fc01af8fb5d98b8eec0740716226fadb3735c/contrib/mkimage/debootstrap#L134-L151 && echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests ARG KERNEL_TYPE="generic" ARG COMMON_PKGS="vim emacs git dnsutils" ENV DEBIAN_FRONTEND=noninteractive \ TERM=linux #set up apt sources COPY files/apt/ /etc/apt/ RUN wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | apt-key add - #install bootloader and kernel, common packages RUN apt update && apt install --no-install-recommends -fy \ linux-image-${KERNEL_TYPE} \ linux-headers-${KERNEL_TYPE} \ grub-pc \ software-properties-common \ build-essential \ ssh \ ${COMMON_PKGS} #disable auto updates RUN apt -fy purge unattended-upgrades #create users with "blank" passwords. WARNING, very insecure!!! RUN echo "root:root" | chpasswd && \ sed -i 's/^\(root:\)[^:]*\(:.*\)$/\1\2/' /etc/shadow && \ cp -r /etc/skel/. /root/ COPY files/fstab /etc/fstab COPY files/grub /etc/default/grub RUN rm /etc/default/grub.d/* COPY files/00-netplan.yaml /etc/netplan/00-netplan.yaml COPY files/ssh/server/* /etc/ssh/ COPY files/ssh/client/* /root/.ssh/ #fix ssh key permissions RUN chmod 400 /etc/ssh/ssh_host_*_key && chmod 444 /etc/ssh/ssh_host_*_key.pub #16.04 and prior use ifupdown #COPY dhcp/hostname-ifupdown /etc/dhcp/dhclient-exit-hooks.d/hostname #18.04 uses netplan and networkd-dispatcher COPY files/dhcp/hostname-networkd /etc/networkd-dispatcher/routable.d/20-hostname.sh RUN chmod +x /etc/networkd-dispatcher/routable.d/20-hostname.sh #clean up build RUN rm -f /etc/apt/apt.conf.d/01proxy && \ rm -rf /var/lib/apt/lists/* && \ apt clean && \ apt autoclean #commands are rune when container is started #workaround for "locked" files in docker-build #this may delay image saving COPY files/late_commands.sh /root/late_commands.sh ENTRYPOINT ["/root/late_commands.sh"] CMD ["/bin/bash"]
Baseline
The baseline image is a very bare install of Ubuntu 18.04 Bionic
You should customize it to you needs, and use that as a base for your experiments.
After saving an image, it will NOT track changes to the baseline, it is a copy, not a delta.
You may periodically want to re-create your experimental images when a new baseline has been released, to support new hardware, or newer drivers, etc.
Reference Dockerfile
FROM container_bare:latest as baseline ENV DEBIAN_FRONTEND=noninteractive \ TERM=linux RUN apt update && apt -y install \ ruby ruby-dev iw #Install OMF6 RC #fix dependencies RUN gem install hashie:'~>2' facter:'~>2' omf_rc:'6.2.3' #manually patch omf_rc COPY files/omf/config.yml /var/lib/gems/2.5.0/gems/omf_rc-6.2.3/config/config.yml COPY files/omf/environment /var/lib/gems/2.5.0/gems/omf_rc-6.2.3/init/ COPY files/omf/omf_rc.service /var/lib/gems/2.5.0/gems/omf_rc-6.2.3/init/ COPY files/omf/install_omf_rc /usr/local/bin/install_omf_rc RUN install_omf_rc -i -c #copy misc files needed COPY files/blacklist/* /etc/modprobe.d/ COPY files/prepare.sh /root/prepare.sh #clean up build RUN rm -f /etc/apt/apt.conf.d/01proxy && \ rm -rf /var/lib/apt/lists/* && \ apt clean && \ apt autoclean
Baseline UHD
Baseline UHD starts from Baseline, then installs UHD3.15 installed from source, and downloads the fpga images with uhd_images_downloader
Reference Dockerfile
FROM container_baseline:latest as baseline-uhd ENV DEBIAN_FRONTEND=noninteractive \ TERM=linux RUN apt update && apt -y install \ cmake \ debhelper \ doxygen \ dpdk-dev \ libboost-date-time-dev \ libboost-dev \ libboost-filesystem-dev \ libboost-program-options-dev \ libboost-regex-dev \ libboost-serialization-dev \ libboost-system-dev \ libboost-test-dev \ libboost-thread-dev \ libncurses5-dev \ libusb-1.0-0-dev \ pkg-config \ python3-apt \ python3-pip \ python3-dev \ python3-mako \ python3-numpy \ python3-requests #install UHD ARG UHD_VERSION=3.15.0 ARG UHD_PATCH=$UHD_VERSION.0 ARG UHD_TAG=v$UHD_PATCH WORKDIR /opt/ RUN git clone https://github.com/EttusResearch/uhd -b $UHD_TAG --single-branch RUN cd uhd/host && mkdir build && cd build && \ cmake .. && make -j`nproc` RUN cd uhd/host/build && make test RUN cd /opt/uhd/host/build && make install #clean up build dir RUN rm -rf /opt/uhd #trick apt into thinking uhd was installed from repo RUN apt update && apt install -y \ equivs RUN equivs-control libuhd-dev.control && \ sed -i "s/<package name; defaults to equivs-dummy>/libuhd-dev/g" libuhd-dev.control && \ sed -i "s/# Version: <enter version here; defaults to 1.0>/Version: $UHD_PATCH/g" libuhd-dev.control && \ equivs-build libuhd-dev.control && \ dpkg -i libuhd-dev*.deb RUN equivs-control libuhd$UHD_VERSION.control && \ sed -i "s/<package name; defaults to equivs-dummy>/libuhd$UHD_VERSION/g" libuhd$UHD_VERSION.control && \ sed -i "s/# Version: <enter version here; defaults to 1.0>/Version: $UHD_PATCH/g" libuhd$UHD_VERSION.control && \ equivs-build libuhd$UHD_VERSION.control && \ dpkg -i libuhd$UHD_VERSION*.deb RUN rm -f /opt/*.control && rm -f /opt/*.deb #enable libraries and download images RUN ldconfig RUN uhd_images_downloader #install usrp PCIe drivers WORKDIR /opt/ ADD files/usrp/niusrprio-installer-18.0.0.tar.gz /opt/ #tell it how to log, set kernel target to latest installed #handle exit 2 for some reason.. RUN cp ./niusrprio_installer/niusrprio_pcie /usr/local/bin/ && \ KERNELTARGET=$(ls -tr /lib/modules | tail -1) \ LOG_MSD_STDERR=true \ /opt/niusrprio_installer/INSTALL --accept-license --no-prompt; \ if [ "$?" -eq 2 ]; then exit 0; fi #install unit file and udev rule COPY files/usrp/niusrprio.service /etc/systemd/system/ COPY files/usrp/99-usrprio.rules /etc/udev/rules.d/ #add UHD related sysctls to system RUN echo "net.core.rmem_max=33554432" >> /etc/sysctl.conf && \ echo "net.core.wmem_max=33554432" >> /etc/sysctl.conf #clean up build RUN rm -f /etc/apt/apt.conf.d/01proxy && \ rm -rf /var/lib/apt/lists/* && \ apt clean && \ apt autoclean
Baseline Gnu Radio
Baseline Gnu Radio starts from Baseline_uhd and then builds gnuradio verion 3.8 from source, against the installed UHD version. (Currently 3.15)
If you need a different version of UHD or Gnu Radio, please build it yourself from the parent image.
Reference Dockerfile
FROM container_baseline-uhd:latest as baseline-gr ENV DEBIAN_FRONTEND=noninteractive \ TERM=linux RUN dpkg -l | grep uhd RUN add-apt-repository -s ppa:gnuradio/gnuradio-releases && \ add-apt-repository -s ppa:ettusresearch/uhd && \ apt update && \ apt build-dep -qy \ gnuradio ARG GR_VERSION=v3.8.1.0 WORKDIR /opt/ RUN git clone https://github.com/gnuradio/gnuradio -b $GR_VERSION \ --single-branch --recurse-submodules RUN cd gnuradio && mkdir build && cd build && \ cmake .. && make -j24 # RUN cd gnuradio/build && make test -j24 RUN cd gnuradio/build && make install -j24 RUN ldconfig #set envs RUN export "GNURADIO_PREFIX=$(gnuradio-config-info --prefix)" >> /root/.bashrc && \ echo "export PYTHONPATH=$GNURADIO_PREFIX/lib/python3/dist-packages:$GNURADIO_PREFIX/lib/python3/site-packages:$PYTHONPATH" >> /root/.bashrc && \ echo "export LD_LIBRARY_PATH=$GNURADIO_PREFIX/lib:$LD_LIBRARY_PATH" >> /root/.bashrc #remove source RUN rm -rf /opt/gnuradio #clean up RUN rm -f /etc/apt/apt.conf.d/01proxy && \ rm -rf /var/lib/apt/lists/* && \ apt clean && \ apt autoclean
Baseline CUDA
The cuda baseline image is meant to be run on the cosmos server machines containing V100 GPUs. It is built with Nvidia drivers for the GPUs and CUDA libraries for general purpose GPU programming. The baseline image is built with driver version 410.104 with cuda 10.0 libraries.
If you would like to create a cuda image using different versions of either the drivers or cuda, you can do so by starting with the baseline_1804 image.
- Select the driver version you need from the Nvidia Driver Downloads Page. Be sure to specify the product type as "Tesla" and the product series as "V-Series". Click download and then on the following page, right click on the "agree & download" and copy the link address. On the node, use wget or curl to download the link you copied.
- "dpkg -i nvidia-diag-driver-local-repo-ubuntu1804-410.104_1.0-1_amd64.deb" note: you may be asked to add a gpg key during the installation process. Use the command that is given.
- "apt-get update"
- "apt-get install cuda-drivers"
- log out of the node and use omf tell to turn it off and on again. When you log back into the node, running lsmod should demonstrate that the nvidia drivers have been loaded.
- Select the version of cuda you need from the cuda toolkit archive, then choose your operating system (Linux), architecture (x86_64), distribution (Ubuntu), and version (18.04). Choose "deb(local)" as the installer type. Again, copy the download link and use wget to download it onto the node. You can then follow the installation instructions on the download page.
- To verify your cuda installation, you can build and run some of the cuda samples. They'll be found in /usr/local/cuda/samples.
- You will also have to add the directory of cuda binaries to the path. Edit the .profile file and add 'PATH="PATH:/usr/local/cuda/bin"'
Baseline Tensorflow
TODO
Baseline Pytorch
TODO
Advanced
Image CI Pipieline
Documentation TODO
Building a baseline image
- Use pxe or usb to install ubuntu netinstall iso
- Start it up, run update and dist-upgrade
- set netplan.io to dhcp on all physical ethernet interfaces
- add dhclient-exit-hook/hostname to dynamically set hostname based on DHCP
- add prepare.sh script to generalize prior to saving images