The 'Bring Your Own Container' (BYOC) Model

27 Apr 2024

research

1. Introduction

In this article, I will outline a model for managing a shared computer environment and how to create reproducible environments, especially useful within a research group. We have been testing this model for the last two years at the Robot Manipulation Lab at the University of Leeds with our robot manipulator with great success. I refer to this model as the “Bring Your Own Container” (BYOC) model.

At its essence, the BYOC model relies on a set of constraints imposed on the users of the system. Namely, everyone is a user without elevated permissions, and everyone works inside isolated containers. Each container self-contains a particular software for a specific task. For example, a pose estimation system like DOPE would be self-contained in one container, while a motion planning stack would be in another container.

This article aims to encourage people to adopt a similar philosophy and create systems that are versatile and reproducible. The article includes, I hope, insightful notes on how to set up such a system.

Today I will discuss a lesser-known container software called Apptainer/Singularity, and explain why we use it in our robotics research group over other solutions. These notes, however, are not specific to robotics or research. A similar configuration can be adopted for any other situation that requires sharing a computer among users, when those users require elevated permissions.

1. Background and why one should consider the BYOC model

During my PhD, while working on shared computers (such as a robot computer or a computer with a powerful GPU for machine learning tasks), I often encountered installation conflicts, also known as dependency hell. In several cases, I had to reformat the computer. When multiple users with elevated permissions (i.e., sudo) work on the same computer and have different requirements, issues can arise. This is due to the difficulty in predicting how one change may impact other environments and users.

I encountered dependency issues even when working alone on my own computer. Installing a package or building one from source often required a specific version of a library, leading to conflicts with other dependencies. This resulted in my system becoming broken over time.

Containers to the rescue

While working as a software developer in industry in 2016, I was introduced to Docker containers by my colleague David. Although I did not use them at that time, I understood their benefits and the issues they addressed. When I began my PhD, the University of Leeds provided HPC (High-Performance Computing) facilities that mandated the use of Singularity containers (now known as Apptainer containers) for code deployment. We also utilised Apptainer containers for teaching robotics at the school, specifically to have ROS installed in an Ubuntu environment.

I started testing the “containerisation” of my research on my personal computer. At the same time, another PhD student, Logan, was also experimenting with these containers on his personal computer, so we gradually became the go-to people in the lab for containerising software. Since then, I have started “containerising” parts of my research to make it easily accessible to the community. It was extremely helpful to have different software running smoothly on my computer without causing any issues, and being able to recreate that environment whenever needed, even on a new machine. Sharing my code and software with others was also easy.

Fast-forward to today, when I started my post-doc in 2021, Mehmet and I decided to test the philosophy of containerisation for shared computers in order to create a more versatile environment for everyone.

2. The BYOC (Bring Your Own Container) Model

When I started my post-doc, I had the opportunity to oversee and influence the development and integration of the robotic system. I was responsible for architecting the entire system to support our research for the next 5 years.

From the beginning, my goal was to create a versatile shared environment for everyone. We decided to implement certain policies for users and create a system that would not deteriorate over time. The BYOC model involves imposing constraints on users and utilising containers to install and manage software.

2.1 Operating System

The assumption here is that you are using a Linux distribution. The specific distribution and version are not important. Containers allow you to install different operating systems on top of a host OS. For example, you could be running Fedora as the host OS, but have Ubuntu 22.04 installed on top of it in a container.

Separating OS files from user files (optional)

Although this step is optional, we have chosen to separate the operating system from the user data. Our servers are equipped with two SSDs: one small but very fast SSD for hosting the operating system(s) files, and another larger SSD (8 TB) for hosting user data in the /home directory. In Linux, you have the ability to specify where the /home directory is stored, even on a separate disk. This allows you to keep most of the OS files on one drive and host the /home directory on another.

I recommend this approach because if you ever need to format the computer to install a newer version of the operating system, your users’ data will not be affected.

2.3 The administrator

The system has one and only one administrator with elevated permissions (root access), and this user is not used on a daily basis by anyone except for essential maintenance tasks such as upgrading packages or installing necessary system-wide software like a web browser, text editor, or utility tools.

Access to the administrator account is limited to few individuals with managing responsibilities (such as PIs, Post-Docs, etc), and they agree not to use the account for daily tasks or install non-essential software for personal use.

2.4 Users

Everyone else, including the system manager, has a standard account without elevated permissions. Users can work in a container with elevated permissions, which cannot in turn harm the host operating system.

With this policy in place, if the system malfunctions, the blame falls on the administrator. It is essential for those managing the system to adhere to the above constraints and refrain from using the administrator account on a daily basis.

2.5 Container Software

There are multiple containers available today, with Docker being the most popular and widely used. While Docker is great for software development, it is not optimised for HPC. For example, running a GPU-intensive simulator with a GUI may not be optimal using Docker. I am aware of Rocker (a Docker derivative aim to tackle this problem), but I have not had the opportunity to test it yet.

We have found Apptainer (previously called Singularity) to be particularly well-suited for robotics, as it is optimised for High-Performance Computing tasks and the processes running in the container are almost “native” to the host OS. This results in minimal impact on performance. We have had great success using Apptainer/Singularity, and in this article, I will discuss Apptainer specifically.

One of the great features of Apptainer is its compatibility with Docker images. You can bootstrap an Apptainer container from a Docker image, even pulling that image from DockerHub.

2.6 Documentations

Another important ingredient for success with such a system is to develop comprehensive documentation of your system. We have two such documents, one for regular users and one for administrators.

User Docs

Most new users who join the group may not be familiar with these concepts and will require an “induction”. Having user documentation in place can be very helpful in making it easier for them to access important knowledge and also in preserving that knowledge over time as people join and leave the group. We host our user documentation on GitHub and use Sphinx with markdown. New users can clone the docs and run make html to build the HTML locally and access the documentation.

Admin Docs

Although not as detailed as the user documentation, an admin documentation could be useful to preserve important details about the system design.

2.7 Enabling remote access to multiple concurrent users

The server we use is very powerful and can accommodate multiple users at the same time, most of the time. We have ensured that users can work on their accounts in their containers remotely, and it is a rule to use it remotely unless there is a need to run things on the real robot. We use VNC to allow remote access to the server.

3. Case Study: containerising a robotic manipulation system

Please note that your users may initially find this counter-intuitive and may experience some friction that may seem unnecessary at first. However, over time, they will become accustomed to this way of working and will see the benefits for themselves.

The robotic system consists of the following components:

  1. A Franka Emika Panda robot.
  2. A ‘Robot Computer’ with Ubuntu and a real-time kernel patch (also known as PREEMPT-RT).
  3. Several powerful servers running Ubuntu/Linux, accessed by multiple users.
  4. Robot sensors and end-effectors.

Each computer has a hostname defined in /etc/hosts. Each computer has a single administrator (not used daily) and multiple users (without elevated permissions) for daily use. All users have their own user space and must work within containers in order to install any software. On the host OS, they can only access basic tools such as a browser and code editor.

The Robot Computer

The robot arm is directly connected to the Robot Computer. This PC has the PREEMPT-RT patch and ROS installed directly. There is a single administrator who is not used on a daily basis, and a number of users without elevated permissions, one for each user of the system. The Robot Computer also serves as the ROS Master of the system.

Each user of the system will have their own account (without elevated permissions) and will need to create their ROS workspace using our fork of franka_ros. Our fork includes a modified launch file with the cobot pump end-effector and also integrates our ROS implementation of the pump.

The Servers

The robot computer is not used frequently as it is headless (accessible via SSH) and is only utilised to initiate the ROS drivers for the robot arm. Users will SSH into it from a server with the sole purpose of starting the robot drivers and initiating a roscore.

For the Panda robot specifically, a user needs to access the “Desk” to enable and unlock the robot. This can only be done on the “Robot Computer,” which is headless. We configured the servers such that the desk can be accessed from a server using the following alias:

alias desk="ssh -YC YOUR_USERNAME@robot-computer firefox https://panda-control -no-remote"

Users will spend most of their time on one of the servers. Containerisation happens on the servers. A user’s “daily routine” may involve the following process:

  1. Logging into the server with their account.
  2. Opening a desk via SSH to turn on the robot (i.e., running desk in the terminal to invoke the alias above).
  3. SSH’ing into the Robot Computer to start the drivers and ROS.
  4. Starting a container for a specific task on the server they are working on, for example:
    1. Starting a container to run the ROS drivers for a sensor like RealSense.
    2. Starting a container to run motion planning libraries.
    3. Starting a container to run their code.

All these containers can communicate with other containers and even computers on the network (including the ‘Robot Computer’) using ROS out-of-the-box, as long as the ROS_MASTER_URI and ROS_IP/ROS_HOSTNAME are correctly set within the .bashrc file of the container.

4. How to build Apptainer containers

I provide a template container repo here. You can use this template repo to start a new container.

You will find the following files:

  1. home/: This directory becomes the “home” directory of the container. It is optional, as you could have the container access the home directory of the host OS. However, it is not recommended because any user-specific changes will affect your host OS and other containers.
  2. scripts/: These are the “recipe” or “definition” files used to build the container:
    1. post_script.sh: This file contains all the commands that you want to run during the build. These commands should be non-interactive, meaning they should not require user input.
    2. run_script.sh: This file contains all the commands that you want to run every time the container is executed. Typically, user-specific commands are included here.
  3. Singularity: This file specifies the base image to pull from DockerHub (e.g., Ubuntu 20.04) as well as other metadata and the path to the post and run scripts.
  4. Makefile: For convenience, this Makefile allows one to build and use the containers. Note certain commands in this file require elevated permissions, however you will see a --fakeroot flag to address this and allow a user to build a container without elevated permissions.
    1. make sandbox-build: Builds a sandbox container (a mutable container).
    2. make sandbox-run: Runs a sandbox container.
    3. make sandbox-write: Runs a sandbox container in write mode, allowing for the installation of new system-wide packages1.
    4. make img-build: Builds an image (immutable container).
    5. make img-run: Runs an image container.

Here is an example of the Singularity file:

bootstrap:docker
From:ubuntu:20.04

%labels

AUTHOR Rafael Papallas (rpapallas.com)

%environment
    export LANG=C.UTF-8
    export LC_ALL=C.UTF-8

%files
  scripts /scripts

%post
  /scripts/post_script.sh

%runscript
  exec /bin/bash "$@" --rcfile /scripts/run_script.sh

Here is a basic version of the post script:

apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y keyboard-configuration
DEBIAN_FRONTEND=noninteractive TZ="Europe/London" apt-get install -y tzdata
apt-get -y upgrade

apt-get install -y \
    wget \
    tmux \
    vim-gtk \
    zip \
    unzip \
    git \
    build-essential \
    pypy \
    cmake \
    curl \
    software-properties-common \
    apt-utils \
    python3-pip \
    ninja-build \
    curl \
    python-is-python3 \
    && apt-get -y autoremove \
    && apt-get clean

# Let's have a custom PS1 to help people realise in which container they are working.
CUSTOM_ENV=/.singularity.d/env/99-zz_custom_env.sh
cat >$CUSTOM_ENV <<EOF
#!/bin/bash
PS1="[CONTAINER_NAME] Singularity> \w \$ "
EOF
chmod 755 $CUSTOM_ENV

It will install essential software such as wget, tmux, zip, git, etc. Commands to install ROS should also be included here. However, it is important to note that every command in this file should not be interactive. For example, when using apt-get install, we include -y to make it non-interactive. Additionally, there is no need for sudo as these commands will be executed by root.

Here is a list of notable containers we have built:

5. Summary and takeaway points

In this article, I outline our philosophy of managing a complex system used by numerous users. The main idea is to isolate each user to their own account, without permissions to modify the system’s state, by requiring them to work within containers in their user space. In detail:

  1. We use Apptainer (previously known as Singularity) containers because they are excellent for HPC and ML tasks and have minimal virtualisation impact, even with the GPU.
  2. No user on the system has elevated permissions, not even the “system manager”.
  3. We have a single administrator account to manage the system, but this account is not used on a daily basis by anyone.
  4. Everyone works within containers they build, allowing them to create reproducible environments.
  5. We enable remote access via VNC so people can work with GUI even remotely and accommodate concurrent connections to the system.

This entire process may seem unnecessary and introduce friction to a group of users; however, it will pay dividends over time for three main reasons:

  1. It will be extremely difficult to break a system that operates in this manner because every user is isolated. As a result, users can rely on a robust system to do their research.
  2. It will instil a useful habit of automating the creation of such software environments through the Apptainer definition file, enabling one to replicate the exact same environment in the future.
  3. It will be easier to share their research in the future by simply sharing their definition files.

  1. This could be a trap. Anything you install in this mode will not be transferable because those commands are not in the definition file. An agreement you need to make with yourself is to install packages and configure the container using the write mode, but always remember to update the post_script.sh file to ensure those changes are available for the next build. The write mode is useful for experimentation when you want to try different packages because changing the definition file and rebuilding is time-consuming. ↩︎