docker-overview

Overview of the Docker container system

Sheffield R meetup - 6th March 2018

Longer version of materials prepared for CRUK Cambridge available here

Mark Dunning (@DrMarkDunning) Bioinformatics Core Director

Sheffield Bioinformatics Core

web : sbc.shef.ac.uk
twitter: @SheffBioinfCore
email: bioinformatics-core@sheffield.ac.uk

Basics

https://docs.docker.com/engine/docker-overview

Docker is an open platform for developers to build and ship applications, whether on laptops, servers in a data center, or the cloud.

Installing Docker

Mac

Windows

(may require some messing around with virtualisation or Hyper-V)

Once you have installed Docker using the insructions above, you can open a terminal (Mac) or command prompt (Windows) and run the following to download an image for the Ubuntu operating system from Dockerhub;

docker pull ubuntu

To run a command inside this new environment software we can do;

docker run ubuntu echo "Hello World"

:tada::tada:

To use the container in interactive mode we have to specify a -it argument. Which basically means that it doesn’t exit straight away, but instead runs the bash command to get a terminal prompt

docker run -it --rm ubuntu

Volumes in Docker

You’ll notice that when you launch a container, you don’t automatically have access to the files on your OS. In Docker, we can mount volumes using the -v argument to make files accessible e.g. -v /PATH/TO/YOUR/data:/data inside the container.

## should say that no file or directory exists
docker run --rm ubuntu ls /data

## If on Windows, need correct path separator
docker run --rm -v c:\work:/data ubuntu ls /data

## On Unix it would be something more sensible, like

docker run --rm -v c/home/USER/work:/data ubuntu ls /data

Running R (and RStudio) through Docker

The latest version of R and R devel are provided by the rocker project https://github.com/rocker-org/rocker

docker run --rm -it r-base R

For latest developmental version of R:-

docker run --rm -it r-devel R

Can also get previous versions of R

RStudio is also supported. See https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image

docker run -p 8787:8787 rocker/rstudio

You can install whatever R packages you need in this container and analyse your data

N.B. Python fans needn’t feel left out; there are docker containers for jupyter too.

Once a docker container has quit, you can jump back in with docker start and docker attach

docker ps ##not name of container that just quit
docker start <name-of-container-that-just-exited>
docker attach <name-of-container-that-just-exited>

You can then build a new image

docker commit <name-of-container-that-just-exited> <new image>

There may already be a docker container for popular sets of tools

The Dockerfile

The creation of Docker images is specified by a Dockerfile. This is a text file containing the sequence of instructions required to re-create your image from some starting point, which could be the standard Ubuntu image. Essentially we list the commands we use, step-by-step to install all the software required. If you already have a shell script to install your software, then translating to a Dockerfile is relatively painless.

A useful reference is the official Docker documentation on Dockerfiles, which goes into far more detail than we will here.

The example below shows the Dockerfile used to create a Ubuntu image use git to clone a repository and install some packages

FROM ubuntu
MAINTAINER YOU NAME<your.name@sheffield.ac.uk>
RUN apt-get update
RUN apt-get install -y wget build-essential git
RUN git clone.....
RUN R -e 'install.packages(....)'

The docker build command will build a new image from a Dockerfile. With docker push you can distribute this on dockerhub once you have a user name.

docker build -t=my_username/my_new_image .
docker push

Use Case 1:- Distributing software for a training course

Several headaches can emerge when preparing the materials for a training course

docker run --rm -p 8787:8787 markdunning/cancer-genome-toolkit

Use Case 2:- Distributing supplementary data for a publication

docker run -d -p 8787:8787 sje30/waverepo

The elephant in the room…

Sounds great so far! But…

There is an alternative….

Singularity