Deploy your containerized AI applications with nvidia-docker

Additional and far more goods and services are getting edge of the modeling and prediction abilities of AI. This article presents the nvidia-docker instrument for integrating AI (Artificial Intelligence) application bricks into a microservice architecture. The main edge explored right here is the use of the host system’s GPU (Graphical Processing Unit) sources to speed up many containerized AI apps.

To understand the usefulness of nvidia-docker, we will begin by describing what kind of AI can advantage from GPU acceleration. Next we will existing how to implement the nvidia-docker tool. Finally, we will explain what resources are accessible to use GPU acceleration in your purposes and how to use them.

Why using GPUs in AI programs?

In the area of synthetic intelligence, we have two key subfields that are utilised: machine finding out and deep understanding. The latter is element of a larger relatives of machine learning techniques based mostly on synthetic neural networks.

In the context of deep mastering, in which operations are primarily matrix multiplications, GPUs are much more effective than CPUs (Central Processing Models). This is why the use of GPUs has developed in new several years. In fact, GPUs are regarded as the heart of deep learning due to the fact of their massively parallel architecture.

On the other hand, GPUs can not execute just any program. Without a doubt, they use a precise language (CUDA for NVIDIA) to take benefit of their architecture. So, how to use and converse with GPUs from your apps?

The NVIDIA CUDA engineering

NVIDIA CUDA (Compute Unified Gadget Architecture) is a parallel computing architecture put together with an API for programming GPUs. CUDA translates software code into an instruction set that GPUs can execute.

A CUDA SDK and libraries this sort of as cuBLAS (Primary Linear Algebra Subroutines) and cuDNN (Deep Neural Network) have been formulated to talk effortlessly and competently with a GPU. CUDA is accessible in C, C++ and Fortran. There are wrappers for other languages which includes Java, Python and R. For illustration, deep learning libraries like TensorFlow and Keras are dependent on these systems.

Why employing nvidia-docker?

Nvidia-docker addresses the desires of builders who want to increase AI functionality to their apps, containerize them and deploy them on servers powered by NVIDIA GPUs.

The aim is to set up an architecture that permits the progress and deployment of deep understanding styles in services readily available via an API. As a result, the utilization fee of GPU methods is optimized by generating them accessible to numerous application circumstances.

In addition, we profit from the positive aspects of containerized environments:

  • Isolation of cases of each individual AI model.
  • Colocation of many models with their particular dependencies.
  • Colocation of the exact same product underneath several versions.
  • Steady deployment of models.
  • Model functionality monitoring.

Natively, using a GPU in a container calls for putting in CUDA in the container and giving privileges to access the device. With this in intellect, the nvidia-docker device has been formulated, letting NVIDIA GPU equipment to be exposed in containers in an isolated and secure way.

At the time of creating this short article, the most recent edition of nvidia-docker is v2. This variation differs enormously from v1 in the pursuing ways:

  • Variation 1: Nvidia-docker is executed as an overlay to Docker. That is, to create the container you had to use nvidia-docker (Ex: nvidia-docker operate ...) which performs the steps (between other people the development of volumes) enabling to see the GPU equipment in the container.
  • Model 2: The deployment is simplified with the substitution of Docker volumes by the use of Docker runtimes. Certainly, to launch a container, it is now important to use the NVIDIA runtime through Docker (Ex: docker run --runtime nvidia ...)

Observe that because of to their distinct architecture, the two versions are not appropriate. An application written in v1 should be rewritten for v2.

Location up nvidia-docker

The necessary factors to use nvidia-docker are:

  • A container runtime.
  • An offered GPU.
  • The NVIDIA Container Toolkit (primary component of nvidia-docker).



A container runtime is expected to run the NVIDIA Container Toolkit. Docker is the recommended runtime, but Podman and containerd are also supported.

The formal documentation provides the set up technique of Docker.


Motorists are needed to use a GPU unit. In the case of NVIDIA GPUs, the motorists corresponding to a offered OS can be acquired from the NVIDIA driver download web site, by filling in the data on the GPU design.

The installation of the motorists is finished by way of the executable. For Linux, use the pursuing commands by replacing the identify of the downloaded file:

chmod +x

Reboot the host equipment at the end of the installation to choose into account the mounted motorists.

Putting in nvidia-docker

Nvidia-docker is readily available on the GitHub project web page. To set up it, stick to the set up guide depending on your server and architecture specifics.

We now have an infrastructure that makes it possible for us to have isolated environments providing entry to GPU methods. To use GPU acceleration in apps, a number of applications have been created by NVIDIA (non-exhaustive listing):

  • CUDA Toolkit: a established of applications for building computer software/systems that can complete computations utilizing each CPU, RAM, and GPU. It can be used on x86, Arm and Electricity platforms.
  • NVIDIA cuDNN]( a library of primitives to speed up deep discovering networks and enhance GPU efficiency for important frameworks these types of as Tensorflow and Keras.
  • NVIDIA cuBLAS: a library of GPU accelerated linear algebra subroutines.

By applying these instruments in software code, AI and linear algebra tasks are accelerated. With the GPUs now noticeable, the software is in a position to ship the information and functions to be processed on the GPU.

The CUDA Toolkit is the least expensive amount solution. It provides the most control (memory and instructions) to develop custom made purposes. Libraries provide an abstraction of CUDA operation. They permit you to focus on the software improvement instead than the CUDA implementation.

When all these aspects are executed, the architecture working with the nvidia-docker company is all set to use.

Here is a diagram to summarize every little thing we have viewed:



We have set up an architecture allowing for the use of GPU methods from our programs in isolated environments. To summarize, the architecture is composed of the adhering to bricks:

  • Running process: Linux, Windows …
  • Docker: isolation of the setting utilizing Linux containers
  • NVIDIA driver: installation of the driver for the components in concern
  • NVIDIA container runtime: orchestration of the preceding three
  • Purposes on Docker container:
    • CUDA
    • cuDNN
    • cuBLAS
    • Tensorflow/Keras

NVIDIA continues to create instruments and libraries all around AI systems, with the objective of developing alone as a leader. Other technologies may well complement nvidia-docker or may well be far more ideal than nvidia-docker dependent on the use circumstance.

Source website link