Deep learningをDockerで構築


Install Docker Engine on Ubuntu | Docker Documentation
apt パッケージを更新し、必要なパッケージをインストール

$ sudo apt-get update
$ sudo apt-get -y install curl \
    apt-transport-https \
    ca-certificates \
    gnupg-agent \

Docker 公式の GPG 公開鍵をインストール

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88

repository (stable) を追加

$ sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) \

apt パッケージを更新し、最新版をインストール

$ sudo apt-get update
$ sudo apt-get -y install docker-ce docker-ce-cli containerd.io


$ sudo gpasswd -a $USER docker

Ubuntu 20.04上のdockerでGPUを使うために、NVIDIA Container Toolkitをインストールする。
GitHub - NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker


$ sudo docker run --gpus all --rm nvidia/cuda nvidia-smi
Unable to find image 'nvidia/cuda:latest' locally
docker: Error response from daemon: manifest for nvidia/cuda:latest not found: manifest unknown: manifest unknown.
See 'docker run --help'.



$ docker run --gpus all --rm nvidia/cuda:11.0-base nvidia-smi
Unable to find image 'nvidia/cuda:11.0-base' locally
11.0-base: Pulling from nvidia/cuda
54ee1f796a1e: Pull complete 
f7bfea53ad12: Pull complete 
46d371e02073: Pull complete 
b66c17bbf772: Pull complete 
3642f1a6dfb3: Pull complete 
e5ce55b8b4b9: Pull complete 
155bc0332b0a: Pull complete 
Digest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67a
Status: Downloaded newer image for nvidia/cuda:11.0-base
Sun Apr 25 12:51:06 2021       
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 50%   54C    P8    N/A /  N/A |    294MiB /   980MiB |     N/A      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |


NVIDIA NGCのイメージを使ってみる。

$ docker run --gpus all --rm -it nvcr.io/nvidia/tensorflow:20.09-tf1-py3


Unable to find image 'nvcr.io/nvidia/tensorflow:20.09-tf1-py3' locally
20.09-tf1-py3: Pulling from nvidia/tensorflow
Digest: sha256:e3db261638dc0283bd87d27b59be5731d2298604b44ec8bc81ab3f8e9128b6af
Status: Downloaded newer image for nvcr.io/nvidia/tensorflow:20.09-tf1-py3
== TensorFlow ==

NVIDIA Release 20.09-tf1 (build 16003718)
TensorFlow Version 1.15.3

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: Detected NVIDIA GeForce GT 710 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

root@03d78aedf70c:/workspace#  python
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-04-25 13:14:16.276946: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
>>> print(tf.__version__)


ERROR: Detected NVIDIA GeForce GT 710 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container