kuroの覚え書き

96の個人的覚え書き

Deep learningをDockerで構築

今どきDockerくらい使えないと、というわけですよ。

Install Docker Engine on Ubuntu | Docker Documentation
apt パッケージを更新し、必要なパッケージをインストール

$ sudo apt-get update
$ sudo apt-get -y install curl \
    apt-transport-https \
    ca-certificates \
    gnupg-agent \
    software-properties-common

Docker 公式の GPG 公開鍵をインストール

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88

repository (stable) を追加

$ sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) \
    stable"

apt パッケージを更新し、最新版をインストール

$ sudo apt-get update
$ sudo apt-get -y install docker-ce docker-ce-cli containerd.io

現在のユーザーをdockerグループに追加しておく。

$ sudo gpasswd -a $USER docker

Ubuntu 20.04上のdockerでGPUを使うために、NVIDIA Container Toolkitをインストールする。
GitHub - NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

ここでdocker上でnvidia-smiが実行できるかテストしてみると

$ sudo docker run --gpus all --rm nvidia/cuda nvidia-smi
Unable to find image 'nvidia/cuda:latest' locally
docker: Error response from daemon: manifest for nvidia/cuda:latest not found: manifest unknown: manifest unknown.
See 'docker run --help'.

と、エラーが出て動かない。うーむ。

追記

$ docker run --gpus all --rm nvidia/cuda:11.0-base nvidia-smi
Unable to find image 'nvidia/cuda:11.0-base' locally
11.0-base: Pulling from nvidia/cuda
54ee1f796a1e: Pull complete 
f7bfea53ad12: Pull complete 
46d371e02073: Pull complete 
b66c17bbf772: Pull complete 
3642f1a6dfb3: Pull complete 
e5ce55b8b4b9: Pull complete 
155bc0332b0a: Pull complete 
Digest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67a
Status: Downloaded newer image for nvidia/cuda:11.0-base
Sun Apr 25 12:51:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 50%   54C    P8    N/A /  N/A |    294MiB /   980MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

動いた。

NVIDIA NGCのイメージを使ってみる。

$ docker run --gpus all --rm -it nvcr.io/nvidia/tensorflow:20.09-tf1-py3

nvidiaのドライバが450.119.03なので20.09にしてみた。

Unable to find image 'nvcr.io/nvidia/tensorflow:20.09-tf1-py3' locally
20.09-tf1-py3: Pulling from nvidia/tensorflow
f08d8e2a3ba1: Pulling fs layer 
3baa9cb2483b: Pulling fs layer 
94e5ff4c0b15: Pulling fs layer 
1860925334f9: Pulling fs layer 
c6b364205fad: Pulling fs layer 
ffcd5dc3448d: Pulling fs layer 
13cf13e5ce72: Pulling fs layer 
7202bec79e41: Pulling fs layer 
fdfbe893941b: Pulling fs layer 
f73bfa0e0e17: Pulling fs layer 
36aade146566: Pulling fs layer 
0cf8254e1bfe: Pulling fs layer 
40ff6c34e5e5: Pulling fs layer 
0adeec2cfe74: Pulling fs layer 
895d871af5fd: Pulling fs layer 
71c97f6ac83c: Pulling fs layer 
281aa21cb812: Pulling fs layer 
9c7e46bb4080: Pulling fs layer 
150dfb1677cd: Pulling fs layer 
3ce488e63cd3: Pulling fs layer 
0f6e3807a6dc: Pulling fs layer 
3585c705a7d0: Pulling fs layer 
de81ad699822: Pulling fs layer 
bb1a224031d9: Pulling fs layer 
de7308abc9b3: Pulling fs layer 
07620b1781c2: Pulling fs layer 
dbc3331c85c9: Pulling fs layer 
c863ab4a3ce5: Pulling fs layer 
2cb780dadd08: Pulling fs layer 
72698521ce7a: Pulling fs layer 
1860925334f9: Waiting 
c6b364205fad: Waiting 
13cf13e5ce72: Waiting 
c4f3861fc440: Pulling fs layer 
7202bec79e41: Waiting 
c8dbc3fd23eb: Pulling fs layer 
fdfbe893941b: Waiting 
f73bfa0e0e17: Waiting 
cebab9392074: Pulling fs layer 
098124793456: Pulling fs layer 
be7876894a57: Pulling fs layer 
8f6cac9eb6f5: Pull complete 
8794953727b3: Pull complete 
66611ff5ae22: Pull complete 
052da93182d9: Pull complete 
19ab74a7714f: Pull complete 
10fb2f25565b: Pull complete 
99d96c644f99: Pull complete 
e04d68703197: Pull complete 
54b734d972b3: Pull complete 
8737a875ce8c: Pull complete 
66447294ec52: Pull complete 
bff8468ce910: Pull complete 
102507e7b013: Pull complete 
ad60ed3798eb: Pull complete 
7d1ebbc9228a: Pull complete 
6513260fcbe9: Pull complete 
8aaacd84798e: Pull complete 
7961f1c63d21: Pull complete 
c079890a79ca: Pull complete 
1aef1f3f370b: Pull complete 
1db61ffb4058: Pull complete 
30ab2ccfcdb1: Pull complete 
ee27d709b773: Pull complete 
Digest: sha256:e3db261638dc0283bd87d27b59be5731d2298604b44ec8bc81ab3f8e9128b6af
Status: Downloaded newer image for nvcr.io/nvidia/tensorflow:20.09-tf1-py3
                                                                                                                                                
================
== TensorFlow ==
================

NVIDIA Release 20.09-tf1 (build 16003718)
TensorFlow Version 1.15.3

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: Detected NVIDIA GeForce GT 710 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

root@03d78aedf70c:/workspace#  python
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-04-25 13:14:16.276946: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
>>> print(tf.__version__)
1.15.3
>>> 

イケてるのかな。
イケてないね。

ERROR: Detected NVIDIA GeForce GT 710 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

とな。GT710は対応してないのか。あかんやん。