Deep Learning再び - kuroの覚え書き

しばらく中断していたDeep Learningをまた再開しようと思う。
なんか、いろいろやっているうちに環境を壊してしまったので、再構築からスタート。
毎回、環境構築で時間がかかって、それだけで疲れてしまって肝心のデータ解析までできてないような。

この分野の進展は超早いので、1年前に買った本の情報などはかなり古くなっていて、インストールの仕方なんかもすっかり洗練されていたりするみたいだ。

ということで現時点でのインストール関係の覚書。
とりあえずこれまでの反省に立って、Linuxの素の状態へのインストールはトラブったときにいろいろ面倒なので仮想環境で構築することにする。

KVMで仮想環境をCentOS7上に構築し、ゲストOSとしてUbuntu18.04を入れることにする。
仮想環境にsshでログインして以下インストール作業をする。

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

$ python3 -V
Python 3.6.9

$ python3 -m pip install tensorflow

$ python3 -m pip list
Package               Version
--------------------- -------------------
absl-py               0.12.0
apturl                0.5.2
asn1crypto            0.24.0
astor                 0.8.1
Brlapi                0.6.6
cached-property       1.5.2
certifi               2018.1.18
chardet               3.0.4
click                 6.7
colorama              0.3.7
command-not-found     0.3
cryptography          2.1.4
cupshelpers           1.0
cycler                0.10.0
decorator             4.1.2
defer                 1.0.6
distro-info           0.18ubuntu0.18.04.1
gast                  0.4.0
google-pasta          0.2.0
grpcio                1.36.1
h5py                  3.1.0
httplib2              0.9.2
idna                  2.6
importlib-metadata    3.8.0
Keras-Applications    1.0.8
Keras-Preprocessing   1.1.2
keyring               10.6.0
keyrings.alt          3.0
language-selector     0.1
launchpadlib          1.10.6
lazr.restfulclient    0.13.5
lazr.uri              1.0.3
louis                 3.5.0
macaroonbakery        1.1.3
Mako                  1.0.7
Markdown              3.3.4
MarkupSafe            1.0
matplotlib            2.1.1
netifaces             0.10.4
numpy                 1.19.5
oauth                 1.0.1
olefile               0.45.1
pexpect               4.2.1
Pillow                5.1.0
pip                   21.0.1
protobuf              3.15.6
pycairo               1.16.2
pycrypto              2.6.1
pycups                1.9.73
pydot-ng              2.0.0
pygobject             3.26.1
pymacaroons           0.13.0
PyNaCl                1.1.2
pyparsing             2.4.7
pyRFC3339             1.0
python-apt            1.6.5+ubuntu0.5
python-dateutil       2.6.1
python-debian         0.1.32
pytz                  2018.3
pyxdg                 0.25
PyYAML                3.12
reportlab             3.4.0
requests              2.18.4
requests-unixsocket   0.1.5
scipy                 0.19.1
SecretStorage         2.3.1
setuptools            54.2.0
simplejson            3.13.2
six                   1.15.0
ssh-import-id         5.7
system-service        0.3
systemd-python        234
tensorboard           1.14.0
tensorflow            1.14.0
tensorflow-estimator  1.14.0
termcolor             1.1.0
typing-extensions     3.7.4.3
ubuntu-drivers-common 0.0.0
ufw                   0.36
unattended-upgrades   0.1
urllib3               1.22
usb-creator           0.3.3
wadllib               1.3.2
Werkzeug              1.0.1
wheel                 0.36.2
wrapt                 1.12.1
xkit                  0.0.0
zipp                  3.4.1
zope.interface        4.3.2

おや？tensorflowが1.14しか入らないよ？今2.0以上に行っているはずなんだが。

どうやらpipのバージョンが古いようだ。

$ python3 -m pip install pip

$ python3 -m pip install --upgrade tensorflow

$ python3 -m pip list 
......
tensorboard            2.4.1
tensorboard-plugin-wit 1.8.0
tensorflow             2.4.1
tensorflow-estimator   2.4.0
......

$ python3 -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2021-03-28 01:24:30.266613: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-28 01:24:30.266703: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-03-28 01:24:32.709611: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-28 01:24:32.709778: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-03-28 01:24:32.709798: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-03-28 01:24:32.709825: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (kkuro-KVM): /proc/driver/nvidia/version does not exist
2021-03-28 01:24:32.710260: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-28 01:24:32.716037: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
tf.Tensor(-1853.1768, shape=(), dtype=float32)

これでよし。cudaにエラーが出ているが、このあと入れていく。

# Add NVIDIA package repositories
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
$ sudo apt-get update

$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

$ sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
$ sudo apt-get update

# Install NVIDIA driver
$ sudo apt-get install --no-install-recommends nvidia-driver-450
# Reboot. Check that GPUs are visible using the command: nvidia-smi
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

おや？なんで？

今日は時間切れ。