なんといってもまあ、CUDAコアが128コアしかないわけで、きっと学習には不向きだろうよ、という予想はたつがとりあえずmnistで試してみよう。
まずリファレンスとしてCUDAコア192のGT710/xeon e3-1330v6/64GBのCentOS7サーバでやってみる。
ソフトウェア環境としてはDocker上でtensorflow環境を構築し
$ docker run --gpus all -it --rm --name tensorflow-gpu -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
Jupiter notebookで
import tensorflow as tf import time mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) time_start = time.time() model.fit(x_train, y_train, batch_size=1024, epochs=20) model.evaluate(x_test, y_test) time_end = time.time() print("time:",(time_end - time_start))
こんなスクリプトを走らせる。
Train on 60000 samples Epoch 1/20 60000/60000 [==============================] - 1s 24us/sample - loss: 0.5912 - accuracy: 0.8328 Epoch 2/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.2486 - accuracy: 0.9295 Epoch 3/20 60000/60000 [==============================] - 1s 21us/sample - loss: 0.1904 - accuracy: 0.9465s - loss: 0.200 Epoch 4/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.1543 - accuracy: 0.9566 Epoch 5/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.1296 - accuracy: 0.9642 Epoch 6/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.1108 - accuracy: 0.9690 Epoch 7/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0968 - accuracy: 0.9725 Epoch 8/20 60000/60000 [==============================] - 1s 21us/sample - loss: 0.0848 - accuracy: 0.9763 Epoch 9/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0752 - accuracy: 0.9790 Epoch 10/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0679 - accuracy: 0.9811 Epoch 11/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0601 - accuracy: 0.9837s - loss: 0.0606 - accuracy: Epoch 12/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0554 - accuracy: 0.9838 Epoch 13/20 60000/60000 [==============================] - 1s 21us/sample - loss: 0.0496 - accuracy: 0.9863 Epoch 14/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0452 - accuracy: 0.9876 Epoch 15/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0418 - accuracy: 0.9884 Epoch 16/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0375 - accuracy: 0.9895 Epoch 17/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0346 - accuracy: 0.9906s - loss: 0.0346 - accuracy: 0.99 Epoch 18/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0325 - accuracy: 0.9910 Epoch 19/20 60000/60000 [==============================] - 1s 22us/sample - loss: 0.0293 - accuracy: 0.9926s - loss: 0 Epoch 20/20 60000/60000 [==============================] - 1s 21us/sample - loss: 0.0279 - accuracy: 0.9928 10000/10000 [==============================] - 1s 57us/sample - loss: 0.0600 - accuracy: 0.9814 time: 27.059566020965576
初回はデータのダウンロードが最初に入るが、タイムはそれらが終わってからの計測になっている。
27.06秒ね。
同じようにJetson nano B01で
$ docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-ml:r32.5.0-py3
とDocker環境を作って、おなじスクリプトを走らせると
time: 44.97859978675842
44.98秒と。
GT710 のほうがコアが1.5倍あって速度は1.66倍高速という当たり前のような結果となった。
最底辺のGT710にすら太刀打ちできないようではやはり学習には使い物にならない。
推論に専念してもらうことにしよう。
GT710もやっぱり使い物にならないので、もうちょっとまともなグラボをどうにかしたい。