kuroの覚え書き

96の個人的覚え書き

AlphaFold2のその後2.1

前回、どうもTensorflowのバージョンがなんかコンフリクトしているっぽく、自前で環境インストールしたのがまずかったのかも、と思ったので、一旦Dockerに戻ってみた。Docker自体にエラーの原因はないと思うし、Dockerの中ならバージョンが合わないということもないだろうと、

・・・・
I0827 15:11:45.201521 139638022334272 run_docker.py:193] I0827 06:11:45.200996 139677094938432 pipeline.py:207] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0827 15:11:45.240777 139638022334272 run_docker.py:193] I0827 06:11:45.240107 139677094938432 run_alphafold.py:142] Running model model_1
I0827 15:11:51.314033 139638022334272 run_docker.py:193] I0827 06:11:51.313426 139677094938432 model.py:132] Running predict with shape(feat) = {'aatype': (4, 177), 'residue_index': (4, 177), 'seq_length': (4,), 'template_aatype': (4, 4, 177), 'template_all_atom_masks': (4, 4, 177, 37), 'template_all_atom_positions': (4, 4, 177, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 177), 'msa_mask': (4, 508, 177), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 177, 3), 'template_pseudo_beta_mask': (4, 4, 177), 'atom14_atom_exists': (4, 177, 14), 'residx_atom14_to_atom37': (4, 177, 14), 'residx_atom37_to_atom14': (4, 177, 37), 'atom37_atom_exists': (4, 177, 37), 'extra_msa': (4, 5120, 177), 'extra_msa_mask': (4, 5120, 177), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 177), 'true_msa': (4, 508, 177), 'extra_has_deletion': (4, 5120, 177), 'extra_deletion_value': (4, 5120, 177), 'msa_feat': (4, 508, 177, 49), 'target_feat': (4, 177, 22)}
I0827 15:11:52.923670 139638022334272 run_docker.py:193] 2021-08-27 06:11:52.922207: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:235] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.6
I0827 15:11:52.923956 139638022334272 run_docker.py:193] 2021-08-27 06:11:52.922279: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:238] Used ptxas at ptxas
I0827 15:11:52.979863 139638022334272 run_docker.py:193] 2021-08-27 06:11:52.979194: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:625] failed to get PTX kernel "shift_right_logical_3" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I0827 15:11:52.990681 139638022334272 run_docker.py:193] 2021-08-27 06:11:52.990152: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Internal: Could not find the corresponding function
I0827 15:11:53.107557 139638022334272 run_docker.py:193] Traceback (most recent call last):
I0827 15:11:53.107832 139638022334272 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 310, in <module>
I0827 15:11:53.108034 139638022334272 run_docker.py:193] app.run(main)
I0827 15:11:53.108211 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0827 15:11:53.108422 139638022334272 run_docker.py:193] _run_main(main, args)
I0827 15:11:53.108574 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0827 15:11:53.108716 139638022334272 run_docker.py:193] sys.exit(main(argv))
I0827 15:11:53.108857 139638022334272 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 292, in main
I0827 15:11:53.109008 139638022334272 run_docker.py:193] random_seed=random_seed)
I0827 15:11:53.109150 139638022334272 run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 149, in predict_structure
I0827 15:11:53.109290 139638022334272 run_docker.py:193] prediction_result = model_runner.predict(processed_feature_dict)
I0827 15:11:53.109431 139638022334272 run_docker.py:193] File "/app/alphafold/alphafold/model/model.py", line 133, in predict
I0827 15:11:53.109569 139638022334272 run_docker.py:193] result = self.apply(self.params, jax.random.PRNGKey(0), feat)
I0827 15:11:53.109706 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 75, in PRNGKey
I0827 15:11:53.109843 139638022334272 run_docker.py:193] k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32)))
I0827 15:11:53.109997 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 386, in shift_right_logical
I0827 15:11:53.110138 139638022334272 run_docker.py:193] return shift_right_logical_p.bind(x, y)
I0827 15:11:53.110274 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 265, in bind
I0827 15:11:53.110411 139638022334272 run_docker.py:193] out = top_trace.process_primitive(self, tracers, params)
I0827 15:11:53.110548 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 610, in process_primitive
I0827 15:11:53.110686 139638022334272 run_docker.py:193] return primitive.impl(*tracers, **params)
I0827 15:11:53.110825 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 274, in apply_primitive
I0827 15:11:53.110984 139638022334272 run_docker.py:193] return compiled_fun(*args)
I0827 15:11:53.111125 139638022334272 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 390, in _execute_compiled_primitive
I0827 15:11:53.111263 139638022334272 run_docker.py:193] out_bufs = compiled.execute(input_bufs)
I0827 15:11:53.111400 139638022334272 run_docker.py:193] RuntimeError: Internal: Could not find the corresponding function


だめだ〜
ただ、なんか言っていることはさっきとは違うみたいなんで、そのへんを詳しく分解していくしかあるまい。
ちなみに今回、アミノ酸サイズが小さいタンパク質を投げてみたが、かかった時間はやはり6時間位で殆ど変わらない。
これはどうもタンパク質の長さが問題ではなくて比較対象となるデータベースの読み込みに時間がかかっているのだろうか。
だからこそのSSD推奨なんだろうな。3TBのSSDとか幾らするんだ?1.5Tをストライピングするにしても。

そこまでするならインチキサーバじゃなくて1台組み上げたほうが早いような・・・
マザボ(ある程度枯れたシステムにしたほうがたぶんいい)
CPU(AMDに浮気するか?)
メモリ(128Gくらい?)
SSD (3TB)
ケース(古いminiATXケース流用でもいいか)
があれば完璧。

あとは、エラーが出る前までのところをスキップしてできるようにスクリプトをどうにかするか。