kuroの覚え書き

96の個人的覚え書き

SiGN-SSM

ネットワーク解析がしたいと思い環境構築。結構手こずったので(いつもながら)メモ。

まずはHGCからダウンロード。
MacOSバイナリーとLinuxバイナリーを両方ダウンロードしてみる。

Linux版は解凍して

$ make INSTALLDIR=適当な場所 install

でインストール。ただコピーしてるだけっぽいけど。

$ signssm --threads 8 -n 10 -d 4-12 -o results sample.tsv

というふうにサンプルデータをランしてみる。
チュートリアルにあるようにd=8くらいでBICの値が最小となったので

$ signssm --threads 8 -n 100 -d 8 -o result1 sample.tsv

とやって、できた3つのファイルから次のステップへ進む。

$ sh bin/signssm_plot.sh 3 result1.D008.S001.K.dat sample.tsv output.pdf

とやろうとしたところ、

$ bin/signssm_plot.sh: \u884c 109: gnuplot: コマンドが見つかりません。

と出たので

$ brew install gnuplot
・・・・・・
==> Installing gnuplot dependency: harfbuzz
sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
==> Patching
==> Applying 7c61caa7384e9c3afa0d9237bf6cd303eb5ef3a1.patch
patching file src/meson.build
Hunk #1 succeeded at 397 with fuzz 1 (offset -8 lines).
Hunk #2 succeeded at 406 (offset -8 lines).
Hunk #3 succeeded at 430 (offset -8 lines).
Hunk #4 succeeded at 532 (offset -8 lines).
Hunk #5 succeeded at 618 (offset -8 lines).
==> meson --prefix=/home/linuxbrew/.linuxbrew/Cellar/harfbuzz/2.7.0 --libdir=/home/linuxbrew/.linuxbrew
==> ninja
Last 15 lines from /home/rnaseq/.cache/Homebrew/Logs/harfbuzz/02.ninja:
       ^
cc1plus: some warnings being treated as errors
[147/289] Compiling C++ object src/test-bimap.p/test-bimap.cc.o
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
[148/289] Compiling C object test/api/test-buffer.p/test-buffer.c.o
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
[149/289] Compiling C++ object src/libharfbuzz-subset.so.0.20700.0.p/hb-subset-plan.cc.o
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
[150/289] Compiling C++ object src/libharfbuzz-subset.so.0.20700.0.p/hb-subset.cc.o
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
ninja: build stopped: subcommand failed.
sh: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)

READ THIS: https://docs.brew.sh/Troubleshooting

These open issues may also help:
Harfbuzz fails to build on Ubuntu 20.04 (WSL 2) https://github.com/Homebrew/linuxbrew-core/issues/20888

とエラーが大量に出てインストールできないし。
この件についていろいろ試してみたが結局解決せず。

ということで一旦Linuxでの作業を諦め、Macに移る。

$ sh signssm_plot.sh 3 best.D008.S001.K.dat sample.tsv output.pdf
signssm_plot.sh: line 109: gnuplot: command not found

またか。結局おなじなのか?

$ brew install gnuplot
Updating Homebrew...
・・・・
==> Installing gnuplot dependency: python@3.8
==> Pouring python@3.8-3.8.5.mojave.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/2to3
Target /usr/local/bin/2to3
already exists. You may want to remove it:
  rm '/usr/local/bin/2to3'

To force the link and overwrite all conflicting files:
  brew link --overwrite python@3.8

To list all files that would be deleted:
  brew link --overwrite --dry-run python@3.8

Possible conflicting files are:
/usr/local/bin/2to3 -> /Library/Frameworks/Python.framework/Versions/2.7/bin/2to3
/usr/local/bin/idle3 -> /Library/Frameworks/Python.framework/Versions/3.6/bin/idle3
/usr/local/bin/pydoc3 -> /Library/Frameworks/Python.framework/Versions/3.6/bin/pydoc3
/usr/local/bin/python3 -> /Library/Frameworks/Python.framework/Versions/3.6/bin/python3
/usr/local/bin/python3-config -> /Library/Frameworks/Python.framework/Versions/3.6/bin/python3-config
Error: Permission denied @ dir_s_mkdir - /usr/local/Frameworks

またか。その2
エラーメッセージに従って

$ rm '/usr/local/bin/2to3'

としてやると、今度はすんなりとgnuplotがインストールできた。
これでようやく

$ sh signssm_plot.sh 3 result1.D008.S001.K.dat sample.tsv output.pdf

無事完了してPDFファイルが出来上がる。今回100遺伝子のプロファイルなので100枚のPDFなんだけど、これ全ゲノムでやると3万枚のPDFになるんだろうな。見てらんないよ。

で、続いてやることは

then let's try permutation test on the HGC supercomputer system

て、え?ここからはスパコンじゃないとだめなの?


なお、signssmコマンドをMacで走らせようと思ったら

$ ./signssm
dyld: Library not loaded: /usr/local/lib/libmpi.1.dylib
  Referenced from: /Volumes/SSD/kkuro/local/bin/./signssm
  Reason: image not found
Abort trap: 6

とやはりそのままでは動かない。
まずlibmpiがいるということでopen-mpiを入れる。

$ brew install openmpi
$ ./signssm
dyld: Library not loaded: /usr/local/lib/libmpi.1.dylib
  Referenced from: /Volumes/SSD/kkuro/local/bin/./signssm
  Reason: image not found
Abort trap: 6

変わらんし。
強引だけど

$ ln -s /usr/local/Cellar/open-mpi/4.0.4_1/lib/libmpi.40.dylib /usr/local/lib/libmpi.1.dylib
$ ./signssm
dyld: Library not loaded: /usr/local/lib/gcc/x86_64-apple-darwin13.3.0/4.9.0/libgomp.1.dylib
  Referenced from: /Volumes/SSD/kkuro/local/bin/./signssm
  Reason: image not found
Abort trap: 6

なんだとー。

$ mkdir /usr/local/lib/gcc/x86_64-apple-darwin13.3.0
$ mkdir /usr/local/lib/gcc/x86_64-apple-darwin13.3.0/4.9.0/
$ ln -s /usr/local/opt/gcc/lib/gcc/10/libgomp.dylib /usr/local/lib/gcc/x86_64-apple-darwin13.3.0/4.9.0/libgomp.1.dylib

これでどうだ。

o$ ./signssm
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data from the standard input.

signssm(43237,0x10e02b5c0) malloc: can't allocate region
*** mach_vm_map(size=18446603546223603712) failed (error code=3)
signssm(43237,0x10e02b5c0) malloc: *** set a breakpoint in malloc_error_break to debug
[kkuro-Mac-mini-2018:43237] *** Process received signal ***
[kkuro-Mac-mini-2018:43237] Signal: Segmentation fault: 11 (11)
[kkuro-Mac-mini-2018:43237] Signal code: Address not mapped (1)
[kkuro-Mac-mini-2018:43237] Failing at address: 0x0
[kkuro-Mac-mini-2018:43237] [ 0] 0   libsystem_platform.dylib            0x00007fff5cca3b5d _sigtramp + 29
[kkuro-Mac-mini-2018:43237] [ 1] 0   ???                                 0x0000000000000000 0x0 + 0
[kkuro-Mac-mini-2018:43237] [ 2] 0   signssm                             0x0000000103351e26 SSMData_read_fp + 438
[kkuro-Mac-mini-2018:43237] *** End of error message ***
Segmentation fault: 11

とりあえず走るようにはなったみたい。

open-MPIもインストールしたことだし試しにMacで最初からやってみる。

$ mpirun ./signssm -d 8 -n 100 -o result sample.tsv

おっ、結構速いよ。

SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

MPI mode: This is the root process.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
Broadcasting the input data.
  Done.
THIS IS THE ROOT PROCESS [SERVER]
In total, 1 (set) x 100 (iteration) x 1 (dimension) = 100 jobs will be dispatched.
Total sets = 1
proc_first_dispatch: rank=1, set_id=1
proc_first_dispatch: rank=2, set_id=1
proc_first_dispatch: rank=3, set_id=1
proc_first_dispatch: rank=4, set_id=1
proc_first_dispatch: rank=5, set_id=1
After the first dispatch: iter=5, set=0, dim_idx=0, set_id=1.
Waiting a job request...
Request received: rank=3, likelihood=1272.128966 (set ID=1)
  Best likelihood updated for set ID=1: 1272.128966 count: 5
Job Dispatched: rank=3, setID1=0, setID2=1, dim=8
  iteration=5, set=0
  6/100

・・・・・

Waiting a job request...
Request received: rank=3, likelihood=2529.649199 (set ID=1)
Job Dispatched: rank=3, setID1=0, setID2=0, dim=-2
  iteration=0, set=0
Waiting a job request...
Request received: rank=4, likelihood=1243.279744 (set ID=1)
------------------------
 Set finished. setID=1.
------------------------
  |set_ranks[recv_id]|=5
  Best likelihood: 1633206613492228683064330596324346188692632137464403681452508477558896480390805133713951290953826731815033386049280101816652249859497132032.000000 @ rank: 2  count: 100
Job Dispatched: rank=4, setID1=-1, setID2=0, dim=-2
  iteration=0, set=0
All jobs finished.  Sending the final order.
For Client:1, num = 0
For Client:2, num = 1
For Client:3, num = 0
For Client:4, num = 0
For Client:5, num = 0
All time: 0 day 00 hour 01 min 49 sec
X-ALL-TIME:	109.028624

ちなみにLinuxXEON E3-1330v6)で7スレッドでやると

All time: 0 day 00 hour 05 min 58 sec
X-ALL-TIME:	357.768202

だったので i7 3.2GHz 6コアのMPIのほうが3倍ほど速いという結果。

$ ./signssm --perm result.D008.S001.A.dat -o perm/result sample.tsv
$ ./signssm --ssmperm prefix=perm/result,ssm=result.D008.S001.A.dat,ed=1000,th=0.05 -o final.txt sample.tsv

と最後までいけた。
signssm --perm
All time: 0 day 00 hour 12 min 26 sec
X-ALL-TIME: 746.334618

signssm --ssmperm
All time: 0 day 00 hour 00 min 00 sec
X-ALL-TIME: 0.267522

かかる時間はこんな感じ。この2ステップはMPIに対応してないっぽく、mpirunでは正しい処理ができなかった。
実際に全ゲノムでやるとなるとこの2番めのステップがボトルネックだろうな。ここはやはりクラスタで並列処理をかましたいところ。

となんかイケてそうな気がしてたけど出来上がったデータを見てみるとどうもMPIはうまく走っていない。うーむ。

なおLinuxのほうでMPIでやってみたらマルチスレッドでやったのと全く同じ結果を得られた。こちらはうまく走っている。
しかしタイムは
All time: 0 day 00 hour 34 min 28 sec
X-ALL-TIME: 2067.862958
とめちゃめちゃかかってる。どういうこと?