Linux版バイナリとして配布されているSiGN-SSMはrel 1.0.2. Multi-thread supported, MPI support not enabledだった。
だからMPIを指定するとマルチスレッドも働かず激遅になったんだな。
あと
You can also use signssm to do this (rel 1.10.0 or later):
$ ~tamada/sign/signssm --ssmperm prefix=perm/result,ssm=result.D008.S001.A.dat,ed=1000,th=0.05 -o final.txt sample.tsv
となっていて1.0.2ではこのコマンドが使えない。まあここはあっという間に終わるのでMacに持っていってからでいいっちゃあいい。
(ところで1.10.0ってそんなバージョンないけど。1.1.0の間違いなんじゃないのかな。1.0.0の可能性もあるけど。)
とにかくコンパイルしてみる。
環境としてBLASとLAPACKというライブラリが必要らしい。
$ wget https://github.com/Reference-LAPACK/lapack/archive/v3.9.0.tar.gz
でLAPACKをダウンロードする。BLASはLAPACKに含まれているらしい。
解凍したらディレクトリに入り
$ cp make.inc.example make.inc $ make blaslib
そして
$ make lapacklib
でmakeして、出来上がったら
$ sudo cp librefblas.a /usr/lib/libblas.a $ sudo cp liblapack.a /usr/lib/liblapack.a
とコピーしておく。
# Prepare "make.inc" for your own environment, # or comment-out below and uncomment one of the followings. #include make.inc #ここをコメントアウト # Linux gcc + Netlib LAPACK. (Set the location of LAPACK in make.inc.gcc_lapack) include make.inc.gcc_lapack #ここのコメントを外す。
# Netlib LAPACK DIRECTORY LAPACK=/usr/lib #書き換え # Platform specified in make.inc of LAPACK #PLAT=_LINUX #コメントアウト USE_MPI=1 #コメント外し ######################################################### # NOT NEED TO EDIT BELOW LAPACK_LIB=$(LAPACK)/liblapack$(PLAT).a #lapackをliblapackに書き換えた LAPACK_LIB+=$(LAPACK)/libblas$(PLAT).a #blasをlibblasに書き換えた
としてやって
make
するとsignssmというバイナリが出来上がるので適当な場所にコピーしてやる。
とりあえず
$ bin/signssm -o result1 --threads 8 sample.tsv SiGN-SSM version 1.2.1 (Fri Dec 19 17:15:38 2014 JST) Copyright (C) 2010-2014 SiGN Project Members. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv Applying mean shift for the input data set. ******************************* ** Single Execution Mode ** ******************************* --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimension = 4 ======================================== = Start the loop for dim=4 Set 1... Estimation failed. Retrying (1)... Estimation failed. Retrying (2)... 1: Loops=5503, Likelihood=1829.617679 2: Loops=2723, Likelihood=1829.093921 Estimation failed. Retrying (1)... 3: Loops=4497, Likelihood=1829.805601 4: Loops=2908, Likelihood=1830.622691 5: Loops=28717, Likelihood=1829.705602 6: Loops=14724, Likelihood=1829.985808 ・・・・・ 95: Loops=2908, Likelihood=1829.762141 96: Loops=4126, Likelihood=1830.554303 97: Loops=4996, Likelihood=1830.595209 98: Loops=2147, Likelihood=1829.749469 Estimation failed. Retrying (1)... 99: Loops=4746, Likelihood=1829.645741 100: Loops=3037, Likelihood=1829.483204 Finished: dim=4, likelihood=1831.571195 0 day 00 hour 02 min 43 sec X-SINGLE-TIME: 163.107068
おや?MPIが効いてないどころかマルチスレッドすら動いてない?
ちなみに1.0.2のバイナリで同じ作業をやると
$ bin/signssm0 -o result0 --threads 8 sample.tsv SiGN-SSM version 1.0.2 (Wed Feb 23 11:44:59 2011 JST) Copyright (C) 2010 HGC, IMS, The University of Tokyo. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv Applying mean shift for the input data set. ******************************* ** Thread Execution Mode ** ******************************* Number of threads = 8 --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimemsion = 4 Total single tasks = 100 {4} set=1, dim=4, likelihood=1830.679933, loops=198, finished=1 Best likelihood updated for set id = 1. {6} set=1, dim=4, likelihood=1829.956352, loops=2757, finished=2 {3} set=1, dim=4, likelihood=1830.440368, loops=2997, finished=3 {1} set=1, dim=4, likelihood=1831.018207, loops=3138, finished=4 Best likelihood updated for set id = 1. {5} set=1, dim=4, likelihood=1829.589914, loops=3391, finished=5 ・・・・・ {6} set=1, dim=4, likelihood=1830.725075, loops=4010, finished=96 {0} set=1, dim=4, likelihood=1829.636737, loops=6723, finished=97 {2} set=1, dim=4, likelihood=1829.531546, loops=8539, finished=98 {3} set=1, dim=4, likelihood=1829.669849, loops=33276, finished=99 {5} set=1, dim=4, likelihood=1829.672068, loops=20355, finished=100 Outputting the calculated state variables: result0.D004.S001.K.dat Outputting the result into a single file: result0.D004.S001.A.dat Outputting the summary file: result0.D004.S001.B.dat All time: 0 day 00 hour 01 min 04 sec X-ALL-TIME: 63.882370
MPIはないけどマルチスレッドが働いていてマアマアの速度。
じゃあと、MPIを外してコンパイルし直してみると
$ bin/signssm2 -o result2 --threads 8 sample.tsv SiGN-SSM version 1.2.1 (Fri Dec 19 17:15:38 2014 JST) Copyright (C) 2010-2014 SiGN Project Members. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv Applying mean shift for the input data set. ******************************* ** Single Execution Mode ** ******************************* --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimension = 4 ======================================== = Start the loop for dim=4 Set 1... Estimation failed. Retrying (1)... Estimation failed. Retrying (2)... 1: Loops=5503, Likelihood=1829.617679 2: Loops=2723, Likelihood=1829.093921 Estimation failed. Retrying (1)... 3: Loops=4497, Likelihood=1829.805601 4: Loops=2908, Likelihood=1830.622691 5: Loops=28717, Likelihood=1829.705602 ・・・・・ 95: Loops=2908, Likelihood=1829.762141 96: Loops=4126, Likelihood=1830.554303 97: Loops=4996, Likelihood=1830.595209 98: Loops=2147, Likelihood=1829.749469 Estimation failed. Retrying (1)... 99: Loops=4746, Likelihood=1829.645741 100: Loops=3037, Likelihood=1829.483204 Finished: dim=4, likelihood=1831.571195 0 day 00 hour 02 min 45 sec X-SINGLE-TIME: 164.772890 Outputting the calculated state variables: result2.D004.S001.K.dat Outputting the p-values: result2.D004.S001.P.dat Outputting the meta-analysis of p-values: result2.D004.S001.m.dat Outputting the result into a single file: result2.D004.S001.A.dat Outputting the summary file: result2.D004.S001.B.dat Going to exit. All time: 0 day 00 hour 02 min 45 sec X-ALL-TIME: 164.900692
と、やはりマルチスレッドでは動かず、速度も変わらない。
どういうこと?
MPIで走らせるコマンドを忘れていた。
$ mpirun ./bin/signssm1 -o result1 sample.tsv SiGN-SSM version 1.2.1 (Fri Dec 19 17:15:38 2014 JST) Copyright (C) 2010-2014 SiGN Project Members. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv Applying mean shift for the input data set. ******************************* ** Single Execution Mode ** ******************************* --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimension = 4 ======================================== = Start the loop for dim=4 Set 1... SiGN-SSM version 1.2.1 (Fri Dec 19 17:15:38 2014 JST) Copyright (C) 2010-2014 SiGN Project Members. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv Applying mean shift for the input data set. ******************************* ** Single Execution Mode ** ******************************* --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimension = 4 ======================================== = Start the loop for dim=4 Set 1... SiGN-SSM version 1.2.1 (Fri Dec 19 17:15:38 2014 JST) Copyright (C) 2010-2014 SiGN Project Members. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv Applying mean shift for the input data set. ******************************* ** Single Execution Mode ** ******************************* --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimension = 4 SiGN-SSM version 1.2.1 (Fri Dec 19 17:15:38 2014 JST) Copyright (C) 2010-2014 SiGN Project Members. ! ABSOLUTELY NO WARRANTY. SEE LICENSE FOR DETAILS. Visit http://sign.hgc.jp/ Single process mode. Reading input data file: sample.tsv ======================================== = Start the loop for dim=4 Set 1... Applying mean shift for the input data set. ******************************* ** Single Execution Mode ** ******************************* --- INPUT DATA --------- observed samples = 21 objects = 100 time points = 48 replicates = 3 ------------------------ Max dimension = 4 ======================================== = Start the loop for dim=4 Set 1... Estimation failed. Retrying (1)... Estimation failed. Retrying (1)... Estimation failed. Retrying (1)... Estimation failed. Retrying (1)... Estimation failed. Retrying (2)... Estimation failed. Retrying (2)... ・・・・・ 95: Loops=2908, Likelihood=1829.762141 96: Loops=4126, Likelihood=1830.554303 97: Loops=4996, Likelihood=1830.595209 98: Loops=2147, Likelihood=1829.749469 Estimation failed. Retrying (1)... 99: Loops=4746, Likelihood=1829.645741 100: Loops=3037, Likelihood=1829.483204 Finished: dim=4, likelihood=1831.571195 0 day 00 hour 03 min 47 sec X-SINGLE-TIME: 226.878186 Outputting the calculated state variables: result1.D004.S001.K.dat Outputting the p-values: result1.D004.S001.P.dat Outputting the meta-analysis of p-values: result1.D004.S001.m.dat Outputting the result into a single file: result1.D004.S001.A.dat Outputting the summary file: result1.D004.S001.B.dat Going to exit. All time: 0 day 00 hour 03 min 47 sec X-ALL-TIME: 227.000185
いや確かにMPIで走ってはいるが全然速くなってないし。むしろ遅い。
やっぱりこれはMPIで走らせるべきではないソフトウェアなんだろうか。だからバイナリもマルチスレッド対応のみで配布されているのか。自前コンパイルだとそれすらできてないからまるで意味がないな。
しかし1.0.2ではやはり2ステップ目以降の
-
- perm
- ssmperm
オプションはやはり使えないのでLinuxだけで完結しないのでもうちょっとここは頑張ってみるか。