kuroの覚え書き

96の個人的覚え書き

SiGN-SSMをソースからコンパイル

Linux版バイナリとして配布されているSiGN-SSMはrel 1.0.2. Multi-thread supported, MPI support not enabledだった。
だからMPIを指定するとマルチスレッドも働かず激遅になったんだな。
あと
You can also use signssm to do this (rel 1.10.0 or later):

$ ~tamada/sign/signssm --ssmperm prefix=perm/result,ssm=result.D008.S001.A.dat,ed=1000,th=0.05 -o final.txt sample.tsv

となっていて1.0.2ではこのコマンドが使えない。まあここはあっという間に終わるのでMacに持っていってからでいいっちゃあいい。
(ところで1.10.0ってそんなバージョンないけど。1.1.0の間違いなんじゃないのかな。1.0.0の可能性もあるけど。)

とにかくコンパイルしてみる。
環境としてBLASLAPACKというライブラリが必要らしい。

$ wget https://github.com/Reference-LAPACK/lapack/archive/v3.9.0.tar.gz

LAPACKをダウンロードする。BLASLAPACKに含まれているらしい。
解凍したらディレクトリに入り

$ cp make.inc.example make.inc
$ make blaslib

そして

$ make lapacklib

でmakeして、出来上がったら

$ sudo cp librefblas.a /usr/lib/libblas.a
$ sudo cp liblapack.a /usr/lib/liblapack.a

とコピーしておく。

一方SiGN-SSMの方は
Makefile

# Prepare "make.inc" for your own environment,
# or comment-out below and uncomment one of the followings.
#include make.inc #ここをコメントアウト

# Linux gcc + Netlib LAPACK. (Set the location of LAPACK in make.inc.gcc_lapack)
include make.inc.gcc_lapack #ここのコメントを外す。

と手を入れ、
make.inc.gcc_lapackの方は

# Netlib LAPACK DIRECTORY
LAPACK=/usr/lib #書き換え
 
# Platform specified in make.inc of LAPACK
#PLAT=_LINUX #コメントアウト

USE_MPI=1 #コメント外し

#########################################################
# NOT NEED TO EDIT BELOW
LAPACK_LIB=$(LAPACK)/liblapack$(PLAT).a  #lapackをliblapackに書き換えた
LAPACK_LIB+=$(LAPACK)/libblas$(PLAT).a #blasをlibblasに書き換えた

としてやって

make

するとsignssmというバイナリが出来上がるので適当な場所にコピーしてやる。
とりあえず

$ bin/signssm -o result1 --threads 8 sample.tsv 
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
*******************************
**   Single Execution Mode   **
*******************************
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimension = 4
========================================
= Start the loop for dim=4
Set 1...
Estimation failed.  Retrying (1)...
Estimation failed.  Retrying (2)...
1: Loops=5503, Likelihood=1829.617679
2: Loops=2723, Likelihood=1829.093921
Estimation failed.  Retrying (1)...
3: Loops=4497, Likelihood=1829.805601
4: Loops=2908, Likelihood=1830.622691
5: Loops=28717, Likelihood=1829.705602
6: Loops=14724, Likelihood=1829.985808
・・・・・
95: Loops=2908, Likelihood=1829.762141
96: Loops=4126, Likelihood=1830.554303
97: Loops=4996, Likelihood=1830.595209
98: Loops=2147, Likelihood=1829.749469
Estimation failed.  Retrying (1)...
99: Loops=4746, Likelihood=1829.645741
100: Loops=3037, Likelihood=1829.483204
Finished: dim=4, likelihood=1831.571195
0 day 00 hour 02 min 43 sec
X-SINGLE-TIME:	163.107068

おや?MPIが効いてないどころかマルチスレッドすら動いてない?

ちなみに1.0.2のバイナリで同じ作業をやると

$ bin/signssm0 -o result0 --threads 8 sample.tsv 
SiGN-SSM  version 1.0.2 (Wed Feb 23 11:44:59 2011 JST)
  Copyright (C) 2010  HGC, IMS, The University of Tokyo.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
*******************************
**   Thread Execution Mode   **
*******************************
Number of threads = 8
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimemsion = 4
Total single tasks = 100
{4} set=1, dim=4, likelihood=1830.679933, loops=198, finished=1
Best likelihood updated for set id = 1.
{6} set=1, dim=4, likelihood=1829.956352, loops=2757, finished=2
{3} set=1, dim=4, likelihood=1830.440368, loops=2997, finished=3
{1} set=1, dim=4, likelihood=1831.018207, loops=3138, finished=4
Best likelihood updated for set id = 1.
{5} set=1, dim=4, likelihood=1829.589914, loops=3391, finished=5
・・・・・
{6} set=1, dim=4, likelihood=1830.725075, loops=4010, finished=96
{0} set=1, dim=4, likelihood=1829.636737, loops=6723, finished=97
{2} set=1, dim=4, likelihood=1829.531546, loops=8539, finished=98
{3} set=1, dim=4, likelihood=1829.669849, loops=33276, finished=99
{5} set=1, dim=4, likelihood=1829.672068, loops=20355, finished=100
Outputting the calculated state variables: result0.D004.S001.K.dat
Outputting the result into a single file: result0.D004.S001.A.dat
Outputting the summary file: result0.D004.S001.B.dat
All time: 0 day 00 hour 01 min 04 sec
X-ALL-TIME:	63.882370

MPIはないけどマルチスレッドが働いていてマアマアの速度。


じゃあと、MPIを外してコンパイルし直してみると

$ bin/signssm2 -o result2 --threads 8 sample.tsv 
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
*******************************
**   Single Execution Mode   **
*******************************
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimension = 4
========================================
= Start the loop for dim=4
Set 1...
Estimation failed.  Retrying (1)...
Estimation failed.  Retrying (2)...
1: Loops=5503, Likelihood=1829.617679
2: Loops=2723, Likelihood=1829.093921
Estimation failed.  Retrying (1)...
3: Loops=4497, Likelihood=1829.805601
4: Loops=2908, Likelihood=1830.622691
5: Loops=28717, Likelihood=1829.705602
・・・・・

95: Loops=2908, Likelihood=1829.762141
96: Loops=4126, Likelihood=1830.554303
97: Loops=4996, Likelihood=1830.595209
98: Loops=2147, Likelihood=1829.749469
Estimation failed.  Retrying (1)...
99: Loops=4746, Likelihood=1829.645741
100: Loops=3037, Likelihood=1829.483204
Finished: dim=4, likelihood=1831.571195
0 day 00 hour 02 min 45 sec
X-SINGLE-TIME:	164.772890
Outputting the calculated state variables: result2.D004.S001.K.dat
Outputting the p-values: result2.D004.S001.P.dat
Outputting the meta-analysis of p-values: result2.D004.S001.m.dat
Outputting the result into a single file: result2.D004.S001.A.dat
Outputting the summary file: result2.D004.S001.B.dat
Going to exit.
All time: 0 day 00 hour 02 min 45 sec
X-ALL-TIME:	164.900692

と、やはりマルチスレッドでは動かず、速度も変わらない。
どういうこと?

MPIで走らせるコマンドを忘れていた。

$ mpirun ./bin/signssm1 -o result1 sample.tsv 
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
*******************************
**   Single Execution Mode   **
*******************************
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimension = 4
========================================
= Start the loop for dim=4
Set 1...
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
*******************************
**   Single Execution Mode   **
*******************************
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimension = 4
========================================
= Start the loop for dim=4
Set 1...
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
Applying mean shift for the input data set.
*******************************
**   Single Execution Mode   **
*******************************
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimension = 4
SiGN-SSM  version 1.2.1 (Fri Dec 19 17:15:38 2014 JST)
  Copyright (C) 2010-2014  SiGN Project Members.

! ABSOLUTELY NO WARRANTY.  SEE LICENSE FOR DETAILS.  Visit http://sign.hgc.jp/

Single process mode.
Reading input data file: sample.tsv
========================================
= Start the loop for dim=4
Set 1...
Applying mean shift for the input data set.
*******************************
**   Single Execution Mode   **
*******************************
--- INPUT DATA ---------
 observed samples = 21
 objects = 100
 time points = 48
 replicates = 3
------------------------
Max dimension = 4
========================================
= Start the loop for dim=4
Set 1...
Estimation failed.  Retrying (1)...
Estimation failed.  Retrying (1)...
Estimation failed.  Retrying (1)...
Estimation failed.  Retrying (1)...
Estimation failed.  Retrying (2)...
Estimation failed.  Retrying (2)...
・・・・・

95: Loops=2908, Likelihood=1829.762141
96: Loops=4126, Likelihood=1830.554303
97: Loops=4996, Likelihood=1830.595209
98: Loops=2147, Likelihood=1829.749469
Estimation failed.  Retrying (1)...
99: Loops=4746, Likelihood=1829.645741
100: Loops=3037, Likelihood=1829.483204
Finished: dim=4, likelihood=1831.571195
0 day 00 hour 03 min 47 sec
X-SINGLE-TIME:	226.878186
Outputting the calculated state variables: result1.D004.S001.K.dat
Outputting the p-values: result1.D004.S001.P.dat
Outputting the meta-analysis of p-values: result1.D004.S001.m.dat
Outputting the result into a single file: result1.D004.S001.A.dat
Outputting the summary file: result1.D004.S001.B.dat
Going to exit.
All time: 0 day 00 hour 03 min 47 sec
X-ALL-TIME:	227.000185

いや確かにMPIで走ってはいるが全然速くなってないし。むしろ遅い。

やっぱりこれはMPIで走らせるべきではないソフトウェアなんだろうか。だからバイナリもマルチスレッド対応のみで配布されているのか。自前コンパイルだとそれすらできてないからまるで意味がないな。

しかし1.0.2ではやはり2ステップ目以降の

    • perm
    • ssmperm

オプションはやはり使えないのでLinuxだけで完結しないのでもうちょっとここは頑張ってみるか。