featureCountの方もまだ不完全だがDEXSeqも手付かずなのでまずは環境構築
$ dexseq_prepare_annotation2.py -f DEXSeq.gtf expression/Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf genes.DEXSeq.gtf Could not import HTSeq. Please install the HTSeq Python framework available from http://www-huber.embl.de/users/anders/HTSeq
早速エラー
HTseqがないと。
$ pip install HTseq Collecting HTseq Downloading HTSeq-0.8.0.tar.gz (561kB) 100% |========================================| 563kB 2.3MB/s Could not import setuptools which is required to install from a source distribution. Please install setuptools.
インストールできないって言うし。
結局手動でダウンロードしてきてそのディレクトリで
$ python setup.py build symlinking folders for python2 Could not import 'setuptools', falling back to 'distutils'. Traceback (most recent call last): File "setup.py", line 193, in <module> **kwargs File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 111, in setup _setup_distribution = dist = klass(attrs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 259, in __init__ getattr(self.metadata, "set_" + key)(val) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py", line 1220, in set_requires distutils.versionpredicate.VersionPredicate(v) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/versionpredicate.py", line 113, in __init__ raise ValueError("expected parenthesized list: %r" % paren) ValueError: expected parenthesized list: '>=0.9.0'
今度はなんだ?setuptoolsがインポートできない?
ValueErrorってのも気になるが。setup.pyの中身をざっと見ると
setup_requires=[ 'Cython', 'numpy', 'pysam>=0.9.0', ],
こんな記述があって、エラーの中で0.9.0がどうのとなっているということは、この3つのパッケージもいるってことかな。
というわけでpip installで入れておく。
$ pip install Cython numpy pysum setuptools
で
$ python setup.py build symlinking folders for python2 running build running build_py running preprocess cythonizing Cython version 0.25.2 SWIGging /bin/sh: swig: command not found SWIG not found, but transpiled files found moving swigged .py module done creating build creating build/lib.macosx-10.6-intel-2.7 creating build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/__init__.py -> build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/_HTSeq_internal.py -> build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/StepVector.py -> build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/_version.py -> build/lib.macosx-10.6-intel-2.7/HTSeq creating build/lib.macosx-10.6-intel-2.7/HTSeq/scripts copying HTSeq/scripts/__init__.py -> build/lib.macosx-10.6-intel-2.7/HTSeq/scripts copying HTSeq/scripts/qa.py -> build/lib.macosx-10.6-intel-2.7/HTSeq/scripts copying HTSeq/scripts/count.py -> build/lib.macosx-10.6-intel-2.7/HTSeq/scripts running build_ext building 'HTSeq._HTSeq' extension creating build/temp.macosx-10.6-intel-2.7 creating build/temp.macosx-10.6-intel-2.7/src /usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/_HTSeq.c -o build/temp.macosx-10.6-intel-2.7/src/_HTSeq.o -w /usr/bin/clang -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g build/temp.macosx-10.6-intel-2.7/src/_HTSeq.o -o build/lib.macosx-10.6-intel-2.7/HTSeq/_HTSeq.so building 'HTSeq._StepVector' extension /usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/StepVector_wrap.cxx -o build/temp.macosx-10.6-intel-2.7/src/StepVector_wrap.o -w /usr/bin/clang++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g build/temp.macosx-10.6-intel-2.7/src/StepVector_wrap.o -o build/lib.macosx-10.6-intel-2.7/HTSeq/_StepVector.so clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 running build_scripts creating build/scripts-2.7 copying and adjusting scripts/htseq-qa -> build/scripts-2.7 copying and adjusting scripts/htseq-count -> build/scripts-2.7 changing mode of build/scripts-2.7/htseq-qa from 644 to 755 changing mode of build/scripts-2.7/htseq-count from 644 to 755
HTseqは入ったな。
では最初に戻って
$ python local/bin/Subread_to_DEXSeq-master/dexseq_prepare_annotation2.py -f DEXSeq.gtf expression/Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf genes.DEXSeq.gtf /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: can't open file 'local/bin/Subread_to_DEXSeq-master/dexseq_prepare_annotation2.py': [Errno 2] No such file or directory
まだだめらしい。
$ python setup.py install symlinking folders for python2 running install running bdist_egg running egg_info writing requirements to HTSeq.egg-info/requires.txt writing HTSeq.egg-info/PKG-INFO writing top-level names to HTSeq.egg-info/top_level.txt writing dependency_links to HTSeq.egg-info/dependency_links.txt file HTSeq/StepVector.py (for module HTSeq.StepVector) not found reading manifest file 'HTSeq.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching 'README.md' warning: no previously-included files found matching 'todo.txt' no previously-included directories found matching 'example_data' no previously-included directories found matching 'doc' no previously-included directories found matching 'test' writing manifest file 'HTSeq.egg-info/SOURCES.txt' installing library code to build/bdist.macosx-10.6-intel/egg running install_lib running build_py running preprocess cythonizing Cython version 0.25.2 SWIGging /bin/sh: swig: command not found SWIG not found, but transpiled files found moving swigged .py module done creating build creating build/lib.macosx-10.6-intel-2.7 creating build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/__init__.py -> build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/_HTSeq_internal.py -> build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/StepVector.py -> build/lib.macosx-10.6-intel-2.7/HTSeq copying HTSeq/_version.py -> build/lib.macosx-10.6-intel-2.7/HTSeq creating build/lib.macosx-10.6-intel-2.7/HTSeq/scripts copying HTSeq/scripts/__init__.py -> build/lib.macosx-10.6-intel-2.7/HTSeq/scripts copying HTSeq/scripts/qa.py -> build/lib.macosx-10.6-intel-2.7/HTSeq/scripts copying HTSeq/scripts/count.py -> build/lib.macosx-10.6-intel-2.7/HTSeq/scripts running build_ext building 'HTSeq._HTSeq' extension creating build/temp.macosx-10.6-intel-2.7 creating build/temp.macosx-10.6-intel-2.7/src /usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/_HTSeq.c -o build/temp.macosx-10.6-intel-2.7/src/_HTSeq.o -w /usr/bin/clang -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g build/temp.macosx-10.6-intel-2.7/src/_HTSeq.o -o build/lib.macosx-10.6-intel-2.7/HTSeq/_HTSeq.so building 'HTSeq._StepVector' extension /usr/bin/clang -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/StepVector_wrap.cxx -o build/temp.macosx-10.6-intel-2.7/src/StepVector_wrap.o -w /usr/bin/clang++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g build/temp.macosx-10.6-intel-2.7/src/StepVector_wrap.o -o build/lib.macosx-10.6-intel-2.7/HTSeq/_StepVector.so clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 creating build/bdist.macosx-10.6-intel creating build/bdist.macosx-10.6-intel/egg creating build/bdist.macosx-10.6-intel/egg/HTSeq copying build/lib.macosx-10.6-intel-2.7/HTSeq/__init__.py -> build/bdist.macosx-10.6-intel/egg/HTSeq copying build/lib.macosx-10.6-intel-2.7/HTSeq/_HTSeq.so -> build/bdist.macosx-10.6-intel/egg/HTSeq copying build/lib.macosx-10.6-intel-2.7/HTSeq/_HTSeq_internal.py -> build/bdist.macosx-10.6-intel/egg/HTSeq copying build/lib.macosx-10.6-intel-2.7/HTSeq/_StepVector.so -> build/bdist.macosx-10.6-intel/egg/HTSeq copying build/lib.macosx-10.6-intel-2.7/HTSeq/_version.py -> build/bdist.macosx-10.6-intel/egg/HTSeq creating build/bdist.macosx-10.6-intel/egg/HTSeq/scripts copying build/lib.macosx-10.6-intel-2.7/HTSeq/scripts/__init__.py -> build/bdist.macosx-10.6-intel/egg/HTSeq/scripts copying build/lib.macosx-10.6-intel-2.7/HTSeq/scripts/count.py -> build/bdist.macosx-10.6-intel/egg/HTSeq/scripts copying build/lib.macosx-10.6-intel-2.7/HTSeq/scripts/qa.py -> build/bdist.macosx-10.6-intel/egg/HTSeq/scripts copying build/lib.macosx-10.6-intel-2.7/HTSeq/StepVector.py -> build/bdist.macosx-10.6-intel/egg/HTSeq byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/__init__.py to __init__.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/_HTSeq_internal.py to _HTSeq_internal.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/_version.py to _version.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/scripts/__init__.py to __init__.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/scripts/count.py to count.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/scripts/qa.py to qa.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/StepVector.py to StepVector.pyc creating stub loader for HTSeq/_HTSeq.so creating stub loader for HTSeq/_StepVector.so byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/_HTSeq.py to _HTSeq.pyc byte-compiling build/bdist.macosx-10.6-intel/egg/HTSeq/_StepVector.py to _StepVector.pyc creating build/bdist.macosx-10.6-intel/egg/EGG-INFO installing scripts to build/bdist.macosx-10.6-intel/egg/EGG-INFO/scripts running install_scripts running build_scripts creating build/scripts-2.7 copying and adjusting scripts/htseq-qa -> build/scripts-2.7 copying and adjusting scripts/htseq-count -> build/scripts-2.7 changing mode of build/scripts-2.7/htseq-qa from 644 to 755 changing mode of build/scripts-2.7/htseq-count from 644 to 755 creating build/bdist.macosx-10.6-intel/egg/EGG-INFO/scripts copying build/scripts-2.7/htseq-count -> build/bdist.macosx-10.6-intel/egg/EGG-INFO/scripts copying build/scripts-2.7/htseq-qa -> build/bdist.macosx-10.6-intel/egg/EGG-INFO/scripts changing mode of build/bdist.macosx-10.6-intel/egg/EGG-INFO/scripts/htseq-count to 755 changing mode of build/bdist.macosx-10.6-intel/egg/EGG-INFO/scripts/htseq-qa to 755 copying HTSeq.egg-info/PKG-INFO -> build/bdist.macosx-10.6-intel/egg/EGG-INFO copying HTSeq.egg-info/SOURCES.txt -> build/bdist.macosx-10.6-intel/egg/EGG-INFO copying HTSeq.egg-info/dependency_links.txt -> build/bdist.macosx-10.6-intel/egg/EGG-INFO copying HTSeq.egg-info/requires.txt -> build/bdist.macosx-10.6-intel/egg/EGG-INFO copying HTSeq.egg-info/top_level.txt -> build/bdist.macosx-10.6-intel/egg/EGG-INFO writing build/bdist.macosx-10.6-intel/egg/EGG-INFO/native_libs.txt zip_safe flag not set; analyzing archive contents... HTSeq.StepVector: module references __file__ creating dist creating 'dist/HTSeq-0.8.0-py2.7-macosx-10.6-intel.egg' and adding 'build/bdist.macosx-10.6-intel/egg' to it removing 'build/bdist.macosx-10.6-intel/egg' (and everything under it) Processing HTSeq-0.8.0-py2.7-macosx-10.6-intel.egg creating /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/HTSeq-0.8.0-py2.7-macosx-10.6-intel.egg Extracting HTSeq-0.8.0-py2.7-macosx-10.6-intel.egg to /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages Adding HTSeq 0.8.0 to easy-install.pth file Installing htseq-count script to /Library/Frameworks/Python.framework/Versions/2.7/bin Installing htseq-qa script to /Library/Frameworks/Python.framework/Versions/2.7/bin Installed /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/HTSeq-0.8.0-py2.7-macosx-10.6-intel.egg Processing dependencies for HTSeq==0.8.0 Searching for pysam==0.11.2.2 Best match: pysam 0.11.2.2 Adding pysam 0.11.2.2 to easy-install.pth file Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages Searching for numpy==1.13.1 Best match: numpy 1.13.1 Adding numpy 1.13.1 to easy-install.pth file Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages Finished processing dependencies for HTSeq==0.8.0
これでどや
$ python local/bin/Subread_to_DEXSeq-master/dexseq_prepare_annotation2.py -f DEXSeq.gtf expression/Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf genes.DEXSeq.gtf Traceback (most recent call last): File "local/bin/Subread_to_DEXSeq-master/dexseq_prepare_annotation2.py", line 136, in <module> assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early" AssertionError: <GenomicFeature: exonic_part 'IL3RA' at Y: 1405508 -> 1405819 (strand '+')> starts too early
うーむ
gtfファイルとして今回iGenomeからダウンロードしたNCBIのbuild37.2を使用していたが、このプログラムはどうやらEnsemblのGRCh37を想定してスクリプトが書かれているとのこと。微妙な書式のブレかデータのエラーが引っかかるようです。
仕方がないのでEnsemblのデータをダウンロードし、展開
改めてやり直すと
$ python ~/local/bin/Subread_to_DEXSeq-master/dexseq_prepare_annotation2.py -f ./genesDEXSeq.gtf ~/expression/Homo_sapiens/Ensembl/GRCh37/Annotation/Archives/archive-2015-07-17-14-31-42/Genes/genes.gtf ./genesDEXSeq.gff Done!
無事完了。
アプリごとにアノテーションファイルを独自で持ってないといけないとはなんとも非効率的な。
ではfeatureCountをおもむろにやり直し。
$ ~/local/bin/subread-1.5.2-MaxOSX-x86_64/bin/featureCounts -T 4 -p -s 2 -f -O -a genesDEXSeq.gtf -t exonic_part -o counts_PE.txt SRR1234567_th_PE.bam SRR7654321_th_PE.bam -F GTF ========== _____ _ _ ____ _____ ______ _____ ===== / ____| | | | _ \| __ \| ____| /\ | __ \ ===== | (___ | | | | |_) | |__) | |__ / \ | | | | ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | | ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| | ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/ v1.5.2 //========================== featureCounts setting ===========================\\ || || || Input files : 2 BAM files || || P SRR1234567_th_PE.bam || || P SRR7654321_th_PE.bam || || || || Output file : counts_PE.txt || || Summary : counts_PE.txt.summary || || Annotation : genesDEXSeq.gtf (GTF) || || Dir for temp files : ./ || || || || Threads : 4 || || Level : feature level || || Paired-end : yes || || Strand specific : reversely stranded || || Multimapping reads : not counted || || Multi-overlapping reads : counted || || Min overlapping bases : 1 || || || || Chimeric reads : counted || || Both ends mapped : not required || || || \\===================== http://subread.sourceforge.net/ ======================// //================================= Running ==================================\\ || || || Load annotation file genesDEXSeq.gtf ... || || Features : 644354 || || Meta-features : 60153 || || Chromosomes/contigs : 265 || || || || Process BAM file SRR1234567_th_PE.bam... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Both single-end and paired-end reads were found. || || Total fragments : 15002128 || || Successfully assigned fragments : 10670180 (71.1%) || || Running time : 1.11 minutes || || || || Process BAM file SRR7654321_th_PE.bam... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Total fragments : 16073300 || || Successfully assigned fragments : 9769107 (60.8%) || || Running time : 2.58 minutes || || || || Read assignment finished. || || || || Summary of counting results can be found in file "counts_PE.txt" || || || \\===================== http://subread.sourceforge.net/ ======================//
これでいいのだろうか?
ここからはR
> source("https://bioconductor.org/biocLite.R") Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help A new version of Bioconductor is available after installing the most recent version of R; see http://bioconductor.org/install > biocLite("Rsubread") BioC_mirror: https://bioconductor.org Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.3 (2017-03-06). Installing package(s) ‘Rsubread’ URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/Rsubread_1.24.2.tgz' を試しています Content type 'application/x-gzip' length 9320101 bytes (8.9 MB) ================================================== downloaded 8.9 MB The downloaded binary packages are in /var/folders/sf/3lk5x37135586w0jgqyjq_yc0000gn/T//RtmpyMsqr7/downloaded_packages > source("~/local/bin/Subread_to_DEXSeq-master/load_SubreadOutput.R") > suppressPackageStartupMessages({ + require(dplyr) + }) 警告メッセージ: library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, で: ‘dplyr’ という名前のパッケージはありません
でた。
> install.packages("dplyr") also installing the dependencies ‘bindr’, ‘bindrcpp’, ‘glue’ URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/bindr_0.1.tgz' を試しています Content type 'application/x-gzip' length 12121 bytes (11 KB) ================================================== downloaded 11 KB URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/bindrcpp_0.2.tgz' を試しています Content type 'application/x-gzip' length 380129 bytes (371 KB) ================================================== downloaded 371 KB URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/glue_1.1.1.tgz' を試しています Content type 'application/x-gzip' length 30626 bytes (29 KB) ================================================== downloaded 29 KB URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/dplyr_0.7.1.tgz' を試しています Content type 'application/x-gzip' length 5932799 bytes (5.7 MB) ================================================== downloaded 5.7 MB The downloaded binary packages are in /var/folders/sf/3lk5x37135586w0jgqyjq_yc0000gn/T//RtmpyMsqr7/downloaded_packages
> setwd("~/expression/BAMs/") > samp <- data.frame(row.names = c("Patient_7","control_10"), + condition = rep(c("patient","control"),each=1)) > dxd.fc <- DEXSeqDataSetFromFeatureCounts("counts_PE.txt", + flattenedfile = "./genesDEXSeq.gtf",sampleData = samp) Reading and adding Exon IDs for DEXSeq DEXSeqDataSetFromFeatureCounts("counts_PE.txt", flattenedfile = "./genesDEXSeq.gtf", でエラー: Count files do not correspond to the flattened annotation file 追加情報: 警告メッセージ: 0 aggregate geneIDs were found truncated in featureCounts output
またまたエラー