TopHat→cufflinks→cuffdiff→cummeRbundというのがRNAseq解析の王道であるが、最近はHISAT2→StringTie→Ballgownというのが流行りはじめているらしい。これはやっとかないと。ということでまずは環境構築から。
StringTieのインストールはbrewでできるらしいので、早速
$ brew install stringtie
とやってみると、なにやらbrew自体がrubyで動いているのだけれど、rubyのバージョンが古いぞ、と怒られた。
/usr/local/Homebrew/Library/Homebrew/brew.rb:12:in `<main>': Homebrew must be run under Ruby 2.3! You're running 2.0.0. (RuntimeError)
で、OSXのrubyのアップデートをするならrbenvをbrewで入れて・・・っておい、そのbrewが動かないんだってば。どうすりゃいいのさ?
$ brew update
とりあえずコレでいいみたい。
せっかくなので
$ brew install rbenv $ rbenv install 2.4.1 $ rbenv global 2.4.1
さて、気を取り直してstringtieを入れよう。
$ brew install stringtie Updating Homebrew... ==> Auto-updated Homebrew! Updated 1 tap (homebrew/core). ==> New Formulae fruit Error: No available formula with the name "stringtie" ==> Searching for a previously deleted formula (in the last month)... Warning: homebrew/core is shallow clone. To get complete history run: git -C "$(brew --repo homebrew/core)" fetch --unshallow Error: No previously deleted formula found. ==> Searching for similarly named formulae... ==> Searching local taps... Error: No similarly named formulae found. ==> Searching taps... ==> Searching taps on GitHub... Error: No formulae found in taps.
なんと。移転したらしい。移転先は教えてくれないのか・・・
brewsci/bioかbrewsci/scienceかな。
$ brew tap brewsci/science ==> Tapping brewsci/science Cloning into '/usr/local/Homebrew/Library/Taps/brewsci/homebrew-science'... remote: Counting objects: 598, done. remote: Compressing objects: 100% (595/595), done. remote: Total 598 (delta 1), reused 186 (delta 1), pack-reused 0 Receiving objects: 100% (598/598), 536.53 KiB | 681.00 KiB/s, done. Resolving deltas: 100% (1/1), done. Tapped 579 formulae (617 files, 1.6MB) $ brew install stringtie Updating Homebrew... ==> Installing stringtie from brewsci/science ==> Downloading https://homebrew.bintray.com/bottles-science/stringtie-1.3.3b.el_capitan.b ######################################################################## 100.0% ==> Pouring stringtie-1.3.3b.el_capitan.bottle.tar.gz 🍺 /usr/local/Cellar/stringtie/1.3.3b: 5 files, 509.8KB
こっちがアタリだった。
$ stringtie -h StringTie v1.3.3b usage: stringtie <input.bam ..> [-G <guide_gff>] [-l <label>] [-o <out_gtf>] [-p <cpus>] [-v] [-a <min_anchor_len>] [-m <min_tlen>] [-j <min_anchor_cov>] [-f <min_iso>] [-C <coverage_file_name>] [-c <min_bundle_cov>] [-g <bdist>] [-u] [-e] [-x <seqid,..>] [-A <gene_abund.out>] [-h] {-B | -b <dir_path>} Assemble RNA-Seq alignments into potential transcripts. Options: --version : print just the version at stdout and exit -G reference annotation to use for guiding the assembly process (GTF/GFF3) --rf assume stranded library fr-firststrand --fr assume stranded library fr-secondstrand -l name prefix for output transcripts (default: STRG) -f minimum isoform fraction (default: 0.1) -m minimum assembled transcript length (default: 200) -o output path/file name for the assembled transcripts GTF (default: stdout) -a minimum anchor length for junctions (default: 10) -j minimum junction coverage (default: 1) -t disable trimming of predicted transcripts based on coverage (default: coverage trimming is enabled) -c minimum reads per bp coverage to consider for transcript assembly (default: 2.5) -v verbose (log bundle processing details) -g gap between read mappings triggering a new bundle (default: 50) -C output a file with reference transcripts that are covered by reads -M fraction of bundle allowed to be covered by multi-hit reads (default:0.95) -p number of threads (CPUs) to use (default: 1) -A gene abundance estimation output file -B enable output of Ballgown table files which will be created in the same directory as the output GTF (requires -G, -o recommended) -b enable output of Ballgown table files but these files will be created under the directory path given as <dir_path> -e only estimate the abundance of given reference transcripts (requires -G) -x do not assemble any transcripts on the given reference sequence(s) -u no multi-mapping correction (default: correction enabled) -h print this usage message and exit Transcript merge usage mode: stringtie --merge [Options] { gtf_list | strg1.gtf ...} With this option StringTie will assemble transcripts from multiple input files generating a unified non-redundant set of isoforms. In this mode the following options are available: -G <guide_gff> reference annotation to include in the merging (GTF/GFF3) -o <out_gtf> output file name for the merged transcripts GTF (default: stdout) -m <min_len> minimum input transcript length to include in the merge (default: 50) -c <min_cov> minimum input transcript coverage to include in the merge (default: 0) -F <min_fpkm> minimum input transcript FPKM to include in the merge (default: 1.0) -T <min_tpm> minimum input transcript TPM to include in the merge (default: 1.0) -f <min_iso> minimum isoform fraction (default: 0.01) -g <gap_len> gap between transcripts to merge together (default: 250) -i keep merged transcripts with retained introns; by default these are not kept unless there is strong evidence for them -l <label> name prefix for output transcripts (default: MSTRG)