kuroの覚え書き

96の個人的覚え書き

StringTieを入れる

TopHat→cufflinks→cuffdiff→cummeRbundというのがRNAseq解析の王道であるが、最近はHISAT2→StringTie→Ballgownというのが流行りはじめているらしい。これはやっとかないと。ということでまずは環境構築から。
StringTieのインストールはbrewでできるらしいので、早速

$ brew install stringtie

とやってみると、なにやらbrew自体がrubyで動いているのだけれど、rubyのバージョンが古いぞ、と怒られた。

/usr/local/Homebrew/Library/Homebrew/brew.rb:12:in `<main>': Homebrew must be run under Ruby 2.3! You're running 2.0.0. (RuntimeError)

で、OSXrubyのアップデートをするならrbenvをbrewで入れて・・・っておい、そのbrewが動かないんだってば。どうすりゃいいのさ?

$ brew update

とりあえずコレでいいみたい。
せっかくなので

$ brew install rbenv
$ rbenv install 2.4.1
$ rbenv global 2.4.1

さて、気を取り直してstringtieを入れよう。

$ brew install stringtie
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> New Formulae
fruit

Error: No available formula with the name "stringtie" 
==> Searching for a previously deleted formula (in the last month)...
Warning: homebrew/core is shallow clone. To get complete history run:
  git -C "$(brew --repo homebrew/core)" fetch --unshallow

Error: No previously deleted formula found.
==> Searching for similarly named formulae...
==> Searching local taps...
Error: No similarly named formulae found.
==> Searching taps...
==> Searching taps on GitHub...
Error: No formulae found in taps.

なんと。移転したらしい。移転先は教えてくれないのか・・・
brewsci/bioかbrewsci/scienceかな。

$ brew tap brewsci/science
==> Tapping brewsci/science
Cloning into '/usr/local/Homebrew/Library/Taps/brewsci/homebrew-science'...
remote: Counting objects: 598, done.
remote: Compressing objects: 100% (595/595), done.
remote: Total 598 (delta 1), reused 186 (delta 1), pack-reused 0
Receiving objects: 100% (598/598), 536.53 KiB | 681.00 KiB/s, done.
Resolving deltas: 100% (1/1), done.
Tapped 579 formulae (617 files, 1.6MB)
$ brew install stringtie
Updating Homebrew...
==> Installing stringtie from brewsci/science
==> Downloading https://homebrew.bintray.com/bottles-science/stringtie-1.3.3b.el_capitan.b
######################################################################## 100.0%
==> Pouring stringtie-1.3.3b.el_capitan.bottle.tar.gz
&#127866;  /usr/local/Cellar/stringtie/1.3.3b: 5 files, 509.8KB

こっちがアタリだった。

$ stringtie -h
StringTie v1.3.3b usage:
 stringtie <input.bam ..> [-G <guide_gff>] [-l <label>] [-o <out_gtf>] [-p <cpus>]
  [-v] [-a <min_anchor_len>] [-m <min_tlen>] [-j <min_anchor_cov>] [-f <min_iso>]
  [-C <coverage_file_name>] [-c <min_bundle_cov>] [-g <bdist>] [-u]
  [-e] [-x <seqid,..>] [-A <gene_abund.out>] [-h] {-B | -b <dir_path>} 
Assemble RNA-Seq alignments into potential transcripts.
 Options:
 --version : print just the version at stdout and exit
 -G reference annotation to use for guiding the assembly process (GTF/GFF3)
 --rf assume stranded library fr-firststrand
 --fr assume stranded library fr-secondstrand
 -l name prefix for output transcripts (default: STRG)
 -f minimum isoform fraction (default: 0.1)
 -m minimum assembled transcript length (default: 200)
 -o output path/file name for the assembled transcripts GTF (default: stdout)
 -a minimum anchor length for junctions (default: 10)
 -j minimum junction coverage (default: 1)
 -t disable trimming of predicted transcripts based on coverage
    (default: coverage trimming is enabled)
 -c minimum reads per bp coverage to consider for transcript assembly
    (default: 2.5)
 -v verbose (log bundle processing details)
 -g gap between read mappings triggering a new bundle (default: 50)
 -C output a file with reference transcripts that are covered by reads
 -M fraction of bundle allowed to be covered by multi-hit reads (default:0.95)
 -p number of threads (CPUs) to use (default: 1)
 -A gene abundance estimation output file
 -B enable output of Ballgown table files which will be created in the
    same directory as the output GTF (requires -G, -o recommended)
 -b enable output of Ballgown table files but these files will be 
    created under the directory path given as <dir_path>
 -e only estimate the abundance of given reference transcripts (requires -G)
 -x do not assemble any transcripts on the given reference sequence(s)
 -u no multi-mapping correction (default: correction enabled)
 -h print this usage message and exit

Transcript merge usage mode: 
  stringtie --merge [Options] { gtf_list | strg1.gtf ...}
With this option StringTie will assemble transcripts from multiple
input files generating a unified non-redundant set of isoforms. In this mode
the following options are available:
  -G <guide_gff>   reference annotation to include in the merging (GTF/GFF3)
  -o <out_gtf>     output file name for the merged transcripts GTF
                    (default: stdout)
  -m <min_len>     minimum input transcript length to include in the merge
                    (default: 50)
  -c <min_cov>     minimum input transcript coverage to include in the merge
                    (default: 0)
  -F <min_fpkm>    minimum input transcript FPKM to include in the merge
                    (default: 1.0)
  -T <min_tpm>     minimum input transcript TPM to include in the merge
                    (default: 1.0)
  -f <min_iso>     minimum isoform fraction (default: 0.01)
  -g <gap_len>     gap between transcripts to merge together (default: 250)
  -i               keep merged transcripts with retained introns; by default
                   these are not kept unless there is strong evidence for them
  -l <label>       name prefix for output transcripts (default: MSTRG)