Cufflinksに近いプログラムStringTieをHisat2でマッピングしたRNA-seqデータに使う。
Cufflinksにくらべて
It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.
となっており、splice varinatの検出に向いているようだ。
つかいかたは特に難しいことはなく
stringtie hoge.bam -p 4 -o hoge.gtf -G genes.gff -A hoge_abd.txt
-oでassembled transcriptsのアウトプットファイル
-Aでgene abundance estimationのアウトプットファイルを指定する
genes.gtfは
Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf
のアノテーションファイルをgffに変換しておく
gffread -E genes.gtf -o genes.gff
こんな感じ。
この処理はそんなに時間がかからずに終了する。
$ head hisat_results/knockdown/stringtie_results/knockdown.gtf # StringTie version 1.3.3b 1 StringTie transcript 14362 29370 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "4.131087"; FPKM "0.853882"; TPM "1.905962"; 1 StringTie exon 14362 14829 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "3.137510"; 1 StringTie exon 14970 15038 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "2.742950"; 1 StringTie exon 15796 15947 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "1.672794"; 1 StringTie exon 16607 16765 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "0.503783"; 1 StringTie exon 16858 17055 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "5.274298"; 1 StringTie exon 17233 17368 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "4.874069"; 1 StringTie exon 17606 17742 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "11.357198";
$ head hisat_results/knockdown/stringtie_results/knockdown_abd.txt Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM MIR1302-2 MIR1302-2 1 + 30366 30503 0.000000 0.000000.000000 WASH7P WASH7P 1 - 14362 29370 39.620258 8.189379 18.279625 STRG.1 - 1 - 14362 29370 18.183537 8.254394 18.424746 LOC729737 LOC729737 1 - 136698 140566 6.494237 1.342332.996250 STRG.2 - 1 - 139789 140534 3.615458 0.747303 1.668066 STRG.3 - 1 - 235944 259103 3.407473 0.704313 1.572108 LOC100132287 LOC100132287 1 + 323892 328581 12.898223 2.666025.950861 LOC100288646 LOC100288646 1 + 329790 342507 0.543889 0.112420.250934 STRG.4 - 1 + 319324 328581 10.151579 2.555893 5.705043