kuroの覚え書き

96の個人的覚え書き

StringTieでread coverageを計算

Cufflinksに近いプログラムStringTieをHisat2でマッピングしたRNA-seqデータに使う。
Cufflinksにくらべて
It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.
となっており、splice varinatの検出に向いているようだ。

つかいかたは特に難しいことはなく

stringtie hoge.bam -p 4 -o hoge.gtf -G genes.gff -A hoge_abd.txt

 -oでassembled transcriptsのアウトプットファイル
 -Aでgene abundance estimationのアウトプットファイルを指定する

genes.gtfは
Homo_sapiens/NCBI/build37.2/Annotation/Archives/archive-2014-06-02-13-47-29/Genes/genes.gtf
アノテーションファイルをgffに変換しておく

gffread -E genes.gtf -o genes.gff

こんな感じ。

この処理はそんなに時間がかからずに終了する。

$ head hisat_results/knockdown/stringtie_results/knockdown.gtf
# StringTie version 1.3.3b
1	StringTie	transcript	14362	29370	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "4.131087"; FPKM "0.853882"; TPM "1.905962";
1	StringTie	exon	14362	14829	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "3.137510";
1	StringTie	exon	14970	15038	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "2.742950";
1	StringTie	exon	15796	15947	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "1.672794";
1	StringTie	exon	16607	16765	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "0.503783";
1	StringTie	exon	16858	17055	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "5.274298";
1	StringTie	exon	17233	17368	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "4.874069";
1	StringTie	exon	17606	17742	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; reference_id "NR_024540.1"; ref_gene_id "WASH7P"; ref_gene_name "WASH7P"; cov "11.357198";
$ head hisat_results/knockdown/stringtie_results/knockdown_abd.txt 
Gene ID	Gene Name	Reference	Strand	Start	End	Coverage	FPKM	TPM
MIR1302-2	MIR1302-2	1	+	30366	30503	0.000000	0.000000.000000
WASH7P	WASH7P	1	-	14362	29370	39.620258	8.189379	18.279625
STRG.1	-	1	-	14362	29370	18.183537	8.254394	18.424746
LOC729737	LOC729737	1	-	136698	140566	6.494237	1.342332.996250
STRG.2	-	1	-	139789	140534	3.615458	0.747303	1.668066
STRG.3	-	1	-	235944	259103	3.407473	0.704313	1.572108
LOC100132287	LOC100132287	1	+	323892	328581	12.898223	2.666025.950861
LOC100288646	LOC100288646	1	+	329790	342507	0.543889	0.112420.250934
STRG.4	-	1	+	319324	328581	10.151579	2.555893	5.705043