CRISPRでINDELを誘発したゲノム配列が、実際どういうふうに編集されたかを確認するのにわざわざNGSをつかうのはちょっと大げさなので普通のsangerシークエンサーでシークエンスを読んで確認をしたい。しかし、変異は普通ヘテロに入るので、変異が入った部分からシークエンスの波形がずれて重なり合ってくる。
なので、2つの山を分離して、それぞれのシークエンスを読みほぐす必要がある。まあ、波形を表示するアプリケーションで開いて山をたどりながらエディタでACGTを書いていけばいいだけなんだけど、これが結構面倒くさい。
きっと同じようなことを考えてこれを自動でやっちゃうソフトを作っている人がいるはずだ、と思って検索してみたところ、3つ見つかった。
https://tide.nki.nl
と
Synthego
と
CRISP-ID: Detecting CRISPR mediated indels by Sanger sequencing
しかしどれもweb tool。まあいまどきだいたいそんな感じなんだろうけど、ネットに研究データを安直に流すのはどうよ、ってことでローカルでできるアプリがないかと探してみたところ、2つ目のICEについてはオープンソースで公開しているから、ローカルにインストールもできちゃうようだ。それもpythonっぽいので好都合。
ということで
$ git clone git@github.com/synthego-open/ice.git fatal: repository 'git@github.com/synthego-open/ice.git' does not exist
おや?
$ git clone https://github.com/synthego-open/ice Cloning into 'ice'... remote: Enumerating objects: 336, done. remote: Total 336 (delta 0), reused 0 (delta 0), pack-reused 336 Receiving objects: 100% (336/336), 2.02 MiB | 1.68 MiB/s, done. Resolving deltas: 100% (203/203), done.
こっちでOKね。
$ cd ice $ pip3 install -r requirements.txt Collecting pytest==3.4.2 (from -r requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/f1/5c/411ceafef3b5e5486d16f174db18dc26f49e7704dbf59ef488e95db47339/pytest-3.4.2-py2.py3-none-any.whl (189kB) 100% |████████████████████████████████| 194kB 4.2MB/s Requirement already satisfied: biopython>=1.70 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.70) Collecting coverage>=4.4.1 (from -r requirements.txt (line 3)) Downloading https://files.pythonhosted.org/packages/fb/af/ce7b0fe063ee0142786ee53ad6197979491ce0785567b6d8be751d2069e8/coverage-4.5.2.tar.gz (384kB) 100% |████████████████████████████████| 389kB 4.2MB/s Collecting pytest-cov==2.5.1 (from -r requirements.txt (line 4)) Downloading https://files.pythonhosted.org/packages/30/7d/7f6a78ae44a1248ee28cc777586c18b28a1df903470e5d34a6e25712b8aa/pytest_cov-2.5.1-py2.py3-none-any.whl Collecting matplotlib>=2.1.0 (from -r requirements.txt (line 5)) Downloading https://files.pythonhosted.org/packages/28/6c/addb3560777f454b1d56f0020f89e901eaf68a62593d4795e38ddf24bbd6/matplotlib-3.0.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (14.1MB) 100% |████████████████████████████████| 14.1MB 892kB/s Collecting numpy>=1.13.3 (from -r requirements.txt (line 6)) Downloading https://files.pythonhosted.org/packages/88/b8/569d9c702685b595812fbfd9ee04f240653b7a15feec43cc98be3b34e5f5/numpy-1.16.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (13.9MB) 100% |████████████████████████████████| 13.9MB 1.5MB/s Collecting pandas>=0.20.3 (from -r requirements.txt (line 7)) Downloading https://files.pythonhosted.org/packages/99/12/bf4c58eea94cea4f91ff931f284146337814fb8546e6eb0b52584446fd52/pandas-0.24.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (16.3MB) 100% |████████████████████████████████| 16.3MB 732kB/s Requirement already satisfied: scipy>=0.19.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from -r requirements.txt (line 8)) (0.19.1) Collecting sklearn>=0.0 (from -r requirements.txt (line 9)) Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz Collecting xlrd>=1.1.0 (from -r requirements.txt (line 10)) Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB) 100% |████████████████████████████████| 112kB 10.0MB/s Collecting xlsxwriter>=1.0.2 (from -r requirements.txt (line 11)) Downloading https://files.pythonhosted.org/packages/3d/1b/4caecd4efde1d41ba3bef1a81027032a7a6dff7d5112e1731f232c0addb9/XlsxWriter-1.1.2-py2.py3-none-any.whl (142kB) 100% |████████████████████████████████| 143kB 370kB/s Requirement already satisfied: attrs>=17.2.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (18.2.0) Requirement already satisfied: py>=1.5.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (1.7.0) Collecting pluggy<0.7,>=0.5 (from pytest==3.4.2->-r requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/ba/65/ded3bc40bbf8d887f262f150fbe1ae6637765b5c9534bd55690ed2c0b0f7/pluggy-0.6.0-py3-none-any.whl Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (28.8.0) Requirement already satisfied: six>=1.10.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (1.10.0) Collecting kiwisolver>=1.0.1 (from matplotlib>=2.1.0->-r requirements.txt (line 5)) Downloading https://files.pythonhosted.org/packages/fb/96/619db9bf08f652790fa9f3c3884a67dc43da4bdaa185a5aa2117eb4651e1/kiwisolver-1.0.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (108kB) 100% |████████████████████████████████| 112kB 2.4MB/s Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from matplotlib>=2.1.0->-r requirements.txt (line 5)) (2.2.0) Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from matplotlib>=2.1.0->-r requirements.txt (line 5)) (0.10.0) Requirement already satisfied: python-dateutil>=2.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from matplotlib>=2.1.0->-r requirements.txt (line 5)) (2.6.1) Requirement already satisfied: pytz>=2011k in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pandas>=0.20.3->-r requirements.txt (line 7)) (2017.2) Collecting scikit-learn (from sklearn>=0.0->-r requirements.txt (line 9)) Downloading https://files.pythonhosted.org/packages/cb/5f/dfa0a118b8a503e45cd2cf48acb9cf1de8deaf06a3cef1b1c19bd5cbbc45/scikit_learn-0.20.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (7.9MB) 100% |████████████████████████████████| 7.9MB 2.5MB/s Installing collected packages: pluggy, pytest, coverage, pytest-cov, kiwisolver, numpy, matplotlib, pandas, scikit-learn, sklearn, xlrd, xlsxwriter Found existing installation: pluggy 0.8.1 Uninstalling pluggy-0.8.1: Successfully uninstalled pluggy-0.8.1 Found existing installation: pytest 4.2.0 Uninstalling pytest-4.2.0: Successfully uninstalled pytest-4.2.0 Running setup.py install for coverage ... done Found existing installation: numpy 1.13.0 Uninstalling numpy-1.13.0: Successfully uninstalled numpy-1.13.0 Found existing installation: matplotlib 2.0.2 Uninstalling matplotlib-2.0.2: Successfully uninstalled matplotlib-2.0.2 Running setup.py install for sklearn ... done Successfully installed coverage-4.5.2 kiwisolver-1.0.1 matplotlib-3.0.2 numpy-1.16.1 pandas-0.24.1 pluggy-0.6.0 pytest-3.4.2 pytest-cov-2.5.1 scikit-learn-0.20.2 sklearn-0.0 xlrd-1.2.0 xlsxwriter-1.1.2
でけた。
使い方は
Running a single ICE analysis: ./ice_analysis_single.py \ --control ./ice/tests/test_data/good_example_control.ab1 \ --edited ./ice/tests/test_data/good_example_edited.ab1 \ --target AACCAGTTGCAGGCGCCCCA \ --out results/testing \ --verbose Running a batch analysis: ./ice_analysis_batch.py \ --in ./ice/tests/test_data/batch_example.xlsx \ --out ./results/ \ --data ./ice/tests/test_data/ --verbose
だそうな。
やってみる
$ ./ice_analysis_single.py --control ./ice/tests/test_data/good_example_control.ab1 --edited ./ice/tests/test_data/good_example_edited.ab1 --target AACCAGTTGCAGGCGCCCCA --out results/testing --verbose Synthego ICE (https://synthego.com) Version: 1.1.1 Base dir: /Users/kkuro/local/bin/ice/results analyzing 462 number of edit proposals Shape of coefficient matrix: (462, 228) ------------------------------------------------------ Inference Sequence length: 57 Inference sequence Output vector : 4x57 (228) NNLS input shapes -------------------------------------------------------- A (228, 462) b (228,) R_SQUARED 0.9823262526685657 discord (aln window): 0.14 after cutsite: 0.60
結果は
control NNN-NN-NGGGGCATCCTGTGTTCTACCTGGCACCTGTCCCCATAGAAAT edited NNNANNANNGGGCATCCTGTGTTCTACCTGGCACCTGTCCCCATAGAAAT control GAGCGTGAGTGCCCGGGATCTGCTGCGGGGCTGTGCTGGGCTCTTTCTCA edited GAGCGTGAGTGCCCGGGATCTGCTGCGGGGCTGTGCTGGGCTCTTTCTCA control GCCTGGCCCGAAGTTTCCAGATCTGATTGAGCGAGAGAGCAGCAGGACCT edited GCCTGGCCCGAAGTTTCCAGATCTGATTGAGCGAGAGAGCAGCAGGACCT control GCCCCTCTGCTGGGCTCTTACCTTCGCGGCACTCGCCACTGCCCAGCAGC edited GCCCCTCTGCTGGGCTCTTACCTTCGCGGCACTCGCCACTGCCCAGCAGC control AGGTGAGGCCCAACACAACCAGTTGCAGGCGCCCCATGGTGAGCATCAGC edited AGGTGAGGCCCAACACAACCAGTTGCAGGCGCCCC-TGGGGAGAATCACC control CTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCA edited CCCTGGGGGGCCCCCCCCCTGGGGCCCGGGGATTTTTGGGGAGGGAGCCC control AGGTCACATGCTTGTTCATGAGCTCTC-AGGCA- edited AGGGGCCATGTTTTTTTTTAAACCCCCNAGAAAA
こんな感じだね。ターゲットのPAM配列TGGの直前のAがdeleteされていることが検出されているな。
シークエンスデータはこんな感じだった。
コントロールも波形データでないといかんのかなあ→fasta.txtだとだめだった。むぅ。