kuroの覚え書き

96の個人的覚え書き

CRISPRでknockoutを作ったときにgenotypingをsanger sequenceすることで行う

CRISPRでINDELを誘発したゲノム配列が、実際どういうふうに編集されたかを確認するのにわざわざNGSをつかうのはちょっと大げさなので普通のsangerシークエンサーでシークエンスを読んで確認をしたい。しかし、変異は普通ヘテロに入るので、変異が入った部分からシークエンスの波形がずれて重なり合ってくる。
なので、2つの山を分離して、それぞれのシークエンスを読みほぐす必要がある。まあ、波形を表示するアプリケーションで開いて山をたどりながらエディタでACGTを書いていけばいいだけなんだけど、これが結構面倒くさい。

きっと同じようなことを考えてこれを自動でやっちゃうソフトを作っている人がいるはずだ、と思って検索してみたところ、3つ見つかった。

https://tide.nki.nl

Synthego

CRISP-ID: Detecting CRISPR mediated indels by Sanger sequencing


しかしどれもweb tool。まあいまどきだいたいそんな感じなんだろうけど、ネットに研究データを安直に流すのはどうよ、ってことでローカルでできるアプリがないかと探してみたところ、2つ目のICEについてはオープンソースで公開しているから、ローカルにインストールもできちゃうようだ。それもpythonっぽいので好都合。

ということで

$ git clone git@github.com/synthego-open/ice.git
fatal: repository 'git@github.com/synthego-open/ice.git' does not exist

おや?

$ git clone https://github.com/synthego-open/ice
Cloning into 'ice'...
remote: Enumerating objects: 336, done.
remote: Total 336 (delta 0), reused 0 (delta 0), pack-reused 336
Receiving objects: 100% (336/336), 2.02 MiB | 1.68 MiB/s, done.
Resolving deltas: 100% (203/203), done.

こっちでOKね。

$ cd ice
$ pip3 install -r requirements.txt
Collecting pytest==3.4.2 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f1/5c/411ceafef3b5e5486d16f174db18dc26f49e7704dbf59ef488e95db47339/pytest-3.4.2-py2.py3-none-any.whl (189kB)
    100% |████████████████████████████████| 194kB 4.2MB/s 
Requirement already satisfied: biopython>=1.70 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.70)
Collecting coverage>=4.4.1 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/fb/af/ce7b0fe063ee0142786ee53ad6197979491ce0785567b6d8be751d2069e8/coverage-4.5.2.tar.gz (384kB)
    100% |████████████████████████████████| 389kB 4.2MB/s 
Collecting pytest-cov==2.5.1 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/30/7d/7f6a78ae44a1248ee28cc777586c18b28a1df903470e5d34a6e25712b8aa/pytest_cov-2.5.1-py2.py3-none-any.whl
Collecting matplotlib>=2.1.0 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/28/6c/addb3560777f454b1d56f0020f89e901eaf68a62593d4795e38ddf24bbd6/matplotlib-3.0.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (14.1MB)
    100% |████████████████████████████████| 14.1MB 892kB/s 
Collecting numpy>=1.13.3 (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/88/b8/569d9c702685b595812fbfd9ee04f240653b7a15feec43cc98be3b34e5f5/numpy-1.16.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (13.9MB)
    100% |████████████████████████████████| 13.9MB 1.5MB/s 
Collecting pandas>=0.20.3 (from -r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/99/12/bf4c58eea94cea4f91ff931f284146337814fb8546e6eb0b52584446fd52/pandas-0.24.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (16.3MB)
    100% |████████████████████████████████| 16.3MB 732kB/s 
Requirement already satisfied: scipy>=0.19.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from -r requirements.txt (line 8)) (0.19.1)
Collecting sklearn>=0.0 (from -r requirements.txt (line 9))
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting xlrd>=1.1.0 (from -r requirements.txt (line 10))
  Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB)
    100% |████████████████████████████████| 112kB 10.0MB/s 
Collecting xlsxwriter>=1.0.2 (from -r requirements.txt (line 11))
  Downloading https://files.pythonhosted.org/packages/3d/1b/4caecd4efde1d41ba3bef1a81027032a7a6dff7d5112e1731f232c0addb9/XlsxWriter-1.1.2-py2.py3-none-any.whl (142kB)
    100% |████████████████████████████████| 143kB 370kB/s 
Requirement already satisfied: attrs>=17.2.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (18.2.0)
Requirement already satisfied: py>=1.5.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (1.7.0)
Collecting pluggy<0.7,>=0.5 (from pytest==3.4.2->-r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/ba/65/ded3bc40bbf8d887f262f150fbe1ae6637765b5c9534bd55690ed2c0b0f7/pluggy-0.6.0-py3-none-any.whl
Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (28.8.0)
Requirement already satisfied: six>=1.10.0 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pytest==3.4.2->-r requirements.txt (line 1)) (1.10.0)
Collecting kiwisolver>=1.0.1 (from matplotlib>=2.1.0->-r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/fb/96/619db9bf08f652790fa9f3c3884a67dc43da4bdaa185a5aa2117eb4651e1/kiwisolver-1.0.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (108kB)
    100% |████████████████████████████████| 112kB 2.4MB/s 
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from matplotlib>=2.1.0->-r requirements.txt (line 5)) (2.2.0)
Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from matplotlib>=2.1.0->-r requirements.txt (line 5)) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from matplotlib>=2.1.0->-r requirements.txt (line 5)) (2.6.1)
Requirement already satisfied: pytz>=2011k in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from pandas>=0.20.3->-r requirements.txt (line 7)) (2017.2)
Collecting scikit-learn (from sklearn>=0.0->-r requirements.txt (line 9))
  Downloading https://files.pythonhosted.org/packages/cb/5f/dfa0a118b8a503e45cd2cf48acb9cf1de8deaf06a3cef1b1c19bd5cbbc45/scikit_learn-0.20.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (7.9MB)
    100% |████████████████████████████████| 7.9MB 2.5MB/s 
Installing collected packages: pluggy, pytest, coverage, pytest-cov, kiwisolver, numpy, matplotlib, pandas, scikit-learn, sklearn, xlrd, xlsxwriter
  Found existing installation: pluggy 0.8.1
    Uninstalling pluggy-0.8.1:
      Successfully uninstalled pluggy-0.8.1
  Found existing installation: pytest 4.2.0
    Uninstalling pytest-4.2.0:
      Successfully uninstalled pytest-4.2.0
  Running setup.py install for coverage ... done
  Found existing installation: numpy 1.13.0
    Uninstalling numpy-1.13.0:
      Successfully uninstalled numpy-1.13.0
  Found existing installation: matplotlib 2.0.2
    Uninstalling matplotlib-2.0.2:
      Successfully uninstalled matplotlib-2.0.2
  Running setup.py install for sklearn ... done
Successfully installed coverage-4.5.2 kiwisolver-1.0.1 matplotlib-3.0.2 numpy-1.16.1 pandas-0.24.1 pluggy-0.6.0 pytest-3.4.2 pytest-cov-2.5.1 scikit-learn-0.20.2 sklearn-0.0 xlrd-1.2.0 xlsxwriter-1.1.2

でけた。

使い方は

Running a single ICE analysis:

./ice_analysis_single.py \
	--control ./ice/tests/test_data/good_example_control.ab1  \
	--edited ./ice/tests/test_data/good_example_edited.ab1 \
	--target AACCAGTTGCAGGCGCCCCA \
	--out results/testing \
	--verbose
Running a batch analysis:

./ice_analysis_batch.py \
	--in ./ice/tests/test_data/batch_example.xlsx \
	--out ./results/ \
	--data ./ice/tests/test_data/
	--verbose

だそうな。
やってみる

$ ./ice_analysis_single.py --control ./ice/tests/test_data/good_example_control.ab1 --edited ./ice/tests/test_data/good_example_edited.ab1 --target AACCAGTTGCAGGCGCCCCA --out results/testing --verbose
Synthego ICE (https://synthego.com)
Version: 1.1.1
Base dir: /Users/kkuro/local/bin/ice/results
analyzing 462 number of edit proposals
Shape of coefficient matrix: (462, 228)
------------------------------------------------------
Inference Sequence length: 57
Inference sequence 
Output vector : 4x57 (228)

NNLS input shapes
--------------------------------------------------------
A (228, 462)
b (228,)
R_SQUARED 0.9823262526685657
discord (aln window): 0.14 after cutsite: 0.60

結果は

control                             NNN-NN-NGGGGCATCCTGTGTTCTACCTGGCACCTGTCCCCATAGAAAT
edited                              NNNANNANNGGGCATCCTGTGTTCTACCTGGCACCTGTCCCCATAGAAAT

control                             GAGCGTGAGTGCCCGGGATCTGCTGCGGGGCTGTGCTGGGCTCTTTCTCA
edited                              GAGCGTGAGTGCCCGGGATCTGCTGCGGGGCTGTGCTGGGCTCTTTCTCA

control                             GCCTGGCCCGAAGTTTCCAGATCTGATTGAGCGAGAGAGCAGCAGGACCT
edited                              GCCTGGCCCGAAGTTTCCAGATCTGATTGAGCGAGAGAGCAGCAGGACCT

control                             GCCCCTCTGCTGGGCTCTTACCTTCGCGGCACTCGCCACTGCCCAGCAGC
edited                              GCCCCTCTGCTGGGCTCTTACCTTCGCGGCACTCGCCACTGCCCAGCAGC

control                             AGGTGAGGCCCAACACAACCAGTTGCAGGCGCCCCATGGTGAGCATCAGC
edited                              AGGTGAGGCCCAACACAACCAGTTGCAGGCGCCCC-TGGGGAGAATCACC

control                             CTCTGGGTGGCCCTCCCTCTGGGCCTCGGGTATTTATGGAGCTGGATCCA
edited                              CCCTGGGGGGCCCCCCCCCTGGGGCCCGGGGATTTTTGGGGAGGGAGCCC

control                             AGGTCACATGCTTGTTCATGAGCTCTC-AGGCA-
edited                              AGGGGCCATGTTTTTTTTTAAACCCCCNAGAAAA

こんな感じだね。ターゲットのPAM配列TGGの直前のAがdeleteされていることが検出されているな。
シークエンスデータは

f:id:k-kuro:20190204191845p:plain
シークエンス波形データ
こんな感じだった。
コントロールも波形データでないといかんのかなあ→fasta.txtだとだめだった。むぅ。