Usage¶
Data preprocessing¶
Help
python preprocessing -h
usage: preprocessing.py [-h] [--genome GENOME] [--tss TSS] [--outdir OUTDIR]
optional arguments:
-h, --help show this help message and exit
--genome GENOME, -g GENOME
genome fasta
--tss TSS, -t TSS tss path
--outdir OUTDIR, -o OUTDIR
outdir
File format information
genome is a fasta format:
# fasta format head tair10.fa >Chr1 CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT CTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTT CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
TSS is a table format file:
head cage_covered_5_leader.csv Chr1,+,AT1G01010.1,3624 Chr1,+,AT1G01010.1,3662 Chr1,-,AT1G01020.2,8720 Chr1,-,AT1G01020.6,8720 Chr1,-,AT1G01020.1,8720 Chr1,-,AT1G01020.3,8720 Chr1,-,AT1G01020.4,8720 Chr1,-,AT1G01020.5,8720
Example
mkdir data_cage_covered;
python preprocessing.py -g ../../tair10.fa \
-t cage_covered_5_leader.csv \
-o data_cage_covered
gffutils is tested with Python 3.6, 3.7, 3.8, 3.9.
Model training¶
Ray 1.13.0 (Hyperparameters optimization)
batch size
leanring rate
TSARC:
python trainTSARC.py -h
usage: testTSARC.py [-h] [--model {lr,cnn,gru,lstm,attention}] [--test_data_path TEST_DATA_PATH] [--test_label_path TEST_LABEL_PATH]
[--model_dict_path MODEL_DICT_PATH]
PyTorch Implementation of TSAR Predict
optional arguments:
-h, --help show this help message and exit
--model {lr,cnn,gru,lstm,attention}
model name
--test_data_path TEST_DATA_PATH
test data saved in numpy ndarry
--test_label_path TEST_LABEL_PATH
test label saved in numpy ndarray
--model_dict_path MODEL_DICT_PATH
model saved name
Example
python trainTSARC.py --model ResNet \
--train_data_path ../../data_cage_covered/class_data.npy \
--train_label_path ../../data_cage_covered/class_label.npy \
--lr 0.001 --batch_size 128 \
--model_save saved_model --epoch 100
TSARL:
usage: testTSARL.py [-h] [--model {cnn,gru,attention}] [--test_data_path TEST_DATA_PATH] [--test_label_path TEST_LABEL_PATH]
[--scaler SCALER] [--model_dict_path MODEL_DICT_PATH]
PyTorch Implementation of TSAR Predict
optional arguments:
-h, --help show this help message and exit
--model {cnn,gru,attention}
model name
--test_data_path TEST_DATA_PATH
test data saved in numpy ndarry
--test_label_path TEST_LABEL_PATH
test label saved in numpy ndarray
--scaler SCALER scaler for test data scale
--model_dict_path MODEL_DICT_PATH
model saved name
Example
python trainTSARL.py --model ResNet \
--train_data_path ../data/regress_data.npy \
--train_label_path ../data/regress_label.npy \
--lr 0.001 --batch_size 128 \
--model_saved saved_model --epoch 200
Model predicting¶
TSARC:
conda install --channel conda-forge --channel bioconda numpy pandas biopython scikit-learn torch
usage: predictTSARC.py [-h] [--model_id {lr,cnn,gru,lstm,attention,resnet}]
[--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
[--model_path MODEL_PATH]
PyTorch Implementation of TSARC Predict
optional arguments:
-h, --help show this help message and exit
--model_id {lr,cnn,gru,lstm,attention,resnet}
model name
--input_path INPUT_PATH
input data csv
--output_path OUTPUT_PATH
output path
--model_path MODEL_PATH
model saved name
TSARL:
python predictTSARL.py -h
usage: predictTSARL.py [-h] [--model_id {lr,cnn,gru,lstm,attention,resnet}]
[--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
[--model_path MODEL_PATH]
PyTorch Implementation of TSARL Predict
optional arguments:
-h, --help show this help message and exit
--model_id {lr,cnn,gru,lstm,attention,resnet}
model name
--input_path INPUT_PATH
input data csv
--output_path OUTPUT_PATH
output path
--model_path MODEL_PATH
model saved name