Usage

Data preprocessing

Help

python preprocessing -h
usage: preprocessing.py [-h] [--genome GENOME] [--tss TSS] [--outdir OUTDIR]

optional arguments:
  -h, --help            show this help message and exit
  --genome GENOME, -g GENOME
                        genome fasta
  --tss TSS, -t TSS     tss path
  --outdir OUTDIR, -o OUTDIR
                        outdir

File format information

  • genome is a fasta format:

    # fasta format
    head tair10.fa
    >Chr1
    CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT
    CTTTAAATCCTACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTT
    CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
    TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
    GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
    GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
    
  • TSS is a table format file:

    head cage_covered_5_leader.csv
    Chr1,+,AT1G01010.1,3624
    Chr1,+,AT1G01010.1,3662
    Chr1,-,AT1G01020.2,8720
    Chr1,-,AT1G01020.6,8720
    Chr1,-,AT1G01020.1,8720
    Chr1,-,AT1G01020.3,8720
    Chr1,-,AT1G01020.4,8720
    Chr1,-,AT1G01020.5,8720
    

Example

mkdir data_cage_covered;
python preprocessing.py -g ../../tair10.fa \
                        -t cage_covered_5_leader.csv \
                        -o data_cage_covered

gffutils is tested with Python 3.6, 3.7, 3.8, 3.9.

Model training

  • Ray 1.13.0 (Hyperparameters optimization)

  • batch size

  • leanring rate

TSARC:

python trainTSARC.py -h
usage: testTSARC.py [-h] [--model {lr,cnn,gru,lstm,attention}] [--test_data_path TEST_DATA_PATH] [--test_label_path TEST_LABEL_PATH]
                [--model_dict_path MODEL_DICT_PATH]

PyTorch Implementation of TSAR Predict

optional arguments:
  -h, --help            show this help message and exit
  --model {lr,cnn,gru,lstm,attention}
                    model name
  --test_data_path TEST_DATA_PATH
                    test data saved in numpy ndarry
  --test_label_path TEST_LABEL_PATH
                    test label saved in numpy ndarray
  --model_dict_path MODEL_DICT_PATH
                    model saved name

Example

python trainTSARC.py --model ResNet \
         --train_data_path ../../data_cage_covered/class_data.npy \
         --train_label_path ../../data_cage_covered/class_label.npy \
         --lr 0.001 --batch_size 128 \
         --model_save saved_model --epoch 100

TSARL:

usage: testTSARL.py [-h] [--model {cnn,gru,attention}] [--test_data_path TEST_DATA_PATH] [--test_label_path TEST_LABEL_PATH]
                [--scaler SCALER] [--model_dict_path MODEL_DICT_PATH]

PyTorch Implementation of TSAR Predict

optional arguments:
  -h, --help            show this help message and exit
  --model {cnn,gru,attention}
                        model name
  --test_data_path TEST_DATA_PATH
                        test data saved in numpy ndarry
  --test_label_path TEST_LABEL_PATH
                        test label saved in numpy ndarray
  --scaler SCALER       scaler for test data scale
  --model_dict_path MODEL_DICT_PATH
                        model saved name

Example

python trainTSARL.py --model ResNet \
              --train_data_path ../data/regress_data.npy \
              --train_label_path ../data/regress_label.npy \
              --lr 0.001 --batch_size 128 \
              --model_saved saved_model --epoch 200

Model predicting

TSARC:

conda install --channel conda-forge --channel bioconda numpy pandas biopython scikit-learn torch
usage: predictTSARC.py [-h] [--model_id {lr,cnn,gru,lstm,attention,resnet}]
                   [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
                   [--model_path MODEL_PATH]

PyTorch Implementation of TSARC Predict

optional arguments:
  -h, --help            show this help message and exit
  --model_id {lr,cnn,gru,lstm,attention,resnet}
                        model name
  --input_path INPUT_PATH
                        input data csv
  --output_path OUTPUT_PATH
                        output path
  --model_path MODEL_PATH
                        model saved name

TSARL:

python  predictTSARL.py  -h
usage: predictTSARL.py [-h] [--model_id {lr,cnn,gru,lstm,attention,resnet}]
                       [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
                       [--model_path MODEL_PATH]

PyTorch Implementation of TSARL Predict

optional arguments:
  -h, --help            show this help message and exit
  --model_id {lr,cnn,gru,lstm,attention,resnet}
                        model name
  --input_path INPUT_PATH
                        input data csv
  --output_path OUTPUT_PATH
                        output path
  --model_path MODEL_PATH
                        model saved name