orega evolves an input sequence to increase end-to-end distance using a genetic algorithm. Only nucleotides in a specified segment are mutated.
USAGE: orega <input file> <start> <length> <output
file> [options]
<input file> |
The name of a sequence
file containing input data.
Note that lowercase nucleotides are forced single-stranded
in structure prediction. |
<start> |
Integer that indicates the first nucleotide that can be mutated. |
<length> |
Integer that indicates the length of the sequence segment that canb be mutated. |
<output file> |
The name of a FASTA file to which the output will be written. |
--nocomplexity
|
Use the objective function that does not include sequence complexity. The default uses an objective function that includes sequence complexity. |
-d, --dna
|
Use DNA paarmeters for folding. The default is RNA parameters. |
-h --help |
Display the usage details message.
|
-v --version
|
Display version and copyright information for this
interface. |
-a, -A, --alphabet |
Specify the name of a folding alphabet and associated
nearest neighbor parameters. The alphabet is the prefix
for the thermodynamic parameter files, e.g. "rna" for RNA
parameters or "dna" for DNA parameters or a custom
extended/modified alphabet. The thermodynamic parameters
need to reside in the at the location indicated by
environment variable DATAPATH.
The default is "rna" (i.e. use RNA parameters). |
-f, --func |
The objective function, and integer that chooses which objective function to use. 1 = SIMPLE (no sequence complexity used), 2 = COMPLEX (the sequence complexity is used). The default is 2. |
-i, --iter
|
The number of iterations (optimization steps) to run. The default is 1000.
|
-mr, --mutate
|
The mutation rate, the probability that a nucleotide in the target segment should be mutated. The default is 0.03. |
-n, --population |
The population size, the number of concurrent sequences used in the genetic algorithm. The default is 10. |
-rf, --recomb |
The recombination frequency, the number of iterations that are run before a recombination/crossover step occurs. The default is 6. |
--restart |
Specify the name of a previous state file (created with the --save option) that should be loaded to restart an optimization from where it left off before, or continue optimizing a previous result. |
-rr, --crossover |
The recombination rate, the probability that a nucleotide will be selected as a recombination marker. The default is 0.03. |
-rs, --seed |
Specify a random seed. This is required to get exactly reproducible results. The default is to use a seed based on the current system time. |
-sav, --save |
Specify the name of a file where intermediate results can be saved. This file can be used to restart the calculation. |
orega indirectly increases the end-to-end distance of a sequence by evolving a sequence segment to avoid base pairs. The objective function by default also includes the sequence complexity, and this is helpful to keep the sequence from evolving into repeats. The objective function is maximized and is the sum of (the mean probability of nucleotides in the specified segment being unpaired) and (the sequence complexity). The complexity component can optionally be removed with the --nocomplexity option.
The complexity calculation performed by orega assumes a 4-nucleotide alphabet (although RNAstructure can use larger alphabets to include modified nucleotides). It is important to provide a sequence that uses only the standard A, C, G, U/T nucleotide alphabet. Lowercase nucleotides are forced unpaired (as in the rest of RNAstructure), but these should only be included outside the region of sequence being evolved. Lowercase nucleotides within the evolved region will never be allowed to pair as the sequece evolves.
orega-cuda is the cuda version (for execution on cuda-enabled graphics cards). The same options are used, but the cuda-enabled partition function is used in the background. This can dramatically improve runtimes.
- Lai, W. C., Kayedkhordeh, M., Cornell, E. V., Farah, E., Bellaousov, S., Rietmeijer, R., Mathews, D. H., & Ermolenko, D. N.
mRNAs and lncRNAs intrinsically form secondary structures with short end-to-end distances.
Nature Communications, 9: 4328. (2018).
- Reuter, J.S. and Mathews, D.H.
"RNAstructure: software for RNA secondary structure prediction
and analysis."
BMC Bioinformatics, 11:129. (2010).
|