RNAstructure logo

RNAstructure Command Line Help
orega and orega-cuda

orega evolves an input sequence to increase end-to-end distance using a genetic algorithm. Only nucleotides in a specified segment are mutated.

USAGE: orega <input file> <start> <length> <output file> [options]

Required parameters:

<input file> The name of a sequence file containing input data.
Note that lowercase nucleotides are forced single-stranded in structure prediction.
<start> Integer that indicates the first nucleotide that can be mutated.
<length> Integer that indicates the length of the sequence segment that canb be mutated.
<output file> The name of a FASTA file to which the output will be written.

Options that do not require added values:

--nocomplexity
Use the objective function that does not include sequence complexity. The default uses an objective function that includes sequence complexity.
-d, --dna
Use DNA paarmeters for folding. The default is RNA parameters.
-h --help Display the usage details message.
-v --version
Display version and copyright information for this interface.

Options that require added values:

-a, -A, --alphabet Specify the name of a folding alphabet and associated nearest neighbor parameters. The alphabet is the prefix for the thermodynamic parameter files, e.g. "rna" for RNA parameters or "dna" for DNA parameters or a custom extended/modified alphabet. The thermodynamic parameters need to reside in the at the location indicated by environment variable DATAPATH.
The default is "rna" (i.e. use RNA parameters).
-f, --func The objective function, and integer that chooses which objective function to use. 1 = SIMPLE (no sequence complexity used), 2 = COMPLEX (the sequence complexity is used). The default is 2.
-i, --iter
The number of iterations (optimization steps) to run. The default is 1000.
-mr, --mutate
The mutation rate, the probability that a nucleotide in the target segment should be mutated. The default is 0.03.
-n, --population The population size, the number of concurrent sequences used in the genetic algorithm. The default is 10.
-rf, --recomb The recombination frequency, the number of iterations that are run before a recombination/crossover step occurs. The default is 6.
--restart Specify the name of a previous state file (created with the --save option) that should be loaded to restart an optimization from where it left off before, or continue optimizing a previous result.
-rr, --crossover The recombination rate, the probability that a nucleotide will be selected as a recombination marker. The default is 0.03.
-rs, --seed Specify a random seed. This is required to get exactly reproducible results. The default is to use a seed based on the current system time.
-sav, --save Specify the name of a file where intermediate results can be saved. This file can be used to restart the calculation.

Notes about objective function:

orega indirectly increases the end-to-end distance of a sequence by evolving a sequence segment to avoid base pairs. The objective function by default also includes the sequence complexity, and this is helpful to keep the sequence from evolving into repeats. The objective function is maximized and is the sum of (the mean probability of nucleotides in the specified segment being unpaired) and (the sequence complexity). The complexity component can optionally be removed with the --nocomplexity option.

Notes about alphabet size:

The complexity calculation performed by orega assumes a 4-nucleotide alphabet (although RNAstructure can use larger alphabets to include modified nucleotides). It is important to provide a sequence that uses only the standard A, C, G, U/T nucleotide alphabet. Lowercase nucleotides are forced unpaired (as in the rest of RNAstructure), but these should only be included outside the region of sequence being evolved. Lowercase nucleotides within the evolved region will never be allowed to pair as the sequece evolves.

Notes for cuda:

orega-cuda is the cuda version (for execution on cuda-enabled graphics cards). The same options are used, but the cuda-enabled partition function is used in the background. This can dramatically improve runtimes.

References:

  1. Lai, W. C., Kayedkhordeh, M., Cornell, E. V., Farah, E., Bellaousov, S., Rietmeijer, R., Mathews, D. H., & Ermolenko, D. N.
    mRNAs and lncRNAs intrinsically form secondary structures with short end-to-end distances.
    Nature Communications, 9: 4328. (2018).
  2. Reuter, J.S. and Mathews, D.H.
    "RNAstructure: software for RNA secondary structure prediction and analysis."
    BMC Bioinformatics, 11:129. (2010).