RNAstructure Command Line Help: TurboFold

RNAstructure Command Line Help
TurboFold

TurboFold predicts the common structure for two or more RNA sequences. It does this by generating pairwise alignments between sequences using a hidden markov model (HMM), which supplies extrinsic information to one of three selectable folding modes. The alignments and folding output are iteratively used to improve each other.
TurboFold is available as either a serial (single-threaded) program called TurboFold and a parallelized (multi-processor) program called TurboFold-smp for use in shared memory environments.

USAGE 1: TurboFold <configuration file>

USAGE 2: TurboFold-smp <configuration file>

Required parameters:

The name of a file containing required configuration data.

Options that do not require added values:

-h, -H, --help

Display the usage details message.

Options which require added values:

NONE

Notes:

Unknown nucleotides (e.g. N or X) get randomly mapped to A, C, G, or U by HMM.
As of RNAstructure Version 6.1 TurboFold can accept a multi-sequence FASTA file (using the InFasta configuration parameter).
This may be preferable when there are a large number of input sequences and it is easier to list them all in a singe file rather than individual SEQ or FASTA files.
The names of the output CT files will be automatically generated from the sequence names in the FASTA file. However you can override the default output file names by specifying the OutCT or CT<#> parameters.

Configuration file format:

The following is a description of valid options allowed in the configuration file.

# IMPORTANT CONFIG FILE FORMAT NOTES:
#
# Option lines may be specified by the option name followed by an equals sign
# and the option's desired value. Option names are not case sensitive.
# When specifying an option, there may be nothing else on the line.
# <option> = <value>
#
# Specifying comment lines:
# Comment lines must begin with "#" followed by a space.
# There may not be more than one "#" in a comment line.
# However, a comment line may be an unbroken string of "#", as in a divider 
# between sets of options.
#
# Blank lines are skipped.

# Mode specifies the resolving algorithm TurboFold uses after its initial fold.
# A valid mode is required for TurboFold to run properly.
# Valid modes can be one of three options:
#       1. MEA (Maximum expected accuracy)
#       2. ProbKnot (For pseudoknotted sequences)
#       3. Threshold (Finding most probable pairs)
# Modes should be specified as text strings: MEA, ProbKnot, or Threshold.
# The default mode is MEA.
Mode = MEA|ProbKnot|Threshold

# SequenceNumber specifies the number of sequences given for calculation.
# This is only needed if both sequences and CT files are specified 
# individually (see below).
SequenceNumber = <number of sequences>

#### Listing Input Sequences ####
# There are two formats in which input sequence files can be specified -- either 
# grouped or individually.
#
# 1. Grouped: Place sequence file names in brackets separated by semicolons.
#    Filenames may contain spaces, but no extra space is allowed before or after
#    semicolons or braces.
InSeq = {path/to/input1.seq;path/to/input2.seq;path/to/input3.seq;}
#
# 2. Grouped in fasta format: Specifies the path of the input sequences in fasta 
#    format.
#    The file names cannot contain spaces in this format.

InFasta = path/to/input/sequences.fasta
#
# 3. Individually: Each successive sequence is specified as "Seq<N>" where <N> goes 
#    from 1 to SequenceNumber.
#    The file names cannot contain spaces in this format.
SequenceNumber = 3
Seq1 = path/to/input1.seq
Seq2 = path/to/input2.seq
Seq3 = path/to/input3.seq


#### Listing Output CT files ####
# There are two formats in which output CT files can be specified -- either 
# grouped or individually.
#

# 1. Grouped: Place CT file names in brackets separated by semicolons.
#    Filenames may contain spaces, but no extra space is allowed before or after
#    semicolons or braces.
OutCT = {path/to/output1.ct;path/to/output2.ct;path/to/output3.ct;}
#

    # 2. Individually: Each successive sequence is specified as "CT<N>" where <N> goes 
#    from 1 to SequenceNumber.
#    The file names cannot contain spaces in this format.
CT1 = path/to/output1.ct
CT2 = path/to/output2.ct
CT3 = path/to/output3.ct

# Partiton function save file (PFS) names can be specified for each sequence
# if this type of output is desired. These can be listed individually or grouped.
# Save files are not required.
# Individually:
# Save<N> (where <N> goes from 1 to SequenceNumber).
# There cannot be any spaces in the file names.
Save<N> = <save file N>
# or Grouped:
SaveFiles = {path/to/file1.pfs;path/to/file2.pfs;path/to/file3.pfs;}


# The starting Partition function save file (PFS) names can be specified for each sequence
# if this type of output is desired. These must be listed individually.
# This is the partition function calculated with no extrinsic information.
# This can be useful for measuring the importance of sequence comparison.
# These files are not required.
# Specify as grouped files:
StartingSaveFiles = {path/to/file1.pfs;path/to/file2.pfs;path/to/file3.pfs;}

# The output multiple sequence alignment filename can be specified. 
# Default is output.aln.
OutAln = <filename>

################################################################
# TurboFold options
################################################################
# TurboFold options affect output regardless of the mode specified.

# Gamma specifies the TurboFold gamma value.
# This should not be confused with MeaGamma (below).
# Its default value is 0.3.
Gamma = 0.3

# Iterations specifies the number of iterations TurboFold goes through.
# This should not be confused with PkIterations (below).
# Its default value is 3.
Iterations = 3

# MaximumPairingDistance specified the maximum distance between nucleotides that can pair.
# i.e. for nucleotide i to pair with j, [i - j| < MaximumPairingDistance.
# This applies to each sequence.
# Its default is no limit, which is indicated by a value of zero.
MaximumPairingDistance = 0

# Temperature specifies the temperature at which TurboFold is run, in Kelvin.
# Its default value is 310.15 K, which is 37 degrees C.
Temperature = 310.15

# Processors specifies the number of processors TurboFold is run on.
# Note that this flag only has an effect when TurboFold-smp, the parallel version 
# of TurboFold, is run.
# Its default value is 1.
Processors = 1

# The format of output multiple sequence alignment can be choosen from Fasta or Clustal.
# Default is Clustal.
AlnFormat = Fasta|Clustal

# The number of columns of output multiple sequence alignment can be specified.
# Default is 60
ColumnNumber = 60

################################################################
# Maximum expected accuracy (MEA) mode options
################################################################
# The following options only have an effect when MEA mode is specified. 
# If they are specified when TurboFold is in a different mode, they are ignored.

# MaxPercent specifies the maximum percent energy difference.
# Its default value is 50 (percent).
MaxPercent = 50

# MaxStructures specifies the maximum number of structures to calculate.
# Its default value is 1000 structures.
MaxStructures = 1000

# MeaGamma specifies the MEA mode gamma value.
# This should not be confused with Gamma (above).
# Its default value is 1.0.
MeaGamma = 1.0

# Window specifies the window size.
# Its default value is 5 nucleotides.
Window = 5

################################################################
# Pseudoknot (ProbKnot) mode options
################################################################
# The following options only have an effect when ProbKnot mode is specified. 
# If they are specified when TurboFold is in a different mode, they are ignored.

# MinHelixLength is the minimum helix length allowed during folding.
# Its default value is 3 nucleotides.
MinHelixLength = 3

# Iterations specifies the number of iterations ProbKnot goes through.
# This should not be confused with Iterations (above).
# Its default value is 1.
PkIterations = 1

################################################################
# Probable Pairs (Threshold) mode options
################################################################
# The following options only have an effect when Threshold mode is specified. 
# If they are specified when TurboFold is in a different mode, they are ignored.

# Threshold specifies the probability threshold at which pairs are included in a structure.
# If a threshold is explicitly specified, it should be expressed as a number >= 0.5 and <= 1.0.
# Its default value is 0.
# This signifies that structures should be generated at the following thresholds:
#       >= 0.99, >= 0.97, >= 0.95, >= 0.90, >= 0.80, >= 0.70, >= 0.60, >= 0.50
Threshold = 0

################################################################
# Using SHAPE data
################################################################
# To apply SHAPE data on one or more sequences, use the SHAPEFiles option and 
$ place shape file names in brackets separated by semicolons.
# The order of the files should correspond to list of input sequences.
# Note that there cannot be any spaces whatsoever between the brackets or semicolons.
# The shape file name can be left blank for sequences that lack shape data.
# For example, the following line applies SHAPE data to sequences 1, 2, and 4, but
# not to sequences 3 and 5. (Note the empty slots in the 3rd and 5th positions).
SHAPEFiles = {file1.shape;file2.shape;file4.shape;}

# SHAPE files can also be specified individually, which is often more 
# straightforward when only a few input sequences have SHAPE data:
SHAPE1 = file1.shape
SHAPE4 = file4.shape

# SHAPEintercept specifies the SHAPE intercept used by TurboFold.
# Note that if specified, this value is only used if one or more SHAPE files is also specified.
# Its default value is -0.6 kcal/mol.
SHAPEintercept = -0.6

# SHAPEslope specifies the SHAPE slope used by TurboFold.
# Note that if specified, this value is only used if one or more SHAPE files is also specified.
# Its default value is 1.8 kcal/mol.
SHAPEslope = 1.8

##### Using Rsample Mode for SHAPE data ##### 
# Rsample is a more accurate method of using SHAPE data, which uses stochastic
# sampling to match SHAPE values to structural motifs.
# Rsample mode can be enabled with the following option. 
# (If Rsample is not enabled, SHAPE data are used in legacy mode.)
UseRsample = 1

# The following options only have an effect when Rsample is enabled. 

# Specify a seed number for stochastic sampling in Rsample mode.
Seed = 1

# Cparam and Offset specify the SHAPE constraints in Rsample mode.
# Cparam is used to establish the relationship between free energy and reactivities.
# Offset is used to account for the fact that normalized reactivities can be less than zero.
# The default values for Cparam is 0.5; for Offset is 1.1.
Cparam = 0.5
Offset = 1.1

# NumSamples specify the number of samples for stochastic sampling in Rsample mode.
# The default value is 10000.
NumSamples = 10000

References:

Harmanci, A.O., Sharma, G., and Mathews, D.H.
"TurboFold: Iterative Probabilistic Estimation of Secondary Structures for Multiple RNA Sequences."
BMC Bioinformatics, 12:108. (2011).
Reuter, J.S. and Mathews, D.H.
"RNAstructure: software for RNA secondary structure prediction and analysis."
BMC Bioinformatics, 11:129. (2010).