RNAstructure Command Line Help: File Formats

RNAstructure Command Line Help
RNA Parameter Alphabets

Overview:

RNAstructure is provided with nearest neighbor thermodynamic parameters that estimate folding stability for nucleic acids. The use of these parameters is detailed in the Nearest Neighbor Database (NNDB).

Separate parameter tables (called alphabets) are available for RNA, DNA, in vivo-like RNA conditions, RNA + m6A, RNA + pseudouridine, RNA + N1-methyl-pseudouridine, multiple sequence alignments, and DNA + PZ. Each are detailed below. RNA is the default parameter, and other alphabets are chosen using the --alphabet command line parameter.

Parameter tables are located in RNAstructure/data_tables/. An explanation of the files is available.

RNA:

The RNA parameters are used by default, and use the file prefix "rna". The allowed nucleotides are A, C, G, U, T, X, N, a, c, g, u, t, x, n, and I. U and T are equivalent and mean uracil base. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

The RNA parameters are the Turner 2004 parameters as defined by Mathews et al. 2004. The mulibranch loop paraneters were revised to experimentally determined values as explained by Lu et al. 2009.

DNA:

The DNA parameters can be invoked with --alphabet dna. Many programs also have a --DNA switch to choose DNA. The allowed nucleotides are A, C, G, U, T, X, N, a, c, g, u, t, x, n, and I. U and T are equivalent and mean thymine base. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

The DNA parameters were fit to available experimental data.

in vivo-like RNA:

The in vivo-like RNA parameters are fit to reproduce folding stabilities determined by optical melting experiments in advanced DMEM buffer. These are based in the Turner 2004 rules. They can be invoked with --alphabet dmem.

The allowed nucleotides are A, C, G, U, T, X, N, a, c, g, u, t, x, n, and I. U and T are equivalent and mean uracil base. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

RNA + m6A:

The RNA + m6A parameters include the covalently modified base m6A. They can be invoked with --alphabet m6a.

The allowed nucleotides are A, C, G, U, T, M, 6, X, N, a, c, g, u, m, t, x, n, and I. M, 6, or m are used for m6A. U and T are equivalent and mean uracil base. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

These parameters were reported in Kierzek et al. (2022) and Szabat et al. (2022b).

RNA + Pseudouridine:

The RNA + pseudouridine parameters include the covalent modification pseudouridine. They can be invoked with --alphabet PSU. Alternatively, for sequences in which all U bases are substituted for pseudouridine, the parameters can be invoked with --alphabet FullP. This is more efficient than PSU when no U is present, and U, T, and P all indicate pseudouridine.

The allowed nucleotides are A, C, G, U, T, P, X, N, a, c, g, u, p, t, x, n, and I. P or p are used for pseudouridine. U and T are equivalent. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

RNA + N1-methyl-Pseudouridine:

The RNA + N1-methyl-pseudouridine parameters include the covalent modification N1-methyl-pseudouridine. They can be invoked with --alphabet 1. Alternatively, for sequences in which all U bases are substituted for N1-methyl-pseudouridine, the parameters can be invoked with --alphabet Full1. This is more efficient than 1 when no U is present, and U, T, 1, and M all indicate N1-methyl-pseudouridine.

The allowed nucleotides are A, C, G, U, T, 1, M, X, N, a, c, g, u, m, t, x, n, and I. 1, M or m are used for N1-methyl-pseudouridine. U and T are equivalent. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

Multiple Sequence Alignment Parameters:

The multiple sequence alignment parameters start with the prefix "msa". These are used by the programs AlignmentFold and AlignmentPartition for predicting conserved structures using multiple sequence alignments. In addition to RNA bases, they include the gap nucleotide "-". Generally, these would not be used in other programs.

DNA + PZ:

The DNA + PZ tables include the Hachimoji synthetic bases P and Z for a folding alphabet of A, C, G, T, P, and Z. P forms base pairs with Z. Z also forms a pair with G.

This alphabet can be invoked with --alphabet PZ . The paraneters do not come by default with RNAstructure, but are available upon request.

The allowed nucleotides are A, C, G, U, T, P, Z, X, N, a, c, g, u, p, z, t, x, n, and I. U and T are equivalent and represent thymine. Lowercase nucleotides are forbidden from base pairing. X and N are nucleotides that do not stack or pair. I is an intermolecular linker used for bimolecular structure prediction.

These parameters are reported in Pham et al. (2023).

dna-cuda and rna-cuda:

These are parameters in a legacy format that is used by partition-cuda.

References:

Kierzek, E., Zhang, X., Watson, R. M., Kierzek, R., & Mathews, D. H. (2022).
"Secondary Structure Prediction for RNA Sequences Including N6-methyladenosine."
Nature Communications. 13: 1271.
Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M. and Turner, D.H. (2004).
"Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure."
Proc. Natl. Acad. Sci. USA, 101:7287-7292.
Pham, T.M., Miffin, T., Sun, H., Sharp, K.K., Wang, X., Zhu, M., Hoshika, S., Peterson, R.J., Benner, S.A., Kahn, J.D., & Mathews, D.H. (2023).
"DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters."
ACS Synthetic Biology. 12: 2750-2763.
Szabat, M., Prochota, M., Kierzek, R., Kierzek, E., & Mathews, D.H. (2022).
"A Test and Refinement of Folding Free Energy Nearest Neighbor Parameters for RNA Including N6-Methyladenosine."
Journal of Molecular Biology. 434: 167632.

RNAstructure Command Line Help
RNA Parameter Alphabets

Contents

Index

Overview:

RNA:

DNA:

in vivo-like RNA:

RNA + m6A:

RNA + Pseudouridine:

RNA + N1-methyl-Pseudouridine:

Multiple Sequence Alignment Parameters:

DNA + PZ:

dna-cuda and rna-cuda:

References:

RNAstructure Command Line Help RNA Parameter Alphabets

Contents

Index

Overview:

RNA:

DNA:

in vivo-like RNA:

RNA + m6A:

RNA + Pseudouridine:

RNA + N1-methyl-Pseudouridine:

Multiple Sequence Alignment Parameters:

DNA + PZ:

dna-cuda and rna-cuda:

References:

RNAstructure Command Line Help
RNA Parameter Alphabets