Improve the Accuracy of Secondary Structure Prediction Using Sequence Comparison
-
1. In this step, you'll predict the secondary structure for the group I intron P546 domain using multiple homologous sequences using TurboFold. By finding the structure conserved in multiple homologs, the accuracy of structure prediction is better than using a single sequence.
In step 3, you predicted the structure for the Tetrahymena thermophila P546 domain of the group I intron. The accuracy of single sequence structure prediction was relatively poor, but dramatically improved using SHAPE data. In this part of the workshop, you will improve the accuracy of structure prediction for the same sequence using TurboFold to integrate information from four total homologous sequences.
Copy the following FASTA sequence files to your local hard drive: P546.Tetrahymena.thermophila.fasta, P546.Ast.cL2500.fasta, P546.Cgl.mL1917.fasta, and P546.Cbr.cL1917.fasta.
-
2. Use the GUI to predict the secondary structure of P546 using TurboFold.
- 3. Viewing the results
When the calculation is complete, drawing windows will open for all four sequences, showing the predicted structures. Find the drawing window for the Tetrahymena structure. You can compare the predicted structure to that of the actual structure. The CircleCompare plot can be found here. You will find that, using TurboFold with four sequences, the predicted structure is more accurate than that predicted without additional information as in step 3. All the predicted base pairs are in the know structure (PPV, positive predictive value, = 100%). But not all the known base pairs are predicted (sensitivity = 35 / 58 = 60.34%).
Back to Workshop Home or Continue to Next Step
From the RNAstructure GUI, start the TurboFold input form using the menu option this screenshot.
. Choose the four sequence homologs, starting with Tetrahymena. Click the button "..." next to "Sequence File". Choose "P546.Tetrahymena.thermophila.fasta." A CT file name for structure output will be automatically filled next to "CT File". Click the button labeled "ADD-->" to add the sequence and CT file to the list. Then do the same for "P546.Ast.cL2500.fasta", "P546.Cgl.mL1917.fasta", and "P546.Cbr.cL1917.fasta".. The input form will look likeAt this time, the parameters can be adjusted from their default values. Generally, these should be kept at their default values, and they should be kept in default values for the workshop. "Gamma" adjusts the extent to which TurboFold allows each sequence to have its own structure or be a consensus structure. Larger gamma makes the structures more common. The default is 0.3. The number of iterations of refinement, "Iterations", defaults to 3. The "Mode" chooses the way structures are predicted. TurboFold estimates base pairing probabilities, and these need to be resolved to structures. "Maximum Expected Accuracy Mode" is the defauls and recommended mode. "ProbKnot/TurboKnot" is a mode that can predict pseudoknots. It is probably worth repeating the calculation in this mode to test whether common pseudoknots can occur in the structure. "Probability Threshold" can be used to predict structures with fewer pairs, but with pairs that are more likely to be correctly predicted. Under "Mode" is a set of parameters that adjust the structure prediction from pair probabilities. These parameters change depending on the mode being used.
The "Delete Sequence" button and the text box next to it are available for removing a sequence from the list if a sequence is added by mistake.
Click the "START" button to start the calculation.