Personal tools
You are here: Home Bioinformatcs Tools for RNAs Rfold Experimental Results
Document Actions

Experimental Results

by Kiryu Hisanori last modified 2008-03-24 01:10

Here, we summarize the performance of Rfold presented in the original paper. For more details, see the original reference.

Dataset

We extracted 151 alignments of structural RNA families from
the seed alignments in the Rfam7.0 database.
All these alignments had annotated secondary structures that had been published in the literature.
We then selected a single representative sequence from each family that had the maximal number of canonical base pairs.
From these sequences, we created four types of dataset (Datasets1--4).
Dataset1 comprises these 151 RNA sequences.
Dataset2 is created from Dataset1 by appending random sequences
of length e = 100, 300, 500, and 1000 to both the ends of each sequence.
Dataset3 contains a single sequence of length 172k bases,
which is obtained by concatenating the sequences of Dataset1 and the random sequences of length 1000 alternately.
Dataset4 comprises 10 random sequences of length 10k bases, and it is used as the control set
to estimate the false positive rate of the structure predictions.
The random sequences of these datasets were generated
by concatenating the 151 RNA sequences and shuffling the nucleotides of the sequence,  while conserving the dinucleotide frequency.

Accuracy Measures

To estimate the accuracy of the base pairing probabilities (BPPs) p(i, j) and the structure predictions,
we draw receiver operator characteristic (ROC) curves that represent the balance
between the sensitivity to the true base pairs
and the rate of false positives in the non-structured sequences.

In the case of BPP comparison, the sensitivity is defined by
the fraction of the true base pairs that have a BPP larger than the given threshold value p0.
The false positive rate is defined by the frequency of the base pairs (i, j)
with p(i,j)> p0 in the non-structured sequence divided by the length of the sequence.
We draw the ROC curve by examining several values of p0.
In the case of structure prediction, the sensitivity is defined by the fraction of base pairs
that are correctly predicted by the programs.
We define the false positive rate as the fraction of the inner regions (i.e. the segments enclosed by any base pair) in the non-structured sequence.
This definition penalizes long inner regions that contain only a small number of predicted base pairs.
Furthermore, only the base pairs that satisfy the maximal span constraint are counted as  true base pairs
in order to remove the effect of trivial loss of sensitivity to the distant base pairs |i-j|>W.

Comparison of the quality of the computed local base pairing probabilities

Comparison with RNAplfold

This figure shows the ROC curves of computed base pairing probabilities. False positive rates are calculated for Dataset4 (random sequences), and the sensitivities are calculated for Dataset1 (151 Rfam seed sequences) (circles) and Dataset2 with e=1000 (i.e. random sequences of length 1000 are added to both the ends of each Rfam sequence) (triangles). The open and filled symbols represent the ROC curves of Rfold and RNAplfold, respectively. In both the figures, the maximal span W is set to be 800.

Comparison of the accuracy of local structure predictions

Accuracy comparison with RNALfold Comparison of the accuracies of the predicted structures.
We used Dataset4 for the computation of the false positive rate and Dataset3 for the sensitivity.
We examined three values of W ---
50 (circle), 100 (triangle), and 200 (square).
Since RNALfold(denoted by "Lfold" in the figure) has no parameter that strikes the balance
between the sensitivity and the false positive rate,
only one point is plotted for each values of maximal span W.

Comparison of Running Time

BPP Computation  Rfold 15min
  RNAplfold 12min
Structure Prediction Rfold 22min
  RNALfold 30sec
Comparison of running times of Rfold, RNAplfold, and RNALfold.
We used Dataset3, which consists of a sequence with a length of 172k bases.
The maximal span W is set to be 100.

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: