SCARNA is an alignment tool for a pair of RNA sequences based on the predicted common secondary structure.
we show the performance of SCARNA for the alignment of RNA sequences by computational experiments on the benchmark dataset of tRNAs used by Gardner et al.
We have evaluated the quality of the alignments by the sum-ofpairs score (SPS) and the structure conservation index (SCI) (Gardner et al., 2005). The SPS is defined as the fraction out of all possible nucleotide pairs that are aligned both in the predicted alignment and in the alignment of the reference. The SPS provides a measure of the sensitivity of the prediction. The SCI provides a measure of the conserved secondary structure information contained within the alignment (Washietl et al., 2004). It is a derivative of the score calculated by the RNAalifold consensus folding algorithm (Hofacker et al., 2002; Washietl and Hofacker, 2004) which is based upon the sum of the thermodynamic term and the covariance term. In contrast to the SPS, SCI is independent from a reference alignment. The SCI is close to zero if RNAalifold identifies no common RNA structure in the alignment, whereas a set of perfectly conserved structures has an SCI=1. The SCI points out the structural aspect of alignment accuracy and, therefore, a useful measure in addition to the SPS.
Comparison of accuracies with those of other aligners
Gardner’s benchmark datasets are composed of pairs of tRNA sequences that are classified by sequence identities. Though all the structural alignment programs are not able to align RNA sequences of more than 150 bases without any device, they can align those short tRNA sequences of 71.8 nt in average. The sequences and the reference alignments for calculating the SPS were obtained from the Rfam database. The experimental results are shown in the following fugures.
The SPS and the SCI of SCARNA exceed those of sequence-based methods [e.g. ClustalW, MUSCLE, PCMA, POA (gp), ProAlign and Prrn and are comparable with those of structure-based methods, Foldalign2.0, PMcomp and Stemloc. While the sequencebased methods and structure-based ones have a dramatic divergence in relative performances below about 60% sequence identity, the SPS and the SCI of SCARNA do not come down. In particular, the SPSs of SCARNA outperform most of the structure-based methods in <50% sequence identity.
The comparison of execution time of SCARNA with other methods in the following figure shows its applicability to real sequences.