Murlet
Experimental Results
Here, we summarize the experimental results presented in the original paper. See that reference for more details.
The Dataset
The dataset consists of 85 multiple alignments of 10 sequences. There are 17 sequence families, and there are five alignments for each family. The dataset is reasonably diverse; its mean length varies from 54 bases to 291 bases, and the mean pairwise sequence identities varies from 40% to 94%. Additionally, we also used BRAlibaseII multiple alignment dataset for the comparison.
Accuracy Measures
The accuracy of the alignments is measured by the standard sum-of-pairs score (SPS). To measure the efficiency of the structural alignment, the consensus structures are predicted from the alignment results using the Pfold program.
The Matthews correlation coefficients (MCC) are then calculated for the predictions. MCC is defined by the formula
\[\mathrm{MCC} = (t_p t_n – f_p f_n)/ \sqrt{(t_p + f_p)(t_p + f_n)(t_n + f_p)(t_n + f_n)}\]
where \( t_p \) indicates the number of correctly predicted base pairs; \( t_n \), the number of base pairs that are correctly predicted as unpaired; \( f_p \), the number of incorrectly predicted base pairs; and \( f_n \), the number of true base pairs that are not predicted.
Note that \(t_n\) is computed in units of base pairs and is very large in most cases.
The numbers are computed by assigning both reference and predicted consensus structures to each sequence using the alignment and then counting the matches and mismatches of base pairs for all the sequences.
Comparison of the SPS and MCC values for several multiple alignment programs.
Murlet | ProbCons | ClustalW | Stemloc | PMMutil | RNAcast | ||
Dataset | #Data | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC |
All | 85 | 0.81/0.71 | 0.80/0.65 | 0.70/0.50 | — | — | — |
Stemioc | 49 | 0.86/0.71 | 0.85/0.66 | 0.73/0.50 | 0.79/0.67 | — | — |
PMMulti | 50 | 0.86/0.71 | 0.85/0.67 | 0.73/0.51 | — | 0.58/0.54 | — |
RNAcast | 53 | 0.80/0.71 | 0.79/0.65 | 0.70/0.53 | — | — | 0.40/0.55 |
Common | 24 | 0.86/0.74 | 0.85/0.71 | 0.74/0.58 | 0.83/0.74 | 0.59/0.59 | 0.49/0.62 |
Each column show the SPS and MCC values of the alignment results. The MCC values are computed for the structures predicted by the Pfold software. Each row shows the average values of SPS and MCC as “SPS/MCC” for each software. The values in the “All” row indicate the average values across all the families. “Stemloc”, “PMMulti”, “RNAcast” and “Common” indicate the average values across the partial alignment set for which Stemloc, PMMulti, RNAcast, and all the programs returned results, respectively. For each row, the highest values of SPS and MCC are shown in bold type face.
Murlet | ProbCons | ClustalW | Stemloc | PMMutil | RNAcast | ||
Dataset | #Data | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC |
All | 481 | 0.88/0.77 | 0.88/0.75 | 0.84/0.72 | — | — | — |
Stemioc | 386 | 0.88/0.78 | 0.88/0.75 | 0.83/0.72 | 0.86/0.78 | — | — |
PMMulti | 374 | 0.89/0.77 | 0.89/0.75 | 0.84/0.72 | — | 0.80/0.74 | — |
RNAcast | 421 | 0.89/0.77 | 0.88/0.74 | 0.85/0.72 | — | — | 0.62/0.66 |
Common | 310 | 0.90/0.77 | 0.89/0.75 | 0.85/0.73 | 0.88/0.77 | 0.81/0.74 | 0.64/0.67 |
This shows the SPS and MCC values for the BRAlibaseII multiple alignment dataset. Although the SPS and MCC values are relatively high and the differences of scores among the programs are smaller than the above dataset, Murlet still shows the highest accuracies with regard to both the SPS and MCC values.
Comparison of time and memory usages
Elapsed time and the maximal resident memory for computing alignments of the datasets. In both figures, x-axis represents the mean length of the sequence families. Y-axes represent the maximal resident physical memory of the process in megabytes (MB) (left) and the elapsed time in minutes (right). Each data point represents a specific sequence family of Table 2. Only the alignments returned correctly are plotted. The memory and time consumptions of ClustalW, ProbCons, and RNAcast are very small when compared with those of the Sankoff-based programs, and several points for these programs coincide in the figure.