Experimental Results
Here, we summarize the experimental results presented in the original paper. See that reference for more details.
The Dataset
The dataset consists of 85 multiple alignments of 10 sequences.
There are 17 sequence families, and there are five alignments for each family. The dataset is reasonably diverse;
its mean length varies from 54 bases to 291 bases, and
the mean pairwise sequence identities varies from 40% to 94%.
Additionally, we also used BRAlibaseII multiple alignment dataset for the comparison.
Accuracy Measures
The accuracy of the alignments is measured by the standard sum-of-pairs score (SPS).
To measure the efficiency of the structural alignment, the consensus structures are predicted
from the alignment results using the Pfold program.
The Matthews correlation coefficients (MCC) are then calculated for the predictions.
MCC is defined by the formula
MCC = (tp tn - fp fn)/ sqrt((tp + fp)(tp + fn)(tn + fp)(tn + fn))
where tp indicates the number of correctly predicted base pairs;
tn, the number of base pairs that are correctly predicted as unpaired;
fp, the number of incorrectly predicted base pairs;
and fn, the number of true base pairs that are not predicted.
Note that tn is computed in units of base pairs and is very large in most cases.
The numbers are computed by assigning both reference and predicted consensus
structures to each sequence using the alignment and then counting the matches and mismatches of
base pairs for all the sequences.
Comparison of the SPS and MCC values for several multiple alignment programs.
| Murlet | ProbCons | ClustalW | Stemloc | PMMulti | RNAcast | ||
|---|---|---|---|---|---|---|---|
| Dataset | #Data | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC |
| All | 85 | 0.81/0.71 | 0.80/0.65 | 0.70/0.50 | -- | -- | -- |
| Stemloc | 49 | 0.86/0.71 | 0.85/0.66 | 0.73/0.50 | 0.79/0.67 | -- | -- |
| PMMulti | 50 | 0.86/0.71 | 0.85/0.67 | 0.73/0.51 | -- | 0.58/0.54 | -- |
| RNAcast | 53 | 0.80/0.71 | 0.79/0.65 | 0.70/0.53 | -- | -- | 0.40/0.55 |
| Common | 24 | 0.86/0.74 | 0.85/0.71 | 0.74/0.58 | 0.83/0.74 | 0.59/0.59 | 0.49/0.62 |
Each column show the SPS and MCC values of the alignment results. The MCC values are computed for the structures predicted by the Pfold software.
Each row shows the average values of SPS and MCC as "SPS/MCC" for each software.
The values in the "All'' row indicate the average values across all the families.
"Stemloc,'' "PMMulti'', "RNAcast'' and "Common'' indicate the average values across
the partial alignment set for which Stemloc, PMMulti, RNAcast,
and all the programs returned results, respectively.
For each row, the highest values of SPS and MCC are shown in bold type face.
| Murlet | ProbCons | ClustalW | Stemloc | PMMulti | RNAcast | ||
|---|---|---|---|---|---|---|---|
| Dataset | #Data | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC | SPS/MCC |
| All | 481 | 0.88/0.77 | 0.88/0.75 | 0.84/0.72 | -- | -- | -- |
| Stemloc | 386 | 0.88/0.78 | 0.88/0.75 | 0.83/0.72 | 0.86/0.78 | -- | -- |
| PMMulti | 374 | 0.89/0.77 | 0.89/0.75 | 0.84/0.72 | -- | 0.80/0.74 | -- |
| RNAcast | 421 | 0.89/0.77 | 0.88/0.74 | 0.85/0.72 | -- | -- | 0.62/0.66 |
| Common | 310 | 0.90/0.77 | 0.89/0.75 | 0.85/0.73 | 0.88/0.77 | 0.81/0.74 | 0.64/0.67 |
This shows the SPS and MCC values for the BRAlibaseII
multiple alignment dataset. Although the SPS and MCC values are
relatively high and the differences of scores among the programs are
smaller than the above dataset, Murlet still shows the highest
accuracies with regard to both the SPS and MCC values.
Comparison of time and memory usages

Elapsed time and the maximal resident memory for computing alignments of the datasets. In both figures, x-axis represents the mean length of the
sequence families. Y-axes represent the maximal resident physical memory
of the process in megabytes (MB) (left) and the elapsed time in minutes
(right). Each data point represents a specific sequence family of Table 2.
Only the alignments returned correctly are plotted. The memory and time
consumptions of ClustalW, ProbCons, and RNAcast are very small when
compared with those of the Sankoff-based programs, and several points for
these programs coincide in the figure.
