Inhalt des Dokuments
Computing Maximum Agreement Forests without Cluster Partitioning is Folly
Norbert Zeh (Dalhousie University in Halifax)
Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, not even close. Yet, in practice, they perform remarkably well. In this paper, we shed light on why that is the case. First, our results show that—probably unsurprisingly—the problem kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements at best and often even hurts performance. Our study of the maximal cluster size before and after kernelization explains why: for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. We also observe that the choice of the algorithm applied to solve individual clusters does have a significant impact on performance, even though our limited experiment to evaluate this produced no clear winner. Depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance.
Back to the research colloquium site.