In this thesis we present an implementation of, improvements of and experiments on the two algorithms presented by [SODA 2013].
The algorithms calculate the triplet and quartet distances on two respectively rooted and unrooted trees, of arbitrary degree. Both the triplet and the quartet distances are measures for comparing the similarity between two trees.
The triplet distance calculation operates on sets of three leaves (triplets). The quartet distance calculation operates on sets of four leaves (quartets). The distances are defined as the number of triplets or quartets with different structures in the two trees.
The triplet distance calculation algorithm presented in [SODA 2013] runs in time $O(n \cdot \lg n)$. The space usage of the algorithm is $O(n \cdot \min(d_1, \lg n))$, where $d_1$ is the degree of the first input tree, $T_1$.
The quartet distance calculation algorithm runs in time $O(\max(d_1, d_2) \cdot n \cdot \lg n)$ where $d_2$ is the degree of the second input tree, $T_2$. The space usage of the algorithm is $O(\max(d_1, d_2) \cdot n \cdot \min(\max(d_1, d_2), \lg n))$.
We improve the quartet distance calculation algorithm, reducing the runtime to $O(\min(d_1, d_2) \cdot n \cdot \lg n)$. This improvement furthermore decreases the space usage to $O(\min(d_1, d_2) \cdot n \cdot \min(d_1, d_2, \lg n))$.
We furthermore significantly reduce the number of calculations needed to calculate the quartet distance.
Through experiments we provide empirical evidence that both the algorithms presented in [SODA 2013], as well as our improvements, are feasible and perform well in practice.
The Makefile specifies -m64, meaning that all compiles will result in 64-bit programs. This is only useful when your compiler supports it, and you intend to run the program on a 64-bit system.
To instead compile a 32-bit version, manually edit the file Makefile to state -m32 in the two places (CFLAGS and LDFLAGS at the top) where it currently specifies -m64. As noted below, you will further have to specify NO_N4_128=1 for all compiles.
Many different variations are supported. The combination is specified at compile-time. We here list a number of useful commands.
A few variations are supported. The combination is specified at compile-time. We here list the commands.