Inhalt des Dokuments
Entity Disambiguation by Partitioning Under Heterogeneity Constraints
Falk Hüffner (Postdoc TU Berlin)
We consider disambiguation of entries using a given partition of the entries into groups, where we know that no two entries in one group can refer to the same entity. One application is in correcting inaccurate interlanguage links in Wikipedia, and another is matching user profiles from different social networks. This is formalized by a simple and elegant graph-theoretic model: Given an undirected vertex-colored graph, delete a minimum number of edges such that none of the resulting connected components contain two vertices of the same color. This graph partitioning problem is NP-hard. We present methods based on mathematical programming that allow for an exact solution in real-world Wikipedia interlanguage link networks with millions of vertices. Moreover, these methods perform well in experiments on realistic random data. Finally, we demonstrate that a greedy heuristic provides solutions very close to optimal ones and scales well.
Back to the research colloquium site.