Die Vermeidung von Hubness in Music Information Retrieval
View on FWF Research RadarKeywords
Research Disciplines
In a number of recent publications the so-called ``hubness`` phenomenon has been described and explored as a general problem of machine learning in high dimensional data spaces. Hubs are data points which keep appearing unwontedly often in nearest neighbor lists of many other data points. This effect is particularly problematic in algorithms for similarity search, as the same ``similar`` objects are found over and over again. But it has also adverse effects for the many machine learning algorithms that make use of distance information. The effect has been shown to be a natural consequence of high dimensionality and as such is yet another aspect of the curse of dimensionality. The hub problem has gained particular attention in the field of Music Information Retrieval (MIR) which is the interdisciplinary science of extracting information from music. In MIR, the hub problem has been primarily studied in the context of music recommendation based on modeling of audio similarity. Songs which act as hubs are reported as being similar to very many other songs and hence keep a significant proportion of the audio collection from being recommended at all. Since proper modeling of audio similarity is the central challenge in MIR, a problem like hubness interfering with this endeavor is of major concern for MIR in general. Similar effects exist for other forms of multimedia retrieval and recommendation. The main goal of this project is to conduct an in-depth study of the hubness problem in the context of MIR with the aim of finding ways to avoid or at least attenuate its adverse effects. Our research will focus on three possible solutions: - finding parameterizations of audio similarity which are less prone to hubness - transforming audio similarity spaces thereby avoiding asymmetries that lead to hubness - considering audio similarity spaces as nearest neighbor graphs and using graph theoretic results to avoid hub nodes Although the emphasis of this project is on MIR, results concerning the prevention of hubs will also be of interest and applicability in the broader field of machine learning. Such additional ramifications will be explored where possible and will make sure that our research has not only the potential to solve an important problem in MIR but in general multimedia retrieval and machine learning also.
This project has no linked research outputs in the database.
No additional funding sources recorded.
Research Fields