Hochdimensionale Datenanalyse im Music Information Retrieval
View on FWF Research RadarKeywords
Research Disciplines
Learning in high dimensional spaces poses a number of challenges which are referred to as the curse of dimensionality. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, is very often relying on high dimensional feature representations and models. The existence of a new aspect of the curse of dimensionality, the so-called hubness, has been first documented and established in MIR as a problem of computing music similarity. Hub songs are, according to the music similarity function, similar to very many other songs and as a consequence appear in very many recommendation lists preventing other songs from being recommended at all. The hubness phenomenon has since then been identified as a general problem of machine learning in high dimensional spaces. It is due to the property of distance concentration which causes all points in a high dimensional data space to be at almost the same distance to each other. Our own previous research efforts have focused on the impact of distance concentration and hubness on nearest neighbor based music recommendation and genre classification. As a result we have developed a general unsupervised method to pre-process and rescale distance spaces which is able to decisively diminish hubness and its adverse effects in music databases but also general machine learning datasets. Research by our own and other research groups has also made it clear that concentration and hubness have an impact on many more distance based algorithms being used in high dimensional data analysis. This proposed project will explore existing and develop new approaches to deal with these problems by studying their effects on a wide range of methods in MIR, but also multimedia and machine learning. In particular we are planning to (i) study and unify rescaling methods to avoid distance concentration, (ii) explore the role of hubness in unsupervised (clustering, visualization) and supervised learning (classification) in high dimensional spaces. The main focus of this project is on MIR since this is where the majority of results on hubness and concentration exist. But the evaluation of our results in the broader field of multimedia and machine learning will make sure that our research has the potential to solve an important problem in MIR and at the same time a general problem of learning in high dimensional spaces.
This project has no linked research outputs in the database.
No additional funding sources recorded.
Research Fields