Wednesday, April 25, 2007

Methodology

In order to investigate the question of how people use tags to conceptualize musical genres, I have been downloading tag information made available through lastFM’s Audioscrobbler Web Services. The information is available as XML pages, which I have been extracting the tag values from. I chose to download the tag information from the top 10 musicians for each genre. The reason for using the top 10 musicians of the genre is because as the top tagged members they are more central to the genre and therefore should be the best exemplars for modeling.

After downloading and parsing the data I then removed all of the tags that were only applied one time. I made this decision because the purpose of the project is to model how the users of lastFM conceptualizes a given genre, and if only one person feels that something is deserving of a tag than that is not representative enough of our culture to be accounted for.

This week I have begun conducting descriptive statistics of the data to get an idea of what I have to work with. I intend to continue investigating the information by doing an analysis based on fuzzy set theory. This will involve writing a script in matlab that will compare the values of each of the tags on the musician to see how central that tag is for the musician to the genre in general. By scaling it as a series of fuzzy sets this will remove the weighting that would occur by just doing an analysis of the number of tags. The tags that appear the most frequently and appear to be good predictors of centrality will be used to inform the parameters that are modeled for.

No comments: