Wednesday, April 25, 2007

Preliminary Data Analysis






These are the figures based on the number of tags for each musician. Each different symbol represents a different musician.

Methodology

In order to investigate the question of how people use tags to conceptualize musical genres, I have been downloading tag information made available through lastFM’s Audioscrobbler Web Services. The information is available as XML pages, which I have been extracting the tag values from. I chose to download the tag information from the top 10 musicians for each genre. The reason for using the top 10 musicians of the genre is because as the top tagged members they are more central to the genre and therefore should be the best exemplars for modeling.

After downloading and parsing the data I then removed all of the tags that were only applied one time. I made this decision because the purpose of the project is to model how the users of lastFM conceptualizes a given genre, and if only one person feels that something is deserving of a tag than that is not representative enough of our culture to be accounted for.

This week I have begun conducting descriptive statistics of the data to get an idea of what I have to work with. I intend to continue investigating the information by doing an analysis based on fuzzy set theory. This will involve writing a script in matlab that will compare the values of each of the tags on the musician to see how central that tag is for the musician to the genre in general. By scaling it as a series of fuzzy sets this will remove the weighting that would occur by just doing an analysis of the number of tags. The tags that appear the most frequently and appear to be good predictors of centrality will be used to inform the parameters that are modeled for.

Motivation

After considering which would be interesting genres to explore and then model I settled on the electronic, electronica, ambient, house and IDM.

Electronic- Used 338,561 times by 44,618 people
Ambient- Used 116,422 times by 22,782 people
IDM- Used 41,921 times by 6,679 people
Electronica- Used 164,348 times by 25,479 people
House- Used 53,008 times by 9,899 people

I choose these tags because of the ways in which they overlap and compliment each other. Electronic is the umbrella genre that covers all of the others. The other genres could be seen as subgenres of electronic music, but because of how popular electronic music has become a variety of ill-defined subgenres exist. Unlike many genres, such as rock, punk and punk-rock these electronic genres are not considered derivatives of others one another. IDM is thought of as distinct from house or ambient while still a sub-genre of electronic. For that reason I felt that these would be interesting genres to model in order to see if the distinctly different concept that individuals have of the genres can be made visually apparent.

Introduction

I decided to create this blog so that I could have any easy way to display my progress on the independent research project that I am doing for Professor Hollan in the Human Computer Interaction lab at UCSD. In this project I set out to investigate how our cultural concept of music and genres is represented by the tags applied to music in the social/music networking site lastFM.

By exploring the phenomenon of tagging music I intend to develop a better understanding of what motivates the phenomenon of tagging and how we can use this rich source of user generated content to visually represent complicated ubiquitous ideas like musical genre.