Wednesday, May 9, 2007

Filtering

In order to eliminate extraneous attributes from the arrays and create a more manageable data set I have been working on filtering the tag arrays. I took the normalized values and filtered all of the tags that were more than .5 z scores below the mean. .5 z scores below the norm seemed like a good cut off point because it excluded all tags that only one artist has and most where only two shared a tag.
This, so far, has produced arrays for each genre with about 50 tags. This is about how many tags I was hoping to work with. Now that I have normalized arrays for all of the data I can move on to working on performing a fuzzy clustering analysis.

No comments: