Lately I’ve been doing a lot of research on social networks centered, at least partially, on music. In doing this, I began gathering some statistics on bands that use MySpace. I wanted to get a ballpark figure of how active bands are in the MySpace network. My friend Dan collaborated with me on writing some scripts to gather data. We wrote 3 scripts to get our stats.
Script #1 is a PHP script that looked at MySpace profiles and saved only band profiles to my hard drive. After I chose my range of MySpace profiles to analyze, it was faster to split up the range between a few scripts and run multiple threads. I also tried to make my requests through a proxy but found it to be too slow.
$start_time = time(); $total_music_profile_count = 0; $friendID_start = 100000000; //Start at 100 million $friendID_end = 101000000; //101 million $PATH = ""; //insert path here $ch = curl_init(); /* You can use Tor to surf anonymously... I found this too slow */ //$tor_address = '127.0.0.1:8118'; for($friendID=$friendID_start; $friendID<$friendID_end; ++$friendID){ //Uncomment these to surf anonymously /* curl_setopt ($ch, CURLOPT_PROXY, $tor_address); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_HTTPPROXYTUNNEL, true); curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); */ curl_setopt ($ch, CURLOPT_URL, "http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendID=$friendID"); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); $pattern = '/MySpace music profile/i'; if(preg_match ($pattern, $result ,$matches)){ ++$total_music_profile_count; $filename = "$PATH/mp_$friendID.html"; $file_HANDLE = fopen($filename, "wb"); $bytes_written = fwrite($file_HANDLE, $result); fclose($file_HANDLE); } print "Processed MySpace profile with friend ID: $friendID\n"; } $end_time = time(); $time_elapsed = $end_time - $start_time;
Script #2 is a ruby script that parsed the saved profiles and wrote the band id, band/artist name, profile view count, friends count, last login date, and member since date to a MySQL database. You will notice that the band_member_text regex is commented out. We wanted to know the number of band members in each band, but the content was too variable to evaluate. *I apologize about the syntax highlighting issue you see below. WP-Syntax (powered by Geshi) seems to have a problem understanding quotes in regular expressions.
#!/usr/bin/rubyrequire 'rubygems' require 'mysql' require 'date' def main db = Mysql.new(hostname, user, password, database) #insert db info here path = "" #insert path here entries = Dir.entries(path).reject { |e| e == "." || e == ".." } total_profile_views=0 total_friends=0 today = Date.today max_avg_friends_per_day = 0 max_avg_profile_views_per_day = 0 n = 0 entries.each do |e| text = IO.read(path + "/" + e) band_id_regex = /^mp_([0-9]+).html/i band_name_regex = /<title>\s*MySpace\.com\s*\-\s*([^\-]+)/ profile_views_regex = /Profile Views: (\s*)([0-9]+)/i num_friends_regex = /has <span class="redbtext">(\d+)<\/span> friends/ last_login_regex = /last login: ( )?(\s*)([0-1]?[0-9]\/[0-3]?[0-9]\/[1-2][0-9] {3})/i member_since_regex = /Member Since<\/span><\/td><td id="ProfileMember Since" width="175" bgcolor="#d5e8fb" style="WORD-WRAP: break-word"<((\d|\/){8,10})>\/td</ if e =~ band_id_regex band_id = $1.to_i else band_id = 0 end if text =~ band_name_regex band_name = $1 band_name.gsub!(/'+/, " ") else band_name = "" end if text =~ profile_views_regex profile_views = $2.to_i total_profile_views += profile_views else profile_views = 0 end if text =~ num_friends_regex num_friends = $1.to_i total_friends += num_friends else num_friends = 0 end if text =~ last_login_regex last_login = $3 ll_date = Date.parse(last_login) else last_login = "1900-01-01" puts "last login parsing error for file: " + e end member_since_found = true if text =~ member_since_regex member_since = $1 ms_date = Date.parse(member_since) else member_since = "1900-01-01" #dummy date member_since_found = false ms_date = Date.parse(member_since) puts "member_since parsing error for file: " + e end if(member_since_found) avg_profile_views_per_day = (profile_views)/(today - 3 - ms_date); avg_profile_views_per_day = avg_profile_views_per_day.to_f avg_friends_per_day = (num_friends)/(today - 3 - ms_date); avg_friends_per_day = avg_friends_per_day.to_f else avg_friends_per_day = 0.0 avg_profile_views_per_day = 0.0 end if max_avg_friends_per_day < avg_friends_per_day max_avg_friends_per_day = avg_friends_per_day end if max_avg_profile_views_per_day < avg_profile_views_per_day max_avg_profile_views_per_day = avg_profile_views_per_day end mysql_query = "INSERT INTO music_profile_stats (band_id, band_name, profile_views, num_friends, last_login) VALUES (#{band_id}, '#{band_name}', #{profile_views}, #{num_friends}, '#{last_login}')" db.query(mysql_query) end end main
Results thus far:
- Total MySpace profiles analyzed: 2,132,917
- Total band profiles found: 87,710 (out of the 13.5 million on MySpace - count taken in March of 2008)
- Average number of profile views per band: 3,403
- Average number of friends per band = 201
For each band, I recorded the average amount of profiles views they received per day, as well as the average number of friends they acquired per day. Script #3 is a php script that compiled results about band activity by looking at this data.
require_once('./mysql_connect.php'); bucketize("music_profile_stats", "avg_friends_per_day", 0.01, 4, 1500); bucketize("music_profile_stats", "avg_prof_views_per_day", 0.03, 5, 12500); function bucketize($table, $column, $bucket_start, $bucket_multiplier, $max_val){ $query = "SELECT * FROM $table WHERE 1=1 ORDER BY $column"; $result = mysql_query($query); $bucket = array(); $num = 0; array_push($bucket, $num); $num = $bucket_start; while($num < $max_val){ array_push($bucket, $num); $num *= $bucket_multiplier; } $bucket_step = 0; while($row = mysql_fetch_array($result)){ if(($bucket_step < (count($bucket) -1))&&($row[$column] >= $bucket[$bucket_step+1])) $bucket_step += 1; $temp = $bucket[$bucket_step]; $stats["$temp"] += 1; } print "**********************************************\n"; foreach($stats as $k=>$v){ //normalize data $temp = $k*365; print "$temp per year => $v\n"; } print "**********************************************\n"; }
The results from this script are graphed below.


Conclusion:
- 54% of the bands in my sample have, on average, less than 1 profile view per day and acquire no more than 4 friends per year (very inactive)
- 8% of the bands in my sample have, on average, between 1 and 10 profile views per day and acquire between 4 and 15 friends per year (somewhat inactive)
I would consider the remaining 38% of my sample to be at least active members. Extrapolating these results, of the 13.5 million bands on MySpace, I estimate that approximately 5 million of these bands are active in the MySpace network.
If you enjoyed this post, make sure you subscribe to my RSS feed!











16 Comments
wow, talk about coming out of the blog-gate fast and hard. nice first post, i’m anxious to see what’s next.
Well written article.
hello–great article. do you know out of the 5 million or so active bands- how many would you think are unsigned and need funding. and how many of them are based in the USA. Thanks
Jeremy,
There is no way to tell for sure how many of these bands are unsigned and need funding, but you could certainly use this data to make a reasonable guess. I did not include functionality to determine band location in these scripts, but I am sure it is possible to do with some degree of accuracy.
really cool job! but where are the profile informations saved? i cant find anything after running it. i gave the php script the complete path, but nothing there
thx, felix
atually, i m looking for a solution to get all bands
- from germany
- genre rock
- signed
- more than 150.000 profile views.
is that possible?
yours, felix
Felix,
To answer your first question, the data is first saved in files in the path you specify. Notice these lines in script #1:
$filename = “$PATH/mp_$friendID.html”;
$file_HANDLE = fopen($filename, “wb”);
$bytes_written = fwrite($file_HANDLE, $result);
fclose($file_HANDLE);
The files are stored in the path described by $file_name. Make sure fopen and fwrite are working.
To answer your second question,
- from Germany: probably possible but not implemented in this script
- genre rock: again, probably possible but not implemented in this script
- signed: i think you would have to guess here –> if((profile views > 50,000)&&(num_friends > 2000)) band = signed; …something like that
- more than 150,000 profile views: this data is accessible from these script results
Hi,
This is really a great article.
Tell me, is there a way for you to compare the relation #friends/#profile views/#plays? A daily average, or even a yearly one would present valuable source of info, I’m sure, since the number of plays can be dominant when determining a band’s popularity on Myspace.
Great job, anyway!
NERD
Great info!
Great article, Tony! Quick question – how did you count or calculate the 13.5 million bands number? What does that include? I’m doing some research on this stuff and would really love to know.
Thanks a lot,
Nick
I just added up the number of bands in each genre showed at the bottom of this page: http://music.myspace.com/
Obviously 13.5 million is no longer accurate, but it is certainly not off by an order of magnitude…it’s still close
Hi Tony,
I’m not sure “Top Genres” numbers listed at the bottom of http://music.myspace.com/ actually count bands on MySpace. Here’s the count from today, adding up to ~21MM bands:
Hip Hop 2,682,753 2682753
Rap 2,534,305 2534305
Rock 1,898,020 1898020
R&B 1,672,610 1672610
Other 1,130,460 1130460
Alternative 907,461 907461
Acoustic 788,026 788026
Experimental 642,357 642357
Pop 773,596 773596
Metal 645,195 645195
Indie 586,409 586409
Punk 487,151 487151
Hardcore 472,478 472478
Electronica 252,457 252457
Crunk 502,483 502483
Emo 252,457 252457
Techno 356,667 356667
Reggae 333,455 333455
Two-Step 336,664 336664
Electro 304,379 304379
DeathMetal 293,604 293604
Club 282,235 282235
Country 277,381 277381
Latin 270,929 270929
Reggaeton 264,176 264176
Jazz 239,784 239784
Classic Rock 153,111 153111
House 221,237 221237
Soul 221,145 221145
Funk 211,247 211247
Blues 204,349 204349
Folk 204,349 204349
Comedy 195,909 195909
TOTAL 20,598,839
Three problems I see with this count are that it doesn’t include all genres (there’s a bunch more listed if you click “Show More Genres”), many artists show up multiple times, and each artist can list up to three genres that they are a part of. If you go here: http://topartists.myspace.com/index.cfm?fuseaction=music.topBands and search for “2-step”, you can see that there are multiple pages for Adam Lambert, and some of them list him as 2-step/2-step/2-step. I’m not sure whether each of these listings is counted in MySpace’s “Top Genres” table (haven’t had time to test this, perhaps by signing up as a new band myself).
Do you or others have any thoughts on this stuff? Am I missing something here? In trying to figure out the number of active unique bands on MySpace, your analysis of who’s active is great – how can we figure out the number of uniques?
Thanks,
Nick
Since this was meant to be very ballpark, not so worried about the other genres. The second point is important and change some of these numbers. I agree that my analysis of active is much better than uniques based on the information you have brought to my attention.
One way you could account for the multiple genre issue is take a random sample of myspace band pages and check how many genres each band has. Then you can divide into the total. So if you find the average to be 2.3 genres per band, ( 22 million / 2.3 ) would give you uniques. If you want to be really accurate, then I agree you should take other genres into consideration.
If you want to be REALLY accurate, you could probably write the script to parse profile ids from 0 to some really large number. I would guess that every time a MySpace user is added, the friendID increments by 1 so you may be able to cover most of the range by creating a MySpace account and looking at your friendID. Ideally, a newly created MySpace account would have one of the highest friendID’s so you could use that as your upper limit. I cannot guarantee this will work without looking into it more, but that is my best guess right now.
great and interesting read, i’m really curious how i can apply this to my band page and gather statistics on daily plays along with the types of data you presented, since myspace daily plays resets at midnight each night. maybe you can offer some insight or even write something for the heck of it since i can’t program. you could then offer a web service for people to sign up and that could be lucrative for you.
it would be great to show a growth trend of both plays and friends to someone who may be interested in my band or anyone’s band. we currently get about 100 per day with only 500 friends, which i think is impressive. i notice other bands with 20,000 friends do about the same. thanks for your time and info.
thanks, ~yod
Yod,
Daily song plays might be difficult to get because they are in the Flash player (so they cannot be easily scraped). I will have to look into their API to see if that data is offered.
I also agree that those statistics are very nice to have and I am actually working on web marketing project for bands so I will certainly look into this.
Thanks for your comments and congratulations on your high rate of “active users.” 100 plays per day with 500 friends is great, as is your music.