Tom Morris

26
Nov
2006

XPath: MySpace

Want to get data out of MySpace? Here are the XPath strings you need to extract data from a MySpace profile:

Top friends:
//table[4]/tr/td/table/tr/td/table/tr/td/table/tr/td/a/img/../@href
Name: //*[@class='nametext']
Status: //*[@id='ProfileStatus:']
Religion: //*[@id='ProfileReligion:']
Zodiac Sign: //*[@id='ProfileZodiac Sign:']
Smoke / Drink: //*[@id='ProfileSmoke / Drink:']
Children: //*[@id='ProfileChildren:']
Education: //*[@id='ProfileEducation:']
Number of friends: //*[@class='redbtext']
Schools: //a[contains (@href, "InterestType=SCH&")

And if you want to get the friends from your friends page:

//a/img/../@href

The problem with this is that MySpace uses a paging mechanism written in JavaScript (just to switch the pages - it doesn’t do anything funky or AJAXy to get the data down and switch it dynamically - it’s just bloat basically) - it is probably easy enough to skip around, but I haven’t spent the time to look under the JavaScript hood since the crazy HTML was enough for me!

When running XPath queries over MySpace, it’s probably a good idea to set your parser on the most ultra-liberal mode you can find, because MySpace is probably the most expensive pile of non-validating crap around.

Comments are closed.