XPath: MySpace
Want to get data out of MySpace? Here are the XPath strings you need to extract data from a MySpace profile:
Top friends://table[4]/tr/td/table/tr/td/table/tr/td/table/tr/td/a/img/../@hrefName: //*[@class='nametext']Status: //*[@id='ProfileStatus:']Religion: //*[@id='ProfileReligion:']Zodiac Sign: //*[@id='ProfileZodiac Sign:']Smoke / Drink: //*[@id='ProfileSmoke / Drink:']Children: //*[@id='ProfileChildren:']Education: //*[@id='ProfileEducation:']Number of friends: //*[@class='redbtext']Schools: //a[contains (@href, "InterestType=SCH&")
And if you want to get the friends from your friends page:
//a/img/../@href
The problem with this is that MySpace uses a paging mechanism written in JavaScript (just to switch the pages - it doesn’t do anything funky or AJAXy to get the data down and switch it dynamically - it’s just bloat basically) - it is probably easy enough to skip around, but I haven’t spent the time to look under the JavaScript hood since the crazy HTML was enough for me!
When running XPath queries over MySpace, it’s probably a good idea to set your parser on the most ultra-liberal mode you can find, because MySpace is probably the most expensive pile of non-validating crap around.