by Tom Morris
I’d like eventually to have a simple way of getting data from all the UK universities admissions departments. Considering that I’m not particularly bothered about getting any more degrees, I’m not totally sure why. But it certainly would be useful.
So, here are some XPaths for one college - namely, UCL.
First up, the undergraduate page:
Course Titles: //tr[count (td) = 5]/td[1]/a
Course Type: //tr[count (td) = 5]/td[3]
Course Code: //tr[count (td) = 5]/td[5]
Secondly, postgraduate admissions:
Courses: //tr[count (td) = 3 and @height = '12' and td/@height != "25"]
Links: //tr[count (td) = 3 and @height = '12' and td/@height != "25"]/td[1]/a
Qualification: //tr[count (td) = 3 and @height = '12' and td/@height != "25"]/td[3]
Now, universities. Get your act together and make all this information available in an open XML format and stop me having to write these blog entries!
Posted in XPath, Universities | Comments Off
by Tom Morris
Want to get data out of MySpace? Here are the XPath strings you need to extract data from a MySpace profile:
Top friends:
//table[4]/tr/td/table/tr/td/table/tr/td/table/tr/td/a/img/../@href
Name: //*[@class='nametext']
Status: //*[@id='ProfileStatus:']
Religion: //*[@id='ProfileReligion:']
Zodiac Sign: //*[@id='ProfileZodiac Sign:']
Smoke / Drink: //*[@id='ProfileSmoke / Drink:']
Children: //*[@id='ProfileChildren:']
Education: //*[@id='ProfileEducation:']
Number of friends: //*[@class='redbtext']
Schools: //a[contains (@href, "InterestType=SCH&")
And if you want to get the friends from your friends page:
//a/img/../@href
The problem with this is that MySpace uses a paging mechanism written in JavaScript (just to switch the pages - it doesn’t do anything funky or AJAXy to get the data down and switch it dynamically - it’s just bloat basically) - it is probably easy enough to skip around, but I haven’t spent the time to look under the JavaScript hood since the crazy HTML was enough for me!
When running XPath queries over MySpace, it’s probably a good idea to set your parser on the most ultra-liberal mode you can find, because MySpace is probably the most expensive pile of non-validating crap around.
Posted in XPath, MySpace | Comments Off