[01:33] hey guys [01:33] all the chatter died down I see [01:33] quick question though..... :) [01:34] willwh, ask away [01:35] well - I am using a real simple bash script atm to scrape pages for image links. [01:35] I have to do this for all sorts of sites [01:35] so, sometimes, it'll be full paths, sometimes relative, etc. [01:35] once sec let me throw it somewhere you can see it :] [01:35] I'm looking for tips for expanding it [01:36] complete tangent - anyone tried irssi-xmpp? :] [01:37] I <3 my irssi setup, using screen [01:38] sorry, I thought your question was about Ubuntu. Perhaps someone else can help. [01:38] :) [01:38] It's not [01:38] I've had my share of scripting this week ;) [01:38] thanks for the offer ofc [01:38] I'm sure you have! [01:39] mine is a really simple 2 liner atm [01:39] but I'd love to really fill it out [01:39] http://home.willskills.com/~willwh/crunch.txt [01:39] which allows me to just paste a URL and throw out any line containing http:// (and grep the string 'paris') [01:39] in this case [01:40] and I can just keep pasting and getting output [01:46] willwh, what else do you want to do with your script? [01:50] bregma: I guess I'd like to check for any if it's a relative path, perhaps print the whole version of the link out [01:50] or just print it, if it's full path [01:51] i.e. links to images, as well as images on the page [01:52] someone I was speaking to in channel a while back (I'd have to grep my logs) [01:52] was going down the perl route - apparently a library that's pretty good for what I want to do [01:52] I'd like to stick to bash purely for learning :) [01:53] bash doesn't do good complex text handling [01:53] ah [01:53] python would probably be ideal, perl if you have no other choice [01:53] ok. [01:54] the classic approach was to use awk for the text processing in a shell script [01:55] yes [01:55] http://stackoverflow.com/questions/5927031/python-get-image-link-from-html - I guess this is an ok primer [01:55] kinda of similar to what I want to do [01:59] yeah, xpath is the technology you want for extracting stuff from xhtml, and maybe well-formed html [02:00] not my realm of expertise, though [04:25] willwh: I've used BeautifulSoup (mentioned in your link). [04:25] Works well, I've found. [04:25] Mostly if you know where the element is in the tree. You might need a bit of code to find it, if there's not an easy API call [06:03] dscassel: thank you [06:03] I've read though that it's not being maintained any longer? [06:03] lxml looks like it might do what I want too === maverickpi is now known as maverickpi[afk] [12:31] Howdy === maverickpi[afk] is now known as maverickpi [22:23] willwh: Whatever works. :) [22:24] I think the Gnome people might be losing their minds. [22:24] But then, maybe it's genius I just can't see.