/srv/irclogs.ubuntu.com/2011/05/19/#ubuntu-ca.txt

willwhhey guys01:33
willwhall the chatter died down I see01:33
willwhquick question though..... :)01:33
MagicFabwillwh, ask away01:34
willwhwell - I am using a real simple bash script atm to scrape pages for image links.01:35
willwhI have to do this for all sorts of sites01:35
willwhso, sometimes, it'll be full paths, sometimes relative, etc.01:35
willwhonce sec let me throw it somewhere you can see it :]01:35
willwhI'm looking for tips for expanding it01:35
willwhcomplete tangent - anyone tried irssi-xmpp? :]01:36
willwhI <3 my irssi setup, using screen01:37
MagicFabsorry, I thought your question was about Ubuntu. Perhaps someone else can help.01:38
willwh:)01:38
willwhIt's not01:38
MagicFabI've had my share of scripting this week ;)01:38
willwhthanks for the offer ofc01:38
willwhI'm sure you have!01:38
willwhmine is a really simple 2 liner atm01:39
willwhbut I'd love to really fill it out01:39
willwhhttp://home.willskills.com/~willwh/crunch.txt01:39
willwhwhich allows me to just paste a URL and throw out any line containing http:// (and grep the string 'paris')01:39
willwhin this case01:39
willwhand I can just keep pasting and getting output01:40
bregmawillwh, what else do you want to do with your script?01:46
willwhbregma: I guess I'd like to check for any <img* links01:50
willwhif it's a relative path, perhaps print the whole version of the link out01:50
willwhor just print it, if it's full path01:50
willwhi.e. links to images, as well as images on the page01:51
willwhsomeone I was speaking to in channel a while back (I'd have to grep my logs)01:52
willwhwas going down the perl route - apparently a library that's pretty good for what I want to do01:52
willwhI'd like to stick to bash purely for learning :)01:52
bregmabash doesn't do good complex text handling01:53
willwhah01:53
bregmapython would probably be ideal, perl if you have no other choice01:53
willwhok.01:53
bregmathe classic approach was to use awk for the text processing in a shell script01:54
willwhyes01:55
willwhhttp://stackoverflow.com/questions/5927031/python-get-image-link-from-html - I guess this is an ok primer01:55
willwhkinda of similar to what I want to do01:55
bregmayeah, xpath is the technology you want for extracting stuff from xhtml, and maybe well-formed html01:59
bregmanot my realm of expertise, though02:00
dscasselwillwh: I've used BeautifulSoup (mentioned in your link).04:25
dscasselWorks well, I've found.04:25
dscasselMostly if you know where the element is in the tree.  You might need a bit of code to find it, if there's not an easy API call04:25
willwhdscassel: thank you06:03
willwhI've read though that it's not being maintained any longer?06:03
willwhlxml looks like it might do what I want too06:03
=== maverickpi is now known as maverickpi[afk]
BluesKajHowdy12:31
=== maverickpi[afk] is now known as maverickpi
dscasselwillwh: Whatever works. :)22:23
dscasselI think the Gnome people might be losing their minds.22:24
dscasselBut then, maybe it's genius I just can't see.22:24

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!