plug westside meeting?

der.hans PLUGd at LuftHans.com
Wed Apr 27 12:04:40 MST 2011


Am 27. Apr, 2011 schwätzte Lyle Tuttle so:

> I have reserved a room for the regular Westside PLUG Meeting tonight.

Thank you.

Topic: HTML Parsing and Scraping Web Pages

Abstract: HTML is meant to be easy to edit, understand, and parse. In
practice, when it comes to most useful sites, it fails somewhere between
two and five of these. Many implement silly hacks such as regexen or using
an overly (or rather, appropriately) pedantic XML parser, but there's
lots of different, broken HTML out there which will render these options
unworkable. Now you need a proper HTML parser, that handles broken tags
and generally terrible practices. This talk will be about BeautifulSoup, a
lovely, easy to use parser for the Python programming language. We'll go
over some basic usage of BeautifulSoup, with examples that are directly
applicable to the modern interwebs.

Bio: Tuna is one of those coder sorts that runs Gentoo and uses
IRC. He views programming as a Golden Corral, but free of the obvious
consequences: try everything once, or a few times. This has led him from
Ruby as his first language to webapps written in bash, and writing a Qt
app to print batches of Hobby Lobby coupons for his grandmother. Tuna's
current projects are ircpoker, a poker-dealing IRC bot in plain C, and
a project to turn 264 episodes of Frasier transcripts into parsible
data. His dream job would have him doing crazy, different things all the
time. He hopes to find this at Openmoko in Taiwan.

ciao,

der.hans
-- 
#  http://www.LuftHans.com/        http://www.LuftHans.com/Classes/
#  Whenever I hear anyone arguing for slavery, I feel a strong impulse to
#  see it tried on him personally. -- Abraham Lincoln


More information about the PLUG-discuss mailing list