Am 27. Apr, 2011 schwätzte Lyle Tuttle so: > I have reserved a room for the regular Westside PLUG Meeting tonight. Thank you. Topic: HTML Parsing and Scraping Web Pages Abstract: HTML is meant to be easy to edit, understand, and parse. In practice, when it comes to most useful sites, it fails somewhere between two and five of these. Many implement silly hacks such as regexen or using an overly (or rather, appropriately) pedantic XML parser, but there's lots of different, broken HTML out there which will render these options unworkable. Now you need a proper HTML parser, that handles broken tags and generally terrible practices. This talk will be about BeautifulSoup, a lovely, easy to use parser for the Python programming language. We'll go over some basic usage of BeautifulSoup, with examples that are directly applicable to the modern interwebs. Bio: Tuna is one of those coder sorts that runs Gentoo and uses IRC. He views programming as a Golden Corral, but free of the obvious consequences: try everything once, or a few times. This has led him from Ruby as his first language to webapps written in bash, and writing a Qt app to print batches of Hobby Lobby coupons for his grandmother. Tuna's current projects are ircpoker, a poker-dealing IRC bot in plain C, and a project to turn 264 episodes of Frasier transcripts into parsible data. His dream job would have him doing crazy, different things all the time. He hopes to find this at Openmoko in Taiwan. ciao, der.hans -- # http://www.LuftHans.com/ http://www.LuftHans.com/Classes/ # Whenever I hear anyone arguing for slavery, I feel a strong impulse to # see it tried on him personally. -- Abraham Lincoln