Daily Snippet, Developer to developer

python’s time:html parser

Nyoba-nyoba code dari Dive Into Python tentang html parser.
Code ini akan membuka dokumen html http://www.detik.com dan mengenali semua link di dalamnya.

import urllibfrom sgmllib import SGMLParser

class URLLister(SGMLParser):
    def reset(self):
        SGMLParser.reset(self)
        self.urls = []

    def start_a(self, attrs):
        href = [v for k, v in attrs if k=='href']
        if href:
            self.urls.extend(href)

sock = urllib.urlopen("http://www.detik.com")
parser = URLLister()
parser.feed(sock.read())
sock.close()
parser.close()
for url in parser.urls: print url

Standard

2 thoughts on “python’s time:html parser

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s