Module HTMLParser :: Class HTMLParser
[frames | no frames]

Class HTMLParser

ParserBase --+
             |
            HTMLParser


Find tags and other markup and call handler functions.

Usage:
    p = HTMLParser()
    p.feed(data)
    ...
    p.close()

Start tags are handled by calling self.handle_starttag() or
self.handle_startendtag(); end tags by self.handle_endtag().  The
data between tags is passed from the parser to the derived class
by calling self.handle_data() with the data as argument (the data
may be split up in arbitrary chunks).  Entity references are
passed by calling self.handle_entityref() with the entity
reference as the argument.  Numeric character references are
passed to self.handle_charref() with the string containing the
reference as the argument.

Method Summary
  __init__(self)
Initialize and reset this instance.
  check_for_whole_start_tag(self, i)
  clear_cdata_mode(self)
  close(self)
Handle any buffered data.
  error(self, message)
  feed(self, data)
Feed data to the parser.
  get_starttag_text(self)
Return full source of start tag: '<...>'.
  goahead(self, end)
  handle_charref(self, name)
  handle_comment(self, data)
  handle_data(self, data)
  handle_decl(self, decl)
  handle_endtag(self, tag)
  handle_entityref(self, name)
  handle_pi(self, data)
  handle_startendtag(self, tag, attrs)
  handle_starttag(self, tag, attrs)
  parse_endtag(self, i)
  parse_pi(self, i)
  parse_starttag(self, i)
  reset(self)
Reset this instance.
  set_cdata_mode(self)
  unescape(self, s)
  unknown_decl(self, data)
Inherited from ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, updatepos

Class Variable Summary
tuple CDATA_CONTENT_ELEMENTS = ('script', 'style')

Method Details

__init__(self)
(Constructor)

Initialize and reset this instance.
Overrides:
markupbase.ParserBase.__init__

close(self)

Handle any buffered data.

feed(self, data)

Feed data to the parser.

        Call this as often as you want, with as little or as much text
        as you want (may include '
').

get_starttag_text(self)

Return full source of start tag: '<...>'.

reset(self)

Reset this instance. Loses all unprocessed data.
Overrides:
markupbase.ParserBase.reset

Class Variable Details

CDATA_CONTENT_ELEMENTS

Type:
tuple
Value:
('script', 'style')                                                    

Generated by Epydoc 2.1 on Fri Dec 14 16:10:52 2007 http://epydoc.sf.net