Class Nokogiri::XML::Document
In: lib/nokogiri/xml/document.rb
ext/nokogiri/xml_text.c
Parent: Nokogiri::XML::Node

Methods

Constants

NCNAME_START_CHAR = "A-Za-z_"   I‘m ignoring unicode characters here. See www.w3.org/TR/REC-xml-names/#ns-decl for more details.
NCNAME_CHAR = NCNAME_START_CHAR + "\\-.0-9"
NCNAME_RE = /^xmlns(:[#{NCNAME_START_CHAR}][#{NCNAME_CHAR}]*)?$/

External Aliases

serialize -> to_xml
dup -> clone

Attributes

errors  [RW]  A list of Nokogiri::XML::SyntaxError found when parsing a document

Public Class methods

Create a new document with version (defaults to "1.0")

Parse an XML file.

string_or_io may be a String, or any object that responds to read and close such as an IO, or StringIO.

url (optional) is the URI where this document is located.

encoding (optional) is the encoding that should be used when processing the document.

options (optional) is a configuration object that sets options during parsing, such as Nokogiri::XML::ParseOptions::RECOVER. See the Nokogiri::XML::ParseOptions for more information.

block (optional) is passed a configuration object on which parse options may be set.

When parsing untrusted documents, it‘s recommended that the nonet option be used, as shown in this example code:

  Nokogiri::XML::Document.parse(xml_string) { |config| config.nonet }

Nokogiri.XML() is a convenience method which will call this method.

JRuby Wraps Java‘s org.w3c.dom.document and returns Nokogiri::XML::Document

Public Instance methods

<<(node_or_tags)

Alias for add_child

Canonicalize a document and return the results. Takes an optional block that takes two parameters: the obj and that node‘s parent. The obj will be either a Nokogiri::XML::Node, or a Nokogiri::XML::Namespace The block must return a non-nil, non-false value if the obj passed in should be included in the canonicalized document.

Recursively get all namespaces from this node and its subtree and return them as a hash.

For example, given this document:

  <root xmlns:foo="bar">
    <bar xmlns:hello="world" />
  </root>

This method will return:

  { 'xmlns:foo' => 'bar', 'xmlns:hello' => 'world' }

WARNING: this method will clobber duplicate names in the keys. For example, given this document:

  <root xmlns:foo="bar">
    <bar xmlns:foo="baz" />
  </root>

The hash returned will look like this: { ‘xmlns:foo’ => ‘bar’ }

Non-prefixed default namespaces (as in "xmlns=") are not included in the hash.

Note that this method does an xpath lookup for nodes with namespaces, and as a result the order may be dependent on the implementation of the underlying XML library.

Create a CDATA Node containing string

Create a Comment Node containing string

Create an element with name, and optionally setting the content and attributes.

  doc.create_element "div" # <div></div>
  doc.create_element "div", :class => "container" # <div class='container'></div>
  doc.create_element "div", "contents" # <div>contents</div>
  doc.create_element "div", "contents", :class => "container" # <div class='container'>contents</div>
  doc.create_element "div" { |node| node['class'] = "container" } # <div class='container'></div>

Create a new entity named name.

type is an integer representing the type of entity to be created, and it defaults to Nokogiri::XML::EntityDecl::INTERNAL_GENERAL. See the constants on Nokogiri::XML::EntityDecl for more information.

external_id, system_id, and content set the External ID, System ID, and content respectively. All of these parameters are optional.

Apply any decorators to node

Get the list of decorators given key

A reference to self

Copy this Document. An optional depth may be passed in, but it defaults to a deep copy. 0 is a shallow copy, 1 is a deep copy.

Get the encoding for this Document

Set the encoding string for this Document

Create a Nokogiri::XML::DocumentFragment from tags Returns an empty fragment if tags is nil.

The name of this document. Always returns "document"

Remove all namespaces from all nodes in the document.

This could be useful for developers who either don‘t understand namespaces or don‘t care about them.

The following example shows a use case, and you can decide for yourself whether this is a good thing or not:

  doc = Nokogiri::XML <<-EOXML
     <root>
       <car xmlns:part="http://general-motors.com/">
         <part:tire>Michelin Model XGV</part:tire>
       </car>
       <bicycle xmlns:part="http://schwinn.com/">
         <part:tire>I'm a bicycle tire!</part:tire>
       </bicycle>
     </root>
     EOXML

  doc.xpath("//tire").to_s # => ""
  doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => "<part:tire>Michelin Model XGV</part:tire>"
  doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => "<part:tire>I'm a bicycle tire!</part:tire>"

  doc.remove_namespaces!

  doc.xpath("//tire").to_s # => "<tire>Michelin Model XGV</tire><tire>I'm a bicycle tire!</tire>"
  doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => ""
  doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => ""

For more information on why this probably is not a good thing in general, please direct your browser to tenderlovemaking.com/2009/04/23/namespaces-in-xml.html

Get the root node for this document.

Set the root element on this document

Explore a document with shortcut methods. See Nokogiri::Slop for details.

Note that any nodes that have been instantiated before slop! is called will not be decorated with sloppy behavior. So, if you‘re in irb, the preferred idiom is:

  irb> doc = Nokogiri::Slop my_markup

and not

  irb> doc = Nokogiri::HTML my_markup
  ... followed by irb's implicit inspect (and therefore instantiation of every node) ...
  irb> doc.slop!
  ... which does absolutely nothing.

JRuby Returns Java‘s org.w3c.dom.document of this Document.

Get the url name for this document.

Validate this Document against it‘s DTD. Returns a list of errors on the document or nil when there is no DTD.

Get the XML version for this Document

[Validate]