read_xml {xml2}R Documentation

Read HTML or XML.

Description

Read HTML or XML.

Usage

read_xml(x, encoding = "", ..., as_html = FALSE)

read_html(x, encoding = "", ...)

## S3 method for class 'character'
read_xml(x, encoding = "", ..., as_html = FALSE)

## S3 method for class 'raw'
read_xml(x, encoding = "", base_url = "", ...,
  as_html = FALSE)

## S3 method for class 'connection'
read_xml(x, encoding = "", n = 64 * 1024,
  verbose = FALSE, ..., base_url = "", as_html = FALSE)

Arguments

x

A string, a connection, or a raw vector.

A string can be either a path, a url or literal xml. Urls will be converted into connections either using base::url or, if installed, curl::curl. Local paths ending in .gz, .bz2, .xz, .zip will be automatically uncompressed.

If a connection, the complete connection is read into a raw vector before being parsed.

encoding

Specify a default encoding for the document. Unless otherwise specified XML documents are assumed to be in UTF-8 or UTF-16. If the document is not UTF-8/16, and lacks an explicit encoding directive, this allows you to supply a default.

...

Additional arguments passed on to methods.

as_html

Optionally parse an xml file as if it's html.

base_url

When loading from a connection, raw vector or literal html/xml, this allows you to specify a base url for the document. Base urls are used to turn relative urls into absolute urls.

n

If file is a connection, the number of bytes to read per iteration. Defaults to 64kb.

verbose

When reading from a slow connection, this prints some output on every iteration so you know its working.

Value

An XML document. HTML is normalised to valid XML - this may not be exactly the same transformation performed by the browser, but it's a reasonable approximation.

Examples

# Literal xml/html is useful for small examples
read_xml("<foo><bar /></foo>")
read_html("<html><title>Hi<title></html>")
read_html("<html><title>Hi")

# From a local path
read_html(system.file("extdata", "r-project.html", package = "xml2"))

# From a url
cd <- read_xml("http://www.xmlfiles.com/examples/cd_catalog.xml")
me <- read_html("http://had.co.nz")

[Package xml2 version 0.1.2 Index]