xml_find_all {xml2} | R Documentation |
Xpath is like regular expressions for trees - it's worth learning if
you're trying to extract nodes from arbitrary locations in a document.
Use xml_find_all
to find all matches - if there's no match you'll
get an empty result. Use xml_find_one
to find a specific match -
if there's no match you'll get an error.
xml_find_all(x, xpath, ns = character()) xml_find_one(x, xpath, ns = character())
x |
A document, node, or node set. |
xpath |
A string containing a xpath (1.0) expression. |
ns |
Optionally, a named vector giving prefix-url pairs, as produced
by |
xml_find_all
always returns a nodeset: if there are no matches
the nodeset will be empty. The result will always be unique; repeated
nodes are automatically de-duplicated.
xml_find_one
returns a node if applied to a node, and a nodeset
if applied to a nodeset. The output is always the same size as
the input. If there are no matches, xml_find_one
will throw an
error; if there are multiple matches, it will use the first with a warning.
x <- read_xml("<foo><bar><baz/></bar><baz/></foo>") xml_find_all(x, ".//baz") xml_path(xml_find_all(x, ".//baz")) # Note the difference between .// and // # // finds anywhere in the document (ignoring the current node) # .// finds anywhere beneath the current node (bar <- xml_find_all(x, ".//bar")) xml_find_all(bar, ".//baz") xml_find_all(bar, "//baz") # Find all vs find one ----------------------------------------------------- x <- read_xml("<body> <p>Some <b>text</b>.</p> <p>Some <b>other</b> <b>text</b>.</p> </body>") para <- xml_find_all(x, ".//p") # If you apply xml_find_all to a nodeset, it finds all matches, # de-duplicates them, and returns as a single list. This means you # never know how many results you'll get xml_find_all(para, ".//b") # xml_find_one only returns one match per input node. If there are 0 # matches it will throw an error; if there are more than one it picks # the first with a warning xml_find_one(para, ".//b") # Namespaces --------------------------------------------------------------- # If the document uses namespaces, you'll need use xml_ns to form # a unique mapping between full namespace url and a short prefix x <- read_xml(' <root xmlns:f = "http://foo.com" xmlns:g = "http://bar.com"> <f:doc><g:baz /></f:doc> <f:doc><g:baz /></f:doc> </root> ') xml_find_all(x, ".//f:doc") xml_find_all(x, ".//f:doc", xml_ns(x))