Namespaces
In XML, namespaces allow to differentiate between elements or attributes that would have the same name otherwise.
General case
In most cases, you don't want to care about namespaces when parsing some XML.
When you see foo:bar
in XML elements or attributes, do as if the foo:
prefix was not
here, and everything should work as expected:
<root xmlns:xs="https://example.com/xml/schema">
<xs:item xs:type="text">Hello, world!</xs:item>
</root>
>>> @xml_handle_element("root", "item")
... def handler(node):
... yield (node.attributes["type"], node.text)
>>> with open("hello_ns.xml", "rb") as f:
... Parser(f).return_from(handler)
('text', 'Hello, world!')
Namespaced elements
If you want to differentiate between XML elements with the same name but different namespaces, you need to mention namespaces in the handlers using the Clark notation.
When parsing an XML element of name bar
and namespace
https://example.com/xml/schema
, BigXML looks for a handler in the following order:
{https://example.com/xml/schema}bar
: a handler for the specific namespacebar
: a handler that does not specify the namespace- No handler is used, the element is ignored
Tip
To match only XML elements of name bar
that do not have any namespace, use the following: {}bar
.
Example:
<root
xmlns="https://example.com/xml/purple"
xmlns:blue="https://example.com/xml/blue"
xmlns:red="https://example.com/xml/red">
<blue:item>Blue</blue:item>
<item xmlns="https://example.com/xml/blue">Also blue</item>
<red:item>Red</red:item>
<item>Purple</item>
</root>
>>> @xml_handle_element("root", "item")
... def handler_default(node):
... yield ("default", node.text)
>>> @xml_handle_element("root", "{}item")
... def handler_nothing(node):
... yield ("nothing", node.text)
>>> @xml_handle_element("root", "{https://example.com/xml/blue}item")
... def handler_blue(node):
... yield ("blue", node.text)
>>> @xml_handle_element("root", "{https://example.com/xml/purple}item")
... def handler_purple(node):
... yield ("purple", node.text)
>>> with open("colors.xml", "rb") as f:
... for item in Parser(f).iter_from(
... handler_default,
... handler_nothing,
... handler_blue,
... handler_purple,
... ):
... print(item)
('blue', 'Blue')
('blue', 'Also blue')
('default', 'Red')
('purple', 'Purple')
Note
In the example above, handler_purple
is used instead of handler_nothing
for the item Purple
because a default namespace has been attached to <root>
with the attribute xmlns
.
Namespaced attributes
When accessing the attributes of a node, you can use one of the following keys:
{https://example.com/xml/schema}bar
to get the attributebar
with the namespacehttps://example.com/xml/schema
;{}bar
to get the attributebar
without any namespace;bar
to get an attributebar
of any namespace.
Warning
The bar
syntax always returns the attribute bar
without any namespace if it exists.
However, if such an attribute does not exist but several attributes bar
with
various namespaces do exist, one of them will be returned. In that case, which
attribute is returned is not guaranteed, and a warning is emitted accordingly.
Example:
<root
xmlns="https://example.com/xml/purple"
xmlns:blue="https://example.com/xml/blue"
xmlns:red="https://example.com/xml/red">
<item color="Green">Case 0</item>
<item blue:color="Blue">Case 1</item>
<item red:color="Red">Case 2</item>
<item color="Green" blue:color="Blue" red:color="Red">Case 3</item>
</root>
>>> @xml_handle_element("root", "item")
... def handler(node):
... yield node.text
... yield ("default ns", node.attributes["color"])
... yield ("no ns", node.attributes.get("{}color"))
... yield ("blue ns", node.attributes.get("{https://example.com/xml/blue}color"))
>>> with open("attributes_ns.xml", "rb") as f:
... for item in Parser(f).iter_from(handler):
... print(item)
Case 0
('default ns', 'Green')
('no ns', 'Green')
('blue ns', None)
Case 1
('default ns', 'Blue')
('no ns', None)
('blue ns', 'Blue')
Case 2
('default ns', 'Red')
('no ns', None)
('blue ns', None)
Case 3
('default ns', 'Green')
('no ns', 'Green')
('blue ns', 'Blue')
Note
Contrary to XML elements, no default namespace apply to attributes: in the example,
Green
is matched by {}color
instead of {https://example.com/xml/purple}color
.