Namespaces
In XML, namespaces allow to differentiate between elements or attributes that would have the same name otherwise.
General case
In most cases, you don't want to care about namespaces when parsing some XML.
When you see foo:bar in XML elements or attributes, do as if the foo: prefix was not
here, and everything should work as expected:
<root xmlns:xs="https://example.com/xml/schema">
<xs:item xs:type="text">Hello, world!</xs:item>
</root>
>>> @xml_handle_element("root", "item")
... def handler(node):
... yield (node.attributes["type"], node.text)
>>> with open("hello_ns.xml", "rb") as f:
... Parser(f).return_from(handler)
('text', 'Hello, world!')
Namespaced elements
If you want to differentiate between XML elements with the same name but different namespaces, you need to mention namespaces in the handlers using the Clark notation.
When parsing an XML element of name bar and namespace
https://example.com/xml/schema, BigXML looks for a handler in the following order:
{https://example.com/xml/schema}bar: a handler for the specific namespacebar: a handler that does not specify the namespace- No handler is used, the element is ignored
Tip
To match only XML elements of name bar that do not have any namespace, use the following: {}bar.
Example:
<root
xmlns="https://example.com/xml/purple"
xmlns:blue="https://example.com/xml/blue"
xmlns:red="https://example.com/xml/red">
<blue:item>Blue</blue:item>
<item xmlns="https://example.com/xml/blue">Also blue</item>
<red:item>Red</red:item>
<item>Purple</item>
</root>
>>> @xml_handle_element("root", "item")
... def handler_default(node):
... yield ("default", node.text)
>>> @xml_handle_element("root", "{}item")
... def handler_nothing(node):
... yield ("nothing", node.text)
>>> @xml_handle_element("root", "{https://example.com/xml/blue}item")
... def handler_blue(node):
... yield ("blue", node.text)
>>> @xml_handle_element("root", "{https://example.com/xml/purple}item")
... def handler_purple(node):
... yield ("purple", node.text)
>>> with open("colors.xml", "rb") as f:
... for item in Parser(f).iter_from(
... handler_default,
... handler_nothing,
... handler_blue,
... handler_purple,
... ):
... print(item)
('blue', 'Blue')
('blue', 'Also blue')
('default', 'Red')
('purple', 'Purple')
Note
In the example above, handler_purple is used instead of handler_nothing for the item Purple because a default namespace has been attached to <root> with the attribute xmlns.
Namespaced attributes
When accessing the attributes of a node, you can use one of the following keys:
{https://example.com/xml/schema}barto get the attributebarwith the namespacehttps://example.com/xml/schema;{}barto get the attributebarwithout any namespace;barto get an attributebarof any namespace.
Warning
The bar syntax always returns the attribute bar without any namespace if it exists.
However, if such an attribute does not exist but several attributes bar with
various namespaces do exist, one of them will be returned. In that case, which
attribute is returned is not guaranteed, and a warning is emitted accordingly.
Example:
<root
xmlns="https://example.com/xml/purple"
xmlns:blue="https://example.com/xml/blue"
xmlns:red="https://example.com/xml/red">
<item color="Green">Case 0</item>
<item blue:color="Blue">Case 1</item>
<item red:color="Red">Case 2</item>
<item color="Green" blue:color="Blue" red:color="Red">Case 3</item>
</root>
>>> @xml_handle_element("root", "item")
... def handler(node):
... yield node.text
... yield ("default ns", node.attributes["color"])
... yield ("no ns", node.attributes.get("{}color"))
... yield ("blue ns", node.attributes.get("{https://example.com/xml/blue}color"))
>>> with open("attributes_ns.xml", "rb") as f:
... for item in Parser(f).iter_from(handler):
... print(item)
Case 0
('default ns', 'Green')
('no ns', 'Green')
('blue ns', None)
Case 1
('default ns', 'Blue')
('no ns', None)
('blue ns', 'Blue')
Case 2
('default ns', 'Red')
('no ns', None)
('blue ns', None)
Case 3
('default ns', 'Green')
('no ns', 'Green')
('blue ns', 'Blue')
Note
Contrary to XML elements, no default namespace apply to attributes: in the example,
Green is matched by {}color instead of {https://example.com/xml/purple}color.