Handlers

The methods iter_from and return_from take handlers as arguments.

Functions

A handler can be a generator function taking a node as an argument.

Such functions are usually decorated with xml_handle_element or xml_handle_text, to restrict the type of nodes they are called with.

<inventory>
    <book>9780261103573</book>
    <lego ean="5702014975200">79005</lego>
    <dvd>0883929452996</dvd>
</inventory>
>>> @xml_handle_element("inventory", "book")
... @xml_handle_element("inventory", "dvd")
... def handle_ean(node):
...     yield (node.text, node.name)

>>> @xml_handle_element("inventory", "lego")
... def handle_lego(node):
...     yield (node.attributes["ean"], "toy")

>>> with open("inventory.xml", "rb") as stream:
...     for ean, kind in Parser(stream).iter_from(handle_ean, handle_lego):
...         print(f"{ean} ({kind})")
9780261103573 (book)
5702014975200 (toy)
0883929452996 (dvd)

Note

To handle different kinds of nodes at once, the same function can be decorated several times with xml_handle_element or xml_handle_text as shown above.

Classes

Passing a class as a handler is a good way to group the handling of a node and its children.

Note

Although not mandatory, using a dataclass feels quite natural in most cases. See this recipe for more information.

Let's parse the following XML file:

<root>
    <cart user="Alice">
        <product price="7.35">9781846975769</product>
        <product price="2.12">9780008322052</product>
    </cart>
    <cart user="Bob">
        <product price="4.99">9780008117498</product>
        <product price="8.14">9780340960196</product>
        <product price="7.37">9780099580485</product>
    </cart>
</root>

Class instantiation

The class is instantiated automatically when a matching node is encountered:

>>> @xml_handle_element("root", "cart")
... class Cart:
...     pass

>>> with open("carts.xml", "rb") as stream:
...     for instance in Parser(stream).iter_from(Cart):
...         print(instance)
<__main__.Cart object...>
<__main__.Cart object...>

If your class has an __init__ method taking one mandatory parameter as argument, that argument is supplied with the encountered node:

>>> @xml_handle_element("root", "cart")
... class Cart:
...     def __init__(self, node):
...         self.user = node.attributes["user"]

>>> with open("carts.xml", "rb") as stream:
...     for instance in Parser(stream).iter_from(Cart):
...         print(f"{instance} for user {instance.user}")
<__main__.Cart object...> for user Alice
<__main__.Cart object...> for user Bob

Class methods as sub-handlers

The methods decorated with xml_handle_element or xml_handle_text are used as sub-handlers:

>>> @xml_handle_element("root", "cart")
... class Cart:
...     def __init__(self):
...         self.price = 0.0
...
...     @xml_handle_element("product")
...     def handle_product(self, node):
...         self.price += float(node.attributes["price"])

>>> with open("carts.xml", "rb") as stream:
...     for instance in Parser(stream).iter_from(Cart):
...         print(f"{instance} total {instance.price:.2f}")
<__main__.Cart object...> total 9.47
<__main__.Cart object...> total 20.50

Note

If such a class method yields some items, they are ignored and a warning message is issued. This behavior can be changed as explained below.

Changing yielded items

As seen above, the class handler yields the class instance. This default behavior can be changed by implementing an xml_handler method:

>>> @xml_handle_element("root", "cart")
... class Cart:
...     def __init__(self, node):
...         self.user = node.attributes["user"]
...         self.price = 0.0
...
...     @xml_handle_element("product")
...     def handle_product(self, node):
...         self.price += float(node.attributes["price"])
...
...     def xml_handler(self):
...         yield (self.user, self.price)

>>> with open("carts.xml", "rb") as stream:
...     for user, price in Parser(stream).iter_from(Cart):
...         print(f"{user} total {price:.2f}")
Alice total 9.47
Bob total 20.50

You can add a single mandatory parameter to xml_handler. In that case, it will be an iterator whose items are yielded by the sub-handlers.

We can rewrite a previous example to leverage this behavior:

>>> @xml_handle_element("root", "cart")
... class Cart:
...     @xml_handle_element("product")
...     def handle_product(self, node):
...         yield float(node.attributes["price"])
...
...     def xml_handler(self, iterator):
...         yield sum(iterator)

>>> with open("carts.xml", "rb") as stream:
...     for price in Parser(stream).iter_from(Cart):
...         print(price)
9.47
20.50

Warning

The children of the node handled by the class instance are parsed as the same time as the iterator is being iterated over. It is up to you to consume the iterator and consider the side-effects your methods may have.

See also

Syntactic sugar

To avoid creating a handler that simply yields the node, the following handler types are available:

tuple of str / list of str

("html", "body", "p") is equivalent to the following handler:

@xml_handle_element("html", "body", "p")
def handler(node):
    yield node
str

"p" is equivalent to ["p"] or to the following handler:

@xml_handle_element("p")
def handler(node):
    yield node

This allows to quickly iterate over a specific type of children of the current node:

>>> @xml_handle_element("root", "cart")
... def handler(node):
...     yield (
...         node.attributes["user"],
...         sum(
...             float(subnode.attributes["price"])
...             for subnode in node.iter_from("product")
...         )
...     )

>>> with open("carts.xml", "rb") as stream:
...     for user, price in Parser(stream).iter_from(handler):
...         print(f"{user} total {price:.2f}")
Alice total 9.47
Bob total 20.50