Typing

BigXML commes natively with type hints, which are checked by mypy.

The benefits are twofold:

  • Improve the correctness of the code of the library itself;
  • Allow users of the library to benefit from well-defined type hints on the public API.

As a rule of thumbs, we follow Postel's law by being as vague as possible for the arguments of functions, and as precise as possible for the returned values.

Handlers

>>> from typing import Iterator, Tuple

>>> @xml_handle_text("p")
... def handle_text(node: XMLText) -> Iterator[str]:
...     yield node.text

>>> @xml_handle_element("p", "em")
... def handle_em(node: XMLElement) -> Iterator[str]:
...     yield node.text

>>> @xml_handle_element("root", "cart")
... class Cart:
...     @xml_handle_element("product")
...     def handle_product(self, node: XMLElement) -> Iterator[float]:
...         yield float(node.attributes["price"])
...
...     def xml_handler(self, iterator: Iterator[float]) -> Iterator[float]:
...         yield sum(iterator)

Note

Instead of Iterator[X], any iterable as return value works, as well as None. Optional[Iterable[X]] can also be used if needed.

Returned values from iter_from / return_from

We our trying our best to be as specific as possible with the returned values of iter_from and return_from methods.

>>> with open("paragraph.xml", "rb") as f:
...    for item in Parser(f).iter_from(handle_text, handle_em):
...        print(type(item), repr(item))
...        # reveal_type(item)
...        # Revealed type is "builtins.str"
<class 'str'> '\n    Hello,\n    '
<class 'str'> 'world'
<class 'str'> '\n    !\n'

However, there are some cases where a little help from your side is needed.

Several handlers with no common type in return value

<root>
    <number>42</number>
    <string>Abc</string>
</root>
>>> @xml_handle_text("root", "number")
... def handle_number(node: XMLText) -> Iterator[int]:
...     yield int(node.text)

>>> @xml_handle_text("root", "string")
... def handle_string(node: XMLText) -> Iterator[str]:
...     yield node.text

>>> with open("mixed.xml", "rb") as f:
...    for item in Parser(f).iter_from(handle_number, handle_string):
...        print(type(item), item)
...        # reveal_type(item)
...        # Revealed type is "builtins.object"
<class 'int'> 42
<class 'str'> Abc

Here we can see that the type of item is object, which is not precise.

In that case, instead of just using cast, you can use the provided HandlerTypeHelper with the expected type in square brackets, by simply adding it as one of the handlers passed to iter_from or return_from:

>>> from typing import Union

>>> with open("mixed.xml", "rb") as f:
...    for item in Parser(f).iter_from(
...        HandlerTypeHelper[Union[int, str]],
...        handle_number,
...        handle_string,
...    ):
...        print(type(item), item)
...        # reveal_type(item)
...        # Revealed type is "Union[builtins.int, builtins.str]"
<class 'int'> 42
<class 'str'> Abc