Skip to content

Incompatibility with lxml 6 #805

@GaetanLepage

Description

@GaetanLepage

I noticed that the following tests are failing:

  • tests/unit_tests.py::test_table_processing:
        processed_table = handle_table(table_cell_with_link, TAG_CATALOG, options)
>       result = [child.tag for child in processed_table.find(".//cell").iterdescendants()]
E       AttributeError: 'NoneType' object has no attribute 'find'
  • tests/realworld_tests.py::test_extract[False-True]:
        result = do_load_page('https://buchperlen.wordpress.com/2013/10/20/leandra-lou-der-etwas-andere-modeblog-jetzt-auch-zwischen-buchdeckeln/')
        if xmloutput is False:
>           assert 'Dann sollten Sie erst recht' in result and 'als saure Gürkchen entlarvte Ex-Boyfriends.' in result and 'Ähnliche Beiträge' not in result
E           TypeError: argument of type 'NoneType' is not iterable
  • tests/unit_tests.py::test_external:
        res = extract(bad_xml, output_format='xml')
>       assert "Features" in res
E       TypeError: argument of type 'NoneType' is not iterable

tests/unit_tests.py:475: TypeError
------------------------------ Captured log call -------------------------------
ERROR    trafilatura.utils:utils.py:259 parsed tree length: 1, wrong data type or not valid HTML
ERROR    trafilatura.core:core.py:237 empty HTML tree: None
WARNING  trafilatura.core:core.py:344 discarding data: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions