Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many nodes fails to be processed in khan_academy_fr #100

Open
benoit74 opened this issue Mar 4, 2024 · 2 comments
Open

Many nodes fails to be processed in khan_academy_fr #100

benoit74 opened this issue Mar 4, 2024 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Mar 4, 2024

1046 nodes have failed to be processed in https://farm.openzim.org/pipeline/62191f74-ff73-473d-acc3-49af55fb5f8b/debug

I browsed through the errors and found following patterns (I might have missed some).

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 98, in wrapper
    return func(self, item)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 222, in add_node
    handler(node_id)
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 325, in add_topic_node
    node = self.db.get_node(node_id, with_parents=True, with_children=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 189, in get_node
    "children_count": self.get_node_children_count(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 125, in get_node_children_count
    return self.get_cell(
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 69, in get_cell
    return self.get_row(query, *args, **kwargs)[0]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 100, in wrapper
    raise RuntimeError(f"Failed to process {kind} node {node_id}") from exc
RuntimeError: Failed to process topic node 232ba2df649f5225b0bf7d16613fc70b
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 98, in wrapper
    return func(self, item)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 222, in add_node
    handler(node_id)
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 327, in add_topic_node
    html = self.jinja2_env.get_template("topic.html").render(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/usr/local/lib/python3.12/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/templates/topic.html", line 1, in top-level template code
    {% extends "base.html" %}
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/templates/base.html", line 36, in top-level template code
    {% block content %}{% endblock %}
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/templates/topic.html", line 9, in block 'content'
    {% for child in children %}
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 114, in get_node_children
    "thumbnail": self.get_thumbnail_name(rowdict["id"]),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 217, in get_thumbnail_name
    thumbnail = self.get_node_thumbnail(node_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 214, in get_node_thumbnail
    return self.get_node_file(node_id, thumbnail=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 198, in get_node_file
    return next(self.get_node_files(node_id, thumbnail=thumbnail))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 203, in get_node_files
    for row in self.get_rows(
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 73, in get_rows
    cursor = conn.execute(query, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.InterfaceError: bad parameter or other API misuse

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 100, in wrapper
    raise RuntimeError(f"Failed to process {kind} node {node_id}") from exc
RuntimeError: Failed to process topic node 02cf7d8d22b4520fb6c8cd1d8e731052
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 98, in wrapper
    return func(self, item)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 222, in add_node
    handler(node_id)
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 327, in add_topic_node
    html = self.jinja2_env.get_template("topic.html").render(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: jinja2.environment.Template.render() argument after ** must be a mapping, not NoneType

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 100, in wrapper
    raise RuntimeError(f"Failed to process {kind} node {node_id}") from exc
RuntimeError: Failed to process topic node bccfcc046a7f5f8ea093a0c27bfa2f66
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 98, in wrapper
    return func(self, item)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 218, in add_node
    thumbnail = self.db.get_node_thumbnail(node_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 214, in get_node_thumbnail
    return self.get_node_file(node_id, thumbnail=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 198, in get_node_file
    return next(self.get_node_files(node_id, thumbnail=thumbnail))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/database.py", line 211, in get_node_files
    yield dict(row)
          ^^^^^^^^^
IndexError: tuple index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 100, in wrapper
    raise RuntimeError(f"Failed to process {kind} node {node_id}") from exc
RuntimeError: Failed to process topic node 4a3ad4543c5b5f1bbb02bc88b82e52c6
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 98, in wrapper
    return func(self, item)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 220, in add_node
    self.funnel_file(thumbnail["id"], thumbnail["ext"])
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 227, in funnel_file
    url, fname = get_kolibri_url_for(fid, fext)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 85, in get_kolibri_url_for
    remote_dirs = (file_id[0], file_id[1])
                   ~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kolibri2zim/scraper.py", line 100, in wrapper
    raise RuntimeError(f"Failed to process {kind} node {node_id}") from exc
RuntimeError: Failed to process topic node 8ab0e74e98605f698b6ba9f244920f21
@benoit74 benoit74 added the bug Something isn't working label Mar 4, 2024
@benoit74 benoit74 self-assigned this Mar 4, 2024
@benoit74
Copy link
Collaborator Author

benoit74 commented Mar 7, 2024

The sqlite3.InterfaceError: bad parameter or other API misuse seems to be a multiprocessing issue. I experienced it (luckily) for "no reason" when debugging another issue. We should probably place a lock around sqlite operations.

Found https://stackoverflow.com/a/22739924 for instance

@benoit74
Copy link
Collaborator Author

benoit74 commented Mar 8, 2024

So I confirm that:

  • all nodes mentioned above are in fact OK, I reprocessed them in debug and got no problem
  • the issue arises only due to multithreading use of sqlite which is not supported
  • we should place a lock around sqlite usage (or get rid of multiprocessing, but I will keep this conversation for Multithreading is significantly broken #106)

@benoit74 benoit74 added this to the 1.2.2 milestone Mar 8, 2024
@benoit74 benoit74 modified the milestones: 1.2.2, 2.0.0 Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant