Skip to content

Conversation

@sliverc
Copy link

@sliverc sliverc commented Sep 9, 2025

Fixes #291

Added transactional safety when writing to redis to avoid corrupted data when multiple processes write to the database. This only works if the complete object is stored. As the redis pipeline itself stores the object quicker as it does it all in one go, I do not expect any performance issues through this change not to have dirty fields.

I would have loved to set up a test but was not successful to reproduce it in a unit test, as it not always occurs. But the tests still run like before, so confident that it does not break more than before.

@cunla Let me know what you think as you have a deeper inside into the project and whether this change makes sense from your point of view.

@sliverc
Copy link
Author

sliverc commented Sep 9, 2025

To note with this change: it only fixes that no corrupt data is written into the redis database. In case if there is already invalid data in the redis database, the exception would stll be raised.

Either this data would need to be removed from redis directly or a PR implemented which ignores invalid models. I am happy to provide a PR in this direction but thought we first make sure this fix looks good and see how to move forward.

if self._parent_key is not None:
connection.sadd(self._parent_key, self.name)
mapping = self.serialize(with_nones=True)
if not self._save_all and len(self._dirty_fields) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this logic?

@cunla
Copy link
Member

cunla commented Sep 14, 2025

This does not really fix the problem, though I do like some of these changes and adopted them.

The cause is the connection needs to be refreshed after fork.

@sliverc
Copy link
Author

sliverc commented Sep 16, 2025

Thanks for looking into this. Very much appreciated. We had a test run of your changes, and it seems the error now moved from JobModel to WorkerModel

web-pms_1           | Failed to deserialize 6880ac62150e-worker.1: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'

That's what we get in the scheduler output. I will have a closer look as well, but wanted to let you know. It could very well be that with the connection refresh that this fix works but something is missing when it comes to WorkerModel.

@sliverc
Copy link
Author

sliverc commented Sep 16, 2025

Just saw this that you committed 5096566 maybe that is the fix already for the worker model issue I outlined above?

We used the following commit for a test run 06bf0da

Are you still working on it, or how confident are you with the changes you made? We would love to test run it and see how well it works, but would be good to know when the code is in a state when it makes sense to test it again.

@cunla
Copy link
Member

cunla commented Sep 17, 2025

Do you mind trying again with the latest, I added printing the stacktrace in case this happens and it will help ensure this does not happen.

@sliverc
Copy link
Author

sliverc commented Sep 22, 2025

Sure, we will test it again and will let you know how it went.

@sliverc
Copy link
Author

sliverc commented Sep 22, 2025

@cunla
Tested again and after a while got following issue again:

web-1         | TypeError: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-1         | Failed to deserialize b01f9333d5d1-worker.1: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-1         | Traceback (most recent call last):
web-1         |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 175, in get
web-1         |     return cls.deserialize(decode_dict(res, set()))
web-1         |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1         |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 135, in deserialize
web-1         |     instance = super(HashModel, cls).deserialize(data)
web-1         |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1         |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 111, in deserialize
web-1         |     res = cls(**data)
web-1         |           ^^^^^^^^^^^
web-1         | TypeError: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
we

@sliverc
Copy link
Author

sliverc commented Sep 28, 2025

@cunla Just saw that you released version 4.0.8. Should that release fix this issue as well we have with the worker and shall we test again?

@cunla
Copy link
Member

cunla commented Sep 28, 2025

I am not 100% sure, would you mind trying?

@sliverc
Copy link
Author

sliverc commented Sep 28, 2025

@cunla Sure will do a test run again and give you feedback. Are those changes in this PR still needed as well?

@cunla
Copy link
Member

cunla commented Sep 29, 2025

I don't think so, but leave it open please since most of the discussion is here, and it will help me remember including you in the next release notes.

@sliverc
Copy link
Author

sliverc commented Sep 29, 2025

@cunla OK that's fine.

Unfortunately, the error still occurs even with version 4.0.8:

web-pms_1           | TypeError: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-pms_1           | Failed to deserialize 2403018786d5-worker.2: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-pms_1           | Traceback (most recent call last):
web-pms_1           |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 175, in get
web-pms_1           |     return cls.deserialize(decode_dict(res, set()))
web-pms_1           |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-pms_1           |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 135, in deserialize
web-pms_1           |     instance = super(HashModel, cls).deserialize(data)
web-pms_1           |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-pms_1           |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 111, in deserialize
web-pms_1           |     res = cls(**data)
web-pms_1           |           ^^^^^^^^^^^
web-pms_1           | TypeError: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-pms_1           | Failed to deserialize 2403018786d5-worker.2: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-pms_1           | Traceback (most recent call last):
web-pms_1           |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 175, in get
web-pms_1           |     return cls.deserialize(decode_dict(res, set()))
web-pms_1           |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-pms_1           |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 135, in deserialize
web-pms_1           |     instance = super(HashModel, cls).deserialize(data)
web-pms_1           |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-pms_1           |   File "/venv/lib/python3.12/site-packages/scheduler/redis_models/base.py", line 111, in deserialize
web-pms_1           |     res = cls(**data)
web-pms_1           |           ^^^^^^^^^^^
web-pms_1           | TypeError: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'
web-pms_1           | [https://sentry.gksw.at:443](https://sentry.gksw.at/) "POST /api/11/envelope/ HTTP/1.1" 200 0

@cunla
Copy link
Member

cunla commented Sep 29, 2025

Would you mind scheduling some time to review together? You are able to reproduce it easily, so it will help me understand what is going on

@sliverc
Copy link
Author

sliverc commented Sep 30, 2025

@cunla The initial issue in terms of JobModel as reported in #291 was more easily reproducible, and I was able to reproduce it locally fairly quickly. That issue seems to be fixed I think, but the error now moved to WorkerModel which does not so often occur (someone else also reported this at #299). I have not been able to reproduce it locally yet, but it occurs on our server deployment after a while.

I am happy to do a review session when I can reproduce it easily locally.

We have noticed the following things though which might help you better understand what the problem could be:

  1. We let the scheduler run overnight.
  2. we check the log in the morning and everything is working fine
  3. then we open django admin at http://localhost:8000/admin/scheduler/worker/ and then the worker crashes with the error message posted above

In case I can reproduce it locally, I will let you know. Let me know when you find a potential fix we can test as well.

@cclauss
Copy link
Contributor

cclauss commented Oct 12, 2025

Please rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error with deserialize JobModel

3 participants