fix(server): surface chat-template errors #208

Odysseusailoon · 2025-11-10T01:55:13Z

📋 PR Title

fix(server): surface chat-template errors

📝 Change Type

fix: Bug fix.

💡 Description

Malformed chat requests that embed <|channel|> tags in messages[*].content caused the executor’s tokenizer step to raise a template error, leaving the scheduler hanging and never responding to the client. This PR forwards those failures back through the IPC channel so the HTTP handler can immediately return a structured 400 error while keeping the node healthy. It also adds unit coverage for the new HTTP error-handling path.

Key Changes

Add _notify_http_request_error in Executor to catch tokenizer/chat-template exceptions and send error envelopes to the HTTP server.
Extend HTTPHandler to track per-request error state, stream error chunks, and emit non-streaming 400 responses.
Manually test it on the machine.

🔗 Related Issues

Closes [Bug]: Failed requests cause the node to be unavailable #158

gufengc · 2025-11-10T03:34:34Z

src/parallax/server/executor.py

 import zmq
 from mlx_lm.server import convert_chat, process_message_content

+try:  # pragma: no cover - jinja2 is an optional dependency during tests


why not add jinja2 to dependency?

jinja2 is only needed when a tokenizer uses a chat template that hits the Jinja renderer. The executor can run entirely without it (e.g., plain prompts, different tokenizers, or GPU builds in minimal environments), so we treat it as optional to keep install footprints small and avoid importing Jinja just to start the server.

The TemplateError handling is defensive: if jinja2 is present we surface a more precise error type to the HTTP handler; if it isn’t, we fall back to the generic exception. That lets tests and deployments without Jinja still run cleanly while giving better diagnostics where it’s installed.

We do not want manual install some dependencies, so add it to dependency is a strightforward way.

gufengc · 2025-11-10T03:38:40Z

src/parallax/server/http_server.py

+                break
+            if isinstance(token, dict) and token.get("type") == "error":
+                yield self._generate_error_stream_chunk(rid, token.get("payload", {}))
+                continue


The continue after emitting the error chunk is intentional, it keeps the loop alive so the handler can pull whatever comes next (typically the None sentinel) and finish the stream cleanly.

if we change None → break so we exit the loop and send the final chunk + [DONE].
Error dict → emit the SSE error chunk, then continue so we don’t fall through to yield self._generate_stream_chunk(...) with a dict, and we keep waiting for the sentinel.

If we changed that continue to a break, an error would terminate the loop immediately: the client would miss the final [DONE], the request would never mark as finished in the streaming path, and it will leak resources if the sentinel never gets consumed.

The continue after emitting the error chunk is intentional, it keeps the loop alive so the handler can pull whatever comes next (typically the None sentinel) and finish the stream cleanly.↳

if we change None → break so we exit the loop and send the final chunk + [DONE].
Error dict → emit the SSE error chunk, then continue so we don’t fall through to yield self._generate_stream_chunk(...) with a dict, and we keep waiting for the sentinel.

If we changed that continue to a break, an error would terminate the loop immediately: the client would miss the final [DONE], the request would never mark as finished in the streaming path, and it will leak resources if the sentinel never gets consumed.

gufengc · 2025-11-10T05:04:31Z

src/parallax/server/http_server.py

+    detokenizer: StreamingDetokenizer = None
+    error_message: Optional[str] = None
+    error_type: Optional[str] = None
+    error_status: HTTPStatus = HTTPStatus.BAD_REQUEST


set default to None

not neccesary, error_status is typed as HTTPStatus, and we default it to HTTPStatus.BAD_REQUEST so handle_executor_error() can always assign a valid status (or leave the default) and later create_error_response() can rely on a real HTTPStatus without adding None checks. Switching to None would force us to make the field Optional[HTTPStatus] and add guard code in every consumer, without any functional gain.

Maybe set default to internal error is better as BAD_REQUEST will make client confused if it is not a bad request

Odysseusailoon · 2025-11-12T03:25:39Z

123

[bug fix] try to add error catch

74a0672

Odysseusailoon requested a review from a team November 10, 2025 01:55

gufengc reviewed Nov 10, 2025

View reviewed changes

gufengc approved these changes Nov 10, 2025

View reviewed changes

Merge branch 'main' into asuka_bugfix

391c79f

solve ci error and add test case

60d42c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(server): surface chat-template errors #208

fix(server): surface chat-template errors #208

Uh oh!

Odysseusailoon commented Nov 10, 2025 •

edited

Loading

Uh oh!

gufengc Nov 10, 2025

Uh oh!

Odysseusailoon Nov 12, 2025

Uh oh!

Odysseusailoon Nov 12, 2025

Uh oh!

gufengc Nov 12, 2025

Uh oh!

gufengc Nov 10, 2025

Uh oh!

Odysseusailoon Nov 12, 2025

Uh oh!

Odysseusailoon Nov 12, 2025

Uh oh!

gufengc Nov 10, 2025

Uh oh!

Odysseusailoon Nov 12, 2025

Uh oh!

gufengc Nov 12, 2025

Uh oh!

Odysseusailoon commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(server): surface chat-template errors #208

Are you sure you want to change the base?

fix(server): surface chat-template errors #208

Uh oh!

Conversation

Odysseusailoon commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 PR Title

📝 Change Type

💡 Description

Key Changes

🔗 Related Issues

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Odysseusailoon commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Odysseusailoon commented Nov 10, 2025 •

edited

Loading