Skip to content

Conversation

@Odysseusailoon
Copy link
Collaborator

@Odysseusailoon Odysseusailoon commented Nov 10, 2025

📋 PR Title

fix(server): surface chat-template errors

📝 Change Type

  • fix: Bug fix.

💡 Description

Malformed chat requests that embed <|channel|> tags in messages[*].content caused the executor’s tokenizer step to raise a template error, leaving the scheduler hanging and never responding to the client. This PR forwards those failures back through the IPC channel so the HTTP handler can immediately return a structured 400 error while keeping the node healthy. It also adds unit coverage for the new HTTP error-handling path.

Key Changes

  1. Add _notify_http_request_error in Executor to catch tokenizer/chat-template exceptions and send error envelopes to the HTTP server.
  2. Extend HTTPHandler to track per-request error state, stream error chunks, and emit non-streaming 400 responses.
  3. Manually test it on the machine.
Screenshot 2025-11-09 at 5 56 06 PM

🔗 Related Issues

@Odysseusailoon Odysseusailoon requested a review from a team November 10, 2025 01:55
import zmq
from mlx_lm.server import convert_chat, process_message_content

try: # pragma: no cover - jinja2 is an optional dependency during tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add jinja2 to dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jinja2 is only needed when a tokenizer uses a chat template that hits the Jinja renderer. The executor can run entirely without it (e.g., plain prompts, different tokenizers, or GPU builds in minimal environments), so we treat it as optional to keep install footprints small and avoid importing Jinja just to start the server.

The TemplateError handling is defensive: if jinja2 is present we surface a more precise error type to the HTTP handler; if it isn’t, we fall back to the generic exception. That lets tests and deployments without Jinja still run cleanly while giving better diagnostics where it’s installed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want manual install some dependencies, so add it to dependency is a strightforward way.

break
if isinstance(token, dict) and token.get("type") == "error":
yield self._generate_error_stream_chunk(rid, token.get("payload", {}))
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue after emitting the error chunk is intentional, it keeps the loop alive so the handler can pull whatever comes next (typically the None sentinel) and finish the stream cleanly.

if we change None → break so we exit the loop and send the final chunk + [DONE].
Error dict → emit the SSE error chunk, then continue so we don’t fall through to yield self._generate_stream_chunk(...) with a dict, and we keep waiting for the sentinel.

If we changed that continue to a break, an error would terminate the loop immediately: the client would miss the final [DONE], the request would never mark as finished in the streaming path, and it will leak resources if the sentinel never gets consumed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue after emitting the error chunk is intentional, it keeps the loop alive so the handler can pull whatever comes next (typically the None sentinel) and finish the stream cleanly.↳

if we change None → break so we exit the loop and send the final chunk + [DONE].
Error dict → emit the SSE error chunk, then continue so we don’t fall through to yield self._generate_stream_chunk(...) with a dict, and we keep waiting for the sentinel.

If we changed that continue to a break, an error would terminate the loop immediately: the client would miss the final [DONE], the request would never mark as finished in the streaming path, and it will leak resources if the sentinel never gets consumed.

detokenizer: StreamingDetokenizer = None
error_message: Optional[str] = None
error_type: Optional[str] = None
error_status: HTTPStatus = HTTPStatus.BAD_REQUEST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set default to None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not neccesary, error_status is typed as HTTPStatus, and we default it to HTTPStatus.BAD_REQUEST so handle_executor_error() can always assign a valid status (or leave the default) and later create_error_response() can rely on a real HTTPStatus without adding None checks. Switching to None would force us to make the field Optional[HTTPStatus] and add guard code in every consumer, without any functional gain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set default to internal error is better as BAD_REQUEST will make client confused if it is not a bad request

@Odysseusailoon
Copy link
Collaborator Author

123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Failed requests cause the node to be unavailable

3 participants