[SPARK-54108][CONNECT] Revise execute* methods of SparkConnectStatement #52810

pan3793 · 2025-10-31T07:07:06Z

What changes were proposed in this pull request?

This PR revises the following 3 execute* methods and one additional getUpdateCount method of SparkConnectStatement that are defined in java.sql.Statement

    /**
     * Executes the given SQL statement, which returns a single
     * {@code ResultSet} object.
     *<p>
     * <strong>Note:</strong>This method cannot be called on a
     * {@code PreparedStatement} or {@code CallableStatement}.
     * @param sql an SQL statement to be sent to the database, typically a
     *        static SQL {@code SELECT} statement
     * @return a {@code ResultSet} object that contains the data produced
     *         by the given query; never {@code null}
     * @throws SQLException if a database access error occurs,
     * this method is called on a closed {@code Statement}, the given
     *            SQL statement produces anything other than a single
     *            {@code ResultSet} object, the method is called on a
     * {@code PreparedStatement} or {@code CallableStatement}
     * @throws SQLTimeoutException when the driver has determined that the
     * timeout value that was specified by the {@code setQueryTimeout}
     * method has been exceeded and has at least attempted to cancel
     * the currently running {@code Statement}
     */
    ResultSet executeQuery(String sql) throws SQLException;

    /**
     * Executes the given SQL statement, which may be an {@code INSERT},
     * {@code UPDATE}, or {@code DELETE} statement or an
     * SQL statement that returns nothing, such as an SQL DDL statement.
     *<p>
     * <strong>Note:</strong>This method cannot be called on a
     * {@code PreparedStatement} or {@code CallableStatement}.
     * @param sql an SQL Data Manipulation Language (DML) statement, such as {@code INSERT}, {@code UPDATE} or
     * {@code DELETE}; or an SQL statement that returns nothing,
     * such as a DDL statement.
     *
     * @return either (1) the row count for SQL Data Manipulation Language (DML) statements
     *         or (2) 0 for SQL statements that return nothing
     *
     * @throws SQLException if a database access error occurs,
     * this method is called on a closed {@code Statement}, the given
     * SQL statement produces a {@code ResultSet} object, the method is called on a
     * {@code PreparedStatement} or {@code CallableStatement}
     * @throws SQLTimeoutException when the driver has determined that the
     * timeout value that was specified by the {@code setQueryTimeout}
     * method has been exceeded and has at least attempted to cancel
     * the currently running {@code Statement}
     */

    int executeUpdate(String sql) throws SQLException;
    /**
     * Executes the given SQL statement, which may return multiple results.
     * In some (uncommon) situations, a single SQL statement may return
     * multiple result sets and/or update counts.  Normally you can ignore
     * this unless you are (1) executing a stored procedure that you know may
     * return multiple results or (2) you are dynamically executing an
     * unknown SQL string.
     * <P>
     * The {@code execute} method executes an SQL statement and indicates the
     * form of the first result.  You must then use the methods
     * {@code getResultSet} or {@code getUpdateCount}
     * to retrieve the result, and {@code getMoreResults} to
     * move to any subsequent result(s).
     * <p>
     *<strong>Note:</strong>This method cannot be called on a
     * {@code PreparedStatement} or {@code CallableStatement}.
     * @param sql any SQL statement
     * @return {@code true} if the first result is a {@code ResultSet}
     *         object; {@code false} if it is an update count or there are
     *         no results
     * @throws SQLException if a database access error occurs,
     * this method is called on a closed {@code Statement},
     * the method is called on a
     * {@code PreparedStatement} or {@code CallableStatement}
     * @throws SQLTimeoutException when the driver has determined that the
     * timeout value that was specified by the {@code setQueryTimeout}
     * method has been exceeded and has at least attempted to cancel
     * the currently running {@code Statement}
     * @see #getResultSet
     * @see #getUpdateCount
     * @see #getMoreResults
     */
    boolean execute(String sql) throws SQLException;

    /**
     *  Retrieves the current result as an update count;
     *  if the result is a {@code ResultSet} object or there are no more results, -1
     *  is returned. This method should be called only once per result.
     *
     * @return the current result as an update count; -1 if the current result is a
     * {@code ResultSet} object or there are no more results
     * @throws SQLException if a database access error occurs or
     * this method is called on a closed {@code Statement}
     * @see #execute
     */
    int getUpdateCount() throws SQLException;

Why are the changes needed?

Make the implementation respect the JDBC API specification.

Does this PR introduce any user-facing change?

No, Connect JDBC Driver is an unreleased feature.

How was this patch tested?

New UTs are added.

Was this patch authored or co-authored using generative AI tooling?

No.

pan3793 · 2025-10-31T07:10:39Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectStatement.scala

+    if (resultSet != null) {
+      -1
+    } else {
+      0 // always return 0 because affected rows is not supported yet


this can not be supported soon because it requires non-trivial work on the classic sql module to add metrics for all data writing commands and modify the connect protocol to retrieve them back to the client.

pan3793 · 2025-10-31T07:19:47Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectStatement.scala

+    val df = conn.spark.sql(sql)
+    val sparkResult = df.collectResult()
+    operationId = sparkResult.operationId
+    if (hasResultSet(sparkResult)) {


I checked some well-known JDBC drivers, and the implementation can be classified into:

have a simple JDBC driver-side SQL parser to classify SQL type, e.g., DML, DDL, and justify by the SQL type whether it will return result sets

Trino does this better, the server sets the query type in the analyzing phase and returns such info to the client before execution.

blindly execute the queries and check the returned result sets

here we use the last approach.

pan3793 · 2025-10-31T07:21:42Z

...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectStatement.scala

-    0
+  private def hasResultSet(sparkResult: SparkResult[_]): Boolean = {
+    // suppose this works in most cases
+    sparkResult.schema.length > 0


I didn't find a counterexample after a quick thought, please let me know if you have a more reliable approach

pan3793 · 2025-10-31T07:22:02Z

cc @LuciferYang

[SPARK-54108][CONNECT] Revise execute methods of SparkConnectStatement

704704f

github-actions bot added SQL CONNECT labels Oct 31, 2025

pan3793 commented Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54108][CONNECT] Revise execute* methods of SparkConnectStatement #52810

[SPARK-54108][CONNECT] Revise execute* methods of SparkConnectStatement #52810

Uh oh!

pan3793 commented Oct 31, 2025 •

edited

Loading

Uh oh!

pan3793 Oct 31, 2025

Uh oh!

pan3793 Oct 31, 2025

Uh oh!

pan3793 Oct 31, 2025

Uh oh!

pan3793 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-54108][CONNECT] Revise execute* methods of SparkConnectStatement #52810

Are you sure you want to change the base?

[SPARK-54108][CONNECT] Revise execute* methods of SparkConnectStatement #52810

Uh oh!

Conversation

pan3793 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pan3793 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pan3793 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pan3793 commented Oct 31, 2025 •

edited

Loading