-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54108][CONNECT] Revise execute* methods of SparkConnectStatement #52810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| if (resultSet != null) { | ||
| -1 | ||
| } else { | ||
| 0 // always return 0 because affected rows is not supported yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can not be supported soon because it requires non-trivial work on the classic sql module to add metrics for all data writing commands and modify the connect protocol to retrieve them back to the client.
| val df = conn.spark.sql(sql) | ||
| val sparkResult = df.collectResult() | ||
| operationId = sparkResult.operationId | ||
| if (hasResultSet(sparkResult)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked some well-known JDBC drivers, and the implementation can be classified into:
- have a simple JDBC driver-side SQL parser to classify SQL type, e.g., DML, DDL, and justify by the SQL type whether it will return result sets
- Trino does this better, the server sets the query type in the analyzing phase and returns such info to the client before execution.
- blindly execute the queries and check the returned result sets
here we use the last approach.
| 0 | ||
| private def hasResultSet(sparkResult: SparkResult[_]): Boolean = { | ||
| // suppose this works in most cases | ||
| sparkResult.schema.length > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a counterexample after a quick thought, please let me know if you have a more reliable approach
|
cc @LuciferYang |
What changes were proposed in this pull request?
This PR revises the following 3
execute*methods and one additionalgetUpdateCountmethod ofSparkConnectStatementthat are defined injava.sql.StatementWhy are the changes needed?
Make the implementation respect the JDBC API specification.
Does this PR introduce any user-facing change?
No, Connect JDBC Driver is an unreleased feature.
How was this patch tested?
New UTs are added.
Was this patch authored or co-authored using generative AI tooling?
No.