-
Notifications
You must be signed in to change notification settings - Fork 23
feat: add support for pyspark equivalent explode_outer, posexplode/posexplode_outer (via explode_with_index)
#249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/regexp-text-functions
Are you sure you want to change the base?
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
0dd2fab to
082e2db
Compare
082e2db to
fffb538
Compare
7a31f76 to
10c1eb7
Compare
fffb538 to
3c163b7
Compare
cbe46cd to
8c3ed51
Compare
3c163b7 to
d12deee
Compare
8c3ed51 to
9dd1148
Compare
d12deee to
fa21928
Compare
9dd1148 to
718a5b9
Compare
fa21928 to
796dbb9
Compare
eb9d019 to
8771644
Compare
9eeb753 to
e79ec74
Compare
8771644 to
1c5aeb9
Compare
e79ec74 to
0c8d0aa
Compare
1c5aeb9 to
47c7af9
Compare
0c8d0aa to
babcf09
Compare
47c7af9 to
4ed937b
Compare
YoungVor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly finished a pass. Implementation and tests looks great, had a few comments on the API. I'll finish my pass tomorrow
| self._session_state, | ||
| ) | ||
|
|
||
| def explode_with_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe posexplode_with_fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think so -- this is just a superset of the functionality of those two functions, so it makes sense to keep it named separately, at least imo.
src/fenic/api/dataframe/dataframe.py
Outdated
| def explode_with_index( | ||
| self, | ||
| column: ColumnOrName, | ||
| index_name: str = "index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we couldn't keep everything as 'pos', so pos_name, and default to 'pos'? Its slightly strange to me to have the descriptions and default depart from the posexplode functions
in the example you could show that you can change it to index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it makes sense for the names of this function to be internally consistent (index), but I think it does make sense to change the default to pos to match the existing functions.
| column: Name of array column to explode (as string) or Column expression. | ||
| index_name: Name for the column containing 0-based array positions (default: "index"). | ||
| value_name: Optional name for the exploded value column. If None, uses the original column name. | ||
| keep_null_and_empty: If True, preserves rows where the array is null or empty (default: False). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could call this 'outer', or at least mention that behavior would mimick explode/explode_outer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added clarification. I think this is a more explicit name, but mentioned that it is the same as regular posexplode (False) or posexplode_outer (True).
|
Finished my pass, looks great! I love the consolidation of code with a single explode_with_index logical expression. A few nits, but mostly around the API documentation |
babcf09 to
6813598
Compare
4ed937b to
f048b72
Compare
6813598 to
60cf0bd
Compare
f048b72 to
05bda1e
Compare
60cf0bd to
26c24d7
Compare
05bda1e to
6140a8a
Compare
26c24d7 to
335c273
Compare
221d3f5 to
6a43034
Compare
88d048b to
27558aa
Compare
6a43034 to
615972c
Compare
27558aa to
83420e9
Compare
615972c to
2855f73
Compare
83420e9 to
404bb85
Compare
2855f73 to
40feadc
Compare
…`/`posexplode_outer` (via explode_with_index)
404bb85 to
67e400a
Compare
d531650 to
fbfaa0f
Compare
fbfaa0f to
686ce7f
Compare

TL;DR
Add outer explode support that preserves null/empty arrays, plus position-aware variants. No breaking changes; default explode behavior unchanged.
What’s included
explode_outer(col),posexplode_outer(col)keep_null_and_empty: boolonExplodeandExplodeWithIndexlogical planskeep_null_and_empty:explode_with_index(col, index_name="index", value_name=None, keep_null_and_empty=False)posexplode(col)andposexplode_outer(col)ExplodeWithIndexmessageExplodeandExplodeWithIndexcarrykeep_null_and_emptyvalue_namehandled correctly (None when omitted)ExplodeWithIndex(default/custom names, outer)Behavior details
Compatibility
posexplode/posexplode_outer(0-based indices here; position nulls for outer)Examples (with results)
Testing and quality