-
Couldn't load subscription status.
- Fork 18
docs: s3 tables #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: s3 tables #292
Conversation
Deploying localstack-docs with
|
| Latest commit: |
9b0fd1d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://902cbd9a.localstack-docs.pages.dev |
| Branch Preview URL: | https://s3tables.localstack-docs.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nice to add this documentation now! @hovaesco did a phenomenal job implementing this service, and he has way more knowledge than me on managed Iceberg tables, so I'll let him give the final approval stamp.
I've shared what I know and the confusion that is around S3 Tables, the tutorial and the rest looks really good, thanks a lot for adding this quickly! 🚀
|
|
||
| ## Introduction | ||
|
|
||
| Amazon S3 Tables are specialized S3 buckets for managing tabular data (for example, Apache Iceberg tables) with built-in maintenance features like automatic compaction and snapshot management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hovaesco correct me if I'm wrong, but I feel like S3 Tables is more of a "catalog" that will take care of creating the underlying S3 buckets for you transparently without you having to deal at all with them.
Saying that it's a managed Apache Iceberg solution using S3 storage might be a bit clearer/remove the confusion of S3 buckets all together?
An "S3 Tables Bucket" is actually a collection of real S3 buckets, one per table.
Overall I think it might look like: S3Tables Namespace -> S3TablesBucket -> S3TablesTable -> S3 Bucket
(This is mostly just for context sharing, no need to write about the line above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, maybe also don't say for example, Apache Iceberg tables because Iceberg is the only format supported now
| { | ||
| "versionToken": "0c0c1509", | ||
| "warehouseLocation": "s3://hqpdve6ni1lb7w5bdn24lruswomtsh5bdrw66oip--table-s3" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hovaesco this is what you meant if the previous PR, right? S3 Tables will not return the MetadataLocation field if you did not execute any iceberg request against it? So this response does not look too good as it actually doesn't contain the metadata location? or am I fully offtrack here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, so after merging my latest PR the output would be:
awslocal s3tables get-table-metadata-location \
--table-bucket-arn arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket \
--namespace my_namespace \
--name my_table
{
"versionToken": "b69a6fbb",
"metadataLocation": "s3://893wylknn9utyhrcv7xzlz5a9acqh1vn480tupa5--table-s3/metadata/00000-b6d96c57-403a-4387-ac59-ec55ac2e646b.metadata.json",
"warehouseLocation": "s3://893wylknn9utyhrcv7xzlz5a9acqh1vn480tupa5--table-s3"
}
AWS S3 tables service doesn't return metadataLocation see https://github.com/localstack/localstack-pro/blob/1e9aef0522806c4974e5605b585f9d528e16504a/localstack-pro-core/tests/aws/services/s3tables/test_s3tables.snapshot.json#L364-L366 that's why I added skip on this field, without it being returned PyIceberg is not working correctly, for me it seems to be a bug on AWS side but I will investigate it further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀 % added some comments
|
|
||
| ## API Coverage | ||
|
|
||
| <FeatureCoverage service="s3tables" client:load /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how this is generated? it says that DeleteTableBucketPolicy, GetTablePolicy operations are supported but it's not true there are others which are not supported as well
| --- | ||
| title: "S3 Tables" | ||
| description: Get started with Amazon S3 Tables on LocalStack | ||
| persistence: supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether persistence is supported, we don't run any test to verify that - @bentsku could you help here and clarify what is required to support persistence in AWS service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! By default, persistence would be enabled for the "control plane" via the store (we have some magic in place to automatically pick up stores), and here the "data plane" is in S3 so it should work by default.
You can test it with our persistence tests suite framework. I'll send you our internal docs 👍
|
|
||
| ## Introduction | ||
|
|
||
| Amazon S3 Tables are specialized S3 buckets for managing tabular data (for example, Apache Iceberg tables) with built-in maintenance features like automatic compaction and snapshot management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, maybe also don't say for example, Apache Iceberg tables because Iceberg is the only format supported now
|
|
||
| You can also create a table within the namespace. | ||
|
|
||
| Run the following command to create a table named `my_table` within the namespace `my_namespace`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS API doc link is missing and it's present in the other sections (CreateNamespace, CreateTableBucket)
| { | ||
| "versionToken": "0c0c1509", | ||
| "warehouseLocation": "s3://hqpdve6ni1lb7w5bdn24lruswomtsh5bdrw66oip--table-s3" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, so after merging my latest PR the output would be:
awslocal s3tables get-table-metadata-location \
--table-bucket-arn arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket \
--namespace my_namespace \
--name my_table
{
"versionToken": "b69a6fbb",
"metadataLocation": "s3://893wylknn9utyhrcv7xzlz5a9acqh1vn480tupa5--table-s3/metadata/00000-b6d96c57-403a-4387-ac59-ec55ac2e646b.metadata.json",
"warehouseLocation": "s3://893wylknn9utyhrcv7xzlz5a9acqh1vn480tupa5--table-s3"
}
AWS S3 tables service doesn't return metadataLocation see https://github.com/localstack/localstack-pro/blob/1e9aef0522806c4974e5605b585f9d528e16504a/localstack-pro-core/tests/aws/services/s3tables/test_s3tables.snapshot.json#L364-L366 that's why I added skip on this field, without it being returned PyIceberg is not working correctly, for me it seems to be a bug on AWS side but I will investigate it further
|
Hey @HarshCasper, have you addressed all feedback from @hovaesco? I didn't want to do my review until that technical round of reviews were done. 😸 |
No description provided.