From fe8056ffb6e0c5ce8a9a7cc6d795c2eb05675aec Mon Sep 17 00:00:00 2001 From: serramatutu Date: Thu, 30 Oct 2025 12:20:51 +0100 Subject: [PATCH 1/7] Add Timestamp With Offset canonical extension type --- docs/source/format/CanonicalExtensions.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index 8608a6388e0..e6a6cf64e34 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -483,6 +483,28 @@ binary values look like. .. _variant_primitive_type_mapping: +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + + When de/serializing to/from JSON, this type must be represented as an RFC3339 string, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + Primitive Type Mappings ----------------------- From 080220a36401e6468f381c241c6dae245c76967d Mon Sep 17 00:00:00 2001 From: serramatutu Date: Mon, 3 Nov 2025 14:19:23 +0100 Subject: [PATCH 2/7] fixe: move timestamp offset to its own section --- docs/source/format/CanonicalExtensions.rst | 44 +++++++++++----------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index e6a6cf64e34..383edf5f949 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -483,28 +483,6 @@ binary values look like. .. _variant_primitive_type_mapping: -Timestamp With Offset -============= -This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. - -* Extension name: ``arrow.timestamp_with_offset``. - -* The storage type of the extension is a ``Struct`` with 2 fields, in order: - - * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). - - * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). - -* Extension type parameters: - - * ``time_unit``: the time-unit of each of the stored UTC timestamps. - -* Description of the serialization: - - Extension metadata is an empty string. - - When de/serializing to/from JSON, this type must be represented as an RFC3339 string, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. - Primitive Type Mappings ----------------------- @@ -566,6 +544,28 @@ Primitive Type Mappings | UUID extension type | UUID | +----------------------+------------------------+ +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + + When de/serializing to/from JSON, this type must be represented as an RFC3339 string, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + Community Extension Types ========================= From d8b900fd29603a34670ea2e45ae4f8eb5cbc8432 Mon Sep 17 00:00:00 2001 From: serramatutu Date: Mon, 3 Nov 2025 14:21:51 +0100 Subject: [PATCH 3/7] Add note about compat with ANSI SQL --- docs/source/format/CanonicalExtensions.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index 383edf5f949..a45560e2a9b 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -547,6 +547,7 @@ Primitive Type Mappings Timestamp With Offset ============= This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. * Extension name: ``arrow.timestamp_with_offset``. From d9a85aa190a61d8fc9e9e95d2bebd5bf55370812 Mon Sep 17 00:00:00 2001 From: serramatutu Date: Fri, 7 Nov 2025 14:32:33 +0100 Subject: [PATCH 4/7] Make RFC3339 JSON a recommendation, not requirement --- docs/source/format/CanonicalExtensions.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index a45560e2a9b..7dc40e07f9c 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -565,7 +565,11 @@ This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WIT Extension metadata is an empty string. - When de/serializing to/from JSON, this type must be represented as an RFC3339 string, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. +.. note:: + + Although not required, it is recommended that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + + The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded arrow arrays without extra boilerplate just for integrating with Arrow. Community Extension Types ========================= From 1bb212f13652947a363cfe2659725a71f3df9845 Mon Sep 17 00:00:00 2001 From: serramatutu Date: Fri, 7 Nov 2025 14:33:27 +0100 Subject: [PATCH 5/7] Add header to make it similar to other extensions --- docs/source/format/CanonicalExtensions.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index 7dc40e07f9c..c02fc50b247 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -544,6 +544,8 @@ Primitive Type Mappings | UUID extension type | UUID | +----------------------+------------------------+ +.. _timestamp_with_offset_extension: + Timestamp With Offset ============= This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. From 06ffdf958c34405918300789dd88a1b6ac5684ee Mon Sep 17 00:00:00 2001 From: serramatutu Date: Fri, 7 Nov 2025 14:43:37 +0100 Subject: [PATCH 6/7] Allow dictionary and run-end encodings for the offset --- docs/source/format/CanonicalExtensions.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index c02fc50b247..ae769143922 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -569,9 +569,13 @@ This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WIT .. note:: - Although not required, it is recommended that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded with a preferred (*but not required*) index type of ``int8``, or run-end-encoded with a preferred (*but not required*) runs type of ``int8``. - The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded arrow arrays without extra boilerplate just for integrating with Arrow. +.. note:: + + Although not required, it is *recommended* that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + + The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for integrating with Arrow. Community Extension Types ========================= From 7520684cddda570eded48c0bf690ac6788a80eb3 Mon Sep 17 00:00:00 2001 From: Lucas Valente Date: Tue, 11 Nov 2025 07:32:51 +0100 Subject: [PATCH 7/7] lidavidm's suggestion Co-authored-by: David Li --- docs/source/format/CanonicalExtensions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index ae769143922..d63d19d181d 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -573,9 +573,9 @@ This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WIT .. note:: - Although not required, it is *recommended* that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. + Although not required, it is *recommended* that implementations represent this type as an RFC3339 string when de/serializing to/from JSON, respecting the ``TimeUnit`` precision and time zone offset without loss of information. For example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one nanosecond after January 1st 2025 in UTC-07. - The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for integrating with Arrow. + The rationale behind this recommendation is that many programming languages provide support for parsing RFC3339 out of the box, facilitating consumption of timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for integrating with Arrow. Community Extension Types =========================