API: to_datetime strings default to microsecond #62801

jbrockmendel · 2025-10-22T22:09:20Z

closes API: timestamp resolution inference - default to one unit (if possible) instead of being data-dependent? #58989 (Replace xxxx with the GitHub issue number)
closes BUG: pd.Timestamp() defaults to [s] resolution instead of [ns] #52653
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

As discussed in last week's dev call, this PR changes to_datetime, DatetimeIndex, and the Timestamp constructor behavior when it sees strings to infer a "us" unit in cases where it would previously (in main, not a released version) infer either "s" or "ms". Cases with nano precision stay nano.

Non-string cases are unchanged (again, unchanged from main). So np.datetime64 objects retain their resolution (or nearest-supported). pydatetime objects stay "us". ints and floats mean we end up with "ns".

For all-string cases this is similar to the OP suggestion from #58989. Most users will end up with a "us" unit when calling e.g. read_csv.

jbrockmendel · 2025-10-27T15:31:10Z

gentle ping @jorisvandenbossche @rhshadrach @mroeschke since this is a blocker for several 3.0-milestone issues

mroeschke

Do we have tests with multiple strings with different precision correctly returning nano if there's a nano string otherwise micro?

jbrockmendel · 2025-10-27T17:34:20Z

Do we have tests with multiple strings with different precision correctly returning nano if there's a nano string otherwise micro?

Didn't see one in a quick pass. Will add where appropriate.

…atetime-micros

rhshadrach

+1

rhshadrach · 2025-10-28T10:36:37Z

pandas/tests/series/methods/test_combine_first.py

+        if unit in ["s", "ms"]:
+            # TODO: should _cast_pointwise_result attempt to preserve unit?
+            xp = xp.dt.as_unit("us")


What is meant here by "preserve unit"? It seems to me the current rule is to surface the lowest resolution of the inputs, which I believe is sensible.

Consider:

series_with_dt64_ms_unit.map(lambda x: pd.Timestamp("2016-01-01"))

What dtype would you expect the result to have? In main it will be "s". After this PR it would be "us". This comment is suggesting that since the original has "ms" unit, we might try to retain "ms" in the mapped result.

I don't think we should let the input influence the result here. If we did, then the result of

pd.Series([pd.Timestamp("2016-01-01").as_unit("ms")]).map(lambda x: x.as_unit("us"))

would also be ms.

OK. Will remove this comment next time I push.

This is a bit different case, though?

In your example with lambda x: pd.Timestamp("2016-01-01"), the return value is an actual timestamp with a unit, that we might ignore to coerce to the original unit (but agreed we shouldn't do that in a context of map).

But the test case here is about combining timestamp and string. At that point, I think it could make sense to be flexible how to parse the string to timestamps such to preserve the unit. Although no strong opinion about it.

That said, what I was actually wondering in this case: if we are combining timestamps and strings, shouldn't that give object dtype as a result instead?
Like concat of timestamps and strings also gives object. Or do we consider combine_first more like a setitem operation? But then it would actually make sense to preserve the unit of the left side, as we would do for left[mask] = right[mask]

That said, what I was actually wondering in this case: if we are combining timestamps and strings, shouldn't that give object dtype as a result instead?

Yah, this is driven by a line that I think is weird/bad in Series.combine_first:

if this.dtype.kind == "M" and other.dtype.kind != "M": # TODO: try to match resos? other = to_datetime(other)

when we get here in this test we have other = Series({1: "2011"}), i.e. dtype='str'

opened #62931 to deprecate this to_datetime call

jorisvandenbossche · 2025-10-28T13:04:16Z

@jbrockmendel can you clarify a bit more what the PR exactly does? The title mentions "to_datetime strings", but so that means other input stays as it was?
Although in practice, other input like datetime stdlib objects already give microseconds as well (edit: not for datetime.date, though), and integers are continued to be interpreted as nanos by default. Are there other cases remaining for to_datetime?

jbrockmendel · 2025-10-28T14:48:04Z

can you clarify a bit more what the PR exactly does? The title mentions "to_datetime strings", but so that means other input stays as it was?

Updated OP, LMK if it is still unclear.

jorisvandenbossche · 2025-10-29T15:36:00Z

Updated OP, LMK if it is still unclear.

Thanks!
I would still use a "general" rule of defaulting to microseconds, and not just for strings. In practice (comparing with my older branch), I think that only means updating the datetime.date case (PyDate_Check(ts)), because all other input has a resolution (numpy, our own timestamps), or is numeric.

For numeric input can't just change the default resolution how it is interpreted (since that would change the result, not just the resolution), but we could still cast the resulting nanoseconds to microseconds (if not out-of-bounds). But fine leaving that for a separate PR / discussion.

jbrockmendel · 2025-10-29T15:50:15Z

For numeric input can't just change the default resolution how it is interpreted (since that would change the result, not just the resolution), but we could still cast the resulting nanoseconds to microseconds (if not out-of-bounds). But fine leaving that for a separate PR / discussion.

Very happy to leave that for separate discussion.

I think that only means updating the datetime.date case (PyDate_Check(ts))

I think the "natural" resolution on a date object is "D", so it makes sense to treat this like a np.datetime64[D]. But if changing that is what it takes to get to "+1" i'll make my peace with it.

Dr-Irv · 2025-10-31T14:02:03Z

So my only concern here is if someone has code that converts time objects to integers and assumes that those integers represent nanoseconds.

e.g. pd.Series([pd.Timestamp("2025-10-31 9:57")]).astype(int) will now return a different result.

If we go forward with this, then we need to say something about this in the docs, and demonstrate via an example what is the best practice for converting such code.

Dr-Irv

I'm wondering if in user_guide/timeseries.rst, we should have some text that says that the integer representation of the underlying Timestamp should NOT be used in computation, and provide a recommendation of what to do. What's not clear to me is that if you have a Timestamp, and you use Timestamp.value, how do you know what the unit of the value is in case you do want to do arithmetic on the integer representation?

jbrockmendel · 2025-11-03T15:54:55Z

What's not clear to me is that if you have a Timestamp, and you use Timestamp.value, how do you know what the unit of the value is in case you do want to do arithmetic on the integer representation?

Timestamp.value already converts to nano

Dr-Irv · 2025-11-03T17:27:07Z

What's not clear to me is that if you have a Timestamp, and you use Timestamp.value, how do you know what the unit of the value is in case you do want to do arithmetic on the integer representation?

Timestamp.value already converts to nano

Unless, it's out of bounds, it converts to seconds.

So is the real issue here is that if a user has .astype(int) in their current code, they might get different units than before? So maybe the docs should say "do not use .astype(int) and only use .value ? Although to do that, I think we'd need to add a dt.value method to Series and DatetimeIndex.

Or should we warn on .astype(int) when applied to a Series that has a datetime dtype?

I guess I don't fully understand what kind of user code will break with this change.

jbrockmendel · 2025-11-03T17:31:33Z

Unless, it's out of bounds, it converts to seconds.

Where are you seeing this? I don't think its accurate.

should say "do not use .astype(int)

I argued for that long ago and lost that argument.

Dr-Irv · 2025-11-03T18:33:04Z

Unless, it's out of bounds, it converts to seconds.

Where are you seeing this? I don't think its accurate.

>>> pd.Series([pd.Timestamp("3000-10-31 8:30")]).astype(int)
0    32529889800
dtype: int64
>>> pd.Series([pd.Timestamp("3000-10-31 8:30:02.04")]).astype(int)
0    32529889802040

This is without this PR (using main). So it seems the integer value you get is dependent on the underlying unit as determined by when the string is converted.

should say "do not use .astype(int)

I argued for that long ago and lost that argument.

Maybe we need to revisit? Especially considering my example above.

jbrockmendel · 2025-11-03T22:48:41Z

So it seems the integer value you get is dependent on the underlying unit

OK, that is about Series.astype(np.int64). The previous statement was about Timestamp.value.

Maybe we need to revisit? Especially considering my example above.

I'm not opposed to that being discussed separately, but am averse to expanding the scope here.

I'm getting pretty burnt out over here. Going to tag out on this.

Dr-Irv · 2025-11-03T23:12:51Z

I'm not opposed to that being discussed separately, but am averse to expanding the scope here.

I'm getting pretty burnt out over here. Going to tag out on this.

I don't think you have to expand scope. I just think we need to document that doing .astype(int) is dangerous with series containing datetime-like data. If you do that, I'm fine with moving this forward.

Would like to see what @rhshadrach thinks.

rhshadrach · 2025-11-03T23:19:12Z

So is the real issue here is that if a user has .astype(int) in their current code, they might get different units than before?

Yes - this is a breaking change. It seems to me this is worthwhile to do, and that there is no way to deprecate in a way that is worth the noise. As you've already demonstrated, the unit that one gets with strings is fragile on 2.3.x - depends on the number of decimal digits in the input. If users are relying on .astype(int), they should be ensuring they have the right unit when doing so. In addition, I do not expect usage of .astype(int) to be very common. From my experience, the entire benefit of using datetime dtypes is to avoid having to do arithmetic with integers.

Also, pd.Series([pd.Timestamp("3000-10-31 8:30")]).astype(int) raises a TypeError on 2.3.x.

rhshadrach · 2025-11-03T23:21:35Z

Posted my comment prior to seeing @Dr-Irv's request - no objection to adding a line in the documentation.

jbrockmendel · 2025-11-03T23:27:01Z

I’ve already added documentation to that effect. If you want more, someone else needs to tag in.

jorisvandenbossche · 2025-11-04T17:50:55Z

Also, pd.Series([pd.Timestamp("3000-10-31 8:30")]).astype(int) raises a TypeError on 2.3.x.

Note that this is only because the series is object dtype. If you have an actual datetime64 series (eg pd.Series(pd.to_datetime(["2000-10-31 8:30"])).astype(int)), this conversion to int works, and will give a different result in 3.0 compared to 2.3.

Now, we are repeating the discussion from #58989 though. We know that the choice to default to microseconds is a breaking change. Previously we decided to therefore use a slower change cycle by only doing the breaking change in 4.0 and adding an opt-in for it in 3.x. Here we are keeping the breaking change directly in 3.0. We just need to make a decision on that.

If it is a matter of better documentation of the breaking change / best practices about what to do instead, I don't think that needs to be included in / block this PR. We can do follow-up PRs to improve the docs (current main already has breaking behaviour as well anyway).

jbrockmendel added 5 commits October 22, 2025 15:07

API: to_datetime strings default to microsecond

c2c37fc

update tests

31bbee1

update tests

40bf8ac

update tests

a16175c

update doctests

c49c186

jbrockmendel requested a review from rhshadrach as a code owner October 23, 2025 02:07

Merge branch 'main' into to_datetime-micros

69537b2

jbrockmendel mentioned this pull request Oct 23, 2025

API: timestamp resolution inference - default to one unit (if possible) instead of being data-dependent? #58989

Open

Whatsnew

77801e0

mroeschke reviewed Oct 27, 2025

View reviewed changes

jbrockmendel added 2 commits October 27, 2025 13:38

Merge branch 'main' of https://github.com/pandas-dev/pandas into to_d…

afca02d

…atetime-micros

TST: test with mixed-reso strings

bb434e5

rhshadrach reviewed Oct 28, 2025

View reviewed changes

Merge branch 'main' into to_datetime-micros

db177af

jbrockmendel added 2 commits October 29, 2025 17:04

Merge branch 'main' into to_datetime-micros

a8403e5

Merge branch 'main' into to_datetime-micros

da1fc1d

jbrockmendel mentioned this pull request Oct 31, 2025

DEPR: to_datetime call in Series.combine_first #62931

Merged

5 tasks

jbrockmendel added 3 commits October 31, 2025 11:07

Merge branch 'main' into to_datetime-micros

dae93d2

whatsnew doc about changed integer conversion

3c9d86c

more docs

dc389ce

Dr-Irv reviewed Oct 31, 2025

View reviewed changes

Merge branch 'main' into to_datetime-micros

faabc82

Uh oh!

API: to_datetime strings default to microsecond #62801

Are you sure you want to change the base?

API: to_datetime strings default to microsecond #62801

Conversation

jbrockmendel commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Oct 27, 2025

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

rhshadrach Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Oct 28, 2025

Uh oh!

jorisvandenbossche commented Oct 29, 2025

Uh oh!

jbrockmendel commented Oct 29, 2025

Uh oh!

Dr-Irv commented Oct 31, 2025

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Nov 3, 2025

Uh oh!

Dr-Irv commented Nov 3, 2025

Uh oh!

jbrockmendel commented Nov 3, 2025

Uh oh!

Dr-Irv commented Nov 3, 2025

Uh oh!

jbrockmendel commented Nov 3, 2025

Uh oh!

Dr-Irv commented Nov 3, 2025

Uh oh!

rhshadrach commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach commented Nov 3, 2025

Uh oh!

jbrockmendel commented Nov 3, 2025

Uh oh!

jorisvandenbossche commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

jbrockmendel commented Oct 22, 2025 •

edited

Loading

jbrockmendel commented Oct 27, 2025 •

edited

Loading

jbrockmendel Oct 31, 2025 •

edited

Loading

jorisvandenbossche commented Oct 28, 2025 •

edited

Loading

rhshadrach commented Nov 3, 2025 •

edited

Loading