Guessing Date Format For Many Identically-formatted Dates In Python
Solution 1:
Check out https://github.com/jeffreystarr/dateinfer
Seems a little abandoned but maybe it will go with your needs.
Solution 2:
Have you tried using dateutil.parser.parse
on the tokenized time strings from the set?
It's often very robust to a wide range of formats, or from errors you get it becomes obvious how to slightly massage your data into a format that it works with.
In [11]: dateutil.parser.parse("7/1/13 0:45")
Out[11]: datetime.datetime(2013, 7, 1, 0, 45)
Do take care of ambiguities in the data. For example, it doesn't look like your time stamps use 24 hours, but instead would report "3:00 pm" and "3:00 am" identically on the same date. Unless you have some way of assigning am / pm to the data, no parser can help you out of that issue.
If your date strings are stored in an iterable
then you can use map
to apply the parse function to all of the strings:
In [12]: the_dates = ["7/1/13 0:45", "12/2/14 1:38", "4/30/13 12:12"]
In [13]: map(dateutil.parser.parse, the_dates)
Out[13]:
[datetime.datetime(2013, 7, 1, 0, 45),
datetime.datetime(2014, 12, 2, 1, 38),
datetime.datetime(2013, 4, 30, 12, 12)]
And if you are in need of some of the extra arguments to dateutil.parser.parse
that will indicate the formatting to use, you can use functools.partial
to first bind those keyword arguments, and then use map
as above to apply the partial function.
For example, suppose you wanted to be extra careful that DAY is treated as the first number. You could always call parse
with the extra argument dayfirst=True
, or you could pre-bind this argument and treat it like a new function that always had this property.
In [42]: import functools
In [43]: new_parse = functools.partial(dateutil.parser.parse, dayfirst=True)
In [44]: map(new_parse, the_dates)
Out[44]:
[datetime.datetime(2013, 1, 7, 0, 45),
datetime.datetime(2014, 2, 12, 1, 38),
datetime.datetime(2013, 4, 30, 12, 12)]
In [45]: new_parse.keywords
Out[45]: {'dayfirst': True}
In [46]: new_parse.func
Out[46]: <function dateutil.parser.parse>
(Note that in this example, the third date cannot be parsed with day-first, since neither 30 nor 13 can be a month... so it falls back to the default format in that case).
Post a Comment for "Guessing Date Format For Many Identically-formatted Dates In Python"