Ignore Nan-values In Guessdatetime-function, But Raise Valueerror When Date-string Cannot Be Converted
I'm using a function to convert different string datetime formats to the same datetime format. I want the code to raise an error (ValueError) when there is a datetime format not be
Solution 1:
given the described setup, you could check for type str
, which would return False
for np.nan
. I took the freedom to modify the function slightly so you can simply apply
it:
def guess_date(string):
ifnot isinstance(string, str):
return pd.NaT
for fmt in ["%Y/%m/%d", "%Y-%m-%d", "%d%m%Y", "%d%b%Y"]:
try:
return datetime.datetime.strptime(string, fmt).date()
except ValueError:
continueelse:
raise ValueError(f"incompatible string {string}")
df2['Date'].apply(guess_date)
# 0 2016-01-01# 1 2019-03-25# 2 NaT# 3 2018-01-01# 4 2017-01-01# 5 NaT# 6 2013-01-01# 7 2016-01-01# 8 2019-01-01# 9 2014-01-01# Name: Date, dtype: object
Note though that this is the same result you get from
pd.to_datetime(df2['Date']).dt.date
which is probably more efficient. So the function only serves the purpose to check for "undefined" formats.
Solution 2:
To be honest with you, I would try to refactor the code in the end. But here is the quick fix to ur code to accept nan values:
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({"ID": [12,96,73,84,87,64,11,34],
"Date": ['2016-01-01', '25Mar2019', '2018/01/01', '2017-01-01', '2013-01-01', '2016-01-01', '2019-01-01', '2014-01-01']})
print(df)
df2 = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34],
"Date": ['2016-01-01', '25Mar2019', np.nan, '2018/01/01', '2017-01-01', np.nan, '2013-01-01', '2016-01-01', '2019-01-01', '2014-01-01']})
print(df2)
defguess_date(string):
if pd.isnull(string):
return(string)
for fmt in ["%Y/%m/%d", "%Y-%m-%d", "%d%m%Y", "%d%b%Y"]:
try:
return datetime.datetime.strptime(string, fmt).date()
except ValueError as e:
continueraise ValueError(string)
for i inrange(len(df.Date)): # len(result.DCP_lastmoddate)
df.loc[i, 'Date'] = guess_date(df.loc[i, 'Date'])
print(df.Date)
for i inrange(len(df2.Date)): # len(result.DCP_lastmoddate)
df2.loc[i, 'Date'] = guess_date(df2.loc[i, 'Date'])
print(df2.Date)
Post a Comment for "Ignore Nan-values In Guessdatetime-function, But Raise Valueerror When Date-string Cannot Be Converted"