Reading A Comma-delimited File With A Date Object And A Float With Python
Solution 1:
I had to do something like
>>> import numpy as np
>>> from datetime import datetime
>>> wind = np.loadtxt("ws425.log.test", delimiter=",", usecols=(0,4), dtype=object,
... converters={0: lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S.%f"),
... 4: np.float})
>>>
>>> wind
array([[datetime.datetime(2013, 12, 11, 23, 0, 27, 3293), 5.8],
[datetime.datetime(2013, 12, 11, 23, 0, 28, 295), 5.5],
[datetime.datetime(2013, 12, 11, 23, 0, 29, 295), 4.0],
[datetime.datetime(2013, 12, 11, 23, 0, 30, 3310), 4.9]], dtype=object)
For time series data, though, I've switched to using pandas
, because it makes a lot of things much easier:
>>> import pandas as pd
>>> df = pd.read_csv("ws425.log.test", parse_dates=[0], header=None, usecols=[0, 4])
>>> df
0 4
0 2013-12-11 23:00:27.003293 5.8
1 2013-12-11 23:00:28.000295 5.5
2 2013-12-11 23:00:29.000295 4.0
3 2013-12-11 23:00:30.003310 4.9
[4 rows x 2 columns]
>>> df[0][0]
Timestamp('2013-12-11 23:00:27.003293', tz=None)
Solution 2:
I am not sure what is wrong with numpy; but with csv it works great:
>>> import time
>>> import csv
>>> with open('t.txt') as f:
... r = csv.reader(f)
... w = [[time.strptime(i[0], '%Y-%m-%d %H:%M:%S.%f')]+i[1:] for i in r]
...
>>> w
[[time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=27, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '291', 'R', '005.8', 'M', 'A*36'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=28, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '284', 'R', '005.5', 'M', 'A*3F'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=29, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '273', 'R', '004.0', 'M', 'A*33'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=30, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '007', 'R', '004.9', 'M', 'A*3B']]
Solution 3:
time.strptime()
expects a string such as '2013-12-11 23:00:30.003310'
but you are giving it a string representation of an array instead:
['2013-12-12 00:00:02.251311', '2013-12-12 00:00:03.255296', ...]
The minimal fix is to parse one item at a time:
ts = [time.strptime(s, '%Y-%m-%d %H:%M:%S.%f') for s in wind[:,0]]
Or you could use converters
parameter for loadtxt
:
from datetime import datetime
import numpy as np
def str2timestamp(timestr, epoch=datetime.fromtimestamp(0)):
"""Convert local time string into seconds since epoch (float)."""
# np.datetime64 API is experimental so use datetime instead
#NOTE: local time may be ambiguous, non-monotonous
dt = datetime.strptime(timestr, '%Y-%m-%d %H:%M:%S.%f')
return (dt - epoch).total_seconds()
wind = np.loadtxt('input.csv', usecols=(0, 4), delimiter=',',
converters={0: str2timestamp})
print(wind)
Output
[[ 1.38679203e+09 5.80000000e+00]
[ 1.38679203e+09 5.50000000e+00]
[ 1.38679203e+09 4.00000000e+00]
[ 1.38679203e+09 4.90000000e+00]]
Solution 4:
You just have some errors in your NumPy loadtxt
call where you define the dtype
. It should be dtype=[('date', 'str', 26), ('wind', 'float')]
; you must specify the size of the string. Now you can reference the date field using its name, EG: wind['date']
. Your strptime
format is fine, but you want the datetime
module from Python's datetime
package, not time
.
import numpy as np
from datetime import datetime
wind = loadtxt("/disk2/Wind/ws425.log.test", dtype=[('date', 'str', 26), ('wind', 'float')], delimiter=',', usecols=(0,4))
ts = [datetime.strptime(d, '%Y-%m-%d %H:%M:%S.%f') for d in wind['date']]
This returns the following:
[datetime.datetime(2013, 12, 11, 23, 0, 27, 3293),
datetime.datetime(2013, 12, 11, 23, 0, 28, 295),
datetime.datetime(2013, 12, 11, 23, 0, 29, 295),
datetime.datetime(2013, 12, 11, 23, 0, 30, 3310)]
Maybe you want to feed that back into your NumPy array?
wind['date'] = np.array(ts, dtype='datetime64[s]')
this yields
array([('2013-12-11T23:00:27Z', 5.8), ('2013-12-11T23:00:28Z', 5.5),
('2013-12-11T23:00:29Z', 4.0), ('2013-12-11T23:00:30Z', 4.9)],
dtype=[('date', 'S26'), ('wind', '<f8')])
Solution 5:
Oh the real problem here is that time.strptime does not support %f for microseconds, see here for a list of formatting characters supported by time.strptime and time.strftime.
What you do want is datetime.strptime which does support the %f formatting character for microseconds.
Post a Comment for "Reading A Comma-delimited File With A Date Object And A Float With Python"