Reading A Comma-delimited File With A Date Object And A Float With Python

April 28, 2023 Post a Comment

I have a file with entries that look like 2013-12-11 23:00:27.003293,$PAMWV,291,R,005.8,M,A*36 2013-12-11 23:00:28.000295,$PAMWV,284,R,005.5,M,A*3F 2013-12-11 23:00:29.000295,$PAM

Solution 1:

I had to do something like

>>> import numpy as np
>>> from datetime import datetime
>>> wind = np.loadtxt("ws425.log.test", delimiter=",", usecols=(0,4), dtype=object,
...                   converters={0: lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S.%f"),
...                               4: np.float})
>>> 
>>> wind
array([[datetime.datetime(2013, 12, 11, 23, 0, 27, 3293), 5.8],
       [datetime.datetime(2013, 12, 11, 23, 0, 28, 295), 5.5],
       [datetime.datetime(2013, 12, 11, 23, 0, 29, 295), 4.0],
       [datetime.datetime(2013, 12, 11, 23, 0, 30, 3310), 4.9]], dtype=object)

For time series data, though, I've switched to using pandas, because it makes a lot of things much easier:

>>> import pandas as pd
>>> df = pd.read_csv("ws425.log.test", parse_dates=[0], header=None, usecols=[0, 4])
>>> df
                           0    4
0 2013-12-11 23:00:27.003293  5.8
1 2013-12-11 23:00:28.000295  5.5
2 2013-12-11 23:00:29.000295  4.0
3 2013-12-11 23:00:30.003310  4.9

[4 rows x 2 columns]
>>> df[0][0]
Timestamp('2013-12-11 23:00:27.003293', tz=None)

Solution 2:

I am not sure what is wrong with numpy; but with csv it works great:

>>> import time
>>> import csv
>>> with open('t.txt') as f:
...    r = csv.reader(f)
...    w = [[time.strptime(i[0], '%Y-%m-%d %H:%M:%S.%f')]+i[1:] for i in r]
...
>>> w
[[time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=27, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '291', 'R', '005.8', 'M', 'A*36'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=28, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '284', 'R', '005.5', 'M', 'A*3F'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=29, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '273', 'R', '004.0', 'M', 'A*33'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=30, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '007', 'R', '004.9', 'M', 'A*3B']]

Solution 3:

time.strptime() expects a string such as '2013-12-11 23:00:30.003310' but you are giving it a string representation of an array instead:

['2013-12-12 00:00:02.251311', '2013-12-12 00:00:03.255296', ...]

The minimal fix is to parse one item at a time:

ts = [time.strptime(s, '%Y-%m-%d %H:%M:%S.%f') for s in wind[:,0]]

Or you could use converters parameter for loadtxt:

from datetime import datetime
import numpy as np

def str2timestamp(timestr, epoch=datetime.fromtimestamp(0)):
    """Convert local time string into seconds since epoch (float)."""
    # np.datetime64 API is experimental so use datetime instead
    #NOTE: local time may be ambiguous, non-monotonous
    dt = datetime.strptime(timestr, '%Y-%m-%d %H:%M:%S.%f')
    return (dt - epoch).total_seconds()

wind = np.loadtxt('input.csv', usecols=(0, 4), delimiter=',',
                  converters={0: str2timestamp})
print(wind)

Output

[[  1.38679203e+09   5.80000000e+00]
 [  1.38679203e+09   5.50000000e+00]
 [  1.38679203e+09   4.00000000e+00]
 [  1.38679203e+09   4.90000000e+00]]

Solution 4:

You just have some errors in your NumPy loadtxt call where you define the dtype. It should be dtype=[('date', 'str', 26), ('wind', 'float')]; you must specify the size of the string. Now you can reference the date field using its name, EG: wind['date']. Your strptime format is fine, but you want the datetime module from Python's datetime package, not time.

import numpy as np
from datetime import datetime
wind = loadtxt("/disk2/Wind/ws425.log.test", dtype=[('date', 'str', 26), ('wind', 'float')], delimiter=',', usecols=(0,4))
ts = [datetime.strptime(d, '%Y-%m-%d %H:%M:%S.%f') for d in wind['date']]

This returns the following:

[datetime.datetime(2013, 12, 11, 23, 0, 27, 3293),
 datetime.datetime(2013, 12, 11, 23, 0, 28, 295),
 datetime.datetime(2013, 12, 11, 23, 0, 29, 295),
 datetime.datetime(2013, 12, 11, 23, 0, 30, 3310)]

Maybe you want to feed that back into your NumPy array?

wind['date'] = np.array(ts, dtype='datetime64[s]')

this yields

array([('2013-12-11T23:00:27Z', 5.8), ('2013-12-11T23:00:28Z', 5.5),
   ('2013-12-11T23:00:29Z', 4.0), ('2013-12-11T23:00:30Z', 4.9)],
  dtype=[('date', 'S26'), ('wind', '<f8')])

Solution 5:

Oh the real problem here is that time.strptime does not support %f for microseconds, see here for a list of formatting characters supported by time.strptime and time.strftime.

What you do want is datetime.strptime which does support the %f formatting character for microseconds.

Python Tutorial for Beginners