I have this number: 19576.4125
. I want to save it inside a Numpy array, and I think that while the bit count is lower, it is still better. Is this right?
I tried to save inside a half and a single, but I don't know why it changes the number.
19576.4125
19580.0
19576.412
This number is generated by a method that I created to make a datetime
go to a float number. I can use timestamp, but I don't need the seconds and milliseconds, so I tried to create my own method that saves only date, hours and minutes. (My database doesn't accept datetime
s and timedelta
s).
This is my generator method:
from datetime import datetime
def get_timestamp() -> float:
now = datetime.now()
now.replace(microsecond=0, second=0)
_1970 = datetime(1970, 1, 1, 0, 0, 0)
td = now - _1970
days = td.days
hours, remainder = divmod(td.seconds, 3600)
minutes, second = divmod(remainder, 60)
timestamp = days + hours / 24 + minutes / 1440
return round(timestamp, 4)
How I'm creating the array:
from numpy import array, half, single
__td = get_timestamp()
print(__td)
__array = array([__td], dtype=half)
print(type(__array[0]))
print(__array[0])
__array = array([__td], dtype=single)
print(type(__array[0]))
print(__array[0])
EDITED 08/07 11h02 AM
Hello, such the comments said, I think that this number can't be saved in a half or single type. So how I save this number with better performance? Is better save then like a int and multiply by 10000, a float64 or string?
And not, I don't want a better way to sabe datetime
s I want a better way to save this float number with better performance. But thank you for te other replies.
I modified your function to take a timestemp
In [48]: def get_timestamp(now) -> float:
...: #now = datetime.now()
...: now.replace(microsecond=0, second=0)
...
...: return round(timestamp, 4)
...:
and made a list of dates:
In [49]: alist = [datetime.now() for _ in range(1000)]
In [50]: timeit alist = [datetime.now() for _ in range(1000)]
885 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
And timed your function, to make an array:
In [51]: arr = np.array([get_timestamp(d) for d in alist])
In [52]: timeit arr = np.array([get_timestamp(d) for d in alist])
7.7 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [53]: arr.nbytes
Out[53]: 8000
and did the same, but using numpy's own conversion to an 8 byte element:
In [54]: barr = np.array(alist,dtype='datetime64[m]')
In [55]: barr.nbytes
Out[55]: 8000
In [56]: timeit barr = np.array(alist,dtype='datetime64[m]')
7.87 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So basically the same conversion time. So from calculation and memory your function is just as good.
Saving as a 4 byte element (float or int) would cut the memory use, but unless you are hitting memory errors with millions of values, this effort is rarely worth it.
datetime64
has already worked out the conversion both ways. I imagine the interface to pandas
is also good, though pandas
appears to have its own datetime formats and tricks. After all, it's designed to handle timeseries.
In [64]: import pandas as pd
In [65]: df = pd.DataFrame({'a':arr, 'b':barr})
In [66]: df
Out[66]:
a b
0 19576.3799 2023-08-07 09:07:00
1 19576.3799 2023-08-07 09:07:00
2 19576.3799 2023-08-07 09:07:00
3 19576.3799 2023-08-07 09:07:00
4 19576.3799 2023-08-07 09:07:00
.. ... ...
995 19576.3799 2023-08-07 09:07:00
996 19576.3799 2023-08-07 09:07:00
997 19576.3799 2023-08-07 09:07:00
998 19576.3799 2023-08-07 09:07:00
999 19576.3799 2023-08-07 09:07:00
[1000 rows x 2 columns]
In [67]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 1000 non-null float64
1 b 1000 non-null datetime64[s]
dtypes: datetime64[s](1), float64(1)
memory usage: 15.8 KB
Interesting if I save the timestamp list directly to a dataframe, it's faster
In [81]: df = pd.DataFrame({'c':alist})
In [82]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 c 1000 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 7.9 KB
In [83]: timeit df = pd.DataFrame({'c':alist})
5.29 ms ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)