Search code examples
pythonnumpydatetimeurllib

Why does datetime.strptime not work with numpy giving ' float() argument must be a string or a number, not 'datetime.datetime' '


I'm trying to get the datetime type to be used in the numpy array (datep) here. I have tried to approaches for the function bytespdates2num.

First is:

    def bytespdates2num(fmt,encoding = 'utf-8'):
        def bytesconverter(b):
            s = b.decode(encoding)
            return mdate.datestr2num(s)
        return bytesconverter

Second is:

    def bytespdates2num(fmt, encoding = 'utf-8'):
        def bytesconverter(b):
            s = b.decode(encoding)
            return datetime.datetime.strptime(s)
        return bytesconverter

My code is:

import urllib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import ssl
import requests
import json
import datetime

#First or second approach
def bytespdates2num(fmt, encoding = 'utf-8'):
     def bytesconverter(b):
         #makes it into string from utf-8 encoding format
        s  = b.decode(encoding)
        return datetime.datetime.strptime(s,fmt)
     return bytesconverter

url1 = 'https://pythonprogramming.net/yahoo_finance_replacement'
data = urllib.request.urlopen(url1,context = None).read().decode()
stockprices = list()
stocksplitdata = data.split('\n')
for line in stocksplitdata:
    stockprices.append(line)

date,openp,highp,lowp,closep,adjust,vol = np.loadtxt(stockprices[1:],delimiter = ',',\
                                                     unpack = True,\
                                                     converters = {0:bytespdates2num('%Y-%m-%d')})

While the first approach works and I can proceed to plot the matplotlib graph using datep as the xaxis, the second approach fails giving float() argument must be a string or a number, not 'datetime.datetime . However while debugging, running the datetime.datetime.strptime(s) line on the command line gives the datetime object of s . Why is this occurring? The datetime approach too turns a string format date into a date time format and seems more straight forward too.


Solution

  • Even though you specify a converter, you still need to specify a dtype.

    I tried to recreate your case with a simple input (WHY DIDN'T YOU DO THIS FOR US???)

    In [20]: txt = """2011-01-23 
        ...: 2020-03-23"""
    

    Your second converter (one of your's is missing the fmt):

    In [21]: def bytespdates2num(fmt, encoding = 'utf-8'): 
        ...:         def bytesconverter(b): 
        ...:             s = b.decode(encoding) 
        ...:             return datetime.datetime.strptime(s, fmt) 
        ...:         return bytesconverter 
        ...:                                                                                       
    

    Your run, with FULL TRACEBACK!

    In [22]: np.loadtxt(txt.splitlines(), converters={0:bytespdates2num('%Y-%m-%d')})              
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-22-56c1854d614f> in <module>
    ----> 1 np.loadtxt(txt.splitlines(), converters={0:bytespdates2num('%Y-%m-%d')})
    
    /usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
       1159         for x in read_data(_loadtxt_chunksize):
       1160             if X is None:
    -> 1161                 X = np.array(x, dtype)
       1162             else:
       1163                 nshape = list(X.shape)
    
    TypeError: float() argument must be a string or a number, not 'datetime.datetime'
    

    The default dtype for loadtxt is float. It has read_data (as a list of lists), and is now trying to convert it into an array, using the default dtype.

    If instead I specify object as dtype:

    In [23]: np.loadtxt(txt.splitlines(), converters={0:bytespdates2num('%Y-%m-%d')}, dtype=object)
        ...:                                                                                       
    Out[23]: 
    array([datetime.datetime(2011, 1, 23, 0, 0),
           datetime.datetime(2020, 3, 23, 0, 0)], dtype=object)
    

    Or I could specify a datetime64 dtype:

    In [24]: np.loadtxt(txt.splitlines(), converters={0:bytespdates2num('%Y-%m-%d')}, dtype='datetime64[D]')                                                                             
    Out[24]: array(['2011-01-23', '2020-03-23'], dtype='datetime64[D]')
    

    Sorry for the caps, but I get tired asking for tracebacks and sample inputs. Providing those should be mandatory for SO questions.

    With mdates (not correction in code):

    In [30]:     def bytespdates2num(fmt,encoding = 'utf-8'): 
        ...:         def bytesconverter(b): 
        ...:             s = b.decode(encoding) 
        ...:             return mdates.datestr2num(s) 
        ...:         return bytesconverter 
        ...:                                                                                       
    In [31]: np.loadtxt(txt.splitlines(), converters={0:bytespdates2num('%Y-%m-%d')})              
    Out[31]: array([734160., 737507.])
    

    Evidently that's returning a number rather than a datatime object.