I'm pretty new to Algorithmia but I've used scikit-learn a bit and I know how to persist my machine learning model after I've trained it with joblib:
from sklearn.externals joblib
model = RandomForestRegressor()
# Train the model, etc
joblib.dump(model, "prediction/model/model.pkl")
Now I want to host my ML model and call it as a service using Algorithmia, but I can't figure out how to read the model back. I've created a collection in Algorithmia called "testcollection" with a file called "model.pkl" that is the result of the joblib.dump call. According to the docs, this means my file should be located at
data://(username)/testcollection/model.pkl
I want to read in that model from the file using joblib.load. Here's my current algorithm in Algorithmia:
import Algorithmia
def apply(input):
client = Algorithmia.client()
f = client.file("data://(username)/testcollection/model.pkl")
print(f.path)
print(f.url)
print(f.getName())
model = joblib.load(f.url) # Or f.path, both don't work
return "empty"
Here's the output:
(username)/testcollection/model.pkl
/v1/data/(username)/testcollection/model.pkl
model.pkl
And it errors at the joblib.load line, giving the "No such file or directory (whatever path I put in)"
Here's all the paths / urls I've tried in calling joblib.load:
How do I load a model in from a file using joblib? Am I going about this the wrong way?
There are a few ways to access data on the DataAPI.
Here are 4 different methods to access files via the Python Client:
import Algorithmia
client = Algorithmia.client("<YOUR_API_KEY>")
dataFile = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getFile()
dataText = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getString()
dataJSON = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getJson()
dataBytes = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getBytes()
Since Sklearn expects the path to the model file, the easiest way to get that would be through a file object (aka. dataFile).
According to the Official Python2.7 Documentation, if a file object is created other than the open()
function, the object attribute name
usually corresponds to the path of the file.
In this case, you would need to write something like this:
import Algorithmia
def apply(input):
# You don't need to write your API key if you're editing in the web editor
client = Algorithmia.client()
modelFile = client.file("data://(username)/testcollection/model.pkl").getFile()
modelFilePath = modelFile.name
model = joblib.load(modelFilePath)
return "empty"
But according to the Official Sklearn Model Persistence Documentation, you should also be able to just pass file-like objects instead of file names.
Hence, we can just skip the part where we try to get the filename, and just pass the modelFile
object:
import Algorithmia
def apply(input):
# You don't need to write your API key if you're editing in the web editor
client = Algorithmia.client()
modelFile = client.file("data://(username)/testcollection/model.pkl").getFile()
model = joblib.load(modelFile)
return "empty"
Full discloser: I work as an Algorithm Engineer at Algorithmia.