I've created a SVMlight file with only one line from a pandas dataframe:
from sklearn.datasets import load_svmlight_file
from sklearn.datasets import dump_svmlight_file
dump_svmlight_file(toy_data.drop(["Output"], axis=1),toy_data['Output'],"../data/oneline_pid.txt", query_id=toy_data['EventID'])
The result in the file looks like this:
0 qid:72048431380967004 0:1440446648 1:72048431380967004 2:236784985 3:1477 4:26889 5:22 6:36685162242798766 8:1919947 10:22 11:48985 12:1840689
When I try to load the file with query_id=True
I get an overflow error.
train = load_svmlight_file("../data/oneline_pid.txt", dtype=np.uint64, query_id=True)
OverflowError: signed integer is greater than maximum
If I load the file with query_id=False
there appears no error message but the value for the query_id is wrong. This is the output:
[[ 1440446648 72048431380967008 236784985 1477
26889 22 36685162242798768 0
1919947 0 22 48985
1840689]]
72048431380967004
appears now as 72048431380967008
.
How do I avoid this error, the maximum value of np.uint64
is 9223372036854775807
so there should be no overflow error.
Have tried to load with np.int64
as data type too, but the output is the same.
Scikit-learn version: 0.16.1 OS X Yosemite 10.10.5
The overflow error was fixed for newer scikit-versions.