Search code examples
pythontensorflowmachine-learningscikit-learnskflow

How To Use tf.estimator.DNNClassifier (Scikit Flow?)


Could someone point me to a basic working example for tf.estimator.DNNClassifier (originally skflow)?

Since I'm familiar with Sklearn, I was excited to read about Scikit Flow on this blog. Especially the api looked pretty much the same as SK-Learn.

However, I was having a problem getting the code from the blog to work.

Then I read from Scikit Flow Github that it moved to tensorflow/tensorflow/contrib/learn/python/learn.

Upon further investigation, I found tf.contrib.learn.DNNClassifier moved to tf.estimator.DNNClassifier.

However, now api for estimator seems pretty different than sklearn classifier.

I would appreciate if someone could point me to a basic working example.

Here's the code from the blog above.

import tensorflow.contrib.learn as skflow
from sklearn import datasets, metrics

iris = datasets.load_iris()
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=3)
classifier.fit(iris.data, iris.target)
score = metrics.accuracy_score(iris.target, classifier.predict(iris.data))
print("Accuracy: %f" % score)

Solution

  • The API was changed very much, so now you can do something like this (an official example is available here):

    import tensorflow as tf
    from sklearn import datasets, metrics
    
    def train_input_fn(features, labels, batch_size):
        dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
        return dataset.shuffle(1000).repeat().batch(batch_size)
    
    iris = datasets.load_iris()
    train_x = {
        '0': iris.data[:, 0],
        '1': iris.data[:, 1],
        '2': iris.data[:, 2],
        '3': iris.data[:, 3],
    }
    
    my_feature_columns = []
    for key in train_x.keys():
        my_feature_columns.append(tf.feature_column.numeric_column(key=key))
    
    clf = tf.estimator.DNNClassifier(hidden_units=[10, 20, 10], feature_columns=my_feature_columns, n_classes=3)
    clf.train(input_fn=lambda: train_input_fn(train_x, iris.target, 32), steps=10000)
    
    preds = list()
    for idx, p in enumerate(classifier.predict(input_fn=lambda: train_input_fn(train_x, iris.target, 32))):
        preds.append(p['class_ids'][0])
        if idx == 99:
            break
    
    print(metrics.accuracy_score(iris.target[:100], preds))
    

    But nowadays it is better to use TF Keras API like this:

    import tensorflow as tf
    from sklearn import datasets, metrics
    
    def train_input_fn(features, labels, batch_size):
        dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
        return dataset.shuffle(1000).repeat().batch(batch_size)
    
    iris = datasets.load_iris()
    
    clf = tf.keras.models.Sequential([
        tf.keras.layers.Dense(10, activation='sigmoid'),
        tf.keras.layers.Dense(20, activation='sigmoid'),
        tf.keras.layers.Dense(10, activation='sigmoid'),
        tf.keras.layers.Dense(3, activation='softmax'),
    ])
    clf.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    clf.fit(iris.data, iris.target, batch_size=32)