There are many supervised classifier algorithms available in scikit-learn but I couldn't find any information about their scaalbility regarding large datasets. I know that for instance, support vector machines don't behave well with huge datasets, but what about others? Which supervised/semi-supervised classifier algorithms are most suitable for large datasets?
If you are specifically looking for classifiers in sklearn, you can have a look at this link : Scaling Strategies for large datasets.
Generally, the classifiers do incremental learning on your dataset by creating mini-batches. Here are some link for reference :
Incremental Learning links
You can have a look at these classifiers in SKlearn for more info
If your data is given as a stream during input, you can have a look at Apache Spark Streaming and jump to MlLib in Apache Spark for more info.
You can also have a look at Feature Hasher for large scale feature hashing in sklearn.