Classification data streams. OLIN 15 requires lesser memory

  Classification is the process of predicting the class label of an unknown data instance based on model constructed from learning on training instances 1. Unlike traditional classification techniques, streaming classification algorithms do not have the entire data that could be partitioned into training and test data sets, hence model construction from incoming instances and testing goes hand in hand. Various classification algorithms for streaming data have been devised from time to time in last decade. Each algorithm has its own capabilities and key focus to avert challenges of streaming data mining. Some of the available streaming data classification algorithms along with their key features are chronologically listed in continued 5.   ITI 11 Require large storage hence, not suitable for large data streams. VFDT 1, 12 require lesser memory and does prediction at any moment of time during training. It uses Hoeffding bound to assess the number of minimum instances required to grow the decision tree. CVFDT 13, 14 advancement of VFDT that enables Concept Adaptation. Streaming Ensemble Algorithm 14, 15 provides robustness and handles concept drifts but needs to be carefully used for high speed data streams. OLIN 15 requires lesser memory and uses the info-fuzzy network (IFN) for concept adaption and adapts to the rate of concept changes. Weighted Classifier Ensemble 16 deals well with concept drifts by using ensemble of weighted classifiers on chunks of data instances from data streams rather revising the model frequently (which is time taking process). On Demand Classifier 17 based on micro clustering, dynamically adapts and/or selects sliding window size for better performance and Concept Adaptation. Evolving Naïve Bayes 18 an extended Naïve Bayes algorithm capable of learning from evolving data streams. IOLIN 19 variation of OLIN that keeps on model updating until sufficient concept drift thereby saves computational effort significantly. ASHT Bagging 20 uses varying sized Hoeffding Trees as small size trees is quickly adapts to changes whereas larger size trees gives better performance in less changing concepts situations. Random Forest Based Classification Algorithm 21 handles evolving data streams even with intermittent labeled data instances arrival in one pass. Also, decides whether more labeled data instances are required to update model or not. Vertical Hoeffding Tree (VHT) 22 a variation of VFDT that performs distributed parallel computation by vertically partitioning (attribute based) data sets. Similarity-based Data Stream Classifier (SimC) 23 uses new insertion/removal approach for quickly capturing and representing changes in data to improve performance. Also, incorporates new class labels and discards obsolete class labels during the execution. Distance-Based Ensemble Online Classifier with Kernel Clustering 24 uses kernel-based clustering approach where a new instance is supplied for each of the iteration and prediction is made on it.  An ensemble of classifiers is constructed on the basis of portfolio of distance measures. Online Stream Classifier with incremental semi-supervised learning 35 Utilizes the selective self-training based semi supervised learning approach to achieve at the par classification accuracy even with availability of only little labeled data. Random Forest Based Classification Algorithm 31 Handles evolving data streams even with intermittent labeled data instances arrival in one pass. Also, decides whether more labeled data instances are required to update model or not. In references 1, 3 proposed semi-supervised ensembles that learn with label propagation methods. Both algorithms learn in batch to train new data and to update ensemble models.  Most data stream classifiers assume completely all labeled data. This is not viable as data labeling is time-consuming and requires human inputs. In 37 a relational k-means based transfer semi-supervised SVM learning framework (RK-TS 3VM) is proposed, which deal with labeled and unlabeled examples to build prediction models. In 38 a semi supervised approach is presented for handling concept-drifting data streams containing both labeled and unlabeled instances. If any drift occurs, the classifier is updated by EM algorithm.  In this paper a semi-supervised data stream algorithm is proposed that allow learning based on both labeled and unlabeled data. It is able to solve limited labeling in data stream mining. The goal of this paper is to develop methods for semi-supervised learning such that they exploit the hidden structure of the unlabeled data, which is called the informative unlabeled data, in order to improve the classification performance of the base learner in data streams. As addressed in section3, a Modified self-training Naïve Bayes framework is proposed which is able to deal with limited labeled data in streams.