Many organizations today have more than very large data-bases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can in-corporate tens of thousands of examples per second using the-shelf hardware. It uses Hoeffding bounds to guar-antee that its output is asymptotically nearly identical to that of a conventional learner. The study VFDT’s proper-ties and demonstrate its utility through an extensive set of experiments on synthetic data. To apply VFDT to mining the continuous stream of Web access data from the whole University of Washington main campus.
Support the magazine and subscribe to the content
This is premium stuff. Subscribe to read the entire article.