Classification and Clustering

We developed an algorithm which monitored a camera and created a catalog of the hats people were wearing

Many problems involve classifying data into one of several groups. Some examples of these problems are recognizing hand written digits, filtering spam email, and classifying internet documents.

There are basically two different kinds of classification problems: one where the computer is given a set of labeled data in advance and one where it is not. In the former kind of problem, the computer can use the labeled data to learn a mapping between the input and the class the input belongs to. Then, when it is given novel data, it can use its mapping to guess what class the new data belongs to.

In the case where the computer is not given labeled data, it has to determine what the classes are by grouping similar data points. This problem is usually referred to as clustering. Sometimes the number of clusters is known in advance, but often it must also be determined from the data. The definition of “similar data points” must be chosen with care, as it is critical to the success of the algorithm.

There are a variety of techniques used in classification, and there is a mathematical proof that states that there does not exist a single technique which is the best for all problems. Because of this, finding a good classifier is somewhat of an art form. However, there are a number of good classifiers to choose from and fit to a particular problem, so there are often multiple good solutions to any particular problem.

If you have need of a computer program to classify and/or cluster data, we would be glad to help you with it, so please contact us today!