In this article, Pedro Domingos attempts to reveal the "Folk Knowledge" and "Black Art" commonly used in machine learning.
First Domingos provides some key insights about machine learning:
- Machine learning allows the program to generalize rules from examples. This saves programmers time by not having to explicitly program rules. Cost savings are increased for larger data sets.
- Using machine learning techniques successfully requires information not easily found in formal information sources. i.e. "Black Art" is abundant in the field.
These insight entailed in this "Black Art" is revealed by Domingos in 12 lessons:
1st Lesson: Learning = Representation + Evaluation + Optimization:
- Representation
- Set of Classifiers - Hypothesis Space
- The set of possible classes that the classifier can assign input to
- Evaluation
- Evaluation Function / Scoring Function
- Scores the input and uses that score to classify
- Optimization
- Optimizer chooses the best scoring and efficient classifier (or way to classify) for the data
2nd Lesson: It's Generalization that Counts
Training data should not be used as test data as it 'contaminates' the test data and provides an unrealistic and overoptimistic view the classifier's performance. Optimizing the classifier for the training data can lead to overfitting/overspecialization. Instead the classifier should be generalized to be effective for a larger, more diverse set of data. We don't know the function that the learning algorithm is attempting to optimize; we're instead intuiting and learning that function from the data.3rd Lesson: Data Alone Is Not Enough
It is often impossible to know and store every possible instance of all the classes. Without any information about that data, random guessing is the best way to classify. This is summed up by the "no free lunch" theorem by Wolpert which says that no learner can beat random guessing over all possible functions to be learned. Instead, the learner/classifier must make some assumptions about how that data looks like. Knowledge is used as a lever for predictive power. Learners combine knowledge with data to "grow" programs
4th Lesson: Overfitting Has Many Faces
Overfitting is a generalization error that comes from bias or variance in the data. Bias is the learner's tendency to consistently learn the same wrong thing. Variance is the tendency to learn random things irrespective of input signal. Different types of classifiers can have different behavior with respect to bias or variance and it is often a tradeoff. We can counteract overfitting with cross-validation or by adding a regularization term to prefer classifier with less structure and complexity. Significance tests can combat the problem of multiple testing and/or minimizing the false discovery rate. Similarity based reasoning and common patterns break down for data sets with high dimensionality.
5th Lesson: Intuition Fails in High Dimensions
Humans are unable to see patterns in data with high dimensionality which is part of the reason by machine learning is so useful, however, the same difficulty makes it harder to craft features for the classifier and successfully make generalization and assumptions about the data.
6th Lesson: Theoretical Guarantees Are Not What They Seem
The number of examples needed to generalize in order to make solid claims on the effectiveness of classifier is bounded, however in induction these guarantees are probabilistic and require many more examples than in deduction. Furthermore, these bounds do not inform us on how to construct a good set of classes for the hypothesis space. Theoretical guarantees is a source to inform good algorithm design but may not be indicative of the predictive power and effectiveness of the classifier.
7th Lesson: Feature Engineering is the Key
Understanding the domain and taking the time to gather good data in a large enough volume and adequately analyzing it is often the most time-consuming and important part of machine learning. Automating feature generation is possible, but often poorer in quality to hand crafting.
8th Lesson: More Data Beats a Cleverer Algorithm
Gains in using a more complex and accurate classifier are minimal and larger gains can be obtained with the same amount of effort in getting more data. Start simple and go more complex and refine if needed. Fixed size learners can only handle so much data.
9th Lesson: Learn Many Models, Not Just One
Often a combination of classifiers through bagging, boosting, or stacking can give better results. Model ensembles are now commonly used.
10th Lesson: Simplicity Does Not Imply Accuracy
Rather, simpler hypotheses are preferred for the sake of simplicity rather than accuracy.
11th Lesson: Representable Does Not Imply Learnable
Rather, the form, depth, breadth, and distribution of the data may make the function difficult, if not impossible, to learn. The relationship between representation and function can vary.
12th Lesson: Correlation Does Not Imply Causation
Rather, correlation *may* indicate that there is a set of causal links. This relationship should be explored experimentally to discover the link before a conclusion is made about causality.
This article is a great introduction to the pitfalls of machine learning. However, it does seem that some of the discussion is slightly too abstract and pedagogical to be operationally useful when the discussed problems are encountered in the field. However, as a primer, the article works well.
Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (October 2012), 78-87. DOI=10.1145/2347736.2347755 http://doi.acm.org/10.1145/2347736.2347755
No comments:
Post a Comment