Computational Complexity of Machine Learning 论文

1990引用 229
Machine Learning and AlgorithmsComputability, Logic, AI Algorithms

摘要

This thesis is a study of the computational complexity of machine learning from examples in the distribution-free model introduced by L. G. Valiant (V84). In the distribution-free model, a learning algorithm receives positive and negative examples of an unknown target set (or concept) that is chosen from some known class of sets (or concept class). These examples are generated randomly according to a fixed but unknown probability distribution representing Nature, and the goal of the learning algorithm is to infer an hypothesis concept that closely approximates the target concept with respect to the unknown distribution. This thesis is concerned with proving theorems about learning in this formal mathematical model. We are interested in the phenomenon of efficient learning in the distribution-free model, in the standard polynomial-time sense. Our results include general tools for determining the polynomial-time learnability of a concept class, an extensive study of efficient learning when errors are present in the examples, and lower bounds on the number of examples required for learning in our model. A centerpiece of the thesis is a series of results demonstrating the computational difficulty of learning a number of well-studied concept classes. These results are obtained by reducing some apparently hard number-theoretic problems from cryptography to the learning problems. The hard-to-learn concept classes include the sets represented by Boolean formulae, deterministic finite automata and a simplified form of neural networks. We also give algorithms for learning powerful concept classes under the uniform distribution, and give equivalences between natural models of efficient learnability. This thesis also includes detailed definitions and motivation for the distribution-free model, a chapter discussing past research in this model and related models, and a short list of important open problems.