Yann LeCun's Home Page

My main research interests are Machine Learning, Computer Vision, Mobile Robotics, and Computational Neuroscience. I am also interested in Data Compression, Digital Libraries, the Physics of Computation, and all the applications of machine learning (Vision, Speech, Language, Document understanding, Data Mining, Bioinformatics).

News and Updates

We are working on a class of learning systems called Energy-Based Models, and Deep Belief Networks. We are also working on convolutional nets for visual recognition , and a type of graphical models known as factor graphs.

We have projects in computer vision, object detection, object recognition, mobile robotics, bio-informatics, biological image analysis, medical signal processing, signal processing, and financial prediction,....

Jump to my course page at NYU, and see course descriptions, slides, course material...

Proposal for a new publishing model in Computer Science

Many computer Science researchers are complaining that our emphasis on highly selective conference publications, and our double-blind reviewing system stifles innovation and slow the rate of progress of Science and technology.

This pamphlet proposes a new publishing model based on an open repository and open (but anonymous) reviews which creates a "market" between papers and reviewing entities.

Animals and humans can learn to see, perceive, act, and communicate with an efficiency that no Machine Learning method can approach. The brains of humans and animals are "deep", in the sense that each action is the result of a long chain of synaptic communications (many layers of processing). We are currently researching efficient learning algorithms for such "deep architectures". We are currently concentrating on unsupervised learning algorithms that can be used to produce deep hierarchies of features for visual recognition. We surmise that understanding deep learning will not only enable us to build more intelligent machines, but will also help us understand human intelligence and the mechanisms of human learning.

We are developing a new type of relational graphical models that can be applied to "structured regression problem". A prime example of structured regression problem is the prediction of house prices. The price of a house depends not only on the characteristics of the house, but also of the prices of similar houses in the neighborhood, or perhaps on hidden features of the neighborhood that influence them. Our relational regression model infers a hidden "desirability sruface" from which house prices are predicted.

The purpose of the LAGR project, funded by the US government, is to design vision and learning algorithms to allow mobile robots to navigate in complex outdoors environment solely from camera input.

My Lab, collaboration with Net-Scale Technologies is one of 8 participants in the program (Applied Perception Inc., Georgia Tech, JPL, NIST, NYU/Net-Scale, SRI, U. Penn, Stanford).

Each LAGR team received identical copies of the LAGR robot, built be the CMU/NREC.

The government periodically runs competitions between the teams. The software from each team is loaded and run by the goverment team on their robot.

The robot is given the GPS coordinates of a goal to which it must drive as fast as possible. The terrain is unknown in advance. The robot is run three times through the test course.

The software can use the knowledge acquired during the early runs to improve the performance on the latter runs.

Prior to the LAGR project, we worked on the DAVE project, an attempt to train a small mobile robot to drive autonomously in off-road environments by looking over the shoulder of a human operator.

Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods.

Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches.

The recognition of generic object categories with invariance to pose, lighting, diverse backgrounds, and the presence of clutter is one of the major challenges of Computer Vision.

I am developing learning systems that can recognize generic object purely from their shape, independently of pose and lighting.

See

The NORB dataset for generic object recognition is available for download.

Tired of Matlab? Lush is an easy-to-learn, open-source object-oriented programming language designed for researchers, experimenters, and engineers working in large-scale numerical and graphic applications.

Lush combines three languages in one: a very simple to use, loosely-typed interpreted language, a strongly-typed compiled language with the same syntax, and the C language, which can be freely mixed with the other languages within a single source file, and even within a single function.

Lush has a library of over 14,000 functions and classes, some of which are simple interfaces to popular libraries: vector/matrix/tensor algebra, linear algebra (LAPACK, BLAS), numerical function (GSL), 2D and 3D graphics (X, SDL, OpenGL, OpenRM, PostScipt), image processing, computer vision (OpenCV), machine learning (gblearning, Torch), regular expressions, audio processing (ALSA), and video grabbing (Video4linux).

If you do research and development in signal processing, image processing, machine learning, computer vision, bio-informatics, data mining, statistics, or artificial intelligence, and feel limited by Matlab and other existing tools, Lush is for you. If you want a simple environment to experiment with graphics, video, and sound, Lush is for you. Lush is Free Software (GPL) and runs under GNU/Linux, Solaris, and Irix.

VISIT THE LUSH HOME PAGE >>>>

My main research topic until I left AT&T was the DjVu project. DjVu is a document format, a set of compression methods and a software platform for distributing scanned and digitally produced documents on the Web. DjVu image files of scanned documents are typically 3-8 times smaller than PDF or TIFF-groupIV for bitonal and 5-10 times smaller than PDF or JPEG for color (at 300 DPI). DjVu versions of digitally produced documents are more compact and render much faster than the PDF or PostScript versions.

My main research interest is machine learning, particularly how it applies to perception, and more particularly to visual perception.

I am currently working on two architectures for gradient-based perceptual learning: graph transformer networks and convolutional networks.

Convolutional Nets are a special kind of neural net architecture designed to recognize images directly from pixel data. Convolutional Nets can be trained to detect, segment and recognize objects with excellent robustness to noise, and variations of position, scale, angle, and shape.

Have a look at the animated demonstrations of LeNet-5, a Convolutional Nets trained to recognize handwritten digit strings.

Convolutional nets and graph transformer networks are embedded in several high speed scanners used by banks to read checks. A system I helped develop reads an estimated 10 percent of all the checks written in the US.

Check out this page, and/or read this paper to learn more about Convolutional Nets and graph transformer networks.

The MNIST database contains 60,000 training samples and 10,000 test samples of size-normalized handwritten digits. This database was derived from the original NIST databases.

MNIST is widely used by researchers as a benchmark for testing pattern recognition methods, and by students for class projects in pattern recognition, machine learning, and statistics.

I have several interests beside my family (my wife and three sons) and my research:

My former group at AT&T (the Image Processing Research Department) and its ancestor (Larry Jackel's Adaptive Systems Research Department) made numerous contributions to Machine Learning, Image Compression, Pattern Recognition, Synthetic Persons (talking heads), and Neural-Net Hardware. Specific contributions not mentioned elsewhere on this site include the ever so popular Support Vector Machine, the PlayMail and Virt2Elle synthetic talking heads, the Net32K and ANNA neural net chips, and many others. Visit my former group's home page for more details.

[HOME]	[NEWS]	[PUBLICATIONS]
[RESEARCH]	[DOWNLOADS]
[LENET]	[MUSIC]	[PHOTOS]
[HOBBIES]	[FUN]	[LINKS]