Machine learning and deep learning is on a roll with many recent innovations and successes. In the past applying learning algorithms often meant spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas.
Thus, innovative deep learning methods using deep convolutional neural networks, new system architectures, and unsupervised feature learning, which automatically learn a good representation of the input from unlabeled data, is encouraging. Yet more innovation is required to shift the burden from humans to smart machines.
Here are links to recent papers.
A recent 2015 paper entitled "Object Detectors Emerge in Deep Scene CNNs" discusses the use of new system architectures, deep convolutional neural networks and image databases to detect objects and classify scenes. They demonstrate how the same network can learn to recognize scenes and perform both scene recognition and object localization without being taught the notion of objects.
A recent 2015 paper entitled "Deep Image: Scaling up Image Recognition" details use of a highly optimized parallel algorithm, new data partitioning and communication methods, larger deep neural network models, innovative data augmentation approaches, and usage of multi-scale high-resolution images.
In 2015 "Large-scale Classification of Fine-Art Paintings - Learning The Right Metric on The Right Feature" introduced new machine learning techniques to train algorithms to recognize the artist and style of a fine-art painting with incredible accuracy outperforming humans. The results reveal connections between artists, and between entire painting styles, that art historians have labored for years to understand.
In 2014 "Learning Deep Features for Scene Recognition using Places Database" introduced a new scene-centric database called "Places" with over 7 million labeled pictures of scenes and proposed new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, they learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets.
Deep Karaoke - Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
Learning Temporal Embeddings for Complex Video Analysis
Visual Noise from Natural Scene Statistics Reveals Human Scene Category Representations
VideoSET - Video Summary Evaluation through Text