Latest Posts

  1. A short introduction on Deep and Recurrent methods for Neural Networks

    Conventional machine learning techniques have limitations in their ability to process raw data. The implementation of such methods often requires domain expertise and delicate engineering. On the other hand Deep Learning algorithms have shown another way forward. Representation learning allows for the discovery of suitable representations from the raw data.

    By passing the data through multiple non-linear layers, each layer transforms the data to a different representation, having as input the output of the layer below. Due to the the distributed way of encoding the raw input, the multiple representation levels, and the power of composition; deep networks have shown promising results in varying applications, and established new records in speech and image recognition.

    By pre-training layers like these, of gradually more complicated feature extractors, the weights of the network can be initialised in “good” values. By adding an extra layer of the whole system can then be trained and fine tuned with standard backpropagation. The hidden layers of a multilayer neural network are learning to represent the network’s inputs in a way that makes it easier to predict the target outputs. This is nicely demonstrated by training a multilayer neural network to predict the next word in a sequence from a local context of local words.

    When trained to predict the next word in a news story, for example, the learned word vectors for Tuesday and Wednesday are very similar, as are the word vectors for Sweden and Norway. Such representations are called distributed representations because their elements (the features) are not mutually exclusive and their many configurations correspond to the variations seen in the observed data. These word vectors are composed of learned features that were not determined ahead of time by experts, but automatically discovered by the neural network. Vector representations of words learned from text are now very widely used in natural language applications.

    Another type of networks that have shown interesting results are Recurrent Neural Networks (RNN). RNNs try to capture the temporal aspects of the data fed to them, by considering multiple time steps of the data in their processing. Thanks to advances in their architecture [1, 2] and ways of training them [3, 4], RNNs have been found to be very good at predicting the next character in the text [5] or the next word in a sequence [6], but they can also be used for more complex tasks. For example, after reading an English sentence one word at a time, an English ‘encoder’ network can be trained so that the final state vector of its hidden units is a good representation of the thought expressed by the sentence.

    Despite their flexibility and power, DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. It is a significant limitation, since many important problems are best expressed with sequences whose lengths are not known a priori. For example, speech recognition and machine translation.


    1. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
    2. ElHihi, S. & Bengio, Y. Hierarchical recurrent neural networks for long-term dependencies. In Proc. Advances in Neural Information Processing Systems 8 (1995).
    3. Sutskever, I. Training Recurrent Neural Networks. PhD thesis, Univ. Toronto (2012).
    4. Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks. In Proc. 30th International Conference on Machine Learning 1310– 1318 (2013).
    5. Sutskever, I., Martens, J. & Hinton, G. E. Generating text with recurrent neural networks. In Proc. 28th International Conference on Machine Learning 1017– 1024 (2011)
    6. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.