Many recent competitive state-of-the-art solutions for understanding of speech data have in common to be probabilistic and to rely on machine learning algorithms to train their models from large amount of data. The difficulty remains in the cost and time of collecting and annotating such data, but also to update the existing models to new conditions, tasks and/or languages. In the present work an approach based on a zeroshot learning method using word embeddings for spoken language understanding is investigated. This approach requires no dedicated data. Large amounts of un-annotated and unstructured found data are used to learn a continuous space vector representation of words, based on neural network architectures.
Only the ontological description of the target domain and the generic word embedding features are then required to derive the model used for decoding. In this paper, we extend this baseline with an online adaptative strategy allowing to refine progressively the initial model with only a light and adjustable supervision. We show that this proposition can significantly improve the performance of the spoken language understanding module on the second Dialog State Tracking Challenge (DSTC2) datasets.