The Priberam Machine Learning Lunch Seminars are a series of informal meetings which occur every two weeks at Instituto Superior Técnico, in Lisbon. It works as a discussion forum involving different research groups, from IST and elsewhere. Its participants are interested in areas such as (but not limited to): statistical machine learning, signal processing, pattern recognition, computer vision, natural language processing, computational biology, neural networks, control systems, reinforcement learning, or anything related (even if vaguely) with machine learning.

The seminars last for about one hour (including time for discussion and questions) and revolve around the general topic of Machine Learning. The speaker is a volunteer who decides the topic of her presentation. Past seminars have included presentations about state-of-the-art research, surveys and tutorials, practicing a conference talk, presenting a challenging problem and asking for help, and illustrating an interesting application of Machine Learning such as a prototype or finished product.

Presenters can have any background: undergrads, graduate students, academic researchers, company staff, etc. Anyone is welcome both to attend the seminar as well as to present it. Ocasionally we will have invited speakers. See below for a list of all seminars, including the speakers, titles and abstracts.

Note: The seminars are held at lunch-time, and include delicious free food.

Feel free to join our mailing list, where seminar topics are announced beforehand. You may also visit the mailing list webpage. Anyone can attend the seminars; no registration is necessary. If you would like to present something, please send us an email.

The seminars are usually held every other Tuesday, from 1 PM to 2 PM, at the IST campus in Alameda. This sometimes changes due to availability of the speakers, so check regularly!

Tuesday, June 9th 2015, 13h00 - 14h00

Sílvio Amir (INESC-ID, IST)

Towards Social Media Analysis for Low-Resource Languages

Anfiteatro do Complexo Interdisciplinar

Instituto Superior Técnico - Alameda


Modern web-based social networks have become platforms where individuals can express personal views and discuss relevant issues in real-time. The possibility of analysing this massive aggregation of thoughts and opinions has applications in several domains, ranging from finance and marketing to the social sciences. However, the development of social media analysis tools is still slow and expensive, as most of the current approaches depend on hand-crafted lexicons, extensive feature engineering and large amounts of labeled data. This is even more problematic for languages other than English, where annotated corpora and linguistic resources are either scarce or non-existent.

This talk addresses the issue of building sentiment analysis systems for social media with limited resources. We present two methods that leverage word embeddings computed from raw text, to reduce the manual efforts required to develop these applications. The first, consists of training a predictive model to induce large-scale lexicons using embeddings as features and pre-existing lexicons as labeled data. The second, is an approach to jointly learn a classifier and task-specific features from unsupervised embeddings, when only small and noisy labeled datasets are available. We estimate a projection to a small embedding subspace that captures the most relevant information for the task. This allows us to adapt all the word representations, even if they do not occur on the labeled data. At the same time, we reduce the number of parameters of the model, thus reducing the risk of overfitting. These methods were used to participate in the 2015 edition of SemEval Twitter sentiment analysis benchmarks, attaining state-of-the-art results. We will present our participation in this challenge, and report on additional experiments that attest to the adequacy of the proposed approaches.


Bio: Silvio Amir is a phd student at IST and is conducting research at INESC-ID Lisboa. His research interests include natural language processing for noisy domains, information retrieval and affective computing applied to social media analysis.