André Martins (Priberam/IT)
Turning on the Turbo in Turbo Parsing
In the first part of this talk, I will present AD^3 (Alternating Direction Dual Decomposition), a new decoding algorithm for approximate LP-MAP inference in constrained factor graphs. The LP-MAP approximation consists in ignoring global effects caused by the cycles of the graph, and can be seen as a linear relaxation of the original problem. The proposed algorithm can handle arbitrary first-order logic constraints and is suitable to massive decompositions, unlike previously proposed dual decomposition algorithms. As an intermediate step, it requires solving small quadratic programs, for which I provide closed form solutions or efficient procedures.
In the second part of the talk, I will apply this methodology to dependency syntax with rich-feature models. I will start by formulating dependency parsing as a concise integer linear program, which is relaxed for tractability. A constrained factor graph is then constructed for this problem and the relaxation is shown to be equivalent to LP-MAP inference in such graph. The resulting framework is called "turbo parsing," and includes as particular cases other parsers proposed in the literature. Finally, I will apply AD^3 for solving the relaxation.
Experiments in 14 languages yield state-of-art results, with parsing speeds ranging between 700 and 4,000 tokens per second.
This work was done in collaboration with Noah Smith, Mário Figueiredo, Eric Xing, Pedro Aguiar, and Miguel Almeida.
Bio: André Martins is a research scientist at Priberam Labs, in Lisbon. He received his dual PhD in Language Technologies from Instituto Superior Técnico and Carnegie Mellon University, in 2012. His research interests include statistical natural language processing, machine learning for structured data, and optimization algorithms. He received a best paper award at the ACL 2009 conference and the Portuguese IBM 2011 Scientific Prize.