Develop machine learning algorithms for predict numeric quantities about an event given a body of text in natural language.
A movie by a famous director has just premiered and generates a lot of user content in the social media (Twitter, Blogspot, Facebook, Google+). Many posts express a negative feeling about the movie. Can we predict a modest gross revenue for this movie?
The goal of text-driven forecasting is to build a system that is able to predict numeric quantities given a body of text in natural language. Examples are: predicting the revenue of movies from Twitter posts, predicting opinion polls from blogs, predicting stock volatility given financial reports, predicting the number of external links given a news article, etc. The goal of this project is to apply machine learning techniques, such as regression, for this task.
There are no mandatory requisites. Some programming experience (in languages like C/C++, Java, Python, Matlab, etc) is preferred.
At the end of the project, the student should have created a system able to predict numeric quantities about an event given a body of text in natural language.
Movie Reviews and Revenues: An Experiment in Text Regression
, Dipanjan Das
, Kevin Gimpel
, and Noah A. Smith
In Proceedings of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Conference, Los Angeles, CA, June 2010.
. Noah A. Smith
. March 2010. arXiv
Movie dataset: http://www.ark.cs.cmu.edu/movie$-data/