Monday, March 23, 2009

Writing update

Continuing to write up my submission for the Information Retrieval Journal's special issue on learning to rank. Submission will include results from the ECML paper as well as the most recent results from the nonrandom seeding experiments. I'll post the submission later this week.

Tuesday, March 10, 2009

A Meta-Learning Approach for Robust Rank Learning

Back from St. Louis. As promised, paper reviews.

Carvalho et. al. 2008. "A Meta-Learning Approach for Robust Rank Learning". In proceedings of SIGIR 2008 LR4IR - Workshop on Learning to Rank for Information Retrieval. [pdf]

The main crux of interest for me in this paper was its focus on how outliers affect learning ranking functions, specifically the effect of mislabeling on pairwise ranking functions. Mislabeling of a document pair propagates into a quadratic increase in the number of outliers, "outliers" in this case referring to mislabeled and non-relevant documents.

Central to the identification of outliers in this work is the pairwise decision score between two documents. This score is

P_t = z_ql * (s(d_iq) - s(d_jq)

which is difference in score given to a document pair d_i, d_j for a query q from a scoring system s(), multiplied by z_ql which is +1 or -1 depending on the true preference of d_iq to d_jq. P_t is positive if the documents are correctly ranked by s(), negative otherwise. The authors show that by training a model, calculating pairwise decision scores for the training data, and then retraining the model without documents with a P_t below some cutoff value, performance is increased. This is straightforward, and to be expected - if you remove mislabeled instances, the errors they introduce will not propogate to other pairs.

In order to capitalize on this finding, the authors propose a meta-learning method for optimizing linear rankers. After learning a base ranker (e.g. perceptron, RankSVM), the ranker is re-optimized by incorporating a sigmoid loss function in the optimization. This reduces the effects of the outliers on the learned model. By differentiating this new optimization, a gradient descent algorithm can be devised.

Experiments were carried out on a number of standard test corpora, showing performance for perceptron, RankSVM and ListNet, along with performance for these algorithms using the sigmoid meta-learning. The addition of the sigmoid resulted in superior performance in nearly every case, with around two-thirds of the performance gains being statistically significant.

Digging deeper into the results, we see that the significant performance gains come mostly from the perceptron and ListNet. Only half of the SVM gains were statistically significant, and even there the gains were not staggering; the greatest significant gain was from a MAP (mean average precision) of 0.472 to 0.480.

My main takeaway from the paper is that metalearning is a terrific thing to do if you're using a perceptron, a good thing to do if you're using ListNet, and an okay thing to do if you're using RankSVM. This is, of course, assuming that any performance gains are worth the extra time and resources expended in the optimization. I am unconvinced that doing so would be a good idea for my research. I do, however, think that I might use the pairwise decision score to try to identify potentially mislabeled documents for the user to look at again.