
The mean over all examples ("All") is clearly dominated by the interquartile means. This bump in performance means that performance using LETOR seeds is now more pronounced, as seen below.
Again, the graph shows NDCGs vs. total examples seen, in this case for IQ NDCGs for l10 and the baseline.Later today or tomorrow I'm going to do an analysis of the distribution of NDCGs across all queries. I'm hoping to show that there are tight standard deviations with a very large total range, which will allow me to argue that discounting outliers is indeed valid.
No comments:
Post a Comment