The document ranking problem

  • We have a collection of documents

  • User issues a query

  • A list of documents needs to be returned

  • Ranking method is core of an IR system:

    • In what order do we present documents to the user?

    • We want the “best” document to be first, second best second, etc….

  • Idea: Rank by probability of relevance of the document w.r.t. information need

    • P(relevant|documenti, query)

