Alternative Models of Text Generation

Retrieval Using Language Models

  • Retrieval: Query likelihood (1), Document likelihood (2), Model comparison (3)

Query Likelihood

  • P(Q|Dm)

  • Major issue is estimating document model

    • i.e. smoothing techniques instead of tf.idf weights

  • Good retrieval results

    • e.g. UMass, BBN, Twente, CMU

  • Problems dealing with relevance feedback, query expansion, structured queries

Document Likelihood

  • Rank by likelihood ratio P(D|R)/P(D|NR)

    • treat as a generation problem

    • P(w|R) is estimated by P(w|Qm)

    • Qm is the query or relevance model

    • P(w|NR) is estimated by collection probabilities P(w)

  • Issue is estimation of query model

    • Treat query as generated by mixture of topic and background

    • Estimate relevance model from related documents (query expansion)

    • Relevance feedback is easily incorporated

  • Good retrieval results

    • e.g. UMass at SIGIR 01

    • inconsistent with heterogeneous document collections

Creator: Tgbyrdmc


Licensed under the Creative Commons
Attribution ShareAlike CC-BY-SA license

This deck was created using SlideWiki.