Binary Independence Model

  • Traditionally used in conjunction with PRP

  • “Binary” = Boolean: documents are represented as binary incidence vectors of terms (cf. lecture 1):

  • xi=1  iff term i is present in document x.
  • “Independence”: terms occur in documents independently

  • Different documents can be modeled as same vector

  • Bernoulli Naive Bayes model (cf. text categorization!)

Binary Independence Model

  • Queries: binary term incidence vectors

  • Given query q,

    • for each document d need to compute p(R|q,d).

    • replace with computing p(R|q,x) where x is binary term incidence vector representing d Interested only in ranking

  • Will use odds and Bayes’ Rule:

Binary Independence Model

  • Using Independence Assumption:

  • So :

Binary Independence Model

  • Since xi is either 0 or 1:

  • Let Pi = P(xi = 1 | R,q);  rt = p (xi =1 | NR,q)

  • Assume, for all terms not occurring in the query (qi=0) pi = ri

  • Then...                                                                   (This can be changed (e.g.,

                                                                                        in relevance feedback)

Binary Independence Model

Binary Independence Model

  • Retrieval Status Value:

Binary Independence Model

  • All boils down to computing RSV.

  • So, how do we compute ci’s from our data ?

Binary Independence Model

  • Estimating RSV coefficients.

  • For each term i look at this table of document counts:

Creator: Tgbyrdmc


Licensed under the Creative Commons
Attribution ShareAlike CC-BY-SA license

This deck was created using SlideWiki.