### Binary Independence Model

• Traditionally used in conjunction with PRP

• “Binary” = Boolean: documents are represented as binary incidence vectors of terms (cf. lecture 1):

• • xi=1  iff term i is present in document x.
• “Independence”: terms occur in documents independently

• Different documents can be modeled as same vector

• Bernoulli Naive Bayes model (cf. text categorization!)

### Binary Independence Model

• Queries: binary term incidence vectors

• Given query q,

• for each document d need to compute p(R|q,d).

• replace with computing p(R|q,x) where x is binary term incidence vector representing d Interested only in ranking

• Will use odds and Bayes’ Rule: ### Binary Independence Model • Using Independence Assumption: • So :  ### Binary Independence Model • Since xi is either 0 or 1: • Let Pi = P(xi = 1 | R,q);  rt = p (xi =1 | NR,q)

• Assume, for all terms not occurring in the query (qi=0) pi = ri

• Then...                                                                   (This can be changed (e.g.,

in relevance feedback)

### Binary Independence Model ### Binary Independence Model • Retrieval Status Value: ### Binary Independence Model

• All boils down to computing RSV. • So, how do we compute ci’s from our data ?

### Binary Independence Model

• Estimating RSV coefficients.

• For each term i look at this table of document counts: Creator: Tgbyrdmc

Contributors:
-

Licensed under the Creative Commons 