### Binary Independence Model

• Traditionally used in conjunction with PRP

• “Binary” = Boolean: documents are represented as binary incidence vectors of terms (cf. lecture 1):

• xi=1  iff term i is present in document x.
• “Independence”: terms occur in documents independently

• Different documents can be modeled as same vector

• Bernoulli Naive Bayes model (cf. text categorization!)

### Binary Independence Model

• Queries: binary term incidence vectors

• Given query q,

• for each document d need to compute p(R|q,d).

• replace with computing p(R|q,x) where x is binary term incidence vector representing d Interested only in ranking

• Will use odds and Bayes’ Rule:

### Binary Independence Model

• Using Independence Assumption:

• So :

### Binary Independence Model

• Since xi is either 0 or 1:

• Let Pi = P(xi = 1 | R,q);  rt = p (xi =1 | NR,q)

• Assume, for all terms not occurring in the query (qi=0) pi = ri

• Then...                                                                   (This can be changed (e.g.,

in relevance feedback)

### Binary Independence Model

• Retrieval Status Value:

### Binary Independence Model

• All boils down to computing RSV.

• So, how do we compute ci’s from our data ?

### Binary Independence Model

• Estimating RSV coefficients.

• For each term i look at this table of document counts:

Creator: Tgbyrdmc

Contributors:
-