Data Integration

  • Data integration:
    • Combines data from multiple sources into a coherent store
  • Schema integration: e.g., A.cust-id ≡ B.cust-#
    • Integrate metadata from different sources
  • Entity identification problem:
    • Identify real world entities from multiple data sources, e.g., Bill Clinton = William Clinton
  • Detecting and resolving data value conflicts
    • For the same real world entity, attribute values from different sources are different
    • Possible reasons: different representations, different scales, e.g., metric vs. British units

