---
_id: '11906'
abstract:
- lang: eng
  text: In the origin detection problem an algorithm is given a set S of documents,
    ordered by creation time, and a query document D. It needs to output for every
    consecutive sequence of k alphanumeric terms in D the earliest document in $S$
    in which the sequence appeared (if such a document exists). Algorithms for the
    origin detection problem can, for example, be used to detect the "origin" of text
    segments in D and thus to detect novel content in D. They can also find the document
    from which the author of D has copied the most (or show that D is mostly original.)
    We concentrate on solutions that use only a fixed amount of memory. We propose
    novel algorithms for this problem and evaluate them together with a large number
    of previously published algorithms. Our results show that (1) detecting the origin
    of text segments efficiently can be done with very high accuracy even when the
    space used is less than 1% of the size of the documents in $S$, (2) the precision
    degrades smoothly with the amount of available space, (3) various estimation techniques
    can be used to increase the performance of the algorithms.
article_processing_charge: No
author:
- first_name: Ossama
  full_name: Abdel Hamid, Ossama
  last_name: Abdel Hamid
- first_name: Behshad
  full_name: Behzadi, Behshad
  last_name: Behzadi
- first_name: Stefan
  full_name: Christoph, Stefan
  last_name: Christoph
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
citation:
  ama: 'Abdel Hamid O, Behzadi B, Christoph S, Henzinger MH. Detecting the origin
    of text segments efficiently. In: <i>18th International World Wide Web Conference</i>.
    Association for Computing Machinery; 2009:61-70. doi:<a href="https://doi.org/10.1145/1526709.1526719">10.1145/1526709.1526719</a>'
  apa: 'Abdel Hamid, O., Behzadi, B., Christoph, S., &#38; Henzinger, M. H. (2009).
    Detecting the origin of text segments efficiently. In <i>18th International World
    Wide Web Conference</i> (pp. 61–70). Madrid, Spain: Association for Computing
    Machinery. <a href="https://doi.org/10.1145/1526709.1526719">https://doi.org/10.1145/1526709.1526719</a>'
  chicago: Abdel Hamid, Ossama, Behshad Behzadi, Stefan Christoph, and Monika H Henzinger.
    “Detecting the Origin of Text Segments Efficiently.” In <i>18th International
    World Wide Web Conference</i>, 61–70. Association for Computing Machinery, 2009.
    <a href="https://doi.org/10.1145/1526709.1526719">https://doi.org/10.1145/1526709.1526719</a>.
  ieee: O. Abdel Hamid, B. Behzadi, S. Christoph, and M. H. Henzinger, “Detecting
    the origin of text segments efficiently,” in <i>18th International World Wide
    Web Conference</i>, Madrid, Spain, 2009, pp. 61–70.
  ista: 'Abdel Hamid O, Behzadi B, Christoph S, Henzinger MH. 2009. Detecting the
    origin of text segments efficiently. 18th International World Wide Web Conference.
    WWW: International Conference on World Wide Web, 61–70.'
  mla: Abdel Hamid, Ossama, et al. “Detecting the Origin of Text Segments Efficiently.”
    <i>18th International World Wide Web Conference</i>, Association for Computing
    Machinery, 2009, pp. 61–70, doi:<a href="https://doi.org/10.1145/1526709.1526719">10.1145/1526709.1526719</a>.
  short: O. Abdel Hamid, B. Behzadi, S. Christoph, M.H. Henzinger, in:, 18th International
    World Wide Web Conference, Association for Computing Machinery, 2009, pp. 61–70.
conference:
  end_date: 2009-04-24
  location: Madrid, Spain
  name: 'WWW: International Conference on World Wide Web'
  start_date: 2009-04-20
date_created: 2022-08-17T11:54:30Z
date_published: 2009-04-01T00:00:00Z
date_updated: 2023-02-17T14:56:47Z
day: '01'
doi: 10.1145/1526709.1526719
extern: '1'
language:
- iso: eng
month: '04'
oa_version: None
page: 61-70
publication: 18th International World Wide Web Conference
publication_identifier:
  isbn:
  - 978-160558487-4
publication_status: published
publisher: Association for Computing Machinery
quality_controlled: '1'
scopus_import: '1'
status: public
title: Detecting the origin of text segments efficiently
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2009'
...
