---
_id: '11685'
abstract:
- lang: eng
  text: We consider the problem of sampling URLs uniformly at random from the Web.
    A tool for sampling URLs uniformly can be used to estimate various properties
    of Web pages, such as the fraction of pages in various Internet domains or written
    in various languages. Moreover, uniform URL sampling can be used to determine
    the sizes of various search engines relative to the entire Web. In this paper,
    we consider sampling approaches based on random walks of the Web graph. In particular,
    we suggest ways of improving sampling based on random walks to make the samples
    closer to uniform. We suggest a natural test bed based on random graphs for testing
    the effectiveness of our procedures. We then use our sampling approach to estimate
    the distribution of pages over various Internet domains and to estimate the coverage
    of various search engine indexes.
article_processing_charge: No
article_type: original
author:
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: Allan
  full_name: Heydon, Allan
  last_name: Heydon
- first_name: Michael
  full_name: Mitzenmacher, Michael
  last_name: Mitzenmacher
- first_name: Marc
  full_name: Najork, Marc
  last_name: Najork
citation:
  ama: Henzinger MH, Heydon A, Mitzenmacher M, Najork M. On near-uniform URL sampling.
    <i>Computer Networks</i>. 2000;33(1-6):295-308. doi:<a href="https://doi.org/10.1016/s1389-1286(00)00055-4">10.1016/s1389-1286(00)00055-4</a>
  apa: Henzinger, M. H., Heydon, A., Mitzenmacher, M., &#38; Najork, M. (2000). On
    near-uniform URL sampling. <i>Computer Networks</i>. Elsevier. <a href="https://doi.org/10.1016/s1389-1286(00)00055-4">https://doi.org/10.1016/s1389-1286(00)00055-4</a>
  chicago: Henzinger, Monika H, Allan Heydon, Michael Mitzenmacher, and Marc Najork.
    “On Near-Uniform URL Sampling.” <i>Computer Networks</i>. Elsevier, 2000. <a href="https://doi.org/10.1016/s1389-1286(00)00055-4">https://doi.org/10.1016/s1389-1286(00)00055-4</a>.
  ieee: M. H. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork, “On near-uniform
    URL sampling,” <i>Computer Networks</i>, vol. 33, no. 1–6. Elsevier, pp. 295–308,
    2000.
  ista: Henzinger MH, Heydon A, Mitzenmacher M, Najork M. 2000. On near-uniform URL
    sampling. Computer Networks. 33(1–6), 295–308.
  mla: Henzinger, Monika H., et al. “On Near-Uniform URL Sampling.” <i>Computer Networks</i>,
    vol. 33, no. 1–6, Elsevier, 2000, pp. 295–308, doi:<a href="https://doi.org/10.1016/s1389-1286(00)00055-4">10.1016/s1389-1286(00)00055-4</a>.
  short: M.H. Henzinger, A. Heydon, M. Mitzenmacher, M. Najork, Computer Networks
    33 (2000) 295–308.
date_created: 2022-07-28T15:11:53Z
date_published: 2000-06-01T00:00:00Z
date_updated: 2022-09-12T09:09:13Z
day: '01'
doi: 10.1016/s1389-1286(00)00055-4
extern: '1'
intvolume: '        33'
issue: 1-6
keyword:
- URL sampling
- Random walks
- Internet domain distribution
- Search engine size
language:
- iso: eng
month: '06'
oa_version: None
page: 295-308
publication: Computer Networks
publication_identifier:
  issn:
  - 1389-1286
publication_status: published
publisher: Elsevier
quality_controlled: '1'
scopus_import: '1'
status: public
title: On near-uniform URL sampling
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 33
year: '2000'
...
---
_id: '11687'
abstract:
- lang: eng
  text: "When using traditional search engines, users have to formulate queries to
    describe their information need. This paper discusses a different approach to
    Web searching where the input to the search process is not a set of query terms,
    but instead is the URL of a page, and the output is a set of related Web pages.
    A related Web page is one that addresses the same topic as the original page.
    For example, www.washingtonpost.com is a page related to www.nytimes.com, since
    both are online newspapers.\r\n\r\nWe describe two algorithms to identify related
    Web pages. These algorithms use only the connectivity information in the Web (i.e.,
    the links between pages) and not the content of pages or usage information. We
    have implemented both algorithms and measured their runtime performance. To evaluate
    the effectiveness of our algorithms, we performed a user study comparing our algorithms
    with Netscape's `What's Related' service (http://home.netscape.com/escapes/related/).
    Our study showed that the precision at 10 for our two algorithms are 73% better
    and 51% better than that of Netscape, despite the fact that Netscape uses both
    content and usage pattern information in addition to connectivity information."
article_processing_charge: No
article_type: original
author:
- first_name: Jeffrey
  full_name: Dean, Jeffrey
  last_name: Dean
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
citation:
  ama: Dean J, Henzinger MH. Finding related pages in the world wide Web. <i>Computer
    Networks</i>. 1999;31(11-16):1467-1479. doi:<a href="https://doi.org/10.1016/s1389-1286(99)00022-5">10.1016/s1389-1286(99)00022-5</a>
  apa: Dean, J., &#38; Henzinger, M. H. (1999). Finding related pages in the world
    wide Web. <i>Computer Networks</i>. Elsevier. <a href="https://doi.org/10.1016/s1389-1286(99)00022-5">https://doi.org/10.1016/s1389-1286(99)00022-5</a>
  chicago: Dean, Jeffrey, and Monika H Henzinger. “Finding Related Pages in the World
    Wide Web.” <i>Computer Networks</i>. Elsevier, 1999. <a href="https://doi.org/10.1016/s1389-1286(99)00022-5">https://doi.org/10.1016/s1389-1286(99)00022-5</a>.
  ieee: J. Dean and M. H. Henzinger, “Finding related pages in the world wide Web,”
    <i>Computer Networks</i>, vol. 31, no. 11–16. Elsevier, pp. 1467–1479, 1999.
  ista: Dean J, Henzinger MH. 1999. Finding related pages in the world wide Web. Computer
    Networks. 31(11–16), 1467–1479.
  mla: Dean, Jeffrey, and Monika H. Henzinger. “Finding Related Pages in the World
    Wide Web.” <i>Computer Networks</i>, vol. 31, no. 11–16, Elsevier, 1999, pp. 1467–79,
    doi:<a href="https://doi.org/10.1016/s1389-1286(99)00022-5">10.1016/s1389-1286(99)00022-5</a>.
  short: J. Dean, M.H. Henzinger, Computer Networks 31 (1999) 1467–1479.
date_created: 2022-07-29T06:55:26Z
date_published: 1999-05-17T00:00:00Z
date_updated: 2022-09-12T09:12:21Z
day: '17'
doi: 10.1016/s1389-1286(99)00022-5
extern: '1'
intvolume: '        31'
issue: 11-16
keyword:
- Search engines
- Related pages
- Searching paradigms
language:
- iso: eng
month: '05'
oa_version: None
page: 1467-1479
publication: Computer Networks
publication_identifier:
  issn:
  - 1389-1286
publication_status: published
publisher: Elsevier
quality_controlled: '1'
scopus_import: '1'
status: public
title: Finding related pages in the world wide Web
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 31
year: '1999'
...
---
_id: '11688'
abstract:
- lang: eng
  text: Recent research has studied how to measure the size of a search engine, in
    terms of the number of pages indexed. In this paper, we consider a different measure
    for search engines, namely the quality of the pages in a search engine index.
    We provide a simple, effective algorithm for approximating the quality of an index
    by performing a random walk on the Web, and we use this methodology to compare
    the index quality of several major search engines.
article_processing_charge: No
article_type: original
author:
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: Allan
  full_name: Heydon, Allan
  last_name: Heydon
- first_name: Michael
  full_name: Mitzenmacher, Michael
  last_name: Mitzenmacher
- first_name: Marc
  full_name: Najork, Marc
  last_name: Najork
citation:
  ama: Henzinger MH, Heydon A, Mitzenmacher M, Najork M. Measuring index quality using
    random walks on the web. <i>Computer Networks</i>. 1999;31(11-16):1291-1303. doi:<a
    href="https://doi.org/10.1016/s1389-1286(99)00016-x">10.1016/s1389-1286(99)00016-x</a>
  apa: Henzinger, M. H., Heydon, A., Mitzenmacher, M., &#38; Najork, M. (1999). Measuring
    index quality using random walks on the web. <i>Computer Networks</i>. Elsevier.
    <a href="https://doi.org/10.1016/s1389-1286(99)00016-x">https://doi.org/10.1016/s1389-1286(99)00016-x</a>
  chicago: Henzinger, Monika H, Allan Heydon, Michael Mitzenmacher, and Marc Najork.
    “Measuring Index Quality Using Random Walks on the Web.” <i>Computer Networks</i>.
    Elsevier, 1999. <a href="https://doi.org/10.1016/s1389-1286(99)00016-x">https://doi.org/10.1016/s1389-1286(99)00016-x</a>.
  ieee: M. H. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork, “Measuring index
    quality using random walks on the web,” <i>Computer Networks</i>, vol. 31, no.
    11–16. Elsevier, pp. 1291–1303, 1999.
  ista: Henzinger MH, Heydon A, Mitzenmacher M, Najork M. 1999. Measuring index quality
    using random walks on the web. Computer Networks. 31(11–16), 1291–1303.
  mla: Henzinger, Monika H., et al. “Measuring Index Quality Using Random Walks on
    the Web.” <i>Computer Networks</i>, vol. 31, no. 11–16, Elsevier, 1999, pp. 1291–303,
    doi:<a href="https://doi.org/10.1016/s1389-1286(99)00016-x">10.1016/s1389-1286(99)00016-x</a>.
  short: M.H. Henzinger, A. Heydon, M. Mitzenmacher, M. Najork, Computer Networks
    31 (1999) 1291–1303.
date_created: 2022-07-29T07:00:28Z
date_published: 1999-05-17T00:00:00Z
date_updated: 2022-09-12T09:13:55Z
day: '17'
doi: 10.1016/s1389-1286(99)00016-x
extern: '1'
intvolume: '        31'
issue: 11-16
keyword:
- Search engines
- Index quality
- Random walks
- PageRank
language:
- iso: eng
month: '05'
oa_version: None
page: 1291-1303
publication: Computer Networks
publication_identifier:
  issn:
  - 1389-1286
publication_status: published
publisher: Elsevier
quality_controlled: '1'
scopus_import: '1'
status: public
title: Measuring index quality using random walks on the web
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 31
year: '1999'
...
