On near-uniform URL sampling
Henzinger MH, Heydon A, Mitzenmacher M, Najork M. 2000. On near-uniform URL sampling. Computer Networks. 33(1–6), 295–308.
Download
No fulltext has been uploaded. References only!
Journal Article
| Published
| English
Scopus indexed
Author
Henzinger, MonikaISTA ;
Heydon, Allan;
Mitzenmacher, Michael;
Najork, Marc
Abstract
We consider the problem of sampling URLs uniformly at random from the Web. A tool for sampling URLs uniformly can be used to estimate various properties of Web pages, such as the fraction of pages in various Internet domains or written in various languages. Moreover, uniform URL sampling can be used to determine the sizes of various search engines relative to the entire Web. In this paper, we consider sampling approaches based on random walks of the Web graph. In particular, we suggest ways of improving sampling based on random walks to make the samples closer to uniform. We suggest a natural test bed based on random graphs for testing the effectiveness of our procedures. We then use our sampling approach to estimate the distribution of pages over various Internet domains and to estimate the coverage of various search engine indexes.
Publishing Year
Date Published
2000-06-01
Journal Title
Computer Networks
Publisher
Elsevier
Volume
33
Issue
1-6
Page
295-308
ISSN
IST-REx-ID
Cite this
Henzinger MH, Heydon A, Mitzenmacher M, Najork M. On near-uniform URL sampling. Computer Networks. 2000;33(1-6):295-308. doi:10.1016/s1389-1286(00)00055-4
Henzinger, M. H., Heydon, A., Mitzenmacher, M., & Najork, M. (2000). On near-uniform URL sampling. Computer Networks. Elsevier. https://doi.org/10.1016/s1389-1286(00)00055-4
Henzinger, Monika H, Allan Heydon, Michael Mitzenmacher, and Marc Najork. “On Near-Uniform URL Sampling.” Computer Networks. Elsevier, 2000. https://doi.org/10.1016/s1389-1286(00)00055-4.
M. H. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork, “On near-uniform URL sampling,” Computer Networks, vol. 33, no. 1–6. Elsevier, pp. 295–308, 2000.
Henzinger MH, Heydon A, Mitzenmacher M, Najork M. 2000. On near-uniform URL sampling. Computer Networks. 33(1–6), 295–308.
Henzinger, Monika H., et al. “On Near-Uniform URL Sampling.” Computer Networks, vol. 33, no. 1–6, Elsevier, 2000, pp. 295–308, doi:10.1016/s1389-1286(00)00055-4.