Decoding the structure of the WWW: a comparative analysis of web crawls
- Autores
- Serrano, Maria Angeles; Maguitman, Ana Gabriela; Boguña, Marian; Fortunato, Santo; Vespignani, Alessandro
- Año de publicación
- 2007
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by char-acterizing the properties of its representative graphs, in which vertices and directed edges areidentified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls havebeen analyzed by several groups resulting in a general picture of the WWW that encompassesmany of the complex properties typical of rapidly evolving networks. In this article, we report adetailed statistical analysis of the topological properties of four different WWW graphs obtainedwith different crawlers. We find that, despite the very large size of the samples, the statistical mea-sures characterizing these graphs differ quantitatively, and in some cases qualitatively, dependingon the domain analyzed and the crawl used for gathering the data. This spurs the issue of thepresence of sampling biases and structural differences of Web crawls that might induce propertiesnot representative of the actual global underlying graph. In short, the stability of the widely ac-cepted statistical description of the Web is called into question. In order to provide a more accuratecharacterization of the Web graph, we study statistical measures beyond the degree distribution,such as degree-degree correlation functions or the statistics of reciprocal connections. The latterappears to enclose the relevant correlations of the WWW graph and carry most of the topologica.
Fil: Serrano, Maria Angeles. Indiana University; Estados Unidos. Institute for Scientific Interchange; Italia
Fil: Maguitman, Ana Gabriela. Universidad Nacional del Sur; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina
Fil: Boguña, Marian. Universitat de Barcelona; España
Fil: Fortunato, Santo. Institute for Scientific Interchange; Italia. Indiana University; Estados Unidos
Fil: Vespignani, Alessandro. Institute for Scientific Interchange; Italia. Indiana University; Estados Unidos - Materia
- World Wide Web
- Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/81668
Ver los metadatos del registro completo
id |
CONICETDig_c5439c237260b534efc055a9ef543720 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/81668 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Decoding the structure of the WWW: a comparative analysis of web crawlsSerrano, Maria AngelesMaguitman, Ana GabrielaBoguña, MarianFortunato, SantoVespignani, AlessandroWorld Wide Webhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by char-acterizing the properties of its representative graphs, in which vertices and directed edges areidentified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls havebeen analyzed by several groups resulting in a general picture of the WWW that encompassesmany of the complex properties typical of rapidly evolving networks. In this article, we report adetailed statistical analysis of the topological properties of four different WWW graphs obtainedwith different crawlers. We find that, despite the very large size of the samples, the statistical mea-sures characterizing these graphs differ quantitatively, and in some cases qualitatively, dependingon the domain analyzed and the crawl used for gathering the data. This spurs the issue of thepresence of sampling biases and structural differences of Web crawls that might induce propertiesnot representative of the actual global underlying graph. In short, the stability of the widely ac-cepted statistical description of the Web is called into question. In order to provide a more accuratecharacterization of the Web graph, we study statistical measures beyond the degree distribution,such as degree-degree correlation functions or the statistics of reciprocal connections. The latterappears to enclose the relevant correlations of the WWW graph and carry most of the topologica.Fil: Serrano, Maria Angeles. Indiana University; Estados Unidos. Institute for Scientific Interchange; ItaliaFil: Maguitman, Ana Gabriela. Universidad Nacional del Sur; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; ArgentinaFil: Boguña, Marian. Universitat de Barcelona; EspañaFil: Fortunato, Santo. Institute for Scientific Interchange; Italia. Indiana University; Estados UnidosFil: Vespignani, Alessandro. Institute for Scientific Interchange; Italia. Indiana University; Estados UnidosAssociation for Computing Machinary2007-08info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/81668Serrano, Maria Angeles; Maguitman, Ana Gabriela; Boguña, Marian; Fortunato, Santo; Vespignani, Alessandro; Decoding the structure of the WWW: a comparative analysis of web crawls; Association for Computing Machinary; Acm Transactions On The Web; 1; 2; 8-2007; 1131-11551559-1131CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://dl.acm.org/citation.cfm?id=1255438.1255442info:eu-repo/semantics/altIdentifier/doi/10.1145/1255438.1255442info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:58:02Zoai:ri.conicet.gov.ar:11336/81668instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:58:02.455CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Decoding the structure of the WWW: a comparative analysis of web crawls |
title |
Decoding the structure of the WWW: a comparative analysis of web crawls |
spellingShingle |
Decoding the structure of the WWW: a comparative analysis of web crawls Serrano, Maria Angeles World Wide Web |
title_short |
Decoding the structure of the WWW: a comparative analysis of web crawls |
title_full |
Decoding the structure of the WWW: a comparative analysis of web crawls |
title_fullStr |
Decoding the structure of the WWW: a comparative analysis of web crawls |
title_full_unstemmed |
Decoding the structure of the WWW: a comparative analysis of web crawls |
title_sort |
Decoding the structure of the WWW: a comparative analysis of web crawls |
dc.creator.none.fl_str_mv |
Serrano, Maria Angeles Maguitman, Ana Gabriela Boguña, Marian Fortunato, Santo Vespignani, Alessandro |
author |
Serrano, Maria Angeles |
author_facet |
Serrano, Maria Angeles Maguitman, Ana Gabriela Boguña, Marian Fortunato, Santo Vespignani, Alessandro |
author_role |
author |
author2 |
Maguitman, Ana Gabriela Boguña, Marian Fortunato, Santo Vespignani, Alessandro |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
World Wide Web |
topic |
World Wide Web |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/2.2 https://purl.org/becyt/ford/2 |
dc.description.none.fl_txt_mv |
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by char-acterizing the properties of its representative graphs, in which vertices and directed edges areidentified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls havebeen analyzed by several groups resulting in a general picture of the WWW that encompassesmany of the complex properties typical of rapidly evolving networks. In this article, we report adetailed statistical analysis of the topological properties of four different WWW graphs obtainedwith different crawlers. We find that, despite the very large size of the samples, the statistical mea-sures characterizing these graphs differ quantitatively, and in some cases qualitatively, dependingon the domain analyzed and the crawl used for gathering the data. This spurs the issue of thepresence of sampling biases and structural differences of Web crawls that might induce propertiesnot representative of the actual global underlying graph. In short, the stability of the widely ac-cepted statistical description of the Web is called into question. In order to provide a more accuratecharacterization of the Web graph, we study statistical measures beyond the degree distribution,such as degree-degree correlation functions or the statistics of reciprocal connections. The latterappears to enclose the relevant correlations of the WWW graph and carry most of the topologica. Fil: Serrano, Maria Angeles. Indiana University; Estados Unidos. Institute for Scientific Interchange; Italia Fil: Maguitman, Ana Gabriela. Universidad Nacional del Sur; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina Fil: Boguña, Marian. Universitat de Barcelona; España Fil: Fortunato, Santo. Institute for Scientific Interchange; Italia. Indiana University; Estados Unidos Fil: Vespignani, Alessandro. Institute for Scientific Interchange; Italia. Indiana University; Estados Unidos |
description |
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by char-acterizing the properties of its representative graphs, in which vertices and directed edges areidentified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls havebeen analyzed by several groups resulting in a general picture of the WWW that encompassesmany of the complex properties typical of rapidly evolving networks. In this article, we report adetailed statistical analysis of the topological properties of four different WWW graphs obtainedwith different crawlers. We find that, despite the very large size of the samples, the statistical mea-sures characterizing these graphs differ quantitatively, and in some cases qualitatively, dependingon the domain analyzed and the crawl used for gathering the data. This spurs the issue of thepresence of sampling biases and structural differences of Web crawls that might induce propertiesnot representative of the actual global underlying graph. In short, the stability of the widely ac-cepted statistical description of the Web is called into question. In order to provide a more accuratecharacterization of the Web graph, we study statistical measures beyond the degree distribution,such as degree-degree correlation functions or the statistics of reciprocal connections. The latterappears to enclose the relevant correlations of the WWW graph and carry most of the topologica. |
publishDate |
2007 |
dc.date.none.fl_str_mv |
2007-08 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/81668 Serrano, Maria Angeles; Maguitman, Ana Gabriela; Boguña, Marian; Fortunato, Santo; Vespignani, Alessandro; Decoding the structure of the WWW: a comparative analysis of web crawls; Association for Computing Machinary; Acm Transactions On The Web; 1; 2; 8-2007; 1131-1155 1559-1131 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/81668 |
identifier_str_mv |
Serrano, Maria Angeles; Maguitman, Ana Gabriela; Boguña, Marian; Fortunato, Santo; Vespignani, Alessandro; Decoding the structure of the WWW: a comparative analysis of web crawls; Association for Computing Machinary; Acm Transactions On The Web; 1; 2; 8-2007; 1131-1155 1559-1131 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://dl.acm.org/citation.cfm?id=1255438.1255442 info:eu-repo/semantics/altIdentifier/doi/10.1145/1255438.1255442 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Association for Computing Machinary |
publisher.none.fl_str_mv |
Association for Computing Machinary |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269497254215680 |
score |
13.13397 |