Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community

Autores
Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo
Año de publicación
2022
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs.
Fil: Da Rocha Araujo, Leonardo Henrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Rodríguez, Guillermo Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Vidal, Santiago Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Marcos, Claudia Andrea. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; Argentina
Fil: Pereira Dos Santos, Rodrigo. Universidade Federal do Estado do Rio de Janeiro; Brasil
Materia
APIS
DOCUMENTATION
OPENAPI
RESTFUL WEB SERVICES
TOPIC COHERENCE
TOPIC MODELING
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/215064

id CONICETDig_3b51fd24ee467ede3dda339967678da9
oai_identifier_str oai:ri.conicet.gov.ar:11336/215064
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer CommunityDa Rocha Araujo, Leonardo HenriqueRodríguez, Guillermo HoracioVidal, Santiago AgustínMarcos, Claudia AndreaPereira Dos Santos, RodrigoAPISDOCUMENTATIONOPENAPIRESTFUL WEB SERVICESTOPIC COHERENCETOPIC MODELINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs.Fil: Da Rocha Araujo, Leonardo Henrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Rodríguez, Guillermo Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Vidal, Santiago Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Marcos, Claudia Andrea. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; ArgentinaFil: Pereira Dos Santos, Rodrigo. Universidade Federal do Estado do Rio de Janeiro; BrasilSlovak Acad Sciences Inst Informatics2022-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/215064Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo; Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community; Slovak Acad Sciences Inst Informatics; Computing And Informatics; 40; 6; 2-2022; 1345-13691335-9150CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.cai.sk/ojs/index.php/cai/article/view/2021_6_1345info:eu-repo/semantics/altIdentifier/doi/10.31577/cai_2021_6_1345info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T10:02:17Zoai:ri.conicet.gov.ar:11336/215064instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 10:02:17.707CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
title Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
spellingShingle Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
Da Rocha Araujo, Leonardo Henrique
APIS
DOCUMENTATION
OPENAPI
RESTFUL WEB SERVICES
TOPIC COHERENCE
TOPIC MODELING
title_short Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
title_full Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
title_fullStr Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
title_full_unstemmed Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
title_sort Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
dc.creator.none.fl_str_mv Da Rocha Araujo, Leonardo Henrique
Rodríguez, Guillermo Horacio
Vidal, Santiago Agustín
Marcos, Claudia Andrea
Pereira Dos Santos, Rodrigo
author Da Rocha Araujo, Leonardo Henrique
author_facet Da Rocha Araujo, Leonardo Henrique
Rodríguez, Guillermo Horacio
Vidal, Santiago Agustín
Marcos, Claudia Andrea
Pereira Dos Santos, Rodrigo
author_role author
author2 Rodríguez, Guillermo Horacio
Vidal, Santiago Agustín
Marcos, Claudia Andrea
Pereira Dos Santos, Rodrigo
author2_role author
author
author
author
dc.subject.none.fl_str_mv APIS
DOCUMENTATION
OPENAPI
RESTFUL WEB SERVICES
TOPIC COHERENCE
TOPIC MODELING
topic APIS
DOCUMENTATION
OPENAPI
RESTFUL WEB SERVICES
TOPIC COHERENCE
TOPIC MODELING
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs.
Fil: Da Rocha Araujo, Leonardo Henrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Rodríguez, Guillermo Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Vidal, Santiago Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Marcos, Claudia Andrea. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; Argentina
Fil: Pereira Dos Santos, Rodrigo. Universidade Federal do Estado do Rio de Janeiro; Brasil
description OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs.
publishDate 2022
dc.date.none.fl_str_mv 2022-02
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/215064
Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo; Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community; Slovak Acad Sciences Inst Informatics; Computing And Informatics; 40; 6; 2-2022; 1345-1369
1335-9150
CONICET Digital
CONICET
url http://hdl.handle.net/11336/215064
identifier_str_mv Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo; Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community; Slovak Acad Sciences Inst Informatics; Computing And Informatics; 40; 6; 2-2022; 1345-1369
1335-9150
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.cai.sk/ojs/index.php/cai/article/view/2021_6_1345
info:eu-repo/semantics/altIdentifier/doi/10.31577/cai_2021_6_1345
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Slovak Acad Sciences Inst Informatics
publisher.none.fl_str_mv Slovak Acad Sciences Inst Informatics
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269749144190976
score 13.13397