Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community
- Autores
- Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs.
Fil: Da Rocha Araujo, Leonardo Henrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Rodríguez, Guillermo Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Vidal, Santiago Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina
Fil: Marcos, Claudia Andrea. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; Argentina
Fil: Pereira Dos Santos, Rodrigo. Universidade Federal do Estado do Rio de Janeiro; Brasil - Materia
-
APIS
DOCUMENTATION
OPENAPI
RESTFUL WEB SERVICES
TOPIC COHERENCE
TOPIC MODELING - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/215064
Ver los metadatos del registro completo
id |
CONICETDig_3b51fd24ee467ede3dda339967678da9 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/215064 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer CommunityDa Rocha Araujo, Leonardo HenriqueRodríguez, Guillermo HoracioVidal, Santiago AgustínMarcos, Claudia AndreaPereira Dos Santos, RodrigoAPISDOCUMENTATIONOPENAPIRESTFUL WEB SERVICESTOPIC COHERENCETOPIC MODELINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs.Fil: Da Rocha Araujo, Leonardo Henrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Rodríguez, Guillermo Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Vidal, Santiago Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Marcos, Claudia Andrea. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; ArgentinaFil: Pereira Dos Santos, Rodrigo. Universidade Federal do Estado do Rio de Janeiro; BrasilSlovak Acad Sciences Inst Informatics2022-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/215064Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo; Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community; Slovak Acad Sciences Inst Informatics; Computing And Informatics; 40; 6; 2-2022; 1345-13691335-9150CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.cai.sk/ojs/index.php/cai/article/view/2021_6_1345info:eu-repo/semantics/altIdentifier/doi/10.31577/cai_2021_6_1345info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T10:02:17Zoai:ri.conicet.gov.ar:11336/215064instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 10:02:17.707CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
title |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
spellingShingle |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community Da Rocha Araujo, Leonardo Henrique APIS DOCUMENTATION OPENAPI RESTFUL WEB SERVICES TOPIC COHERENCE TOPIC MODELING |
title_short |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
title_full |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
title_fullStr |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
title_full_unstemmed |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
title_sort |
Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community |
dc.creator.none.fl_str_mv |
Da Rocha Araujo, Leonardo Henrique Rodríguez, Guillermo Horacio Vidal, Santiago Agustín Marcos, Claudia Andrea Pereira Dos Santos, Rodrigo |
author |
Da Rocha Araujo, Leonardo Henrique |
author_facet |
Da Rocha Araujo, Leonardo Henrique Rodríguez, Guillermo Horacio Vidal, Santiago Agustín Marcos, Claudia Andrea Pereira Dos Santos, Rodrigo |
author_role |
author |
author2 |
Rodríguez, Guillermo Horacio Vidal, Santiago Agustín Marcos, Claudia Andrea Pereira Dos Santos, Rodrigo |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
APIS DOCUMENTATION OPENAPI RESTFUL WEB SERVICES TOPIC COHERENCE TOPIC MODELING |
topic |
APIS DOCUMENTATION OPENAPI RESTFUL WEB SERVICES TOPIC COHERENCE TOPIC MODELING |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs. Fil: Da Rocha Araujo, Leonardo Henrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina Fil: Rodríguez, Guillermo Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina Fil: Vidal, Santiago Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina Fil: Marcos, Claudia Andrea. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; Argentina Fil: Pereira Dos Santos, Rodrigo. Universidade Federal do Estado do Rio de Janeiro; Brasil |
description |
OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2 000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developers needs. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-02 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/215064 Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo; Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community; Slovak Acad Sciences Inst Informatics; Computing And Informatics; 40; 6; 2-2022; 1345-1369 1335-9150 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/215064 |
identifier_str_mv |
Da Rocha Araujo, Leonardo Henrique; Rodríguez, Guillermo Horacio; Vidal, Santiago Agustín; Marcos, Claudia Andrea; Pereira Dos Santos, Rodrigo; Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community; Slovak Acad Sciences Inst Informatics; Computing And Informatics; 40; 6; 2-2022; 1345-1369 1335-9150 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.cai.sk/ojs/index.php/cai/article/view/2021_6_1345 info:eu-repo/semantics/altIdentifier/doi/10.31577/cai_2021_6_1345 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Slovak Acad Sciences Inst Informatics |
publisher.none.fl_str_mv |
Slovak Acad Sciences Inst Informatics |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269749144190976 |
score |
13.13397 |