Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
- Autores
- Rocha Araujo, Leonardo; Rodríguez, Guillermo Horacio; Vidal, Santiago; Marcos, Claudia A.; Santos, Rodrigo P.
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
Microservices
OpenAPI
Migration
Legacy systems
Topic modeling - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/151639
Ver los metadatos del registro completo
id |
SEDICI_c266c1f7b4c57d3e34202a9e9fa10581 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/151639 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer communityRocha Araujo, LeonardoRodríguez, Guillermo HoracioVidal, SantiagoMarcos, Claudia A.Santos, Rodrigo P.Ciencias InformáticasMicroservicesOpenAPIMigrationLegacy systemsTopic modelingOpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs.Sociedad Argentina de Informática e Investigación Operativa2022-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionResumenhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf68-69http://sedici.unlp.edu.ar/handle/10915/151639enginfo:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/301/250info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T11:11:05Zoai:sedici.unlp.edu.ar:10915/151639Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 11:11:05.727SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
title |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
spellingShingle |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community Rocha Araujo, Leonardo Ciencias Informáticas Microservices OpenAPI Migration Legacy systems Topic modeling |
title_short |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
title_full |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
title_fullStr |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
title_full_unstemmed |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
title_sort |
Empirical analysis on OpenAPI topic exploration and discovery to support the developer community |
dc.creator.none.fl_str_mv |
Rocha Araujo, Leonardo Rodríguez, Guillermo Horacio Vidal, Santiago Marcos, Claudia A. Santos, Rodrigo P. |
author |
Rocha Araujo, Leonardo |
author_facet |
Rocha Araujo, Leonardo Rodríguez, Guillermo Horacio Vidal, Santiago Marcos, Claudia A. Santos, Rodrigo P. |
author_role |
author |
author2 |
Rodríguez, Guillermo Horacio Vidal, Santiago Marcos, Claudia A. Santos, Rodrigo P. |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Microservices OpenAPI Migration Legacy systems Topic modeling |
topic |
Ciencias Informáticas Microservices OpenAPI Migration Legacy systems Topic modeling |
dc.description.none.fl_txt_mv |
OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs. Sociedad Argentina de Informática e Investigación Operativa |
description |
OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Resumen http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/151639 |
url |
http://sedici.unlp.edu.ar/handle/10915/151639 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/301/250 info:eu-repo/semantics/altIdentifier/issn/2451-7496 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 68-69 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842260614952517632 |
score |
13.13397 |