Empirical analysis on OpenAPI topic exploration and discovery to support the developer community

Autores: Rocha Araujo, Leonardo; Rodríguez, Guillermo Horacio; Vidal, Santiago; Marcos, Claudia A.; Santos, Rodrigo P.
Año de publicación: 2022
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
Microservices
OpenAPI
Migration
Legacy systems
Topic modeling
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/151639

Acceder

id	SEDICI_c266c1f7b4c57d3e34202a9e9fa10581
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/151639
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Empirical analysis on OpenAPI topic exploration and discovery to support the developer communityRocha Araujo, LeonardoRodríguez, Guillermo HoracioVidal, SantiagoMarcos, Claudia A.Santos, Rodrigo P.Ciencias InformáticasMicroservicesOpenAPIMigrationLegacy systemsTopic modelingOpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs.Sociedad Argentina de Informática e Investigación Operativa2022-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionResumenhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf68-69http://sedici.unlp.edu.ar/handle/10915/151639enginfo:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/301/250info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-02-26T11:26:12Zoai:sedici.unlp.edu.ar:10915/151639Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-02-26 11:26:12.889SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
title	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
spellingShingle	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community Rocha Araujo, Leonardo Ciencias Informáticas Microservices OpenAPI Migration Legacy systems Topic modeling
title_short	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
title_full	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
title_fullStr	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
title_full_unstemmed	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
title_sort	Empirical analysis on OpenAPI topic exploration and discovery to support the developer community
dc.creator.none.fl_str_mv	Rocha Araujo, Leonardo Rodríguez, Guillermo Horacio Vidal, Santiago Marcos, Claudia A. Santos, Rodrigo P.
author	Rocha Araujo, Leonardo
author_facet	Rocha Araujo, Leonardo Rodríguez, Guillermo Horacio Vidal, Santiago Marcos, Claudia A. Santos, Rodrigo P.
author_role	author
author2	Rodríguez, Guillermo Horacio Vidal, Santiago Marcos, Claudia A. Santos, Rodrigo P.
author2_role	author author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Microservices OpenAPI Migration Legacy systems Topic modeling
topic	Ciencias Informáticas Microservices OpenAPI Migration Legacy systems Topic modeling
dc.description.none.fl_txt_mv	OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs. Sociedad Argentina de Informática e Investigación Operativa
description	OpenAPI has become a dominant standard for documentation in the service-oriented software industry. OpenAPI is used in many analysis and reengineering approaches for RESTful service and microservice-based systems. An OpenAPI document has several components that are usually filled by humans using natural language (e.g. description of a certain functionality). Thus, subjectivity may lead to inconsistencies and ambiguities. Understanding what an API does is a challenging question. As a consequence, this issue could hinder developers from identifying the functionality of APIs, after reading all its components. Along this line, we argue that developers will be provided with supportive tools to find those APIs that better suit their needs. In this paper, we propose a step towards creating these kinds of tools by empirically analyzing a set of 2,000 OpenAPI documents with the goal of extracting the main topics of an API using three topic modeling algorithms. To address this issue, we focus on three tasks: i) determine which component of an OpenAPI document provides the most meaningful information, ii) compare three state-of-the-art topic modeling algorithms, and iii) determine the optimal number of topics to represent an API. Our findings show that the best results could be obtained from the Description component by using the Non-negative Matrix Factorization (NMF) or Latent Semantic Indexing (LSI) algorithms. To help developers find services in the OpenAPI directory, we also propose a prototype tool to explore the OpenAPI documents and analyze extracted topics to assess if the APIs meet developer’s needs.
publishDate	2022
dc.date.none.fl_str_mv	2022-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Resumen http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/151639
url	http://sedici.unlp.edu.ar/handle/10915/151639
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/301/250 info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 68-69
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1858282334884724736
score	12.665996

Empirical analysis on OpenAPI topic exploration and discovery to support the developer community

Publicaciones similares