Hash2Vec: Feature Hashing for Word Embeddings

Autores
Argerich, Luis; Cano, Matías J.; Torre Zaffaroni, Joaquín
Año de publicación
2016
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.
Sociedad Argentina de Informática e Investigación Operativa (SADIO)
Materia
Ciencias Informáticas
feature hashing
word embedding
Natural Language Processing
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/56977

id SEDICI_b6a750cf0868f0a19fa50351c730f57c
oai_identifier_str oai:sedici.unlp.edu.ar:10915/56977
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Hash2Vec: Feature Hashing for Word EmbeddingsArgerich, LuisCano, Matías J.Torre Zaffaroni, JoaquínCiencias Informáticasfeature hashingword embeddingNatural Language ProcessingIn this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.Sociedad Argentina de Informática e Investigación Operativa (SADIO)2016-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf33-40http://sedici.unlp.edu.ar/handle/10915/56977enginfo:eu-repo/semantics/altIdentifier/url/http://45jaiio.sadio.org.ar/sites/default/files/ASAI-10_0.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:38:51Zoai:sedici.unlp.edu.ar:10915/56977Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:38:51.705SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Hash2Vec: Feature Hashing for Word Embeddings
title Hash2Vec: Feature Hashing for Word Embeddings
spellingShingle Hash2Vec: Feature Hashing for Word Embeddings
Argerich, Luis
Ciencias Informáticas
feature hashing
word embedding
Natural Language Processing
title_short Hash2Vec: Feature Hashing for Word Embeddings
title_full Hash2Vec: Feature Hashing for Word Embeddings
title_fullStr Hash2Vec: Feature Hashing for Word Embeddings
title_full_unstemmed Hash2Vec: Feature Hashing for Word Embeddings
title_sort Hash2Vec: Feature Hashing for Word Embeddings
dc.creator.none.fl_str_mv Argerich, Luis
Cano, Matías J.
Torre Zaffaroni, Joaquín
author Argerich, Luis
author_facet Argerich, Luis
Cano, Matías J.
Torre Zaffaroni, Joaquín
author_role author
author2 Cano, Matías J.
Torre Zaffaroni, Joaquín
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
feature hashing
word embedding
Natural Language Processing
topic Ciencias Informáticas
feature hashing
word embedding
Natural Language Processing
dc.description.none.fl_txt_mv In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.
Sociedad Argentina de Informática e Investigación Operativa (SADIO)
description In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.
publishDate 2016
dc.date.none.fl_str_mv 2016-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/56977
url http://sedici.unlp.edu.ar/handle/10915/56977
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://45jaiio.sadio.org.ar/sites/default/files/ASAI-10_0.pdf
info:eu-repo/semantics/altIdentifier/issn/2451-7585
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv application/pdf
33-40
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260249001590784
score 13.13397