Hash2Vec: Feature Hashing for Word Embeddings
- Autores
- Argerich, Luis; Cano, Matías J.; Torre Zaffaroni, Joaquín
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.
Sociedad Argentina de Informática e Investigación Operativa (SADIO) - Materia
-
Ciencias Informáticas
feature hashing
word embedding
Natural Language Processing - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-sa/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/56977
Ver los metadatos del registro completo
id |
SEDICI_b6a750cf0868f0a19fa50351c730f57c |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/56977 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Hash2Vec: Feature Hashing for Word EmbeddingsArgerich, LuisCano, Matías J.Torre Zaffaroni, JoaquínCiencias Informáticasfeature hashingword embeddingNatural Language ProcessingIn this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.Sociedad Argentina de Informática e Investigación Operativa (SADIO)2016-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf33-40http://sedici.unlp.edu.ar/handle/10915/56977enginfo:eu-repo/semantics/altIdentifier/url/http://45jaiio.sadio.org.ar/sites/default/files/ASAI-10_0.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:38:51Zoai:sedici.unlp.edu.ar:10915/56977Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:38:51.705SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Hash2Vec: Feature Hashing for Word Embeddings |
title |
Hash2Vec: Feature Hashing for Word Embeddings |
spellingShingle |
Hash2Vec: Feature Hashing for Word Embeddings Argerich, Luis Ciencias Informáticas feature hashing word embedding Natural Language Processing |
title_short |
Hash2Vec: Feature Hashing for Word Embeddings |
title_full |
Hash2Vec: Feature Hashing for Word Embeddings |
title_fullStr |
Hash2Vec: Feature Hashing for Word Embeddings |
title_full_unstemmed |
Hash2Vec: Feature Hashing for Word Embeddings |
title_sort |
Hash2Vec: Feature Hashing for Word Embeddings |
dc.creator.none.fl_str_mv |
Argerich, Luis Cano, Matías J. Torre Zaffaroni, Joaquín |
author |
Argerich, Luis |
author_facet |
Argerich, Luis Cano, Matías J. Torre Zaffaroni, Joaquín |
author_role |
author |
author2 |
Cano, Matías J. Torre Zaffaroni, Joaquín |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas feature hashing word embedding Natural Language Processing |
topic |
Ciencias Informáticas feature hashing word embedding Natural Language Processing |
dc.description.none.fl_txt_mv |
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications. Sociedad Argentina de Informática e Investigación Operativa (SADIO) |
description |
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/56977 |
url |
http://sedici.unlp.edu.ar/handle/10915/56977 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://45jaiio.sadio.org.ar/sites/default/files/ASAI-10_0.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
dc.format.none.fl_str_mv |
application/pdf 33-40 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842260249001590784 |
score |
13.13397 |