tagsets {NLP} | R Documentation |
Tag sets frequently used in Natural Language Processing.
Penn_Treebank_POS_tags Brown_POS_tags Universal_POS_tags Universal_POS_tags_map
Penn_Treebank_POS_tags
and Brown_POS_tags
provide,
respectively, the Penn Treebank POS tags
(https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2)
and the POS tags used for the Brown corpus
(http://www.hit.uib.no/icame/brown/bcm.html),
both as data frames with the following variables:
a character vector with the POS tags
a character vector with short descriptions of the tags
a character vector with examples for the tags
Universal_POS_tags
provides the universal POS tagset introduced
by Slav Petrov, Dipanjan Das, and Ryan McDonald
(https://arxiv.org/abs/1104.2086), as a data frame with character
variables entry
and description
.
Universal_POS_tags_map
is a named list of mappings from
language and treebank specific POS tagsets to the universal POS tags,
with elements named en-ptb and en-brown giving the
mappings, respectively, for the Penn Treebank and Brown POS tags.
https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, http://www.hit.uib.no/icame/brown/bcm.html, https://code.google.com/p/universal-pos-tags/.
## Penn Treebank POS tags dim(Penn_Treebank_POS_tags) ## Inspect first 20 entries: write.dcf(head(Penn_Treebank_POS_tags, 20L)) ## Brown POS tags dim(Brown_POS_tags) ## Inspect first 20 entries: write.dcf(head(Brown_POS_tags, 20L)) ## Universal POS tags Universal_POS_tags ## Available mappings to universal POS tags names(Universal_POS_tags_map)