stem.corpus {textreg} | R Documentation |
Given a tm
-package VCorpus of original text,
returns a VCorpus of stemmed text with '+' appended to all stemmed words.
stem.corpus(corpus, verbose = TRUE)
corpus |
Original text |
verbose |
True means print out text progress bar so you can watch progress. |
This is non-optimized code that is expensive to run. First the stemmer chops words. Then this method passes through and adds a "+" to all chopped words, and builds a list of stems. Finally, the method passes through and adds a "+" to all stems found without a suffix.
So, e.g., goblins and goblin will both be transformed to "goblin+".
Adding the '+' makes stemmed text more readible.
Code based on code from Kevin Wu, UC Berkeley Undergrad Thesis 2014.
Requires, via the tm package, the SnowballC package.
Warning: Do not use this on a textreg.corpus
object. Do to text before
building the textreg.corpus
object.
library( tm ) texts <- c("texting goblins the dagger", "text these goblins", "texting 3 goblins appl daggers goblining gobble") corpus <- VCorpus(VectorSource(texts)) stemmed_corpus<-stem.corpus(corpus, verbose=FALSE) inspect( stemmed_corpus[[2]] )