bind_tf_idf {tidytext} | R Documentation |
Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf to the dataset. Each of these values are added as columns.
bind_tf_idf(tbl, term_col, document_col, n_col) bind_tf_idf_(tbl, term_col, document_col, n_col)
tbl |
A tidy text dataset with one-row-per-term-per-document |
term_col |
Column containing terms |
document_col |
Column containing document IDs |
n_col |
Column containing document-term counts |
tf_idf
is given bare names, while tf_idf_
is given strings and is therefore suitable for programming with.
If the dataset is grouped, the groups are ignored but are retained.
The dataset must have exactly one row per document-term combination for this to work.
library(dplyr) library(janeaustenr) book_words <- austen_books() %>% unnest_tokens(word, text) %>% count(book, word, sort = TRUE) %>% ungroup() book_words # find the words most distinctive to each document book_words %>% bind_tf_idf(word, book, n) %>% arrange(desc(tf_idf))