calc_bigrams_network.Rd
For a given labelled text, create and calculate the most frequently occurring bigrams (no stop words) for the given class(es).
calc_bigrams_network( x, target_col_name, text_col_name, filter_class = NULL, bigrams_prop )
x | A data frame with one or more columns: the column with the classes
(if |
---|---|
target_col_name | A string with the column name of the target variable.
Defaults to |
text_col_name | A string with the column name of the text variable. |
filter_class | A string or vector of strings with the name(s) of the
class(es) for which bigrams are to be created and counted. Defaults to
|
bigrams_prop | A numeric in (0, 100] indicating the percentage of the most frequent bigrams to keep. |
A data frame with three columns: first word of bigram; second word of bigram; and bigram count.
When supplying more than one class in filter_class
, the returned data
frame will NOT separate the results for the different classes. If
separation is desired, then run the function for each class separately or
do something like this:
# Assuming that the class and text columns are called "label" and # "feedback" respectively x %>% split(.$label) %>% purrr::map( ~ calc_bigrams_network(., target_col_name = NULL, text_col_name = "feedback", filter_class = NULL, bigrams_prop = 50) )
library(experienceAnalysis) books <- janeaustenr::austen_books() # Jane Austen books emma <- paste(books[books$book == "Emma", ], collapse = " ") # String with whole book pp <- paste(books[books$book == "Pride & Prejudice", ], collapse = " ") # String with whole book # Make data frame with books Emma and Pride & Prejudice x <- data.frame( text = c(emma, pp), book = c("Emma", "Pride & Prejudice") ) # Bigrams for both books calc_bigrams_network(x, target_col_name = "book", text_col_name = "text", filter_class = NULL, bigrams_prop = 3)#> word1 word2 n #> 1 4 4 16234 #> 2 2 2 13029 #> 3 miss woodhouse 162 #> 4 frank churchill 132 #> 5 miss fairfax 109 #> 6 miss bates 103 #> 7 lady catherine 100 #> 8 jane fairfax 96 #> 9 miss bingley 72 #> 10 miss bennet 60 #> 11 john knightley 56 #> 12 miss smith 51 #> 13 miss taylor 40 #> 14 sir william 38 #> 15 de bourgh 35 #> 16 miss darcy 34 #> 17 dear emma 31 #> 18 dear miss 31 #> 19 maple grove 31 #> 20 cried emma 27 #> 21 harriet smith 27 #> 22 robert martin 27 #> 23 colonel forster 26 #> 24 colonel fitzwilliam 25 #> 25 cried elizabeth 24 #> 26 dear sir 23 #> 27 miss lucas 23 #> 28 thousand pounds 23 #> 29 dear jane 22 #> 30 colonel campbell 21 #> 31 miss de 20 #> 32 frank churchill's 19 #> 33 box hill 18 #> 34 lady lucas 18 #> 35 replied elizabeth 18# Bigrams for Emma calc_bigrams_network(x, target_col_name = "book", text_col_name = "text", filter_class = "Emma", bigrams_prop = 3)#> word1 word2 n #> 1 4 4 16234 #> 2 miss woodhouse 162 #> 3 frank churchill 132 #> 4 miss fairfax 109 #> 5 miss bates 103 #> 6 jane fairfax 96 #> 7 john knightley 56 #> 8 miss smith 51 #> 9 miss taylor 40 #> 10 dear emma 31 #> 11 maple grove 31 #> 12 cried emma 27 #> 13 harriet smith 27 #> 14 robert martin 27 #> 15 dear miss 25 #> 16 colonel campbell 21 #> 17 frank churchill's 19# Bigrams for Pride & Prejudice calc_bigrams_network(x, target_col_name = "book", text_col_name = "text", filter_class = "Pride & Prejudice", bigrams_prop = 3)#> word1 word2 n #> 1 2 2 13029 #> 2 lady catherine 100 #> 3 miss bingley 72 #> 4 miss bennet 60 #> 5 sir william 38 #> 6 de bourgh 35 #> 7 miss darcy 34 #> 8 colonel forster 26 #> 9 colonel fitzwilliam 25 #> 10 cried elizabeth 24 #> 11 miss lucas 23 #> 12 miss de 20 #> 13 thousand pounds 20 #> 14 lady lucas 18 #> 15 replied elizabeth 18