For a given text and class, calculate indicators of "net positive" and "net negative" sentiment using different sentiment dictionaries.

calc_net_sentiment_per_tag(x, target_col_name = NULL, text_col_name)

Arguments

x

A data frame with two columns: the column with the classes; and the column with the text. Any other columns will be ignored.

target_col_name

A string with the column name of the target variable. Defaults to NULL.

text_col_name

A string with the column name of the text variable.

Value

A data frame with four or five columns: the column with the classes (if any); the net sentiment; the method (dictionary) used; the total negative sentiment; and the total positive sentiment. The last two columns are NA for AFINN (see Note).

Details

The dictionaries of Minging and Liu (2004) and Mohammad and Turney (2013; known as NRC) assign sentiment characterizations to words, e.g. "negative" or "positive". The "net" sentiment is therefore calculated as "sum of words with a positive sentiment minus sum of words with a negative sentiment". On the other hand, AFINN, the dictionary of Nielsen (2013), works with sentiment scores and so the net sentiment is their sum. See Silge and Robinson (2017).

References

Hu M. & Liu B. (2004). Mining and summarizing customer reviews. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004), Seattle, Washington, USA, Aug 22-25, 2004.

Mohammad S.M. & Turney P.D. (2013). Crowdsourcing a Word–Emotion Association Lexicon. Computational Intelligence, 29(3):436-465.

Nielsen F.A. (2013). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages 718 in CEUR Workshop Proceedings 93-98. https://arxiv.org/abs/1103.2903.

Silge J. & Robinson D. (2017). Text Mining with R: A Tidy Approach. Sebastopol, CA: O’Reilly Media. ISBN 978-1-491-98165-8.

Examples

library(experienceAnalysis) books <- janeaustenr::austen_books() # Jane Austen books emma <- paste(books[books$book == "Emma", ], collapse = " ") # String with whole book pp <- paste(books[books$book == "Pride & Prejudice", ], collapse = " ") # String with whole book # Make data frame with books Emma and Pride & Prejudice x <- data.frame( text = c(emma, pp), book = c("Emma", "Pride & Prejudice") ) # Net sentiment in each book for each dictionary, sorted in descending order calc_net_sentiment_per_tag(x, target_col_name = "book", text_col_name = "text")
#> # A tibble: 6 x 5 #> book sentiment method negative positive #> <chr> <dbl> <chr> <dbl> <dbl> #> 1 Emma 5837 AFINN NA NA #> 2 Pride & Prejudice 3955 AFINN NA NA #> 3 Emma 2348 Minging & Liu 4809 7157 #> 4 Emma 4998 NRC 4473 9471 #> 5 Pride & Prejudice 1400 Minging & Liu 3652 5052 #> 6 Pride & Prejudice 3802 NRC 3641 7443
# Net sentiment in each book for each dictionary, by dictionary and book name calc_net_sentiment_per_tag(x, target_col_name = "book", text_col_name = "text") %>% dplyr::arrange(method, book)
#> # A tibble: 6 x 5 #> book sentiment method negative positive #> <chr> <dbl> <chr> <dbl> <dbl> #> 1 Emma 5837 AFINN NA NA #> 2 Pride & Prejudice 3955 AFINN NA NA #> 3 Emma 2348 Minging & Liu 4809 7157 #> 4 Pride & Prejudice 1400 Minging & Liu 3652 5052 #> 5 Emma 4998 NRC 4473 9471 #> 6 Pride & Prejudice 3802 NRC 3641 7443