I was studying the quanteda package from R and I just could not find from the documents what the variable called Types that is returned by summary(immig_corp) means.
require(quanteda)
require(readtext)
Now I create the corpus:
immig_corp <- corpus(data_char_ukimmig2010,
docvars = data.frame(party = names(data_char_ukimmig2010)))
Now I would like to display some information about the corpus I have just created. Types is one of the generic attributes always given by the summary(corpus).
summary(immig_corp)
This bit returns me the following:
Corpus consisting of 9 documents:
Text Types Tokens Sentences party
BNP 1125 3280 88 BNP
Coalition 142 260 4 Coalition
Conservative 251 499 15 Conservative
Greens 322 679 21 Greens
Labour 298 683 29 Labour
LibDem 251 483 14 LibDem
PC 77 114 5 PC
SNP 88 134 4 SNP
UKIP 346 723 27 UKIP
Let's just concentrate on immig_corp <- corpus(data_char_ukimmig2010)
. This returns the following:
Corpus consisting of 9 documents:
Text Types Tokens Sentences
BNP 1125 3280 88
Coalition 142 260 4
Conservative 251 499 15
Greens 322 679 21
Labour 298 683 29
LibDem 251 483 14
PC 77 114 5
SNP 88 134 4
UKIP 346 723 27
Now Text
is the document name. Sentences
is the number of sentences in the document. Tokens
is the number of tokens in the text and Types
is the number of unique tokens in the text. So for BNP there are 1125 unique tokens, 3280 tokens and 88 sentences.
You can recreate the counts as follows:
# Sentences
nsentence(immig_corp)
BNP Coalition Conservative Greens Labour LibDem PC SNP UKIP
88 4 15 21 29 14 5 4 27
# Tokens
ntoken(immig_corp)
BNP Coalition Conservative Greens Labour LibDem PC SNP UKIP
3280 260 499 679 683 483 114 134 723
# Types
ntype(immig_corp)
BNP Coalition Conservative Greens Labour LibDem PC SNP UKIP
1125 142 251 322 298 251 77 88 346