I'm new to R Studio. For class, I have pulled the US Census 2016 election dataset and want to run a series of T-Tests over the data set. Some specifics on the dataset. First, the data is coded - 1 through 4 - representing a state of Citizenship. I want to see if various factors affect the likelihood of voting (either a 1=Yes or 2=No).
Here's the code:
factor <- c("Age", "Fathers_country_of_birth", "Mothers_country_of_birth","Highest_level_of_School_completed", "Country_of_birth")
citizen <- c("NATIVE, BORN IN THE UNITED STATES", "NATIVE, BORN IN PUERTO RICO OR OTHER U.S. ISLAND AREAS", "NATIVE, BORN ABROAD OF AMERICAN PARENT OR PARENTS", "FOREIGN BORN, U.S. CITIZEN BY NATURALIZATION")
for (f in factor) {
print(f)
for (i in 1:4){
print(paste("Citizenship is", citizen[i] ))
query <- paste("select * from result2 where Citizenship = ",i)
sample <- sqldf(query)
print(
(t.test(f ~ Vote_in_Election, data=sample, var.equal = FALSE) ) )
} }
And it throws a 'variable lengths' error
> [1] "Age" [1] "Citizenship is NATIVE, BORN IN THE UNITED STATES" Show
> Traceback Error in model.frame.default(formula = f ~ Vote_in_Election,
> data = sample) : variable lengths differ (found for
> 'Vote_in_Election')
If I take out the outer loop I can run it just fine, I have to put in the values in 'factor' one by one, of course.
Running R Studio Version 1.1.463, R is 3.5.2 on Windows 10.
Because there will be different rows of data when I iterate over i, I tried setting paired = FALSE and it still yelled at me.
I've look through SO but haven't found the solution. What am I missing?
To dynamically build a formula, you need to cast a string version of formula inside as.formula
:
t.test(as.formula(paste(f, "~ Vote_in_Election")), data=sample, var.equal = FALSE)
Or use reformulate
:
t.test(reformulate("Vote_in_Election", response=f), data=sample, var.equal = FALSE)