I have a problem in relation to a database related to dengue. I have in this database some variables, among them the variable "Cases", which indicates the number of dengue cases in a given period. But I want to apply the logistic regression model to these data, so the idea is to make this variable with integers, to become a binary variable, that is, for places that did not have dengue cases in that period, I want to put 0 in place of the quantity that I already have, and for places that have had cases, put 1. As there are 35628 lines, I want to do it in an automated way, to avoid doing it, manually. Would anyone have any idea how to proceed in order to put this idea into practice? I'm new to programming and I'm trying to implement it in the R language. If they know of a package that does this, it helps a lot. Each neighborhood is conditioned to a number.
I appreciate any help and thank you very much.
neighborhood | Dates | Cases | precipitation | Temperature |
---|---|---|---|---|
0 | Jan/14 | 10 | 149,6 | 33,25 |
1 | Fev/14 | 0 | 254 | 30,1 |
2 | Mar/14 | 6 | 150 | 25,4 |
3 | Apr/14 | 0 | 244,1 | 32,5 |
4 | May/14 | 3 | 44,3 | 33,2 |
I appreciate any help and thank you very much.
Pick from among
dat$CasesBin1 <- (dat$Cases > 0)
dat$CasesBin2 <- +(dat$Cases > 0)
dat
# neighborhood Dates Cases precipitation Temperature CasesBin1 CasesBin2
# 1 0 Jan/14 10 149.6 33.25 TRUE 1
# 2 1 Fev/14 0 254.0 30.10 FALSE 0
# 3 2 Mar/14 6 150.0 25.40 TRUE 1
# 4 3 Apr/14 0 244.1 32.50 FALSE 0
# 5 4 May/14 3 44.3 33.20 TRUE 1
In R at least, most logistic regression tools I've used work fine with either integer
(0/1) or logical
, but you may need to verify with the tools you will use.
Data:
dat <- structure(list(neighborhood = 0:4, Dates = c("Jan/14", "Fev/14", "Mar/14", "Apr/14", "May/14"), Cases = c(10L, 0L, 6L, 0L, 3L), precipitation = c(149.6, 254, 150, 244.1, 44.3), Temperature = c(33.25, 30.1, 25.4, 32.5, 33.2)), class = "data.frame", row.names = c(NA, -5L))
In [13]: dat
Out[13]:
neighborhood Dates Cases precipitation Temperature
0 0 Jan/14 10 149.6 33.25
1 1 Fev/14 0 254.0 30.10
2 2 Mar/14 6 150.0 25.40
3 3 Apr/14 0 244.1 32.50
4 4 May/14 3 44.3 33.20
In [17]: dat['CasesBin1'] = dat['Cases'].apply(lambda x: (x > 0))
In [18]: dat['CasesBin2'] = dat['Cases'].apply(lambda x: int(x > 0))
In [19]: dat
Out[19]:
neighborhood Dates Cases ... Temperature CasesBin1 CasesBin2
0 0 Jan/14 10 ... 33.25 True 1
1 1 Fev/14 0 ... 30.10 False 0
2 2 Mar/14 6 ... 25.40 True 1
3 3 Apr/14 0 ... 32.50 False 0
4 4 May/14 3 ... 33.20 True 1
[5 rows x 7 columns]
Data:
In [11]: js
Out[11]: '[{"neighborhood":0,"Dates":"Jan/14","Cases":10,"precipitation":149.6,"Temperature":33.25},{"neighborhood":1,"Dates":"Fev/14","Cases":0,"precipitation":254,"Temperature":30.1},{"neighborhood":2,"Dates":"Mar/14","Cases":6,"precipitation":150,"Temperature":25.4},{"neighborhood":3,"Dates":"Apr/14","Cases":0,"precipitation":244.1,"Temperature":32.5},{"neighborhood":4,"Dates":"May/14","Cases":3,"precipitation":44.3,"Temperature":33.2}]'
In [12]: dat = pd.read_json(js)