I am trying to create a dummy variable to identify the next five observations after a selection of cutoffs. The first method in the code below works, but it looks a bit messy and I'd like to be able to adjust the number of observations I'm creating dummies for without typing out the same expression 30 times (usually a sign I'm doing something the hard way).
Every time I put a macro into the indexing, i.e.
[_n-`i']
I get the following error:
_= invalid name
r(198);
I'd be very grateful for some advice.
sysuse auto.dta, replace
global cutoffs 3299 4424 5104 5788 10371
This works
sort price
gen A=0
foreach x in $cutoffs {
replace A=1 if price==`x'
replace A=1 if price[_n-1]==`x'
replace A=1 if price[_n-2]==`x'
replace A=1 if price[_n-3]==`x'
replace A=1 if price[_n-4]==`x'
replace A=1 if price[_n-5]==`x'
}
This doesn't.
foreach x in $cutoffs {
forval `i' = 0/25 {
replace A=1 if price[_n-`i']==`x'
}
}
Any advice as to why?
In Stata terms no loops are needed here at all, except those tacit in generate
and replace
. You want to set a counter going each time immediately after you hit a cutoff, and then identify counter values between 1 and 5. Here's some technique:
sysuse auto.dta, clear
global cutoffs 3299,4424,5104,5788,10371
sort price
gen counter = 0 if inlist(price, $cutoffs)
replace counter = counter[_n-1] + 1 if missing(counter)
gen wanted = inrange(counter, 1, 5)
list price counter wanted
+---------------------------+
| price counter wanted |
|---------------------------|
1. | 3,291 . 0 |
2. | 3,299 0 0 |
3. | 3,667 1 1 |
4. | 3,748 2 1 |
5. | 3,798 3 1 |
|---------------------------|
6. | 3,799 4 1 |
7. | 3,829 5 1 |
8. | 3,895 6 0 |
9. | 3,955 7 0 |
10. | 3,984 8 0 |
|---------------------------|
11. | 3,995 9 0 |
12. | 4,010 10 0 |
13. | 4,060 11 0 |
14. | 4,082 12 0 |
15. | 4,099 13 0 |
|---------------------------|
16. | 4,172 14 0 |
17. | 4,181 15 0 |
18. | 4,187 16 0 |
19. | 4,195 17 0 |
20. | 4,296 18 0 |
|---------------------------|
21. | 4,389 19 0 |
22. | 4,424 0 0 |
23. | 4,425 1 1 |
24. | 4,453 2 1 |
25. | 4,482 3 1 |
|---------------------------|
26. | 4,499 4 1 |
27. | 4,504 5 1 |
28. | 4,516 6 0 |
29. | 4,589 7 0 |
30. | 4,647 8 0 |
|---------------------------|
31. | 4,697 9 0 |
32. | 4,723 10 0 |
33. | 4,733 11 0 |
34. | 4,749 12 0 |
35. | 4,816 13 0 |
|---------------------------|
36. | 4,890 14 0 |
37. | 4,934 15 0 |
38. | 5,079 16 0 |
39. | 5,104 0 0 |
40. | 5,172 1 1 |
|---------------------------|
41. | 5,189 2 1 |
42. | 5,222 3 1 |
43. | 5,379 4 1 |
44. | 5,397 5 1 |
45. | 5,705 6 0 |
|---------------------------|
46. | 5,719 7 0 |
47. | 5,788 0 0 |
48. | 5,798 1 1 |
49. | 5,799 2 1 |
50. | 5,886 3 1 |
|---------------------------|
51. | 5,899 4 1 |
52. | 6,165 5 1 |
53. | 6,229 6 0 |
54. | 6,295 7 0 |
55. | 6,303 8 0 |
|---------------------------|
56. | 6,342 9 0 |
57. | 6,486 10 0 |
58. | 6,850 11 0 |
59. | 7,140 12 0 |
60. | 7,827 13 0 |
|---------------------------|
61. | 8,129 14 0 |
62. | 8,814 15 0 |
63. | 9,690 16 0 |
64. | 9,735 17 0 |
65. | 10,371 0 0 |
|---------------------------|
66. | 10,372 1 1 |
67. | 11,385 2 1 |
68. | 11,497 3 1 |
69. | 11,995 4 1 |
70. | 12,990 5 1 |
|---------------------------|
71. | 13,466 6 0 |
72. | 13,594 7 0 |
73. | 14,500 8 0 |
74. | 15,906 9 0 |
+---------------------------+
In fact, your text says "the next five observations after" but your code implements not that only that, but the cutoff observation too. For the latter, use inrange(counter, 0, 5)
.
Understanding the principles explained here is crucial for this question.
For inrange()
and inlist()
see their help entries and/or this paper.
So, what did you do wrong?
This line
forval `i' = 0/25 {
in illegal unless you have previously defined the local macro i
(and rather odd style even then). You perhaps meant
forval i = 0/25 {
although where the 25 comes from, given your problem statement, is unclear to me. The error message isn't especially helpful, but Stata is struggling to make sense of code with a hole in it, given that the local macro implied by your code is not defined.