I have a text file, which contains only numbers.
For example:
2001 31110
199910 311
Its layout can be explained as follows:
1~4th numbers : Year
5~6th numbers : Month
7~8th numbers : Day
9th number : Sex
10th number : Married
However, I can't decide how to import this file into Stata.
For instance, if I use the command:
import delimited input.txt, delimiter(??)
What should I write in delimiter
?
I don't necessarily need to use the above. I just want to import the data using whatever method.
The answer depends on what you want to do with the data later.
My understanding is that the spaces indicate a single digit for date-related numbers and that in the text file, only month or day can be single digit but not both. In addition, sex
and married
are binary indicators taking values 0
and 1
.
Assuming the above are correct and the data below are included in a file data.txt
:
2001 31110
199910 311
1983 41201
2012121500
Here's one way to do it:
clear
import delimited data.txt, delimiter(" ") stringcols(_all)
list
+--------------------+
| v1 v2 |
|--------------------|
1. | 2001 31110 |
2. | 199910 311 |
3. | 1983 41201 |
4. | 2012121500 |
+--------------------+
replace v2 = "0" + v2 if v2 != ""
generate v3 = v1 + v2
generate year = substr(v3, 1, 4)
generate month = substr(v3, 5, 2)
generate day = substr(v3, 7, 2)
generate date = substr(v3, 1, 8)
generate sex = substr(v3, 9, 1)
generate married = substr(v3, 10, 1)
list
+----------------------------------------------------------------------------------+
| v1 v2 v3 year month day date sex married |
|----------------------------------------------------------------------------------|
1. | 2001 031110 2001031110 2001 03 11 20010311 1 0 |
2. | 199910 0311 1999100311 1999 10 03 19991003 1 1 |
3. | 1983 041201 1983041201 1983 04 12 19830412 0 1 |
4. | 2012121500 2012121500 2012 12 15 20121215 0 0 |
+----------------------------------------------------------------------------------+
You basically import everything in a maximum of two string variables, with a single space " "
acting as a separator. The single-digit months or days are changed to two digits by adding a 0
at the front. Then, after you extract the relevant parts of the strings using the substr()
function, you can simply convert the resulting variables to numeric as needed.
For example:
destring year month day sex married, replace
generate date2 = daily(date, "YMD")
format date2 %tdDD-NN-CCYY
. list date2
+------------+
| date2 |
|------------|
1. | 11-03-2001 |
2. | 03-10-1999 |
3. | 12-04-1983 |
4. | 15-12-2012 |
+------------+
If in your text file both month and day contain single digits, you follow the same logic as above but you will need to deal with a third variable as well after you import the data.