I have a .txt file with relatively odd delimiters. Data looks something like this:
|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,||
|ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,||
|ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|
So there is the issue of having some variables without any pipes like the sixth variable here which is just an amount without pipes and some variables only come without pipes if they are empty like the fourth variable here which is either ,,
or ,|y|,
. Some of the variables also have a comma, so I can not use a comma as the delimiter. So basically there is two problems:
I am looking for a way to solve this in Stata. Does anyone have an idea how to?
If the full dataset is messier than this example, I really don't want to know. But this seems to make some sense of it.
* Example generated by -dataex-. To install: ssc install dataex
clear
input str100 whatever
"|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,||"
"|ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,||"
"|ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|"
end
gen work = whatever
replace work = subinstr(work, ",,", ",||,", .)
forval j = 1/5 {
gen work`j' = substr(work, 1, strpos(work, "|,") + 1)
replace work = subinstr(work, work`j', "", 1)
}
gen work6 = substr(work, 1, strpos(work, ","))
replace work = subinstr(work, work6, "", 1)
forval j = 7/8 {
gen work`j' = substr(work, 1, strpos(work, "|,") + 1)
replace work = subinstr(work, work`j', "", 1)
}
gen work9 = work
drop work
forval j = 1/9 {
replace work`j' = trim(subinstr(work`j', "|", "", .))
replace work`j' = substr(work`j', 1, length(work`j') - 1) if substr(work`j', -1, 1) == ","
}
list
+-----------------------------------------------------------------------------------+
1. | whatever |
| |ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,|| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC4 | Name1 | NameRaw1 | y | XY1 | 10000.0 | |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| FOURTH QUARTER REPORT | |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
2. | whatever |
| |ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,|| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC5 | Name2, extraname | NameRaw2 | | XY2 | 266539.0 | pac |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| MID-YEAR REPORT | |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
3. | whatever |
| |ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC6 | Name3 | NameRaw3 | y | X,Y3 | 60000.0 | name |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| YEAR-END REPORT | XYZ |
+-----------------------------------------------------------------------------------+