When I export public weather data from https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2017/CRNS0101-05-2017-TX_Austin_33_NW.txt, as soon as solar radiation > 9, all of my data for the remaining columns gets lumped into a single column, as shown below. I have tried uploading as txt and csv and the problem still exists in excel, sheets, and dataprep.
Why is this happening?
Is there a programmatic way to fix this so that the data populates as intended, with 1 value per column?
It is likely because the initial data structure is not detected correctly. This can happen if the first rows of your dataset have a different structure than the remaining rows.
To solve this problem in Dataprep, you can indicate how the dataset should be structured by following these steps:
splitrows col: column1 on: '\n'
/\s+/
)
splitpatterns col: column1 type: on on: /\s+/ limit: 22
(you can copy and paste the following command inside the search input when you create a new step)
Note: it is also possible to prevent the initial structure detection when importing a dataset. See https://cloud.google.com/dataprep/docs/html/Remove-Initial-Structure_136154971