Read file and make every character in a separate column

I have a huge file (square data file of a sequence alignment) and want to put every position into a separate column, but readr::read_delim for instance can´t take empty delimiters, and for readr::read_fwf it seems that you need to specify every position? I have more than 35000 positions.

Example input file:

EIGMEYRTVSGVAGPLVILDKVKGPKYQEI..... EIGMEYRTVSGVAGPLVILDKVKGPKYQEI..... EIGMEYRTVSGVAGPLVILDKVKGPKYQEI.....

Output: col1 col2 col3 col4 col5 col6.... E I G M E Y..... E I G M E Y..... E I G M E Y.....

Solution

readr::read_fwf has a few different ways you can specify the field widths using the col_positions argument. Here's a test file, test.txt:

Hdvsmf
Dfhjds
Dfhjkd
Dfklds
Dkjffd
Dsfjkd
fkldsf

Assuming you know how many fields there are, you can either specify a vector of field widths (1 character wide, 5 times because there are five fields in this test file):

read_fwf('test.txt', col_positions = fwf_widths(rep(1, 5)))

This is probably easier than specifying star and end positions for each field. You can also provide a character vector of column names to fwf_widths, like:

fwf_widths(rep(1, 5), paste0('col', 1:5))

If you don't know how many fields you have, you can also bring it in as a single column and then use tidyr::separate to extract your columns (the sep argument can take a vector of numeric positions, not just delimiters):

# a data frame with everything in one column named blah
df1 = read_csv('test.txt', col_names = 'blah')
field_count = length(df1$blah[1]) # assuming the fields are all same length!

# nb: parentheses for field_count - 1 are super important! you will spend forever debugging this if you miss it
df1 = df1 %>% separate(blah, into = paste0('col', 1:field_count), sep = 1:(field_count - 1))