Search code examples
pythonloopsawkcapitalize

Capitalize specific indices of string using awk or python


I have an input file where each line contains 99 lowercase letters,

bccdddcdccddddddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbcbaabbbccbb 
bccdddcdcddddcddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbcbaabbbccbb 
bccdddcdcddddccdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbccacbbbccbb 
bccdddcdccdddccdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbccaaabbccbb 

I have a list of positions, for example p = [10, 14, 89, 99].

I'd like to capitalize the letters at these positions in my input file.

Desired output:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB 
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbccbB 
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbccbbB 
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB 

I'm using this awk command:

awk -vFS= -vOFS= '{$10=toupper($10)}1' input > output

But I'm not sure how to loop this over all the positions.


Solution

  • You can use a generator expression with .upper() and enumerate() to capitalize only the specified indices:

    p = [10, 14, 89, 99] # or use set([10, 14, 89, 99]) for faster lookup
    with open('in.txt') as file:
        for line in file:
            line = line.rstrip()
            result = ''.join(c.upper() if i + 1 in p else c for i, c in enumerate(line))
            print(result)
    

    This outputs:

    bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB
    bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbbccbB
    bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbbccbB
    bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB