Search code examples
pythonreplacesplitlinereadline

How to readline and access to specific words and change word on the samefile on Python


Firstly, I tried to find the "ifrs_Revenue" and change the word that is located next to "ifrs_Revenue", but it failed.

f = open(input_path +'/CIS' + "/16_1분기보고서_03_포괄손익계산서_연결_2212.txt")
while line: 
    if "ifrs_Revenue" in line:
        s_line = line.split("\t") 
        idx = s_line.index("ifrs_Revenue") 
        value = s_line[idx+1] 
        value = value.replace(value,'매출액') 
        break    
    line = f.readline()

Then, I found another way to replace specific words in the same file at once.

def inplace_change(filename, old_string, new_string):
   
    with open(filename) as f:
        s = f.read()
        if old_string not in s:
            print('"{old_string}" not found in {filename}.'.format(**locals()))
            return

    
    with open(filename, 'w') as f:
        print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
        s = s.replace(old_string, new_string)
        f.write(s)

b_list = os.listdir(input_path +'/CIS')
for blist in b_list:
    for old, new in zip(['   지배기업의 소유주에게 귀속되는 당기순이익(손실)','수익(매출액)', '영업수익', '영업이익(손실)', '관리비및판매비', '영업관리비용(수익)','   지배기업의 소유주지분'   ],['당기순이익(지배)', '매출액', '매출액','영업이익', '판매비와관리비', '판매비와관리비','당기순이익(지배)'   ]):
        inplace_change(input_path +'/CIS'+ '/' + blist,  old_string= old, new_string= new)  
        break

What I want is to uniformly change the word next to a specific word, but no matter how much I searched, I couldn't find a way, so I came here. I am a non-English speaking resident, so I ask for your understanding using a translator.

I am attaching a picture to help you understand. Non-English words are Korean: Picture


Solution

  • I made a simple example file that mimics the data you are using, all data is separated by tab characters ("\t"):

    col1    col2    col3
    randomwords ifrs_Revenue    replaceme
    morerandomwords ifrs_CostOfSales    this_should_stay_the_same
    asdfasdfasdf    ifrs_Revenue    alsoreplaceme
    jajajajajaja    ifrsGrossProfit this_should_not_be_replaced
    

    I then use the pandas module to search through and find all locations where "col2" == "ifrs_Revenue". In your case, you will replace 'col2' with the name of your column. Same goes for "col3", you want to replace this with the column name you are replacing. The code is as follows:

    import pandas as pd
    
    df = pd.read_csv("example.txt", sep="\t")  #  read in data
                                               #  NOTE: make sure to replace "example.txt" with your own filename
    print(df.head())
    
    mask = df.col2 == "ifrs_Revenue"  # create mask that finds all rows with "ifrs_Revenue"
    
    df.loc[mask, "col3"] = "REPLACED_VALUE"  # "REPLACED_VALUE" will be the valie you want to use to replace
                                             # also replace "col3" with the column you are replacing
    
    print("=" * 50)
    print(df.head())
    
    df.to_csv("results.tsv", sep="\t")  # this saves the results, change "results.tsv" to be whatever you want the save to be
    

    These are the results:

        col1    col2    col3
    0   randomwords ifrs_Revenue    REPLACED_VALUE
    1   morerandomwords ifrs_CostOfSales    this_should_stay_the_same
    2   asdfasdfasdf    ifrs_Revenue    REPLACED_VALUE
    3   jajajajajaja    ifrsGrossProfit this_should_not_be_replaced
    

    Let me know if you need any clarification!