Search code examples
pythondata-sciencesearch-engine

text to list of tuples


I have a text dataset. the content of the text is looks like as follows.

.I 1\n.T\nPreliminary Report-International Algebraic Language\n.B\nCACM December,
 .I 2\n.T\nExtraction of Roots by,5\t3\n .I 3\n.T\nTechniquI 4\n.T\nGlossary of Computer 

this is the description of the dataset

.I 1, I.2, .I 3 -> are the document id and the rest of the text is the content of the document. the task is: to create a list of tuples-> [(doc_id, content)]. Any help or suggestion is highly appreciated!


Solution

  • FILENAME = "your filename"
    f = open(FILENAME)
    lst = []
    lines = f.read().splitlines()
    for i in range(0,len(lines),3):
        lst.append((lines[i],lines[i+1]))
    

    print(lst)

    ->>>[('.i 1', 'lipsum lipsum lipsum lipsum B. lipsum lipsum '), ('.i 2', 'lipsum lipsum '), ('.i 3', 'lipsum lipsum lipsum lipsum')]