Search code examples
pdfpuz

Convert .txt to PUZ file


How to properly convert the .txt to PUZ?

I have a .txt file that looks like below. I am finding a way to convert this to PUZ.

<xpuzzle>
<across>
<clue><xnum>1</xnum><text><p>Was cut despite permission to enter the Corps Diplomatique. (7)</p></text></clue>
<clue><xnum>5</xnum><text><p>The grand mal? Not in the least.</p></text></clue>
<clue><xnum>9</xnum><text><p>Outrages of blackouts (')are eliminated. (7)</p></text></clue>
<clue><xnum>10</xnum><text><p>Anthony, the commercial jingle specialist? (7)</p></text></clue>
<clue>...</clue>
<clue><xnum>27</xnum><text><p>Dead Red, still feared. (7)</p></text></clue>
</across>
<down>
<clue><xnum>1</xnum><text><p>&ldquo;Nobody goes there anymore. It's too  --&rdquo;--Yogi Berra. (7)</p></text></clue>
<clue><xnum>2</xnum><text><p>Sings the praises of (7) the old phone rates.</p></text></clue>
<clue><xnum>3</xnum><text><p>One whose diet would be eat vinegar to others.(10)</p></text></clue>
<clue>...</clue>
</down></xpuzzle>

<xpuzzle>
<across>
<answer><xnum>1.</xnum><text><p>C-leave-D;</p></text></answer>
<answer><xnum>5.</xnum><text><p>Mini-mal;</p></text></answer>
<answer><xnum>9.</xnum><text><p>OUT(r)AGES;</p></text></answer>
<answer><xnum>10.</xnum><text><p>Ad-verse;</p></text></answer>
<answer>...</answer>
</across>

<down>
<answer><xnum>1.</xnum><text><p>Crowded;</p></text></answer>
<answer><xnum>2.</xnum><text><p>EX-tolls;</p></text></answer>
<answer><xnum>3.</xnum><text><p>VEG-eta-RIAN;</p></text></answer>
<answer>..</answer>
</down>
</xpuzzle>

Where I am stuck?

This is my partially updated .txt file before uploading to Across Lite application:

I need suggestions on how to compete the <GRID> section.

UDPATE:

Based on your answer, few clarifications:

  1. In Grid 6th line (means as per your comment, adding aligned 6th characters from the DOWN CLUES), for the word E.L.A.E.D.A.C.E, how did you add the characters D, A, C, and E? Because from the word MOAT in down answer, there is no 6th character.
  2. In Grid 8th line, for the word ....I.S.V.O...., how is the letter V added? Is next line also considered one character, ie. 4 characters from MOAT+ next line= 1 char. Total 5. 3rd character in Navy = V?
  3. Again, In Grid 8th line, how is letter O added?

Are there any rules to follow while building characters from the DOWN CLUES?


Solution

  • This programmatic conversion of a .PDF into a .PUZzle is not easy for several reasons, but the solution :-) is simple, once you sort out the clues.
    You will likely need a lot of patience, whist tuning your custom program. There are much simpler means to shortcut some of those steps.

    Here is an overview of one click "Uni-extract" PDF into both HTML and Text, where we can see the program limitations visibly. The HTML has not cellularized the numbers correctly (a common problem with grid extraction from PDF). Likewise the text as usual with PDF is not in the order desired for parsing. However for one simple click its "A Maze-ing" how useful those can be to initiate script across (or down).

    enter image description here

    The PUZ-zel format is Copyright 1995-2016 © Literate Software LLC but can be very simply generated in their flagship product Across Lite from parsed .TXT files

    How To: Create an Across puzzle
    To create a crossword puzzle that you can distribute as an Across puzzle, you will need to do the following:
    
    1.Write the crossword content in a text file using a text editor (such as Windows Notepad) or a word-processor that can save the file as a plain text file (not a formatted file). The puzzle must be written in a specific syntax (or format).You must name the text file when you save it with the extension .TXT.
    
    2.Open the Text puzzle file in Across Lite. In the file open dialog box that comes up when you select Open command in the File menu, select Across TEXT format in the file types list to display all files with extension .TXT. Select the file you created in Step 1 and click on OK. If there are no errors in the puzzle description, the puzzle will be read in and displayed in the program.
    
    3. Save the puzzle using the Save command in the File menu. The puzzle will be saved as an Across puzzle with .puz file name extension. This puzzle can be read from any copy of Across Lite program.
    

    So what is needed from a programmatic approach, is write a text to text converter where the PDF output is totally reassembled into PUZ input.

    For guidance on how to attempt to script that see https://www.litsoft.com/across/docs/AcrossTextFormat.pdf

    Here I reassembled by hand your sample to show how the text needs to be reordered

    1. As you have done Replace 1st < xpuzzle > (NOTE its 15x15) with
    <ACROSS PUZZLE>
    <TITLE>
    title
    <AUTHOR>
    author
    <COPYRIGHT>
    2023
    <SIZE>
    15x15
    <GRID>
    
    1. Move answers as grid content here stripping out ( - ) ##. and ; also prefix <answer><xnum> middle </xnum><text><p> suffix </p></text></answer> the logic for D.A.C.E is since Mote. was terminated a new down was likely below it, and the unused NEXT numerically so far was "Devastated" followed by "Agodoflove". C and E were already existing. However it is simple assumptions, thus those words need to be proven by matching the next lines across placements, so some sliding about is often the case. In your source you have word length so it "should" be easier to build ground rules.
    ClEaveD.Minimal
    R.X.E.I.O.A.O.A // This line is built by adding aligned 2nd characters from the DOWN CLUES
    OUTAGES.Adverse
    W.O.E.T.T.Y.O.R // Similar
    DilAToRY...Scot
    E.L.A.E.D.A.C.E
    Distressedgoods
    ....I.S.V.O....
    BAggagehandlers
    L.O.N.D.S.O.L.A
    UFOs...STIFFarm
    B.D.W.D.A.L.P.O
    Bigtime.TROTsky
    E.U.L.N.E.V.E.E
    ROYALty.DREaded
    
    1. remove <clue><xnum># and (#)</p></text></clue> etc
    <ACROSS>
    Was cut despite permission to enter the Corps Diplomatique.
    The grand mal? Not in the least.
    Outrages of blackouts (')are eliminated.
    Anthony, the commercial jingle specialist?
    Characteristic of our tardy oil and energy program.
    ...
    

    enter image description here

    Here it is in PUZ binary format

    enter image description here