Search code examples
phpjsonregextext-parsing

Text File Parsing and convert to JSON?


First, thanks for reading this! Second, I have no control over the .txt file that I am getting my data from. I'd like to be able to take the data and separate it into Complete Matches and then show those results then Upcoming Matches and show those.

For Example:

Most Recent Completed Matches

W6: John Smith vs. Joe Smith, 0-0
L7: Doug Smith vs. Jason Smith, 1-2
and so on

Upcoming Matches

W1: Joan Smith vs. Bruce Smith, 0-0
W1: Jenn Smith vs. David Smith, 0-0
and so on

This is a sample file: https://drive.google.com/file/d/11dW2pOK-MXHGRzQhJWb37tlWMqazj8UZ/view?usp=sharing

Sample Data from above link:

  <|>  Most Recent Completed Matches:  W6: Jason Bertolo vs. Ed Gabele, 0-0 | W6: Ed Gabele vs. Jason Bertolo, 0-0 | L7: Doug Metcalf vs. Jason Bertolo, 1-2 | L6: Tom Walsh vs. Jason Bertolo, 0-2 | L5: Jason Bertolo vs. Kevin Kronk, 2-0 | L5: Tom Walsh vs. Zach Dotson, 2-0 | L4: Kevin Kronk vs. Bruce Patete, 2-1 | L4: Zach Dotson vs. Tom Dotson, 0-0 | L3: Bruce Patete vs. Joseph Hornback, 2-0 | L3: Kevin Kronk vs. Matt Smith, 2-1 | L3: Matt Borders vs. Tom Dotson, 1-2 | L3: Zach Dotson vs. Mick Pillar, 2-0 | L2: David Mosel vs. Joseph Hornback, 0-2 | L2: Antoine Tucker vs. Matt Smith, 0-0 | L2: Tom Dotson vs. Mike Conley, 2-1 | L2: Mick Pillar vs. Hannah Fields, 2-0 | L1: Mike Minor vs. Joseph Hornback, 0-2 | L1: David Mosel vs. Tom Johnson, 2-1 | L1: Matt Smith vs. Jennifer Hamilton, 2-1 | L1: Lori Fields vs. Mike Conley, 1-2 | L1: Owen Miller vs. Tom Dotson, 1-2 | L1: Ken Kronk vs. Hannah Fields, 0-2 | W5: Ed Gabele vs. Doug Metcalf, 3-1 | W4: Doug Metcalf vs. Tom Walsh, 3-2 | W4: Jason Bertolo vs. Ed Gabele, 1-3 | W3: Tom Walsh vs. Kevin Kronk, 3-0 | W3: Bruce Patete vs. Doug Metcalf, 1-3 | W3: Ed Gabele vs. Zach Dotson, 3-1 | W3: Matt Borders vs. Jason Bertolo, 0-3 | W2: Kevin Kronk vs. Ken Kronk, 3-2 | W2: Mick Pillar vs. Tom Walsh, 2-3 | W2: Doug Metcalf vs. Owen Miller, 3-2 | W2: Lori Fields vs. Bruce Patete, 1-3 | W2: Matt Smith vs. Zach Dotson, 1-3 | W2: Antoine Tucker vs. Ed Gabele, 0-3 | W2: David Mosel vs. Jason Bertolo, 0-3 | W2: Matt Borders vs. Mike Minor, 3-1 | W1: Joseph Hornback vs. Ken Kronk, 2-3 | W1: Tom Walsh vs. Tom Johnson, 3-0 | W1: Jennifer Hamilton vs. Bruce Patete, 0-3 | W1: Zach Dotson vs. Mike Conley, 3-2 | W1: Ed Gabele vs. Tom Dotson, 3-0 | W1: Mike Minor vs. Hannah Fields, 3-1  <|>  Upcoming Matches:    <|>  

The code I've tried so far is:

<?php
 $text = file_get_contents('https://drive.google.com/uc?export=download&id=11dW2pOK-MXHGRzQhJWb37tlWMqazj8UZ');
 $json = (explode('|',$text));
 //var_dump ($json);
 //print "\n";
 $trimmed = trim($text);
 print_r ($trimmed);
 ?>

It looks good, but for the life of me I can't figure out how to simply now just list all values between the two "|" characters.

Any ideas on what I can do to manipulate this text in such a way that I can get to my desired result?

Additionally, you should know that this data will change every 60 seconds during an actual tournament. I don't know if that matters, but I thought I'd make sure you'd know that this data isn't static. The only things that are constant are the <|> to indicate Most Recent Matches or Completed Matches and that the results are always between two | characters.

Thanks again for your time!


Solution

  • Here's a start at what you need. There are ways to make it more compact, but I'm going for clarity. You should add error-checking on the result of the preg_match call, so you know if it fails for some reason.

    <?php
    $text = file_get_contents('sample_data');
    
    // The sample data has two sections,
    // one for completed matches and one for upcoming matches.
    // Extract the 'variable' text of these two sections:
    preg_match('/^ *<\|>  Most Recent Completed Matches: (.+) <\|>  Upcoming Matches: (.+) <\|> *$/', $text, $reg_matches);
    
    $completed_text = $reg_matches[1];
    $upcoming_text  = $reg_matches[2];
    
    // Within the text of a section, the matches are separated by '|',
    // so split the section-text on that character:
    $completed_array = explode("|", $completed_text);
    $upcoming_array  = explode("|", $upcoming_text);
    
    echo "\n";
    echo "Most Recent Completed Matches\n";
    echo "\n";
    foreach ( $completed_array as $match_text )
    {
        echo "$match_text\n";
    }
    
    echo "\n";
    echo "Upcoming Matches\n";
    echo "\n";
    foreach ( $upcoming_array as $match_text )
    {
        echo "$match_text\n";
    }
    
    ?>