Search code examples
phpfgetcsv

Reading contents from text/csv document with inconsistencies in data


I am trying to import data from a source that is not a csv or txt but I am able to read it like a text / csv with my code.

The problem I am having is that some "data records" do not follow the same logic. I have approximately 70% of the document conforming, however, I think I may be missing something in the data that is throwing off the results.

I would appreciate it if you could please take a look at the code and the file and help me figure out why some of the data is not working like the rest of the document. I suspect it is because of odd number of characters (~ and/or >) in one of the fields or that the start/stop is slightly different for some of the records.

<?php
header("Content-Type:text/html");

$file = "data.txt";
if (($handle = fopen($file, "r")) !== FALSE) 
    {
        fgetcsv($handle, 1000, ">~Yn");
        $imports = array();

            while (($data = fgetcsv($handle, 1000, ">")) !== FALSE) 
            {
                if(strpos($data[4],'<') !== false)
                    {
                        echo "<br /><strong>Section:</strong> " . $data[5];
                        echo "<br /><strong>Row:</strong> " . $data[6];
                        echo "<br /><strong>Qty:</strong> " . $data[7];
                        echo "<br /><strong>Price:</strong> " . $data[8];
                        echo "<br /><strong>Notes:</strong> " . $data[10];
                    }
                else
                    {
                        echo "error: ";
                        print_r($data);
                    }
                echo "<br /><br /><br /><br />";
            }

            fclose($handle);
    }
?>

The sample data can be found here: Sample Data


Solution

  • I have found a solution that works better than the method I originally attempted. I first determined that loading it as a CSV was not giving me the best results. I then realized that there are common delimiters between each record that I was missing. That being said, I split the contents into lines and then split the lines into pieces using split(). I also ignored the first and last match because of data mismatches.

    $file = "data.txt";
    $content = file_get_contents($file);
    $lines = split(">~", $content);
    foreach($lines as $line)
        {
            $data = split(">", $line);
    
            if(strpos($data['5'],'.') !== false) //if the section is a price
                {
                    //the first match is ignored
                }
            elseif(empty($data['7'])) //if Qty is empty
                {
                    //the last match is ignored
                }
            else
                {
                    echo "<br><br><br>";
                    echo $data['5'] . " (Section) <br>";
                    echo $data['6'] . " (Row) <br>";
                    echo $data['7'] . " (Qty) <br>";
                    echo $data['8'] . " (Price) <br>";
                    //use the data
                }
        }
    

    This resulted in a much more accurate and thorough data collection!