Search code examples
oledbvisual-foxpro

VFPOLEDB Too large files


I had the assignment of writing several queries to read in some .dbf files. The files are Visual FoxPro files and the big problem here is when I start using joins, because this makes the temporary files extremely big, eventually resulting in errors like:

File c:\users\me\appdata\local\temp\00001kjd000a.tmp is too large.

Furthermore the querying takes a really long time and this is not what I want. I tried accessing this data using sqlserver and also using c# code, but it is extremely slow.

The sizes of the databases are about 350mb, 100mb and 10mb. I have these files locally to "speed up" the process, however a triple join from these takes more than 15min...

and i know i will have to use another one which is 2gb

code i'm using:

string connStr = @"Provider=VFPOLEDB.1;Data Source=D:\data\B. Mayer Real\;";
            string qryStr = @"
                select top 100 *
                from db1 a, db2 b, db3 c
                where a.id = b.id
                and b.id = c.id
                order by a.id
                ";

            OleDbConnection conn = new OleDbConnection(connStr);

            conn.Open();

            OleDbCommand cmd = new OleDbCommand(qryStr, conn);
            OleDbDataAdapter adapter = new OleDbDataAdapter(cmd);

            DataSet data = new DataSet();

            adapter.Fill(data);

            DataTable table = data.Tables[0];

            foreach (DataRow myDataRow in table.Rows)
            {
                Console.WriteLine("Found data {0}", myDataRow[1]);
            }

            Console.ReadLine();

            conn.Close();

EDIT:

The biggest problem is simply going through the files... If I do something like this:

SELECT *
FROM [CARATLOCAL]...[lzarb]
where la_nummer = 364999

This already takes 30 seconds

This query took 38 minutes! (it's just one line)

select max(la_datum + convert(datetime, la_time, 108)) as book_datetime, la_nummer, la_index from [CARATLOCAL]...[lzarb]
where la_datum is not null and la_time is not null and la_nummer = 364999
group by la_nummer, la_index

Solution

  • Since the "Order by" is the basis of the query, and you are only looking for the most recent 100 (hence my order by is DESCENDING), I would suggest trying to prequery just the ID as a PREQUERY, then join to the rest such as... Make sure you have an index on each table on the "ID" column.

    select 
          PreQuery.*, 
          b.*, 
          c.*
       from
          ( select top 100 a.*
               from db1 a
               order by a.ID DESC ) PreQuery
          JOIN db2 b on PreQuery.ID = b.ID
          JOIN db3 c on PreQuery.ID = c.ID
    

    if that doesn't cut it, and the "a.ID" is a type of auto-increment column, you may want to get the max() ID and further strip from that.

    ADDITIONALLY... if table "B" and "C" have multiple records per ID, then you are probably running into a Cartesian result... So, if B has 10 entries for each "A" ID, and C has 20 entries for same "A" ID, then you now have 200 entries for that one ID... Do this for 100 entries and you have 20,000 entries. Not a problem with that small a set, but something to consider.