Search code examples
rustdbfdbase

Reading .dfb file with rust throws invalid character error


I am new to rust and creating a POC to convert dbf file to csv. I am reading a .dbf file using rust library dbase.

The issue is, when i crate a sample .dbf file using dbfview the code works fine. But when i use .dbf file which i will be using in real time. I am getting the following error.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidFieldType('M')', src/libcore/result.rs:999:5

Here is the code i am using from the given link.

use dbase::FieldValue;
let records = dbase::read("tests/data/line.dbf").unwrap();
for record in records {
    for (name, value) in record {
        println!("{} -> {:?}", name, value);
        match value {
            FieldValue::Character(string) => println!("Got string: {}", string),
            FieldValue::Numeric(value) => println!("Got numeric value of  {}", value),
            _ => {}
        }
    }
}

I think the ^M shows the character appended by windows. What can i do to handle this error and read the file successfully. Any help will be much appreciated.


Solution

  • The short answer to your question is no, you will not be able to read this file with dbase-rs (or any current library) and you'll most likely have to rework this file to not contain a memo field.


    A deep dive into the DBF file format

    The InvalidFieldType error points at a structural feature of the file that your library cannot handle - a Memo field. We're going to deep-dive into the file to figure out why that is, and whether there is anything we can do to fix it.

    This is the header definition:

    enter image description here

    Of particular importance is byte 28 (offset 0000010, byte 0C), which is a bitmask indicating if the table contains a bunch of possible things, most notably:

    • 0x01 if the file comes with an associated .cdx file
    • 0x02 if it contains a memo
    • 0x04 if the file is actually a .dbc file (a database)

    At 0x03, your file comes with both an associated .cdx file and contains a memo. As we know (ahead of time) that dbase-rs does not handle that, that's looking increasingly more likely.

    Let's keep looking. From here on, each field is 32 bytes long.

    Here are your fields:

    enter image description here

    Bytes 0-10 contain the field name, byte 11 is the type. Due to how the library you wanted to use can only parse certain fields, we only really care about byte 11.

    In order of appearance by what the library can parse:

    • [x] CALL_ID (integer)
    • [x] CONTACT_ID (integer)
    • [x] CALL_DATE (DateTime)
    • [x] SUBJECT (char[])
    • [ ] NOTES (memo)

    The last field is the problematic one. Looking into the library itself, this field type is not supported and will therefore yield an Error, which you are trying to unwrap(). This is the source of your error.

    There are two three ways around it:

    • The "long" way is to patch the library to handle memo fields. This sounds easy, but in practice it really isn't. As the memos are stored in another file (typically a dbt file in the same folder), you're going to have to make that library read both files and reference both. The point of the memo type itself is to store more than 255 bytes of data in a field. You are the only one able to evaluate whether this work is worth the effort.
    • If your data is less than 255 bytes in size, you can replace that memo field with a char field, and dbfview should allow you to do this
    • If your field is longer than 255 bytes and you have access to the ability to run sub-processes (i.e. Command::run), you can sneak-convert it using a library that can process Memo fields in another language. this nodeJS library can, but read-only, for example.