Search code examples
sqlitefull-text-searchfts3

How best to import multiple text files into SQLite FTS virtual table?


I have multiple text files that I would like to import into a virtual FTS table in SQLite. The documentation here mentions inserting ~500,000 files from the Enron email dataset into a single table.

I do not understand how this was done.

Oreilly here says just use INSERT, UPDATE, DELETE, but it is not clear to me how to get content from multiple external files this way.

Is an external script that calls SQLite from the command line with .insert the best way?

If so, how do I prevent arbitrary text in the files from being interpreted as a new column??

Many thanks in advance for your help!


Solution

  • Inserting is to be done with normal INSERT statements like this:

    INSERT INTO enrondata(content) VALUES('From: [email protected]
    To: [email protected]
    Subject: Expense Reports
    Date: Tue, 5 Feb 2002 09:06:41 -0800 (PST)
    
    In anticipation of the imminent closing of the UBSW Energy transaction, please have all outstanding employee expenses in your possession filed through Enron''s XMS system as soon as possible.
    ...');
    

    You have to use some script to convert the emails into valid SQL commands, i.e., duplicate all quotes ('), and add the INSERT statement around the files' contents.

    Alternatively, use some script to read all the data and insert it directly; for example, in Python:

    import sqlite3
    db = sqlite3.connect("enron.db")
    cursor = db.cursor()
    cursor.execute("CREATE TABLE ...")
    for each mail ...:
        content = ...
        cursor.execute("INSERT INTO enrondata(content) VALUES(?)", [content])
    cursor.close()
    db.commit()
    db.close()