Search code examples
pythonsqlsqlalchemypython-elixir

SQLAlchemy model definition at execution


I'm writing an ORM using Elixir and SQLAlchemy to handle moving spreadsheet-like data into SQL. In general, the content of the spreadsheet-like data is unknown, and pyparsing parses (meta) data about the spreadsheet-like data from a text file

(example: the census publishes the Current Population Survey in fixed with flat files, accompanied by a txt file which describes contents of the data, including column specification and documentation for each column in data)

As I've imagined it, the ORM looks something like this

class DataSet(entity)
    """a collection of many spreadsheet-like files"""
class DataFile(entity)
    """describes a single spreadsheet-like file"""
class Variable(entity)
    """describes a single column in spreadsheet-like file"""

So, this model describes the contents of a bunch of flat files sitting on a hard drive. Now, if I want to take those flat files to SQL, should I

  1. try to write the SQL as a string and substitute information from the model written above

  2. try to define a new Elixir/SQLAlchemy entity

  3. some third option

At the end of the day, what I think I want is all of the spreadsheet like data files in SQL as spreadsheet-like tables, and Elixir/SQLAlchemy magic for handling all the meta data

I've read a lot of SQLAlchemy docs and what not, but they all seem to be written for 'so you want to write a blog' type applications, or at least applications where the structure of the data is completely identified before writing code. I guess I'm trying to write a model which is agnostic about its column specifications.


Solution

  • My first thought would be that elixir won't benefit you much towards a solution.

    My advice would be to go with 2) in that you try to build up a table based on the metadata that you have. (re)read the schema docs to see how you'd add columns programmatically and then create the table:

    http://www.sqlalchemy.org/docs/core/schema.html

    e.g.

    sqla_metadata = sqlalchemy.schema.MetaData()
    
    type_mapping = {'int': Integer, 'text': String} # etc.
    cols = []
    for (col_name, col_type) in your_parsed_metadata.fields:
        cols.append(Column(col_name, type_mapping[col_type]))
    
    cols.append(Column('datafile_id', Integer, ForeignKey("datafile.datafile_id"), nullable=False),)
    new_table = Table(your_parsed_metadata.tablename, sqla_metadata, *cols)
    sqla_metadata.create_all(engine)
    

    Then you can start inserting into your newly created table. You'll also want to keep track of the mapping between generated tables datafiles. You could reuse a generated table for another datafile if it's schema matched.