Quite new to SQL, and working with flask and sqlalchemy here is my issue (I hope it's not too long)
I have a SQL table structured like this:
name vector axis value unit ref
----------------------------------------------------------------
name1 v1 W 46504 psi ref1
name1 v1 L 51757 psi ref1
name1 v2 W 127 psi another ref
name1 v2 L 403 psi ref1
name2 ...
My aim is to "unstack" the results such as I could have one row per name
as long as unit
and ref
are the same.
e.g. I wish to get something like:
name v1-L v2-W v1-L v2-W unit ref
--------------------------------------------------------------
name1 46504 127 403 psi ref1
name1 127 psi another ref
name2...
so far, I tried to join the same table based on 'name' --for now, no check on unit
neither ref
--:
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String, Table, Text, Date, Float
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, aliased
engine = create_engine('sqlite://') #, echo=True)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()
class Test(Base):
__tablename__ = 'test'
id = Column(Integer, primary_key=True)
name = Column(String(32))
vector = Column(String(32))
axis = Column(String(1))
value = Column(Float)
unit = Column(String(16), default='psi')
ref = Column(String(32))
Base.metadata.create_all(engine)
# some data to play with
data = [{'name':'name1', 'vector':'v1', 'axis':'W', 'value':'46504', 'unit':'psi', 'ref':'ref1'},
{'name':'name1', 'vector':'v1', 'axis':'L', 'value':'51757', 'unit':'psi', 'ref':'ref1'},
{'name':'name1', 'vector':'v2', 'axis':'W', 'value':'127', 'unit':'psi', 'ref':'another ref'},
{'name':'name1', 'vector':'v2', 'axis':'L', 'value':'403', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v1', 'axis':'L', 'value':'23000', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v1', 'axis':'W', 'value':'27000', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v2', 'axis':'L', 'value':'523', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v2', 'axis':'W', 'value':'217', 'unit':'psi', 'ref':'ref1'},]
for dic in data:
t = Test(**dic)
session.add(t)
session.commit()
test_alias = aliased(Test)
q = session.query(Test.id, Test.name, Test.value).filter(Test.vector == 'v1').\
join(test_alias, Test.name).filter(test_alias.vector == 'v2')
print q
Here is what I would get using pandas
library.
import pandas as pd
q = session.query(Test).order_by(Test.id) # that is the default table
row2dict = lambda r: {c.name: getattr(r, c.name) for c in r.__table__.columns}
df = pd.DataFrame([row2dict(i) for i in q])
df = df.drop(['id'], axis=1)
df = df.set_index(['ref', 'unit', 'name', 'vector', 'axis']).sort()
df = df.unstack(level=-2).unstack(level=-1)['value'].reset_index()
print(df)
vector ref unit name v1 v2
axis L W L W
0 another ref psi name1 NaN NaN NaN 127
1 ref1 psi name1 51757 46504 403 NaN
2 ref1 psi name2 23000 27000 523 217
...which is not so far from my expect.
So does it make sense to do this in SQL language? Thus my following question is: using Flask
framework, does it make sense to do this data munging using pandas? or should I stick to SQL-language?
Pandas is probably better for this kind of thing. There might be some more exotic SQL functions that do a transformation like this, but I'm not sure. Below is your example doing the most simplistic approach, which is just to JOIN a view of each one together. The way this comes to mind is that since you are producing rows that have "virtual" columns derived from data, that suggests a composite row is being created from more fundamental rows. So the approach below breaks out a set of rows corresponding to each of the four conditions, v1/W, v1/L, v2/W, v2/L. Now if in practice, there's any number of "vN", that's where the transformational ability of Pandas is probably more appropriate.
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session, aliased
Base = declarative_base()
class Test(Base):
__tablename__ = 'test'
id = Column(Integer, primary_key=True)
name = Column(String(32))
vector = Column(String(32))
axis = Column(String(1))
value = Column(Float)
unit = Column(String(16), default='psi')
ref = Column(String(32))
engine = create_engine('sqlite://', echo=True)
session = Session(engine)
Base.metadata.create_all(engine)
# some data to play with
data = [{'name':'name1', 'vector':'v1', 'axis':'W', 'value':'46504', 'unit':'psi', 'ref':'ref1'},
{'name':'name1', 'vector':'v1', 'axis':'L', 'value':'51757', 'unit':'psi', 'ref':'ref1'},
{'name':'name1', 'vector':'v2', 'axis':'W', 'value':'127', 'unit':'psi', 'ref':'another ref'},
{'name':'name1', 'vector':'v2', 'axis':'L', 'value':'403', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v1', 'axis':'L', 'value':'23000', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v1', 'axis':'W', 'value':'27000', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v2', 'axis':'L', 'value':'523', 'unit':'psi', 'ref':'ref1'},
{'name':'name2', 'vector':'v2', 'axis':'W', 'value':'217', 'unit':'psi', 'ref':'ref1'},]
for dic in data:
t = Test(**dic)
session.add(t)
session.commit()
axis_w = session.query(Test).filter(Test.axis == "W")
axis_l = session.query(Test).filter(Test.axis == "L")
axis_v1_w = axis_w.filter(Test.vector == "v1").subquery()
axis_v1_l = axis_l.filter(Test.vector == "v1").subquery()
axis_v2_w = axis_w.filter(Test.vector == "v2").subquery()
axis_v2_l = axis_l.filter(Test.vector == "v2").subquery()
def join_axes(left, right):
return and_(
left.c.unit == right.c.unit,
left.c.ref == right.c.ref
)
name_unit_ref = session.query(Test.name, Test.unit, Test.ref).distinct().subquery()
q = session.query(name_unit_ref.c.name,
axis_v1_w.c.value.label('v1_w'),
axis_v1_l.c.value.label('v1_l'),
axis_v2_w.c.value.label('v2_w'),
axis_v2_l.c.value.label('v2_l'),
name_unit_ref.c.unit,
name_unit_ref.c.ref
).\
outerjoin(axis_v1_w, join_axes(name_unit_ref, axis_v1_w)).\
outerjoin(axis_v1_l, join_axes(name_unit_ref, axis_v1_l)).\
outerjoin(axis_v2_w, join_axes(name_unit_ref, axis_v2_w)).\
outerjoin(axis_v2_l, join_axes(name_unit_ref, axis_v2_l))
for row in q:
print row