Search code examples
visual-foxprofoxpro

Data Calculation for joining two tables


I am studying Foxpro to create a simple application for manipulating data from two tables A and B (size of tableB >> size of tableA). The data from an Excel spreadsheet is imported into these two tables.

tableA
id            balance    load    state     
1             10         null    l
2             22         null    l 
3             31         null    l

tableB  
Load id     id      ord        fact   type   1st value  rounded value   state
    1        1        1        0.09      1      null        null         l
    2        1        2        0.02      0      null        null         l
    3        1        3        0.13      1      null        null         l
    4        1        4       -0.05      0      null        null         l
    5        2        1        0.01      1      null        null         l
    6        2        2        0.092     1      null        null         l
    7        2        3        0.03      0      null        null         l
    8        3        1        0.14      1      null        null         l
    9        3        2        0.12      0      null        null         l
   10        3        3       -0.02      0      null        null         l

My friend wants me to write a Foxpro code to do the following things: first, create empty tableA and tableB containing the columns shown above. Each columns will be loaded by (hundreds of thousands) of data from an excel spreadsheet everyday. Second, for each unique id, the code updates the 3 columns 1st value, rounded value and load with given formulas:

1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))

1st value[1] = If(Type[1]=0, balance[1]*fact[1], balance[1]*fact[1]/(1-fact[1]))

rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)

load[i+1] = load[i] + rounded value[i+1] (i >= 1)

load[1] = balance[1] + rounded value[1]

I think I have to create a table like the following to store the calculation above for this step:

Calculation Table
  balance     id      ord    1st value  rounded value    load    
  10          1        1      0.989         0.90        10.9 (= 10 + 0.9)
  10.9        1        2      0.218         0.20        11.1 (= 10.9 + 0.2)
  11.1        1        3      1.658         1.60        12.7 (= 11.1 + 1.6)
  11.06       1        4     -0.635        -0.64        11.06 (=12.7 + (-0.64))

Desired output

Using results in Calculation Table, we update the original tableA and tableB as follows:

tableB    
Load id     id      ord       1st value  rounded value   state
    1        1        1        0.989          0.90       calculated        
    2        1        2        0.218          0.20       calculated     
    3        1        3        1.658          1.60       calculated 
    4        1        4       -0.635         -0.64      calculated 
    5        2        1        ...            ....      calculated 
    6        2        2        ...            ....      calculated 

tableA (Note: for each value in `load id`, the `load` column only stores the **last** value in the `calculation` table which corresponds to maximum `ord`)
id            balance    load    state     
1             10         9.5     calculated
2             22         ...     calculated 
3             31         ...     calculated

Can anyone please help me with the syntax for creating tableB, computing and store results for columns 1st value, rounded value and load into a calculation table with Inner Join function on id column between tableA and tableB , and update tableB?

My attempt:

First step (Creating two tables A and B with column fields shown above)

CREATE TABLE tableA; 
(     id    int,    ;
      balance   double, ;
      load   C(240), ;
      state  C(240), ;)

CREATE TABLE tableB; 
(     Load id   int, ;
      id        int, ;
      ord       int, ;
      fact      double,  ;
      type      binary (not sure....)  ;
     1st value  C(240),;
      rounded value  C(240), ;
      state     C(240), ;)

Solution

  • (adding as another answer just because others got too long to read)

    can you try your code with this dataset (drive.google.com/open?id=1uCWwt5ubd2_F8w2gsh3v4VDpibWz7PAz) to see if you will get the two output tables from your code, each similar to the one shown in the previous Excel worksheet I uploaded for you?

    I downloaded that spreadsheet and here is what I needed to change: Your ranges were C8:F35 and H8:O62 for tableA and B. Also your "balance" was named "base". New code (downloaded to d:\temp\workbook2.xlsx) edited to match ranges and "balance" to "base":

    * Get the data from given excel filename and ranges
    * first range is tableA, second one is tableB
    GetDataFromExcel("d:\temp\WorkBook2.xlsx", "Sheet1$C8:F35", "Sheet1$H8:O62")
    
    * Now data is in cursors csrA and crsB do the calculation in these
    DoCalculation()
    
    * Done. Show the results selecting and browsing the crsA and B
    Select crsA
    Browse
    Select crsB
    Browse
    
    * Get specific fields only from crsB
    Select loadId, id, ord, firstVal, roundedVal, state ;
    from crsB ;
    into cursor crsBCustom ;
    nofilter
    browse
    
    * Check data from both cursors (join)
    * I chose the fields as I see fit
    * ta and tb are local aliases for crsA and crsB
    * helping to write shorter SQL in this case
    
    Select tb.LoadId, tb.Id, ta.base, ta.load, ; 
           tb.firstValue, tb.roundVal, ;
           ta.State as StateA, tb.State as StateB ;    
    from crsA ta ;
    inner join crsB tb on ta.Id = tb.Id ;
    order by tb.Id, tb.Ord ;
    into cursor crsBoth ;
    NoFilter
    browse
    
    
    * Does the specific calculations on specific data
    Procedure DoCalculation
        *1st value[1] = If(Type[1]=0, Base[1]*fact[1], Base[1]*fact[1]/(1-fact[1]))
    
        *rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
        *rounded value[1] = If(1st value[1]>0, rounddown(1st value[1], 1), roundup(1st value[1],2)
    
        *load[1] = Base[1] + rounded value[1]
    
        * i > 1 - ord > 1
        *1st value[i] = If(Type[i]=0, load[i-1]*fact[i], load[i-1]*fact[i]/(1-fact[i]))
    
        *rounded value[i] = If(1st value[i]>0, rounddown(1st value[i], 1), roundup(1st value[i],2)
    
        *load[i+1] = load[i] + rounded value[i+1] (i >= 1)
    
        *declare local variable
        Local lnBase
    
        * select crsB and create an index there
        Select CrsB
        Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB
    
        * select crsA as parent and link to crsB 
        * using the "id" part of index
        Select crsA
        Set Relation To Padl(Id,10,'0') Into CrsB
    
        * start looping the rows
        Scan
            * working with a new Id (1, 2, ...)
            * save base value to m.lnBase
            lnBase = crsA.Base
    
            * select crsB and start looping the rows there
            * because of the index in effect and the relation created
            * pointer would be on the first crsB row with a matching Id 
            * and since Ord is also part of the index the first row of 
            * given Id
            * Limit the looping in crsB (child table) to Id in crsA
            * using WHILE clause 
            Select CrsB
            Scan While Id = crsA.Id
                * do replacing starting on first row of this Id (Ord=1)
                * we don't have any scope clauses in replace, thus 
                * we are doing "single row" updates
    
                Replace ;
                    firstValue With m.lnBase*fact / Iif(!Type, 1, 1-fact),  ;
                    roundVal With Iif(firstValue > 0, ;
                    roundDown(firstValue,1), ;
                    roundUp(firstValue, 2))
                * after each replace update m.lnBase value 
                * to use in next row   
                lnBase = m.lnBase + CrsB.roundVal
            Endscan
            * completed updating crsB
            * select crsA and also update crsA.base with final 'load' value
            Select crsA
            Replace Load With m.lnBase
        Endscan
        * Update state to 'Calculated'
        Update crsA set state = 'Calculated'
        Update crsB set state = 'Calculated'
    Endproc
    
    * Get data from excel with given filename and ranges
    * This code is not generic and expects the 
    * data to be in a specific format.
    * Does not do any error check 
    Procedure GetDataFromExcel(tcExcelFileName, tcTableARange, tcTableBRange)
        * declare and define the connection string to excel
        Local lcConStr
        lcConStr = ;
            'Provider=Microsoft.ACE.OLEDB.12.0;'+;
            'Data Source='+Fullpath(m.tcExcelFileName)+';'+;
            'Extended Properties="Excel 12.0;HDR=Yes"'
    
        * Declare and define the 2 SQL needed to get data for A and B
        * rename the fields in SQL for easier handling
        Local lcSQLA, lcSQLB
        TEXT to lcSQLA textmerge noshow
    Select [id], [base], [load], [state]
    from [<< m.tcTableARange >>]
        ENDTEXT
    
        TEXT to m.lcSQLB textmerge noshow
    select
       [Load Id] as LoadId,
       [Id], [Ord], [Fact], [Type],
       [1st value] as firstValue,
       [Rounded value] as roundVal,
       [State]
    from [<< m.tcTableBRange >>]
        ENDTEXT
    
        * Execute the queries and place results in given cursors 
        ADOQuery(m.lcConStr, m.lcSQLA, "crsTableA")
        ADOQuery(m.lcConStr, m.lcSQLB, "crsTableB")
    
        * Sanitize the cursors a bit
        * (OledB query would assign rather generic datatypes)
        Select Cast(Id As Int) As Id, Cast(Base As Double) As Base, ;
            Cast(Load As Double) As Load, Cast(State As c(50)) As State ;
            from crsTableA ;
            into Cursor crsA ;
            readwrite
        Select Cast(LoadId As Int) As LoadId, ;
            Cast(Id As Int) As Id, Cast(ord As Int) As ord, ;
            Cast(fact As Double) As fact, Cast(Type As logical) As Type, ;
            Cast(firstValue As Double) As firstValue, ;
            Cast(roundVal As Double) As roundVal, ;
            Cast(State As c(50)) As State From crsTableB ;
            into Cursor CrsB ;
            readwrite
        Use In (Select('crsTableA'))
        Use In (Select('crsTableB'))
    Endproc
    
    * roundUp and down custom functions
    
    * RoundUp and Down excel style
    * Not correct math wise IMHO
    Procedure roundUp(tnValue, tnPlaces)
        Local lnResult, lnValue
        lnValue = Abs(m.tnValue)
        If Round(m.lnValue, m.tnPlaces) != m.lnValue
            lnValue = Round(m.lnValue+((10^-(m.tnPlaces+1))*5), m.tnPlaces)
        Endif
        Return Sign(m.tnValue) * m.lnValue
    Endproc
    
    Procedure roundDown(tnValue, tnPlaces)
        Local lnResult, lnValue
        lnValue = Abs(m.tnValue)
        If Round(m.lnValue, m.tnPlaces) != m.lnValue
            lnValue = Round(m.lnValue-((10^-(m.tnPlaces+1))*5), m.tnPlaces)
        Endif
        Return Sign(m.tnValue) * m.lnValue
    Endproc
    
    
    * Generic function to query a given data source
    * and place results in a cursor  
    Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)
        Local oConn As 'ADODB.Connection'
        Local oRS As ADODB.RecordSet
        oConn = Createobject('ADODB.Connection')
        oConn.Mode= 1  && adModeRead
        oConn.Open( m.tcConStr )
        oRS = oConn.Execute(m.tcQuery)
        RS2Cursor(oRS,m.tcCursorName)
        oRS.Close
        oConn.Close
    Endproc
    
    * Helper function to ADOQuery to convert
    * an ADODB.Recordset to a VFP cursor
    Procedure RS2Cursor(toRS, tcCursorName) && simple single cursor - not intended for complex ones
        tcCursorName = Iif(Empty(m.tcCursorName),'ADORs',m.tcCursorName)
        Local xDOM As 'MSXML.DOMDocument'
        xDOM = Createobject('MSXML.DOMDocument')
        toRS.Save(xDOM, 1)
        Xmltocursor(xDOM.XML, m.tcCursorName)
    Endproc
    

    This is the whole code. Just changing the filepath and name to yours, select all the code, right click and execute selection to see results. Or save it as a prg, say ImportMyExcel.prg and run it:

    ImportMyExcel()
    

    You could see the results I have so I didn't upload any results.

    Also, is Procedure RS2Cursor(toRS, tcCursorName) intended to generate the 2 output tables? Why do we need this procedure though: Procedure ADOQuery(tcConStr,tcQuery,tcCursorName)?

    Well those procedures are a little tricky for a newcomer (maybe not). I think you should know the history of VFP, cursors, cursor adapters, converting ADO recordset to a cursor etc (probably advanced level). I don't know, those were the procedures I came up with and published also on the foxite link that I gave to you. Just think they are black boxed (like a built-in one) functions doing they are work. ADOQuery's work is to simply query an OLEDB source and return the result as a cursor. With a cursorAdapter you might not need such a procedure but that procedure was designed before CursorAdapter existence.

    Two more questions please: 1) where does the m come from in m.lnBalance?

    m. explicitly notifies the compiler that it is a memory variable. It is referred to as MDOT. There are developers who claim it is not needed and generally it leads to long running discussions (and likely you would find my name in those discussions). Up until today nobody could show and\or demonstrate me why we shouldn't or we don't need to use it. If you believe me it is not a preference but a thing that you should use.

    2) Don't we need to define crsTableA? Or you meant we can use the CREATE Table tableA in your previous code to make crsTableA valid?

    No. There is no table in that code. We read the data from excel into a cursor (crsTableA and crsTableB initially) and then sanitize into 2 cursors crsA and crsB. All of them are cursors. Cursors are like tables but are not persisted on disk. They may even spend all their life in memory and are gone when you close them. Here I preferred cursors because without harming any real data you could run N times and check your results. When you are satisfied persisting the data is as simple as a "Select ... into" or "insert into ..." (there are more ways too) a table. Even in the case of a table you don't need to use "Create Table ...". A "select Into ..." command can select the data from a source and save it to a table by creating it (like a combined 'create table ...' and then 'insert into ...').

    Also, I saw that B9:E12 does not match the range of tableA or tableB in the Excel spreadsheet I uploaded for you before. Am I missing something here?

    It matched your original samples if you think data starts at B9 and G9 respectively.

    I have another question: can you please clarify on what these lines do: Select CrsB Index On Padl(Id,10,'0')+Padl(ord,10,'0') Tag ALinkB Select crsA Set Relation To Padl(Id,10,'0') Into CrsB.

    I think I explained this part in the previous question. I will soon comment the code itself.