Search code examples
mysqltalend

TOS DI variable in tMySqlInput


I'm relatively new to Talend OSDI. I managed to do simple request in MySql with tMySqlInput component. However today I have a more ambitious request and have some trouble to make it work. Indeed I need a request where the result depends on the previous line. I made it on MySQLWorkbench but not on Talend. Exemple : delay time between two dates. Here is the request :

SET @var = NULL;
SELECT id, start_date, end_date, @var precedent, UNIX_TIMESTAMP(TIMEDIFF(start_date,@var)) AS diff, @var:=start_date AS temp
FROM ma_table
ORDER BY start_date;

and errors are :

  • You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT id, start_date, end_date, id_process_type, @var precedent, UNIX_TIMESTAMP' at line 2

...Not very usefull, Is this syntax forbidden on Talend ? Do it exists others solutions to do such requests on Talend ? (for delay time between two dates for examples) or other component maybe ? I am searching with tMysqlRow.

Thanks for ideas !


Solution

  • As @Gabriele B mentions, you might want to consider doing this in a more "Talend" way.

    I'd personally make use of the tMemorizeRows component to do this though.

    Job layout

    To simplify this I've gone and made the start and end dates as integers but it should be trivial to handle this using proper dates.

    Sample data

    If we have some data that shows the start and end date of a process and we want to work out the delay between finishing the last one and starting the next process we can read all of the data in and then use the tMemorizeRows component to remember the last 2 rows:

    tMemorizeRows component configuration

    We then access the memorized data by looking at the array index. So here we go to a tJavaRow component that has an extra output column, startdelay. We then calculate it by comparing the current process' start day minus the last process' end date:

    output_row.id = input_row.id;
    output_row.startdate = input_row.startdate;
    output_row.enddate = input_row.enddate; 
    if (id_tMemorizeRows_1[0] != 1) {
        output_row.startDelay = startdate_tMemorizeRows_1[0] - enddate_tMemorizeRows_1[1];
    } else {
        output_row.startDelay = 0;
    }
    

    The conditional statement it to avoid null pointer errors on the first run of the data as the enddate_tMemorizeRows_1[1] will be null at that point. You could handle the null in other ways of course.

    This process is reasonably easy to understand and maintain (although there is that small bit of Java code in there) and has the benefits of only needing the load the data once and only keep a small part of it in memory at any one time. It should also be very fast.