Search code examples
c#azureazure-data-lakeu-sql

Add new column with a specific row value in U-SQL


I have to extract records from a text file in U-SQL. The first row is different from other rows and it contains date. I have to skip the first row but I have to copy the date from first row and paste it into a new column for all rows. So this way in my final u-sql output query, the first column of every row will contain same data which was copied from the first line of the file. Please see the attached image file for more information.

enter image description here

Please suggest me the correct u-sql query to complete this task.


Solution

  • Here's another way. I used the SearchLog.tsv in the Samples to demonstrate this. At the top of the file, I added the row FROM 01JAN17 TO 31JAN17.

    //Skip the first row and read all the other rows
    @searchlog = 
        EXTRACT UserId          int, 
                Start           DateTime, 
                Region          string, 
                Query           string, 
                Duration        int, 
                Urls            string, 
                ClickedUrls     string
        FROM @"/Samples/Data/SearchLogWithHeader.tsv"
        USING Extractors.Tsv(skipFirstNRows: 1);
    
    //Extract all the text in the same file but don't parse out the individual columns
    @searchlogAllText = 
        EXTRACT rowText string
        FROM @"/Samples/Data/SearchLogWithHeader.tsv"
        USING Extractors.Text(delimiter: '\n');
    
    //Find a pattern that works for you and use the .NET expressions that match the string
    @searchlogHeaderDate = 
        SELECT rowText.Split(' ')[1] AS FromDate FROM @searchlogAllText WHERE rowText.StartsWith("FROM");                                    
    
    @output = SELECT * FROM @searchlogHeaderDate CROSS JOIN @searchlog;
    
    OUTPUT @output 
        TO @"/Samples/Output/SearchLog_output.tsv"
        USING Outputters.Tsv();
    

    Input:

    FROM 01JAN17 TO 31JAN17
    399266  2/15/2012 11:53:16 AM   en-us   how to make nachos  73  www.nachos.com;www.wikipedia.com    NULL
    382045  2/15/2012 11:53:18 AM   en-gb   best ski resorts    614 skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm   ski-europe.com;www.travelersdigest.com/ski_resorts.htm
    382045  2/16/2012 11:53:20 AM   en-gb   broken leg  74  mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture   mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture
    106479  2/16/2012 11:53:50 AM   en-ca   south park episodes 24  southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall   southparkstudios.com
    906441  2/16/2012 11:54:01 AM   en-us   cosmos  1213    cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos NULL
    351530  2/16/2012 11:54:01 AM   en-fr   microsoft   241 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL
    640806  2/16/2012 11:54:02 AM   en-us   wireless headphones 502 www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com www.amazon.com;store.apple.com
    304305  2/16/2012 11:54:03 AM   en-us   dominos pizza   60  dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos  dominos.com
    460748  2/16/2012 11:54:04 AM   en-us   yelp    1270    yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp  yelp.com
    354841  2/16/2012 11:59:01 AM   en-us   how to run  610 running.about.com;ehow.com;go.com   running.about.com;ehow.com
    354068  2/16/2012 12:00:33 PM   en-mx   what is sql 422 wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL    wikipedia.org/wiki/SQL
    674364  2/16/2012 12:00:55 PM   en-us   mexican food redmond    283 eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com  NULL
    347413  2/16/2012 12:11:55 PM   en-gr   microsoft   305 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL
    848434  2/16/2012 12:12:35 PM   en-ch   facebook    10  facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook facebook.com
    604846  2/16/2012 12:13:55 PM   en-us   wikipedia   612 wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia  wikipedia.org
    840614  2/16/2012 12:13:56 PM   en-us   xbox    1220    xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360    xbox.com/xbox360
    656666  2/16/2012 12:15:55 PM   en-us   hotmail 691 hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail    NULL
    951513  2/16/2012 12:17:00 PM   en-us   pokemon 63  pokemon.com;pokemon.com/us;serebii.net  pokemon.com
    350350  2/16/2012 12:18:17 PM   en-us   wolfram 30  wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram    NULL
    641615  2/16/2012 12:19:55 PM   en-us   kahn    119 khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name) khanacademy.org
    321065  2/16/2012 12:20:03 PM   en-us   clothes 732 gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars   footballfanatics.com/college_washington_state_cougars
    651777  2/16/2012 12:20:33 PM   en-us   food recipes    183 allrecipes.com;foodnetwork.com;simplyrecipes.com    foodnetwork.com
    666352  2/16/2012 12:21:03 PM   en-us   weight loss 630 en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com webmd.com/diet
    

    Output:

    "01JAN17"   399266  2012-02-15T11:53:16.0000000 "en-us" "how to make nachos"    73  "www.nachos.com;www.wikipedia.com"  "NULL"
    "01JAN17"   382045  2012-02-15T11:53:18.0000000 "en-gb" "best ski resorts"  614 "skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm" "ski-europe.com;www.travelersdigest.com/ski_resorts.htm"
    "01JAN17"   382045  2012-02-16T11:53:20.0000000 "en-gb" "broken leg"    74  "mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture" "mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture"
    "01JAN17"   106479  2012-02-16T11:53:50.0000000 "en-ca" "south park episodes"   24  "southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall" "southparkstudios.com"
    "01JAN17"   906441  2012-02-16T11:54:01.0000000 "en-us" "cosmos"    1213    "cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos"   "NULL"
    "01JAN17"   351530  2012-02-16T11:54:01.0000000 "en-fr" "microsoft" 241 "microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com"   "NULL"
    "01JAN17"   640806  2012-02-16T11:54:02.0000000 "en-us" "wireless headphones"   502 "www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com"   "www.amazon.com;store.apple.com"
    "01JAN17"   304305  2012-02-16T11:54:03.0000000 "en-us" "dominos pizza" 60  "dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos"    "dominos.com"
    "01JAN17"   460748  2012-02-16T11:54:04.0000000 "en-us" "yelp"  1270    "yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp"    "yelp.com"
    "01JAN17"   354841  2012-02-16T11:59:01.0000000 "en-us" "how to run"    610 "running.about.com;ehow.com;go.com" "running.about.com;ehow.com"
    "01JAN17"   354068  2012-02-16T12:00:33.0000000 "en-mx" "what is sql"   422 "wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL"  "wikipedia.org/wiki/SQL"
    "01JAN17"   674364  2012-02-16T12:00:55.0000000 "en-us" "mexican food redmond"  283 "eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com"    "NULL"
    "01JAN17"   347413  2012-02-16T12:11:55.0000000 "en-gr" "microsoft" 305 "microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com"   "NULL"
    "01JAN17"   848434  2012-02-16T12:12:35.0000000 "en-ch" "facebook"  10  "facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook"   "facebook.com"
    "01JAN17"   604846  2012-02-16T12:13:55.0000000 "en-us" "wikipedia" 612 "wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia"    "wikipedia.org"
    "01JAN17"   840614  2012-02-16T12:13:56.0000000 "en-us" "xbox"  1220    "xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360"  "xbox.com/xbox360"
    "01JAN17"   656666  2012-02-16T12:15:55.0000000 "en-us" "hotmail"   691 "hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail"  "NULL"
    "01JAN17"   951513  2012-02-16T12:17:00.0000000 "en-us" "pokemon"   63  "pokemon.com;pokemon.com/us;serebii.net"    "pokemon.com"
    "01JAN17"   350350  2012-02-16T12:18:17.0000000 "en-us" "wolfram"   30  "wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram"  "NULL"
    "01JAN17"   641615  2012-02-16T12:19:55.0000000 "en-us" "kahn"  119 "khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name)"   "khanacademy.org"
    "01JAN17"   321065  2012-02-16T12:20:03.0000000 "en-us" "clothes"   732 "gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars" "footballfanatics.com/college_washington_state_cougars"
    "01JAN17"   651777  2012-02-16T12:20:33.0000000 "en-us" "food recipes"  183 "allrecipes.com;foodnetwork.com;simplyrecipes.com"  "foodnetwork.com"
    "01JAN17"   666352  2012-02-16T12:21:03.0000000 "en-us" "weight loss"   630 "en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com"   "webmd.com/diet"