Search code examples
sqlsql-serverdatabasesplitdata-integrity

Split tables in SQL Server


I have a table called Students that stores all the basic information of students and trainings that they have attended to(The table has more than 15 columns and more than 5000 records). a sample part of the table is like this:

St_id  St_Name  St_University  SoftSkillTraining  StartDate   EndDate     ComputerTraining  StartDate   EndDate 
---------------------------------------------------------------------------------------------------------------
 1      x        x             True               12/02/2017  12/03/2017  False             -           -       
 2      y        x             True               25/05/2016  25/06/2016  True              01/08/2017          

However, The table is not normalized and I need to split the student table into three specific tables (in the form of many to many relations)

  1. Student table that contains basic information of students like:
 St_id    St_Name     St_University     St_Faculty
--------------------------------------------------
1        X           Some University    Law
2        y           Some University    IT
  1. Training table that store 'Training name', 'start date' and 'end date' columns

Training table should be:

TrainingId TrainingName  StartDate   EndDate     TrainingLocation
-----------------------------------------------------------------
1         SoftSkill      12/02/2017  12/03/2017   Some Location
2         SoftSkill      25/02/2016  25/06/2016   Some Location
3         CMOA           01/08/2017  01//09/2017  some location
  1. An intersection table that stores participants of the trainings and stores only primary keys of Student and Training tables as foreign key like below:
st_id      training_id
-----------------------
1           1
2           2
2           1

How can I transfer data from student into Training Table as you can see data of different column from student table should appear as a row in training table using stored procedure ?


Solution

  • You have quite a task to perform, but normalizing that table is the right thing to do. In your sample of the old table I notice you have both [StartDate] & [EndDate] repeating. This isn't possible in SQL Sever, all column names must be unique in a table. I'm hoping this is just a glitch in the sample because it is going to be very important.

    Below I use a method to "unpivot" a student row into multiple shorter rows which represents an interim step to reach your goal. This method uses CROSS APPLY and VALUES. Note here that you will need to prepare this VALUES section manually, but you might be able to get the list of fields from a query against your information schema (this query not provided).

    See a working model of this at SQL Fiddle

    MS SQL Server 2014 Schema Setup:

    CREATE TABLE Student
        ([St_id] int, [St_Name] varchar(1), [St_University] varchar(1)
         , [SoftSkillTraining] varchar(4), [StartDate1] datetime, [EndDate1] datetime
         , [ComputerTraining] varchar(5), [StartDate2] datetime, [EndDate2] datetime)
    ;
    
    INSERT INTO Student
        ([St_id], [St_Name], [St_University]
         , [SoftSkillTraining], [StartDate1], [EndDate1]
         , [ComputerTraining], [StartDate2], [EndDate2])
    VALUES
        (1, 'x', 'x', 'True', '2017-02-12 00:00:00', '2017-03-12 00:00:00', 'False', NULL, NULL),
        (2, 'y', 'x', 'True', '2016-05-25 00:00:00', '2016-06-25 00:00:00', 'True', '2017-08-01', NULL)
    ;
    

    This is the most important Query it "unpivots" source data to multiple rows

    Note how it needs an id to be assigned for each training course, and that the column groups such as [SoftSkillTraining], [StartDate1], [EndDate1] must be specified row by row in the values area. Each row here will result in a new row of output, so the "layout" of the values area basically determines what the final output will be. It is in this area you will need to collect all the column names carefully and arrange them accurately.

    select
        St_id, ca.TrainingId, ca.TrainingName, ca.isEnrolled, ca.StartDate, ca.EndDate
        into training_setup
    from Student
    cross apply (
      values
         (1, 'SoftSkillTraining', [SoftSkillTraining], [StartDate1], [EndDate1])
        ,(2, 'ComputerTraining', [ComputerTraining], [StartDate2], [EndDate2])
      ) ca (TrainingId,TrainingName,isEnrolled, StartDate,EndDate)
    where ca.isEnrolled = 'True'
    ;
    

    Query 2:

    select
    *
    from training_setup
    

    Results:

    | St_id | TrainingId |      TrainingName | isEnrolled |            StartDate |              EndDate |
    |-------|------------|-------------------|------------|----------------------|----------------------|
    |     1 |          1 | SoftSkillTraining |       True | 2017-02-12T00:00:00Z | 2017-03-12T00:00:00Z |
    |     2 |          1 | SoftSkillTraining |       True | 2016-05-25T00:00:00Z | 2016-06-25T00:00:00Z |
    |     2 |          2 |  ComputerTraining |       True | 2017-08-01T00:00:00Z |               (null) |
    

    Query 3:

    -- this can be the basis for table [Training]
    select distinct TrainingId,TrainingName, StartDate,EndDate
    from training_setup
    

    Results:

    | TrainingId |      TrainingName |            StartDate |              EndDate |
    |------------|-------------------|----------------------|----------------------|
    |          1 | SoftSkillTraining | 2016-05-25T00:00:00Z | 2016-06-25T00:00:00Z |
    |          1 | SoftSkillTraining | 2017-02-12T00:00:00Z | 2017-03-12T00:00:00Z |
    |          2 |  ComputerTraining | 2017-08-01T00:00:00Z |               (null) |
    

    NOTE I have reservations about the consistency of this data, notice the start/end dates differ for one course. I don't have a simple solution for that. You may need to scrub your data to minimize thee discrepancies and/or you may need an additional step that matches by the id we used in the cross apply plus the start/end date pairs to arrive at a better version of the training_id by updating the training_setup staging table before moving on.

    Query 4:

    -- this can be the basis for table [Student_Training]
    select St_id, TrainingId
    from training_setup
    

    Results:

    | St_id | TrainingId |
    |-------|------------|
    |     1 |          1 |
    |     2 |          1 |
    |     2 |          2 |