I have a table called Students that stores all the basic information of students and trainings that they have attended to(The table has more than 15 columns and more than 5000 records). a sample part of the table is like this:
St_id St_Name St_University SoftSkillTraining StartDate EndDate ComputerTraining StartDate EndDate
---------------------------------------------------------------------------------------------------------------
1 x x True 12/02/2017 12/03/2017 False - -
2 y x True 25/05/2016 25/06/2016 True 01/08/2017
However, The table is not normalized and I need to split the student table into three specific tables (in the form of many to many relations)
Student
table that contains basic information of students like:St_id St_Name St_University St_Faculty -------------------------------------------------- 1 X Some University Law 2 y Some University IT
Training
table that store 'Training name', 'start date' and 'end date' columnsTraining
table should be:
TrainingId TrainingName StartDate EndDate TrainingLocation ----------------------------------------------------------------- 1 SoftSkill 12/02/2017 12/03/2017 Some Location 2 SoftSkill 25/02/2016 25/06/2016 Some Location 3 CMOA 01/08/2017 01//09/2017 some location
intersection
table that stores participants of the trainings and stores only primary keys of Student
and Training
tables as foreign key like below:st_id training_id ----------------------- 1 1 2 2 2 1
How can I transfer data from student
into Training
Table as you can see data of different column from student
table should appear as a row in training
table using stored procedure ?
You have quite a task to perform, but normalizing that table is the right thing to do. In your sample of the old table I notice you have both [StartDate] & [EndDate] repeating. This isn't possible in SQL Sever, all column names must be unique in a table. I'm hoping this is just a glitch in the sample because it is going to be very important.
Below I use a method to "unpivot" a student row into multiple shorter rows which represents an interim step to reach your goal. This method uses CROSS APPLY
and VALUES
. Note here that you will need to prepare this VALUES
section manually, but you might be able to get the list of fields from a query against your information schema (this query not provided).
See a working model of this at SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE Student
([St_id] int, [St_Name] varchar(1), [St_University] varchar(1)
, [SoftSkillTraining] varchar(4), [StartDate1] datetime, [EndDate1] datetime
, [ComputerTraining] varchar(5), [StartDate2] datetime, [EndDate2] datetime)
;
INSERT INTO Student
([St_id], [St_Name], [St_University]
, [SoftSkillTraining], [StartDate1], [EndDate1]
, [ComputerTraining], [StartDate2], [EndDate2])
VALUES
(1, 'x', 'x', 'True', '2017-02-12 00:00:00', '2017-03-12 00:00:00', 'False', NULL, NULL),
(2, 'y', 'x', 'True', '2016-05-25 00:00:00', '2016-06-25 00:00:00', 'True', '2017-08-01', NULL)
;
This is the most important Query it "unpivots" source data to multiple rows
Note how it needs an id to be assigned for each training course, and that the column groups
such as [SoftSkillTraining], [StartDate1], [EndDate1]
must be specified row by row in the values area. Each row here will result in a new row of output, so the "layout" of the values area basically determines what the final output will be. It is in this area you will need to collect all the column names carefully and arrange them accurately.
select
St_id, ca.TrainingId, ca.TrainingName, ca.isEnrolled, ca.StartDate, ca.EndDate
into training_setup
from Student
cross apply (
values
(1, 'SoftSkillTraining', [SoftSkillTraining], [StartDate1], [EndDate1])
,(2, 'ComputerTraining', [ComputerTraining], [StartDate2], [EndDate2])
) ca (TrainingId,TrainingName,isEnrolled, StartDate,EndDate)
where ca.isEnrolled = 'True'
;
Query 2:
select
*
from training_setup
| St_id | TrainingId | TrainingName | isEnrolled | StartDate | EndDate |
|-------|------------|-------------------|------------|----------------------|----------------------|
| 1 | 1 | SoftSkillTraining | True | 2017-02-12T00:00:00Z | 2017-03-12T00:00:00Z |
| 2 | 1 | SoftSkillTraining | True | 2016-05-25T00:00:00Z | 2016-06-25T00:00:00Z |
| 2 | 2 | ComputerTraining | True | 2017-08-01T00:00:00Z | (null) |
Query 3:
-- this can be the basis for table [Training]
select distinct TrainingId,TrainingName, StartDate,EndDate
from training_setup
| TrainingId | TrainingName | StartDate | EndDate |
|------------|-------------------|----------------------|----------------------|
| 1 | SoftSkillTraining | 2016-05-25T00:00:00Z | 2016-06-25T00:00:00Z |
| 1 | SoftSkillTraining | 2017-02-12T00:00:00Z | 2017-03-12T00:00:00Z |
| 2 | ComputerTraining | 2017-08-01T00:00:00Z | (null) |
NOTE I have reservations about the consistency of this data, notice the start/end dates differ for one course. I don't have a simple solution for that. You may need to scrub your data to minimize thee discrepancies and/or you may need an additional step that matches by the id we used in the cross apply plus the start/end date pairs to arrive at a better version of the training_id by updating the training_setup staging table before moving on.
Query 4:
-- this can be the basis for table [Student_Training]
select St_id, TrainingId
from training_setup
| St_id | TrainingId |
|-------|------------|
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |