I am working on creating a DWH where I am loading the data in Staging DB and before loading them into final DB I apply all the udfs that I have created on the data.
I was not processing anything on staging to have a quick load.
Question: is it quicker to apply any udfs when the data is in staging itself or should it be done when loading the data to final DB.
Below facility_cd
is a float
value and I am passing it to a function emr_get_code_Description
to get the corresponding description. The table where it's getting the description from is in the final DB. udf_replace_special_char
is a simple function which is replacing a few special characters with NULL.
LTRIM(RTRIM([Dest_DWH].[dbo].udf_replace_special_char([Dest_DWH].[dbo].[emr_get_code_Description](Stg_ap.Facility_cd))))
In general what should be a better practice? Should I be updating this in staging and then load the data after all conversions to Final DB.
Function definitions :
Function 1 :
USE [PROD_DWH]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER function [dbo].[emr_get_code_Description](@cv int)
returns varchar(80)
as begin
-- Returns the code value display
declare @ret varchar(80)
select @ret = cv.DESCRIPTION
from PROD_DWH.DBO.table cv
where cv.code_value = @cv
and cv.active_ind = 1
return isnull(@ret, 0)
end;
Function 2 :
USE [PROD_DWH]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER function [dbo].[udf_replace_special_char](@var varchar(1000))
returns varchar(1000)
as begin
-- Returns the code value display
declare @return_var varchar(1000)
set @return_var = @var
set @return_var = replace(@return_var,CHAR(13),'')
set @return_var = replace(@return_var,CHAR(10),'')
set @return_var = replace(@return_var,CHAR(09),'')
set @return_var = replace(@return_var,CHAR(34),CHAR(39))
return isnull(@return_var, 0)
end;
First of all, as @Nick.McDermaid mentioned in the comments: Best practice is to avoid using User defined functions. There are many links containing information about the functions effects on query performance.
There is not ideal answer for these question, it is related to the case you are working with, but i can give some tips that you can take into consideration:
...
After posting functions in your question, you can replace function 2 with a Derived Column Transformation in your SSIS package:
ISNULL([Column]) ? "" : REPLACE(REPLACE(REPLACE(REPLACE([Column],CHAR(10),""),CHAR(13),""),CHAR(09),""),CHAR(34),CHAR(39))
Also you can replace Function 1 with a Lookup Transformation in SSIS package or with a LEFT JOIN in SQL query.