I have one table which has two columns with data code points like . These code points need to be changed with japanese characters. I have a lookup table of these code points with the japanese characters. But the problem is in both of the columns there are multiple code points in single rows.
Main table:-
Id body subject
1 <U+9876> Hi <U+1234>No <U+6543> <U+9876> Hi <U+1234>No <U+6543>
2 <U+9826> <U+5678><U+FA32> data <U+9006> <U+6502>
Lookup table :-
char value
<U+9876> だ
<U+9826> づ
I tried creating an update query with like operator in inner join but it is taking lot of time as we have 14k rows in main table and 6K values in lookup tables.
If the performance really matters you need to materialized the data in advanced. This can be done creating separate table and using a trigger or modifying the routine that is populating the original table. If you are records are not inserted/updated on batches you will not harm the CRUD execution time.
You can easy create a good looking short T-SQL statement for building dynamic code for performing 6K updates, so you can give this a shot, too - don't use LIKE
or complex conditions - just simple UPDATE-REPLACE
statements for each lookup value.
In some cases, I am using SQL CLR functions for such replaces. For example:
DECLARE @Main TABLE
(
[id] TINYINT
,[body] NVARCHAR(MAX)
,[subject] NVARCHAR(MAX)
);
DECLARE @Lookup TABLE
(
[id] TINYINT -- you can use row_number to order
,[char] NVARCHAR(32)
,[value] NVARCHAR(32)
);
INSERT INTO @Main ([id], [body], [subject])
VALUES (1, '<U+9876> Hi <U+1234>No <U+6543>', '<U+9876> Hi <U+1234>No <U+6543>')
,(2, '<U+9826> <U+5678><U+FA32> data', '<U+9006> <U+6502>');
INSERT INTO @Lookup ([id], [char], [value])
VALUES (1, '<U+9876>', N'だ')
,(2, '<U+9826>', N'づ');
DECLARE @Pattern NVARCHAR(MAX)
,@Replacement NVARCHAR(MAX);
SELECT @Pattern = [dbo].[ConcatenateWithOrderAndDelimiter] ([id], [char], '|')
,@Replacement = [dbo].[ConcatenateWithOrderAndDelimiter] ([id], [value], '|')
FROM @Lookup;
UPDATE @Main
SET [body] = [dbo].[fn_Utils_ReplaceStrings] ([body], @Pattern, @Replacement, '|')
,[subject] = [dbo].[fn_Utils_ReplaceStrings] ([subject], @Pattern, @Replacement, '|');
SELECT [id]
,[body]
,[subject]
FROM @Main;
I am showing you the code behind below, but this is just an idea. You are free to implement something on your own, which satisfy your performance requirements.
Here, you can see how SQL CLR function is created. Here, is a variant of aggregate function concatenating with order:
[Serializable]
[
Microsoft.SqlServer.Server.SqlUserDefinedAggregate
(
Microsoft.SqlServer.Server.Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = false,
IsNullIfEmpty = false,
MaxByteSize = -1
)
]
/// <summary>
/// Concatenates <int, string, string> values defining order using the specified number and using the given delimiter
/// </summary>
public class ConcatenateWithOrderAndDelimiter : Microsoft.SqlServer.Server.IBinarySerialize
{
private List<Tuple<int, string>> intermediateResult;
private string delimiter;
private bool isDelimiterNotDefined;
public void Init()
{
this.delimiter = ",";
this.isDelimiterNotDefined = true;
this.intermediateResult = new List<Tuple<int, string>>();
}
public void Accumulate(SqlInt32 position, SqlString text, SqlString delimiter)
{
if (this.isDelimiterNotDefined)
{
this.delimiter = delimiter.IsNull ? "," : delimiter.Value;
this.isDelimiterNotDefined = false;
}
if (!(position.IsNull || text.IsNull))
{
this.intermediateResult.Add(new Tuple<int, string>(position.Value, text.Value));
}
}
public void Merge(ConcatenateWithOrderAndDelimiter other)
{
this.intermediateResult.AddRange(other.intermediateResult);
}
public SqlString Terminate()
{
this.intermediateResult.Sort();
return new SqlString(String.Join(this.delimiter, this.intermediateResult.Select(tuple => tuple.Item2)));
}
public void Read(BinaryReader r)
{
if (r == null) throw new ArgumentNullException("r");
int count = r.ReadInt32();
this.intermediateResult = new List<Tuple<int, string>>(count);
for (int i = 0; i < count; i++)
{
this.intermediateResult.Add(new Tuple<int, string>(r.ReadInt32(), r.ReadString()));
}
this.delimiter = r.ReadString();
}
public void Write(BinaryWriter w)
{
if (w == null) throw new ArgumentNullException("w");
w.Write(this.intermediateResult.Count);
foreach (Tuple<int, string> record in this.intermediateResult)
{
w.Write(record.Item1);
w.Write(record.Item2);
}
w.Write(this.delimiter);
}
}
Here is one variant of function performing replacement:
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
public static SqlString ReplaceStrings( SqlString input, SqlString pattern, SqlString replacement, SqlString separator ){
string output = null;
if(
input.IsNull == false
&& pattern.IsNull == false
&& replacement.IsNull == false
){
StringBuilder tempBuilder = new StringBuilder( input.Value );
if( separator.IsNull || String.IsNullOrEmpty( separator.Value ) ){
tempBuilder.Replace( pattern.Value, replacement.Value );
}
else{
//both must have the exact number of elements
string[] vals = pattern.Value.Split( new[]{separator.Value}, StringSplitOptions.None ),
newVals = replacement.Value.Split( new[]{separator.Value}, StringSplitOptions.None );
for( int index = 0, count = vals.Length; index < count; index++ ){
tempBuilder.Replace( vals[ index ], newVals[ index ] );
}
}
output = tempBuilder.ToString();
}
return output;
}
or this one but using regex:
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplaceStrings")]
public static SqlString ReplaceStrings(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement, SqlString sqlSeparator)
{
string returnValue = "";
// if any of the input parameters is "NULL" no replacement is performed at all
if (sqlInput.IsNull || sqlPattern.IsNull || sqlReplacement.IsNull || sqlSeparator.IsNull)
{
returnValue = sqlInput.Value;
}
else
{
string[] patterns = sqlPattern.Value.Split(new string[] {sqlSeparator.Value}, StringSplitOptions.None);
string[] replacements = sqlReplacement.Value.Split(new string[] { sqlSeparator.Value }, StringSplitOptions.None);
var map = new Dictionary<string, string>();
// The map structure is populated with all values from the "patterns" array as if no corresponding value exists
// in the "replacements" array the current value from the "pattern" array is used a a replacement value. The
// result is no replacement is done in the "sqlInput" string if the given "pattern" is matched.
for (int index = 0; index < patterns.Length; index++)
{
map[patterns[index]] = index < replacements.Length ? replacements[index] : patterns[index];
}
returnValue = Regex.Replace(sqlInput.Value, String.Join("|", patterns.Select(patern => Regex.Replace(patern, @"\(|\)|\||\.", @"\$&")).OrderByDescending(patern => patern.Length).ToArray()), match =>
{
string currentValue;
if (!map.TryGetValue(match.Value, out currentValue))
{
currentValue = match.Value;
}
return currentValue;
});
}
return new SqlString(returnValue);
}