Disclaimer: I'm not a SQL expert. I'm trying to insert records into a child table before inserting them into the parent table. (After saying that I'm starting to wonder if this is even a good idea.) The parent table record holds a reference to the child table record, and said-reference cannot be null. This necessitates me inserting into the child table first, then linking to the parent table during a secondary insert.
Anyway for some reason, this code randomly produces orphaned records in the IdentifyingData (child) table e.g., they have no entry in the FraudScore (parent) table even though they should.
Here's why I'm confused. In trying to resolve this issue, I started dumping the contents of the @tempFraudScore table into a physical audit table so I can see exactly what's going on during data transformation. When I switch the below code that inserts into FraudScore from @tempFraudScore to insert from the audit table, all the child records successfully get a parent record created. This makes no sense to me.
insert into IdentifyingData (EntryDateTime, IdentifyingDataTypeId, Value, Source)
select distinct GETDATE(), tfs.IdentifyingDataTypeId, tfs.Value, 'SSIS'
from @tempFraudScore tfs
where not exists (
select id.IdentifyingDataTypeId, id.Value
from IdentifyingData id
where tfs.IdentifyingDataTypeId = id.IdentifyingDataTypeId
and tfs.Value = id.Value
);
update tfs
set tfs.IdentifyingDataId = id.Id
from @tempFraudScore tfs
inner join IdentifyingData id on
tfs.Value = id.Value and
tfs.IdentifyingDataTypeId = id.IdentifyingDataTypeId;
insert into FraudScore (EntryDateTime, FraudCriteriaId, AccountId, IdentifyingDataId, Score, Source)
select distinct
GETDATE() EntryDateTime,
tfs.FraudCriteriaId,
tfs.AccountId,
tfs.IdentifyingDataId,
tfs.Score,
'SSIS'
from @tempFraudScore tfs
inner join FraudCriteria fc on
tfs.FraudCriteriaId = fc.Id
and fc.UniqueEntryPeriod = 0
where not exists (
select fs.AccountId, fs.FraudCriteriaId, fs.IdentifyingDataId
from FraudScore fs
where tfs.AccountId = fs.AccountId
and tfs.FraudCriteriaId = fs.FraudCriteriaId
and tfs.IdentifyingDataId = fs.IdentifyingDataId
);
@tempFraudScore comes pre-populated with all of the necessary fields except for IdentifyingDataId; that has to be created by first inserting into IdentifyingData, then updating the variable table with the created ID. Below is the structure of the variable table:
declare @tempFraudScore table(
FraudCriteriaId int,
AccountId bigint,
IdentifyingDataId bigint,
IdentifyingDataTypeId smallint,
Value varchar(100),
Score int
);
Could someone please tell me what could be causing these orphaned IdentifyingData records? Should I reconsider how the relationships between these two tables are structured? I'm trying to make things so that once a certain IdentifyingData record is put into the system, it won't get duplicated; it'll simply be referenced by newly created FraudScore records.
Edit Attached is a screenshot from the audit table that shows the progress of data transformation for a single value (the Value column is the same value for these records; I'm blurring it out for privacy's sake). Note that despite the message "Post-FraudScore Insert", the record in question was never actually inserted into the FraudScore table.
Edit2 (2/6/2018): I've added the following code to the stored procedure in trying to troubleshoot this issue. I had a value (99999) that appeared in the _Audit table's Value column, but not the second table's Value column despite the code simply dumping all data into these two tables from the same source! I'm not sure if it matters, but this stored procedure gets kicked off from an SSIS package's Execute SQL Task with an IsolationLevel of "Serializable". That-said, I'm not explicitly using transactions anywhere in the code, and the TransactionOption for that Execute SQL Task is set as "Supported". I have no clue if this would have anything to do with the issue.
insert into FraudScoreIdentifyingData_Audit
select 'Post-IdentifyingData Update', GETDATE(), FraudCriteriaId, AccountId, IdentifyingDataId, IdentifyingDataTypeId, Value, Score
from @tempFraudScore;
insert into FraudScoreIdentifyingData
select GETDATE(), FraudCriteriaId, AccountId, IdentifyingDataId, IdentifyingDataTypeId, Value, Score, 1
from @tempFraudScore;
Here are the two tables' schemas:
Turns out there was a single delete statement buried in one of my large stored procedures that was written incorrectly that was causing the problem.
In searching for the cause of this issue, I also had a DBA sit with me and he identified a part of my SSIS process that was reorganizing indexes; but it was doing so as the package continued to run and populate all the necessary underlying tables (including the one with the orphaned records). According to him, reorganizing or rebuilding indexes on tables while simultaneously attempting to add or remove records to those tables could also cause this problem; although in my specific case it was the incorrectly written, single delete statement.