Search code examples
sqlsql-serverleft-joininner-joinrecord

seeking SQL join example that removes offending records (row-removing row-cleansing SQL join)


The SQL engine in question is SSMS (SQL Server Management Studio V18.5), although I doubt the version is relevant for this inquiry.

I have a table A that's an information source. Also there's a table B which is a log of results from a complex query X containing about six joins. A tool processes output from that query to accomplish a work effort in an external tool, and then logs those work details to table C.

So we have:

  • Table-A: data source
  • Complex-Query-X: multi-join query producing work-item list (Table-A is one of many sources for it)
  • Table-B: list of output results from Complex-Query-X over time
  • Table-C: list of work items out of Complex-Query-X that have a work-effort performed upon them, work done, results of that work, etc

The query I seek to create has the goal of removing rows from the complex query above if they have already been processed (meaning they appear in Table-C). The hard goal at this point is to reduce work because there are hundreds of thousands of records being processed and we only want to process records one time, so using the result set to restrict the work-to-do list is an obvious target.

I'm not new to SQL but I've never tried to do record-removal with joins before. A bevy of references online deal with SQL joins, the hows and the whys of them, but none of them detail removing data in a way I've seen that conforms with our goals.

It also occurred to me that I'm old and I might be missing something obvious, should I only be adding data to my big join query, hence my inquiry here (eg -- so it wouldn't need removing). Is that where I faltered?


Solution

  • The answer ended up being a collection of options from which I chose the third:

    • limit original query so 'data to remove' wasn't present after the core query ran -- that option was a bit complicated to contemplate given the way the data is put together (hundreds of thousands of rows, about seven (7) sources), however it was at least conceivable.
    • a left-join criteria late in the query join-structure that forces null-key-identifiers upon non-matching-rows (making them easy to ignore by tools that consume said query)
    • a where clause that automatically drops target rows -- this was the simplest and was also at least comparatively performant in the analysis with both other query options -- so as stated above this was in fact our chosen direction along with a left-join for reasons that will not be detailed here

    My personal thanks to everyone for both input and critiques, such as they were. My personal apologies for not being more forthcoming about what precisely is happening with actual source and data samples, the institution that employs me is quite secretive about that sort of thing and I was too busy / lazy to fudge enough of an example when I had other options.

    Regardless, questions of this type are really more academic and shouldn't require hard examples, and I'm saying that knowing that if I were in the shoes of a reader that I myself would actually prefer an example if I had the option. I did actually draw up a quick line-schema for the party that ultimately cultured my final selection, but even that's more than I'd prefer to post in a public context.