Search code examples
t-sqlpowershellfqdn

Need to go from hostname to base domain


I need a function:

f(fqdn,suffix) -> basedomain

with these example inputs and outputs:

f('foobar.quux.somedomain.com','com') -> 'somedomain.com'
f('somedomain.com','com') -> 'somedomain.com'
f('foobar.quux.somedomain.com.br','com.br') -> 'somedomain.com.br'
f('somedomain.com.br','com.br') -> 'somedomain.com.br'

In plain English, if the suffix has n segments, take the last n+1 segments. Find the base domain for the FQDN, allowing for the fact that some FQDNs have more than one suffix element.

The suffixes I need to match are here. I've already got them in my SQL database.

I could write this in C#; it might not be the most elegant but it would work. Unfortunately I would like to have this function in either T-SQL, where it is closest to the data, or in Powershell, which is where the rest of the utility that consumes this data is going to be. I suppose it would be ok to do it in C#, compile to an assembly and then access it from T-SQL, or even from Powershell ... if that would be the fastest executing. If there's some reasonably clever alternative in pure T-SQL or simple Powershell, I'd like that.

EDIT: One thing I forgot to mention explicitly (but which is clear when reviewing the suffix list, at my link above) is that we must pick the longest matching suffix. Both "br" and "com.br" appear in the suffix list (with similar things happening for uk, pt, etc). So the SQL has to use a window function to make sure the longest matching suffix is found.

Here is how far I got when I was doing the SQL. I had gotten lost in all the substring/reverse functions.

SELECT Domain, suffix
FROM (
    SELECT SD.Domain, SL.suffix, 
       RN=ROW_NUMBER() OVER (
           PARTITION BY sd.Domain ORDER BY LEN(SL.suffix) DESC)
    FROM SiteDomains SD
    INNER JOIN suffixlist SL ON SD.Domain LIKE '%.'+SL.suffix
) AS X
WHERE RN=1

This works ok for finding the right suffix. I'm a little concerned about its performance though.


Solution

  • The following demonstrates matching FQDNs with TLDs and extracting the desired n + 1 domain name segments:

    -- Sample data.
    declare @SampleTLDs as Table ( TLD VarChar(64) );
    insert into @SampleTLDs ( TLD ) values
      ( 'com' ), ( 'somedomain.com' ), ( 'com.br' );
    declare @SampleFQDNs as Table ( FQDN VarChar(64) );
    insert into @SampleFQDNs ( FQDN ) values
      ( 'foobar.quux.somedomain.com' ), ( 'somedomain.com' ),
      ( 'foobar.quux.somedomain.com.br' ), ( 'somedomain.com.br' );
    select * from @SampleTLDs;
    select * from @SampleFQDNs;
    
    -- Fiddle about.
    select FQDN, TLD,
      case
        when DotPosition = 0 then FQDN
        else Reverse( Left( ReversedPrefix, DotPosition - 1) ) + '.' + TLD
        end as Result
      from (
        select FQDNs.FQDN, TLDs.TLD,
          Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) as ReversedPrefix,
          CharIndex( '.', Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) ) as DotPosition
          from @SampleFQDNs as FQDNs inner join
            @SampleTLDs as TLDs on FQDNs.FQDN like '%.' + TLDs.TLD or FQDNs.FQDN = TLDs.TLD  ) as Edna;
    
    -- To select only the longest matching TLD for each FQDN:
    with
      ExtendedFQDNs as (
        select FQDNs.FQDN, TLDs.TLD, Row_Number() over ( partition by FQDN order by Len( TLDs.TLD ) desc ) as TLDLenRank,
          Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) as ReversedPrefix,
          CharIndex( '.', Substring( Reverse( FQDNs.FQDN ), Len( TLDs.TLD ) + 2, 100 ) ) as DotPosition
          from @SampleFQDNs as FQDNs inner join
            @SampleTLDs as TLDs on FQDNs.FQDN like '%.' + TLDs.TLD or FQDNs.FQDN = TLDs.TLD )
      select FQDN, TLD,
        case
          when DotPosition = 0 then FQDN
          else Reverse( Left( ReversedPrefix, DotPosition - 1) ) + '.' + TLD
          end as Result
        from ExtendedFQDNs
        where TLDLenRank = 1;