Search code examples
whoisrdap

Rdap query has less results than whois for google.com?


When I do a simple domain whois lookup for Google.com, I get the following results:

[...]
Registrant Organization: Google LLC
Registrant State/Province: CA
Registrant Country: US
Registrant Email: Select Request Email Form at https://domains.markmonitor.com/whois/google.com
Admin Organization: Google LLC
Admin State/Province: CA
Admin Country: US
Admin Email: Select Request Email Form at https://domains.markmonitor.com/whois/google.com
Tech Organization: Google LLC
Tech State/Province: CA
Tech Country: US
[...]

But when I use the rdap, for example using the following website :

https://client.rdap.org/?type=domain&object=google.com

The resulting json does not contain any data that would point to Google LLC. Is this because I used rdap in the wrong way or because the rdap entry for Google does simply not contain the registrant/admin/tech organization data?


Solution

  • TL;DR: the registrar concerned by the domain you choose as example is not following the regulations and indeed is not showing contact data through RDAP while it is showing it through whois; this is not what is supposed to happen and should be fixed at some point; it is not a defect of the protocol, just one actor not following the specifications. If you try with other names (at other registrars) you should get better results.

    But since your problem may also come from other reasons, please find below more explanations.

    This problem is not necessarily specific to RDAP, you have the exact same for whois, for the case of .COM/.NET as this is a thin registry, which means the registry does not have data about contacts.

    whois clients typically emulates redirects (that do not exist in the whois protocol) and will first show the registry whois reply (no contacts there for a .COM) and then continue on the registrar whois reply (which has contacts).

    You do not see these 2 steps by default if you do not pay attention with whois clients as it is an operational detail.

    But RDAP being structured gives you the links and let you follow them, but your client needs to do it.

    Let us start from scratch to follow a methodology that will work for all cases, and just manually emulating an RDAP client using wget and jq.

    1) Finding authoritative RDAP server

    The process is basically outlined by RFC 7484, but let us do it manually.

    IANA is the authoritative source here, so if you go to http://data.iana.org/rdap/dns.json you find the authoritative RDAP server for .COM, which is: https://rdap.verisign.com/com/v1/

    2) Querying registry RDAP server

    Per RDAP specifications, from the base URL above you know you need to use https://rdap.verisign.com/com/v1/domain/google.com as your first step (i.e. concatenation of base URL, then domain, then the domain name you are after).

    You can emulate it manually by something like wget -O - https://rdap.verisign.com/com/v1/domain/google.com | jq .

    You will get a lot of data but nothing about contacts for the reasons outlined above that has nothing to do with the fact that you are using RDAP, it is just that the registry does not have the contact data.

    But the reply gives you information on where to go next to have the missing data. If you look closely at the returned JSON data you have this part:

      "links": [
        {
          "value": "https://rdap-core.vrsn.com/com/v1/domain/GOOGLE.COM",
          "rel": "self",
          "href": "https://rdap-core.vrsn.com/com/v1/domain/GOOGLE.COM",
          "type": "application/rdap+json"
        },
        {
          "value": "https://rdap.markmonitor.com/rdap/domain/GOOGLE.COM",
          "rel": "related",
          "href": "https://rdap.markmonitor.com/rdap/domain/GOOGLE.COM",
          "type": "application/rdap+json"
        }
      ],
    

    Pay close attention to the rel property. First link (it is an array in the response), has rel=self which means it gives you the canonical URL that represents the object for which you just got a reply. Using it again should give you the exact same reply - if the object did not change of course - and it is useful to keep the source URL in the document itself. And the fact that it is not the same as we used then the base URL differs from what exists at IANA is just an operational detail without consequences here.

    But look at the second one with rel=related. If you look at RDAP specifications and ICANN rules, this is explained as to be the link to get more data, that is the registrar part for cases of split registry/registrars model like in all gTLDs.

    So we should use that link for next step.

    3) Querying registrar RDAP server

    With wget -O - https://rdap.markmonitor.com/rdap/domain/GOOGLE.COM | jq . if we search for the entities part, where contacts are located, we get:

      "entities": [
        {
          "objectClassName": "entity",
          "handle": "292",
          "events": [
            {
              "eventAction": "registrar expiration",
              "eventDate": "2020-09-14T04:00:00.000+0000"
            }
          ],
          "roles": [
            "registrar"
          ],
    
    ...
    

    And indeed then no other entity, that is no other role than registrar. This RDAP server of this registrar did not provide back any contact data, contrary to its whois access. This is obviously against the specification, and this server is not compliant under current ICANN rules.

    Unfortunately, there is probably nothing you can do at your level to change that. It will change, as ICANN will start at some point to enforce things, but until then you will need to live with such broken cases, as there are multiple others.

    4) Same for other domain, better results

    If you repeat the above with another name, say stackoverflow.com you reach another registrar and in the final reply you can see:

      "entities": [
    
    ...
    
       {
          "objectClassName": "entity",
          "handle": "",
          "vcardArray": [
            "vcard",
            [
              [
                "version",
                [],
                "text",
                "4.0"
              ],
              [
                "org",
                {
                  "type": "work"
                },
                "text",
                "Stack Exchange, Inc."
              ],
              [
                "adr",
                [],
                "text",
                [
                  "",
                  "",
                  "",
                  "",
                  "NY",
                  "",
                  "US"
                ]
              ]
            ]
          ],
          "roles": [
            "registrant"
          ],
          "remarks": [
            {
              "title": "REDACTED FOR PRIVACY",
              "type": "object truncated due to authorization",
              "description": [
                "Some of the data in this object has been removed."
              ]
            }
          ]
        },
    

    As you can see by registrant in roles, this structure describe registrant data. However, due to GDPR and hence ICANN temporary specification, most of the data is redacted and in fact not there. You have basically just the registrant name and country, in the vCard part.

    5) Summary

    Three points to remember here:

    • one of the advantages of RDAP (over whois) is exactly to be able to convey clear links on where to go next to get more information; this is the process outlined above
    • for now this relates only to COM/NET names as these TLDs are run under a thin registry model, one where the registry does not have contact data; note that this is bound to disappear: even if the process is postponed multiple times at ICANN it is indeed pending and in some future COM/NET will work like any other gTLD as the registry will have all contact data
    • all the above is heavily influenced by GDPR that restricts the amount of data shown nowadays in whois, specifically about contacts. As the future model of tiered access is not known today, maybe we will still have a multiple steps querying process to get more data on contacts depending on who requests the data.