Using the Data Explorer (SEDE), I would like to find which users have more than 200000 reputation on Stack Overflow, and then find details for any accounts they have on other Stack Exchange sites.
Here is the query which provides the list with this threshold:
Select id, reputation, accountid
From users
Where reputation > 200000
AccountId
is the key for all Stack Exchange sites.
I have found this query for aggregating across SEDE databases, but how is it possible to do that based on the dynamic results of the previous/baseline query?
Here is the kind of output I'm aiming for:
id_so, reputation_so, accounted, other_stackexchange_site_name, reputation_othersite, number_of_answers_other_site, number_of_questions_other_site
1, 250000, 23, serverfault, 500, 5, 1
1, 250000, 23, superuser, 120, 1, 0
2, 300000, 21, serverfault, 300, 3, 2
2, 300000, 21, webmasters, 230, 1, 1
3, 350000, 20, NA, NA, NA, NA
#the case with id 3 has an SO profile with reputation but it has no other profile in other Stack Exchange site
To run non-trivial queries across databases, based on an initial query:
AccountId
(which is a user's Stack-Exchange-wide Id).Create your initial query to feed that key into a temp table. In this case:
CREATE TABLE #UsersOfInterest (AccountId INT)
INSERT INTO #UsersOfInterest
SELECT u.AccountId
FROM Users u
Where u.Reputation > 200000
Determine the query, to run on each site, that gets the info you want. EG:
SELECT u.AccountId, u.DisplayName, u.Reputation, u.Id
, numQst = (SELECT COUNT(q.Id) FROM Posts q WHERE q.OwnerUserId = u.Id AND q.PostTypeId = 1)
, numAns = (SELECT COUNT(q.Id) FROM Posts q WHERE q.OwnerUserId = u.Id AND q.PostTypeId = 2)
FROM Users u
WHERE u.AccountId = ##seAccntId##
Use a system query to get the appropriate databases. For the Data Explorer (SEDE), a query of this type:
SELECT name
FROM sys.databases
WHERE CASE WHEN state_desc = 'ONLINE'
THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U')
END
IS NOT NULL
Create a cursor on the above query and use it to step through the databases.
For each database:
sp_executesql
.When the cursor is done, perform the final query on the temp table from step 3.
Refer to this other answer, for a working template for querying all of the Stack Exchange sites.
Putting it all together, results in the following query, which you can run live on SEDE:
-- MinMasterSiteRep: User's must have this much rep on whichever site this query is run against
-- MinRep: User's must have this much rep on all other sites
CREATE TABLE #UsersOfInterest (
AccountId INT NOT NULL
, Reputation INT
, UserId INT
, PRIMARY KEY (AccountId)
)
INSERT INTO #UsersOfInterest
SELECT u.AccountId, u.Reputation, u.Id
FROM Users u
Where u.Reputation > ##MinMasterSiteRep:INT?200000##
CREATE TABLE #AllSiteResults (
[Master Rep] INT
, [Mstr UsrId] NVARCHAR(777)
, AccountId NVARCHAR(777)
, [Site name] NVARCHAR(777)
, [Username on site] NVARCHAR(777)
, [Rep] INT
, [# Ans] INT
, [# Qst] INT
)
DECLARE @seDbName AS NVARCHAR(777)
DECLARE @seSiteURL AS NVARCHAR(777)
DECLARE @sitePrettyName AS NVARCHAR(777)
DECLARE @seSiteQuery AS NVARCHAR(max)
DECLARE seSites_crsr CURSOR FOR
WITH dbsAndDomainNames AS (
SELECT dbL.dbName
, STRING_AGG (dbL.domainPieces, '.') AS siteDomain
FROM (
SELECT TOP 50000 -- Never be that many sites and TOP is needed for order by, below
name AS dbName
, value AS domainPieces
, row_number () OVER (ORDER BY (SELECT 0)) AS [rowN]
FROM sys.databases
CROSS APPLY STRING_SPLIT (name, '.')
WHERE CASE WHEN state_desc = 'ONLINE'
THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U') -- Pick a table unique to SE data
END
IS NOT NULL
ORDER BY dbName, [rowN] DESC
) AS dbL
GROUP BY dbL.dbName
)
SELECT REPLACE (REPLACE (dadn.dbName, 'StackExchange.', ''), '.', ' ' ) AS [Site Name]
, dadn.dbName
, CASE -- See https://meta.stackexchange.com/q/215071
WHEN dadn.dbName = 'StackExchange.Mathoverflow.Meta'
THEN 'https://meta.mathoverflow.net/'
-- Some AVP/Audio/Video/Sound kerfuffle?
WHEN dadn.dbName = 'StackExchange.Audio'
THEN 'https://video.stackexchange.com/'
-- Ditto
WHEN dadn.dbName = 'StackExchange.Audio.Meta'
THEN 'https://video.meta.stackexchange.com/'
-- Normal site
ELSE 'https://' + LOWER (siteDomain) + '.com/'
END AS siteURL
FROM dbsAndDomainNames dadn
WHERE (dadn.dbName = 'StackExchange.Meta' OR dadn.dbName NOT LIKE '%Meta%')
-- Step through cursor
OPEN seSites_crsr
FETCH NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
WHILE @@FETCH_STATUS = 0
BEGIN
SET @seSiteQuery = '
USE [' + @seDbName + ']
INSERT INTO #AllSiteResults
SELECT
uoi.Reputation AS [Master Rep]
, ''site://u/'' + CAST(uoi.UserId AS NVARCHAR(88)) + ''|'' + CAST(uoi.UserId AS NVARCHAR(88)) AS [Mstr UsrId]
, [AccountId] = ''https://stackexchange.com/users/'' + CAST(u.AccountId AS NVARCHAR(88)) + ''?tab=accounts|'' + CAST(u.AccountId AS NVARCHAR(88))
, ''' + @sitePrettyName + ''' AS [Site name]
, ''' + @seSiteURL + ''' + ''u/'' + CAST(u.Id AS NVARCHAR(88)) + ''|'' + u.DisplayName AS [Username on site]
, u.Reputation AS [Rep]
, (SELECT COUNT(q.Id) FROM Posts q WHERE q.OwnerUserId = u.Id AND q.PostTypeId = 2) AS [# Ans]
, (SELECT COUNT(q.Id) FROM Posts q WHERE q.OwnerUserId = u.Id AND q.PostTypeId = 1) AS [# Qst]
FROM #UsersOfInterest uoi
INNER JOIN Users u ON uoi.AccountId = u.AccountId
WHERE u.Reputation > ##MinRep:INT?200##
'
EXEC sp_executesql @seSiteQuery
FETCH NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
END
CLOSE seSites_crsr
DEALLOCATE seSites_crsr
SELECT *
FROM #AllSiteResults
ORDER BY [Master Rep] DESC, AccountId, [Rep] DESC
It gives results like:
-- where the blue values are hyperlinked.
Note that a user must have 200 rep on a site for it to be "significant". That's also the rep needed for the site to be included in the Stack Exchange flair.