O'zaro o'zaro munosabatlarni toping

Stack Overflow-dagi ikkita alohida foydalanuvchi bir-biridan javob qabul qilgan Stack Exchange Data Explorer (SEDE) yordamida vaziyatni topishga harakat qilaman. Masalan:

Post A { Id: 1, OwnerUserId: "user1", AcceptedAnswerId: "user2" }

va

Post B { Id: 2, OwnerUserId: "user2", AcceptedAnswerId: "user1" }

Hozirda so'rovchi-javob beruvchilardan ortiq savolda hamkorlikda bo'lgan ga ega bo'lgan ikkita foydalanuvchini topadigan so'rov mavjud, lekin u o'zaro aloqaning o'zaro bog'liqligini aniqlamaydi:

SELECT user1.Id AS User_1, user2.Id AS User_2
FROM Posts p
INNER JOIN Users user1 ON p.OwnerUserId = user1.Id
INNER JOIN Posts p2 ON p.AcceptedAnswerId = p2.Id
INNER JOIN Users user2 ON p2.OwnerUserId = user2.Id
WHERE p.OwnerUserId <> p2.OwnerUserId
va p.OwnerUserId IS NOT NULL
va p2.OwnerUserId IS NOT NULL
va user1.Id <> user2.Id
GROUP BY user1.Id, user2.Id HAVING COUNT(*) > 1;

Diagramma bilan tanish bo'lmagan har bir kishi uchun quyidagi ikkita jadval mavjud:

Posts
--------------------------------------
Id                      int
PostTypeId              tinyint
AcceptedAnswerId        int
ParentId                int
CreationDate            datetime
DeletionDate            datetime
Score                   int
ViewCount               int
Body                    nvarchar (max)
OwnerUserId             int
OwnerDisplayName        nvarchar (40)
LastEditorUserId        int
LastEditorDisplayName   nvarchar (40)
LastEditDate            datetime
LastActivityDate        datetime
Title                   nvarchar (250)
Tags                    nvarchar (250)
AnswerCount             int
CommentCount            int
FavoriteCount           int
ClosedDate              datetime
CommunityOwnedDate      datetime

va

Users
--------------------------------------
Id                      int
Reputation              int
CreationDate            datetime
DisplayName             nvarchar (40)
LastAccessDate          datetime
WebsiteUrl              nvarchar (200)
Location                nvarchar (100)
AboutMe                 nvarchar (max)
Views                   int
UpVotes                 int
DownVotes               int
ProfileImageUrl         nvarchar (200)
EmailHash               varchar (32)
AccountId               int
3

5 javoblar

Sorgulaması eng oddiy shaklida (16M savolga javob bera olmaydi) shunday bo'ladi:

WITH accepter_acceptee(a, b) AS (
    SELECT q.OwnerUserId, a.OwnerUserId
    FROM Posts AS q
    INNER JOIN Posts AS a ON q.AcceptedAnswerId = a.Id
    WHERE q.PostTypeId = 1 AND q.OwnerUserId <> a.OwnerUserId
), collaborations(a, b, type) AS (
    SELECT a, b, 'a accepter b' FROM accepter_acceptee
    UNION ALL
    SELECT b, a, 'a acceptee b' FROM accepter_acceptee
)
SELECT a, b, COUNT(*) AS [collaboration count]
FROM collaborations
GROUP BY a, b
HAVING COUNT(DISTINCT type) = 2
ORDER BY a, b

Natija:

1
qo'shib qo'ydi
Natijalar ishonchli ko'rinadi.
qo'shib qo'ydi muallif Brock Adams, manba

Mana, men bu haqda qayoqqa ketmoqchi edim. Ba'zi soddalashtirilgan ma'lumotlar:

if object_id('tempdb.dbo.#Posts') is not null drop table #Posts
create table #Posts
(
    PostId char(1),
    OwnerUserId int,
    AcceptedAnswerUserId int
)

insert into #Posts
values
('A', 1, 2),
('B', 2, 1),
('C', 2, 3),
('D', 2, 4),
('E', 3, 1),
('F', 4, 1)

Bizning maqsadlarimiz uchun PostId ga chindan ham ahamiyat bermaymiz va boshlang'ich nuqtasi sifatida biz egalik qiluvchi egri juftliklar majmui ( OwnerUserId ) va qabul qilingan javoblar ( AcceptedAnswerUserId ).

(Zarur bo'lmasa ham, siz shunga o'xshash to'plamni tasavvur qilishingiz mumkin)

select distinct OwnerUserId, AcceptedAnswerUserId
from #Posts

Keling, biz ushbu ikkita maydonni o'zgartirib yuborgan ushbu to'plamdagi barcha yozuvlarni topishni istaymiz. Ie. agar egalik qiluvchi boshqa birining qabul qiluvchisi bo'lsa. Demak, juftlik qaerda (1, 2), biz topishni istaymiz (2, 1).

Men buni chap qo'shilish bilan qildim, shuning uchun siz chiqadigan satrlarni ko'rasiz, lekin uni ichki qo'shilishga o'zgartirgandan keyin uni siz tasvirlagan to'plamga cheklab qo'yasiz. Siz xohlagan ma'lumotni yig'ib olishingiz mumkin (yoki biron bir ustunni papkadan chiqarib tashlang yoki bir satrda xohlasangiz, har ikkala ustunni jadvallardan biriga qaytaring).

select 
    u1.OwnerUserId, 
    u1.AcceptedAnswerUserId, 
    u2.OwnerUserId, 
    u2.AcceptedAnswerUserId
from #Posts u1
left outer join #Posts u2
    on u1.AcceptedAnswerUserId = u2.OwnerUserId
        and u1.OwnerUserId = u2.AcceptedAnswerUserId

Edit If you want to exclude self answers, just add and u1.AcceptedAnswerUserId != u1.OwnerUserId to the on clause.

Shaxsiy eslatmaimga ko'ra, men har doim o'zimning chuqur ildiz otgan SQL va Relational Algebra ning set nazariyasi ichida kulgili ekanini topdim va shunga o'xshash SQL-da shunga o'xshash operatsiyalarni bajarish juda jozibali bo'lib tuyuladi. Ko'pincha buyurtma etishmasligini saqlab qolish uchun, siz bitta guruhga a'zolarni vakili qilishingiz kerak. Ammo keyin SQL-dagi to'siq a'zolar bilan taqqoslash uchun siz guruhlar a'zolarini alohida ustunlar sifatida ko'rsatishingiz kerak.

Endi bu haqda o'ylab ko'ring, uni bir xil xabarni sharhlagan foydalanuvchilarning uchburchagiga qanday uzatasiz?

1
qo'shib qo'ydi

ETA: Oops. Savolni noto'g'ri tushunish; Op qabul qilingan javoblarini istaydi va quyidagi har qanday o'zaro javoblar uchun. (Modifikatsiya qilish oson, lekin baribir men bu bilan ko'proq qiziqaman.)


Juda katta ma'lumotlar majmuasi (va SEDE-ni takrorlamaslik kerakligi) tufayli, AMAP setlarini cheklashni va u erdan qurishni tanladim.

Shunday qilib, bu so'rov:

  1. Only returns any rows if there is a reciprocal relationship.
  2. Returns all such Q&A pairs.
  3. Excludes self answers.
  4. Leverages SEDE's query parameters and magic columns for usability.

See it live in SEDE.

-- UserA: Enter ID of user A
-- UserB: Enter ID of user B
WITH possibleAnswers AS (
    SELECT
                a.Id                AS aId
                , a.ParentId        AS qId
                , a.OwnerUserId   
                , a.CreationDate
    FROM        Posts a
    WHERE       a.PostTypeId        = 2  --  answers
    AND         a.OwnerUserId       IN (##UserA:INT##, ##UserB:INT##)
),
possibleQuestions AS (
    SELECT
                q.Id                AS qId
                , q.OwnerUserId   
                , q.Tags
    FROM        Posts q
    INNER JOIN  possibleAnswers pa  ON q.Id = pa.qId
    WHERE       q.PostTypeId        = 1  --  questions
    AND         q.OwnerUserId       IN (##UserA:INT##, ##UserB:INT##)
    AND         q.OwnerUserId       != pa.OwnerUserId  --  No self answers
)
SELECT 
            pa.OwnerUserId          AS [User Link]
            , 'answers'             AS [Action]
            , pq.OwnerUserId        AS [User Link]
            , pa.CreationDate       AS [at]
            , pq.qId                AS [Post Link]
            , pq.Tags
FROM        possibleQuestions pq
INNER JOIN  possibleAnswers pa      ON pq.qId = pa.qId
WHERE       pq.OwnerUserId          =  ##UserB:INT##
AND         EXISTS (SELECT * FROM possibleQuestions pq2  WHERE pq2.OwnerUserId =  ##UserA:INT##)

UNION ALL SELECT 
            pa.OwnerUserId          AS [User Link]
            , 'answers'             AS [Action]
            , pq.OwnerUserId        AS [User Link]
            , pa.CreationDate       AS [at]
            , pq.qId                AS [Post Link]
            , pq.Tags
FROM        possibleQuestions pq
INNER JOIN  possibleAnswers pa      ON pq.qId = pa.qId
WHERE       pq.OwnerUserId          =  ##UserA:INT##
AND         EXISTS (SELECT * FROM possibleQuestions pq2  WHERE pq2.OwnerUserId =  ##UserB:INT##)

ORDER BY    pa.CreationDate

Natijalarni ishlab chiqaradi (Katta ko'rish uchun bosing):

results


Barcha foydalanuvchilar juftlari ro'yxati uchun bu SEDE so'rov .

0
qo'shib qo'ydi

Texnikani Salman A ning javobidan foydalanib, tartiblash yaxshilandi va bir necha foydali ustunlarni qo'shdi.

mening boshqa javobim dagi so'rovlar bilan birgalikda ba'zi qiziqarli munosabatlarni ko'rsatadi.

Uni SEDEda ko'ring.

WITH QandA_users AS (
    SELECT      q.OwnerUserId   AS userQ
                , a.OwnerUserId AS userA
    FROM        Posts q
    INNER JOIN  Posts a         ON q.AcceptedAnswerId = a.Id
    WHERE       q.PostTypeId    = 1
),
pairsUnion (user1, user2, whoAnswered) AS (
    SELECT  userQ, userA, 'usr 2 answered'
    FROM    QandA_users
    WHERE   userQ <> userA
    UNION ALL
    SELECT  userA, userQ, 'usr 1 answered'
    FROM    QandA_users
    WHERE   userQ <> userA
),
collaborators AS (
    SELECT      user1, user2, COUNT(*) AS [Reciprocations]
    FROM        pairsUnion
    GROUP BY    user1, user2
    HAVING COUNT (DISTINCT whoAnswered) > 1
)
SELECT
            'site://u/' + CAST(c.user1 AS NVARCHAR) + '|Usr ' + u1.DisplayName      AS [User 1]
            , 'site://u/' + CAST(c.user2 AS NVARCHAR) + '|Usr ' + u2.DisplayName    AS [User 2]
            , c.Reciprocations                                                      AS [Reciprocal Accptd posts]
            , (SELECT COUNT(*)  FROM QandA_users qau  WHERE qau.userQ = c.user1)    AS [Usr 1 Qstns wt Accptd]
            , (SELECT COUNT(*)  FROM QandA_users qau  WHERE qau.userQ = c.user1  AND qau.userA = c.user2) AS [Accptd Ansr by Usr 2]
            , (SELECT COUNT(*)  FROM QandA_users qau  WHERE qau.userA = c.user2)    AS [Usr 2 Ttl Accptd Answrs]
FROM        collaborators c
INNER JOIN  Users u1        ON u1.Id = c.user1
INNER JOIN  Users u2        ON u2.Id = c.user2
ORDER BY    c.Reciprocations DESC
            , u1.DisplayName
            , u2.DisplayName

Natijalar:

results

0
qo'shib qo'ydi

Bir CTE va oddiy ichki qo'shimchalar ishni bajaradi. Men boshqa javoblarda ko'rganimdagina juda ko'p kodlarga ehtiyoj yo'q. Menda bir nechta fikrlarni yozib qo'ying.

StackExchange Data Explorer sahifasiga havola ni saqlab qoling

with questions as ( -- this is needed so that we have ids of users asking and answering
select
   p1.owneruserid as question_userid
 , p2.owneruserid as answer_userid
 --, p1.id -- to view sample ids
from posts p1
inner join posts p2 on -- to fetch answer post
  p1.acceptedanswerid = p2.id
)
select distinct -- unique pairs
    q1.question_userid as userid1
  , q1.answer_userid as userid2
  --, q1.id, q2.id -- to view sample ids
from questions q1
inner join questions q2 on
      q1.question_userid = q2.answer_userid -- accepted answer from someone
  and q1.answer_userid = q2.question_userid -- who also accepted our answer
  and q1.question_userid <> q1.answer_userid -- and we aren't self-accepting

Bu misollarni keltirib chiqaradi:

Shunga qaramay, StackExchange sizga katta ma'lumotlar majmui va alohida qismi tufayli sizni takrorlashi mumkin. Agar ba'zi ma'lumotlarni ko'rishni xohlasangiz, different ni olib tashlang va boshida top N ni qo'shing:

with questions as (
...
)
select top 3 ...
0
qo'shib qo'ydi