Saturday, December 7, 2019

SQL - Intersect



INTERSECT (Transact-SQL)
INTERSECT returns distinct rows that are output by both the left and right input queries operator.

INTERSECT ALL operator that is not implemented in SQL Server.
The INTERSECT operator removes the duplicate rows from the final intersected result set values.
The SQL INTERSECT clause/operator is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement.



CREATE TABLE t1
(
       Col1 INT,
       Col2 INT,
       Col3 INT
)
GO

-- Create the 2nd table
CREATE TABLE t2
(
       Col1 INT,
       Col2 INT
)
GO

-- Insert some records into both tables
INSERT INTO t1 VALUES (1, 1, 1), (2, 2, 2), (2, 2, 2), (2, 2, 3), (2, 2, 4), (3, 3, 3) , (4, 4, 4) , (4, 4, 4), (2, 2, 5)
INSERT INTO t2 VALUES (2, 2), (2, 2), (2, 2), (3, 3), (4, 4),  (4, 4), (6, 6), (2, 2)
GO


For table t2, the 2nd table consists of a duplicate record – we able to see the record with the values 2 appears 4 times in the table. When you now perform an INTERSECT between both tables, the records with the values of 2 just appear once in the intersect  result set. The duplicate row was just eliminated from intersect result set!


WITH IntersectRemoveDuplicate AS
(
       SELECT       
              Col1,
              Col2
       FROM t1

       INTERSECT

       SELECT       
              Col1,
              Col2
       FROM t2
)
SELECT Col1, Col2 FROM IntersectRemoveDuplicate
GO

If you want to keep duplicate rows in the table (IntersectAll), you have to make them unique with using the ROW_NUMBER(). With that function you just generate a unique row number for every duplicate record. So,  duplicate records become unique, and the “duplicate” row is returned 4 times as expected.


WITH IntersectAllWithDuplicate AS
(
       SELECT
              ROW_NUMBER() OVER (PARTITION BY Col1, Col2
              ORDER BY (SELECT 0))
              AS RowNumber,
              Col1,
              Col2
       FROM t1

       INTERSECT

       SELECT
              ROW_NUMBER() OVER (PARTITION BY Col1, Col2
              ORDER BY (SELECT 0))
               AS RowNumber,
              Col1,
              Col2
       FROM t2
)
SELECT Col1, Col2, RowNumber FROM IntersectAllWithDuplicate
Order By Col1, Col2
GO


Practical Example :
We want to query from table customer and table vendor to find out who are our customer and vendor as well.

 -- Create the 1st table
CREATE TABLE Vendor
(
       VendorID INT,
       VendorName nvarchar(100), 
       VendorPurchase decimal
)
GO

-- Create the 2nd table
CREATE TABLE Customer
(
       CustomerID INT,
       CustomerName nvarchar(100),
       CustomerOrder decimal
)
GO

-- Insert some records into both tables
INSERT INTO Vendor VALUES (1, 'CompanyA', 100),
                          (2, 'CompanyB', 200),
                          (2, 'CompanyB', 230),
                          (2, 'CompanyB', 300),
                          (2, 'CompanyB', 400),
                          (3, 'CompanyC', 330),
                          (4, 'CompanyD', 450),
                          (4, 'CompanyD', 40),
                          (5, 'CompanyE', 480),
                          (2, 'CompanyB', 500)
INSERT INTO Customer VALUES (2, 'CompanyB', 120),
                            (2, 'CompanyB', 200),
                            (2, 'CompanyB', 250),
                            (3, 'CompanyC', 300),
                            (4, 'CompanyD', 140), 
                            (4, 'CompanyD', 240),
                            (6, 'CompanyF', 600),                                                                        (2, 'CompanyB', 200)
GO


WITH IntersectRemoveDuplicate AS
(
       SELECT       
              VendorID As ID,
              VendorName As Name
       FROM Vendor

       INTERSECT

       SELECT       
              CustomerID As ID,
              CustomerName As Name
       FROM Customer
)
SELECT ID, Name FROM IntersectRemoveDuplicate
GO

Output :
2          CompanyB
3          CompanyC
4          CompanyD


WITH IntersectAllWithDuplicate AS
(
       SELECT
              ROW_NUMBER() OVER (PARTITION BY VendorID, VendorName
              ORDER BY (SELECT 0)) AS RowNumber,
              VendorID As ID,
              VendorName As Name
       FROM Vendor

       INTERSECT

       SELECT
              ROW_NUMBER() OVER (PARTITION BY CustomerID, CustomerName
              ORDER BY (SELECT 0)) AS RowNumber,
              CustomerID As ID,
              CustomerName As Name
       FROM Customer
)
SELECT ID, Name, RowNumber FROM IntersectAllWithDuplicate
Order By ID, Name
GO

Output : 
2              CompanyB           1
2              CompanyB           2
2              CompanyB           3
2              CompanyB           4
2              CompanyB           5
2              CompanyB           6
2              CompanyB           7
2              CompanyB           8
3              CompanyC           1
4              CompanyD           1
4              CompanyD           2
4              CompanyD           3
4              CompanyD           4




No comments:

Post a Comment