Wednesday, February 19, 2020

SQL - STRING_SPLIT

ProductId
Name
Tags
1
Full-Finger Gloves
clothing,road,touring,bike
2
LL Headset
bike
3
HL Mountain Frame
bike,mountain

SELECT ProductId, Name, value 
FROM Product 
CROSS APPLY STRING_SPLIT(Tags, ','); 

ProductId
Name
value
1
Full-Finger Gloves
clothing
1
Full-Finger Gloves
road
1
Full-Finger Gloves
touring
1
Full-Finger Gloves
bike
2
LL Headset
bike
3
HL Mountain Frame
bike
3
HL Mountain Frame
mountain

Aggregation by values
SELECT value as tag, COUNT(*) AS [Number of articles] 
FROM Product 
    CROSS APPLY STRING_SPLIT(Tags, ','
GROUP BY value 
HAVING COUNT(*) > 2 
ORDER BY COUNT(*) DESC


Search by tag value

To find products with a single tag (clothing):
SQLCopy
SELECT ProductId, Name, Tags 
FROM Product 
WHERE 'clothing' IN (SELECT value FROM STRING_SPLIT(Tags, ',')); 

Find products with two specified tags (clothing and road):
SQLCopy
SELECT ProductId, Name, Tags 
FROM Product 
WHERE EXISTS (SELECT
    FROM STRING_SPLIT(Tags, ','
    WHERE value IN ('clothing', 'road'));

Find rows by list of values

Developers must create a query that finds articles by a list of IDs. They can use following query:
SQLCopy
SELECT ProductId, Name, Tags 
FROM Product 
JOIN STRING_SPLIT('1,2,3',',')
    ON value = ProductId; 

The preceding STRING_SPLIT usage is a replacement for a common anti-pattern. Such an anti-pattern can involve the creation of a dynamic SQL string in the application layer or in Transact-SQL. Or an anti-pattern can be achieved by using the LIKE operator. See the following example SELECT statement:
SQLCopy
SELECT ProductId, Name, Tags 
FROM Product 
WHERE ',1,2,3,' LIKE '%,' + CAST(ProductId AS VARCHAR(20)) + ',%'


Split comma-separated value string
Parse a comma-separated list of values and return all non-empty tokens:
SQLCopy
DECLARE @tags NVARCHAR(400) = 'clothing,road,,touring,bike' 
 
SELECT value 
FROM STRING_SPLIT(@tags, ','
WHERE RTRIM(value) <> '';


Thursday, February 13, 2020

SQL - Order Of Execution Of A Query


Order Of Execution Of A Query

SELECT DISTINCT column, AGG_FUNC(column_or_expression), …
FROM mytable
    JOIN another_table
      ON mytable.column = another_table.column
    WHERE constraint_expression
    GROUP BY column
    HAVING constraint_expression
    ORDER BY column ASC/DESC
    LIMIT count OFFSET COUNT;

Because each part of the query is executed sequentially, it's important to understand the order of execution so that you know what results are accessible where.

Query order of execution
1. FROM and JOINs
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. DISTINCT
7. ORDER BY
8. LIMIT / OFFSET

Conclusion
Not every query needs to have all the parts, it allows developers and data analysts to quickly manipulate data without having to write additional code, just by using the above clauses.

Sunday, February 2, 2020

SQL - ETL


What is ETL?
ETL stands for Extract, Transform and Load, which is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database.
The need to use ETL arises from the fact that in modern computing business data resides in multiple locations and in many incompatible formats. For example, business data might be stored on the file system in various formats (Word docs, PDF, spreadsheets, plain text, etc), or can be stored as email files, or can be kept in a various database servers like MS SQL Server, Oracle and MySQL for example. Handling all this business information efficiently is a great challenge and ETL plays an important role in solving this problem.

Extract, Transform and Load
The ETL process has 3 main steps, which are Extract, Transform and Load.
Extract – The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process.
Transform – Once the data has been extracted and converted in the expected format, it’s time for the next step in the ETL process, which is transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc.
Load – The final ETL step involves loading the transformed data into the destination target, which might be a database or data warehouse.

ETL Tools
IBM (IBM InfoSphere DataStage),
Oracle (Oracle Warehouse Builder)
Microsoft (SQL Server Integration Services - SSIS)

SQL - List Jobs in System For Current Owner


SELECT s.name AS JobName, l.name AS JobOwner
FROM msdb..sysjobs s
LEFT JOIN master.sys.syslogins l ON s.owner_sid = l.sid
WHERE l.name IS NOT NULL
ORDER by l.name



What does “syspolicy_purge_history” do?
To answer this question, we started from SQL SERVER 2008 which introduced a new feature called Policy Management. For example, one could define a policy that says all tables must have a clustered index. Once defined, the policy can be evaluated against targets to check for compliance. Each evaluation of compliance will be recorded in tables in MSDB,
i.syspolicy_policy_execution_history_details_internal,
ii.syspolicy_policy_execution_history_internal, and
iii.syspolicy_policy_category_subscriptions_internal.

The evaluation condition is built on top of object facets, which is predefined by Microsoft.
The purpose of that job, syspolicy_purge_history, is to remove records in those tables to maintain so it does not pile up and take up space unnecessarily.

It has 3 steps:
1.Firstly, check if Policy Based Management System is enabled and exit if it is not.
2.Run the msdb.dbo.sp_syspolicy_purge_history system stored procedure to clear the history
3.Use PowerShell to clear the system health phantom records