SQL Bribery: February 2020

Wednesday, February 19, 2020

SQL - STRING_SPLIT

ProductId	Name	Tags
1	Full-Finger Gloves	clothing,road,touring,bike
2	LL Headset	bike
3	HL Mountain Frame	bike,mountain

SELECT ProductId, Name, value

FROM Product

CROSS APPLY STRING_SPLIT(Tags, ',');

ProductId	Name	value
1	Full-Finger Gloves	clothing
1	Full-Finger Gloves	road
1	Full-Finger Gloves	touring
1	Full-Finger Gloves	bike
2	LL Headset	bike
3	HL Mountain Frame	bike
3	HL Mountain Frame	mountain

Aggregation by values

SELECT value as tag, COUNT(*) AS [Number of articles]

FROM Product

CROSS APPLY STRING_SPLIT(Tags, ',')

GROUP BY value

HAVING COUNT(*) > 2

ORDER BY COUNT(*) DESC;

Search by tag value

To find products with a single tag (clothing):

SQLCopy

SELECT ProductId, Name, Tags

FROM Product

WHERE 'clothing' IN (SELECT value FROM STRING_SPLIT(Tags, ','));

Find products with two specified tags (clothing and road):

SQLCopy

SELECT ProductId, Name, Tags

FROM Product

WHERE EXISTS (SELECT *

FROM STRING_SPLIT(Tags, ',')

WHERE value IN ('clothing', 'road'));

Find rows by list of values

Developers must create a query that finds articles by a list of IDs. They can use following query:

SQLCopy

SELECT ProductId, Name, Tags

FROM Product

JOIN STRING_SPLIT('1,2,3',',')

ON value = ProductId;

The preceding STRING_SPLIT usage is a replacement for a common anti-pattern. Such an anti-pattern can involve the creation of a dynamic SQL string in the application layer or in Transact-SQL. Or an anti-pattern can be achieved by using the LIKE operator. See the following example SELECT statement:

SQLCopy

SELECT ProductId, Name, Tags

FROM Product

WHERE ',1,2,3,' LIKE '%,' + CAST(ProductId AS VARCHAR(20)) + ',%';

Split comma-separated value string

Parse a comma-separated list of values and return all non-empty tokens:

SQLCopy

DECLARE @tags NVARCHAR(400) = 'clothing,road,,touring,bike'

SELECT value

FROM STRING_SPLIT(@tags, ',')

WHERE RTRIM(value) <> '';

Thursday, February 13, 2020

SQL - Order Of Execution Of A Query

Order Of Execution Of A Query

SELECT DISTINCT column, AGG_FUNC(column_or_expression), …

FROM mytable

JOIN another_table

ON mytable.column = another_table.column

WHERE constraint_expression

GROUP BY column

HAVING constraint_expression

ORDER BY column ASC/DESC

LIMIT count OFFSET COUNT;

Because each part of the query is executed sequentially, it's important to understand the order of execution so that you know what results are accessible where.

Query order of execution

1. FROM and JOINs

2. WHERE

3. GROUP BY

4. HAVING

5. SELECT

6. DISTINCT

7. ORDER BY

8. LIMIT / OFFSET

Conclusion

Not every query needs to have all the parts, it allows developers and data analysts to quickly manipulate data without having to write additional code, just by using the above clauses.

Sunday, February 2, 2020

SQL - ETL

What is ETL?

ETL stands for Extract, Transform and Load, which is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database.

The need to use ETL arises from the fact that in modern computing business data resides in multiple locations and in many incompatible formats. For example, business data might be stored on the file system in various formats (Word docs, PDF, spreadsheets, plain text, etc), or can be stored as email files, or can be kept in a various database servers like MS SQL Server, Oracle and MySQL for example. Handling all this business information efficiently is a great challenge and ETL plays an important role in solving this problem.

Extract, Transform and Load

The ETL process has 3 main steps, which are Extract, Transform and Load.

Extract – The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process.

Transform – Once the data has been extracted and converted in the expected format, it’s time for the next step in the ETL process, which is transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc.

Load – The final ETL step involves loading the transformed data into the destination target, which might be a database or data warehouse.

ETL Tools

IBM (IBM InfoSphere DataStage),

Oracle (Oracle Warehouse Builder)

Microsoft (SQL Server Integration Services - SSIS)

SQL - List Jobs in System For Current Owner

SELECT s.name AS JobName, l.name AS JobOwner

FROM msdb..sysjobs s

LEFT JOIN master.sys.syslogins l ON s.owner_sid = l.sid

WHERE l.name IS NOT NULL

ORDER by l.name

What does “syspolicy_purge_history” do?

To answer this question, we started from SQL SERVER 2008 which introduced a new feature called Policy Management. For example, one could define a policy that says all tables must have a clustered index. Once defined, the policy can be evaluated against targets to check for compliance. Each evaluation of compliance will be recorded in tables in MSDB,

i.syspolicy_policy_execution_history_details_internal,

ii.syspolicy_policy_execution_history_internal, and

iii.syspolicy_policy_category_subscriptions_internal.

The evaluation condition is built on top of object facets, which is predefined by Microsoft.

The purpose of that job, syspolicy_purge_history, is to remove records in those tables to maintain so it does not pile up and take up space unnecessarily.

It has 3 steps:

1.Firstly, check if Policy Based Management System is enabled and exit if it is not.

2.Run the msdb.dbo.sp_syspolicy_purge_history system stored procedure to clear the history

3.Use PowerShell to clear the system health phantom records