Tuesday, April 14, 2020

SQL - BULK INSERT


BULK INSERT in SQL Server

The Bulk Insert in (shortly called as BCP) will be very helpful to quickly transfer a large amount of data from Text File or CSV file to SQL Server Table or Views.







To transfer the data present in the text file or CSV file to SQL table.

-- Example for BCP in SQL Server
BULK INSERT [SQL Server Tutorials].[dbo].[DimGeography]
      FROM 'F:\MS BI\FILE EXAMPLES\Geography.txt'
  WITH 
    (
       DATAFILETYPE    = 'char',
       FIELDTERMINATOR = ',',
       ROWTERMINATOR   = '\n'
    );


Imports A Data File Into A Database Table Or View In A User-Specified Format In SQL Server

BULK INSERT Sales.Orders
FROM '\\SystemX\DiskZ\Sales\data\orders.dat';

BULK INSERT Sales.Orders
FROM '\\SystemX\DiskZ\Sales\data\orders.csv'
WITH ( FORMAT='CSV');

database_name Is the database name in which the specified table or view resides. If not specified, this is the current database.
schema_name Is the name of the table or view schema. schema_name is optional if the default schema for the user performing the bulk-import operation is schema of the specified table or view. If schema is not specified and the default schema of the user performing the bulk-import operation is different from the specified table or view, SQL Server returns an error message, and the bulk-import operation is canceled.
table_name Is the name of the table or view to bulk import data into. Only views in which all columns refer to the same base table can be used. For more information about the restrictions for loading data into views, see INSERT (Transact-SQL).
' data_file ' Is the full path of the data file that contains data to import into the specified table or view. BULK INSERT can import data from a disk or Azure Blob storage (including network, floppy disk, hard disk, and so on). data_file must specify a valid path from the server on which SQL Server is running. If data_file is a remote file, specify the Universal Naming Convention (UNC) name. A UNC name has the form \\Systemname\ShareName\Path\FileName. For example:
' data_source_name ' Applies to: SQL Server 2017 (14.x) CTP 1.1 and Azure SQL Database. Is a named external data source pointing to the Azure Blob storage location of the file that will be imported. The external data source must be created using the TYPE = BLOB_STORAGE option added in SQL Server 2017 (14.x) CTP 1.1. 
BATCHSIZE =batch_size Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch. For information about performance considerations, see "Remarks," later in this topic.
CHECK_CONSTRAINTS Specifies that all constraints on the target table or view must be checked during the bulk-import operation. Without the CHECK_CONSTRAINTS option, any CHECK and FOREIGN KEY constraints are ignored, and after the operation, the constraint on the table is marked as not-trusted.
At some point, you must examine the constraints on the whole table. If the table was non-empty before the bulk-import operation, the cost of revalidating the constraint may exceed the cost of applying CHECK constraints to the incremental data.
A situation in which you might want constraints disabled (the default behavior) is if the input data contains rows that violate constraints. With CHECK constraints disabled, you can import the data and then use Transact-SQL statements to remove the invalid data.
CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } Specifies the code page of the data in the data file. CODEPAGE is relevant only if the data contains charvarchar, or text columns with character values greater than 127 or less than 32. For an example, see Specifying a code page.
DATAFILETYPE = { 'char' | 'native' | 'widechar' | 'widenative' } Specifies that BULK INSERT performs the import operation using the specified data-file type value.
ERRORFILE ='file_name' Specifies the file used to collect rows that have formatting errors and cannot be converted to an OLE DB rowset. These rows are copied into this error file from the data file "as is."
The error file is created when the command is executed. An error occurs if the file already exists. Additionally, a control file that has the extension .ERROR.txt is created. This references each row in the error file and provides error diagnostics. As soon as the errors have been corrected, the data can be loaded. Applies to: SQL Server 2017 (14.x) CTP 1.1. Beginning with SQL Server 2017 (14.x), the error_file_path can be in Azure blob storage.
'errorfile_data_source_name' Applies to: SQL Server 2017 (14.x) CTP 1.1. Is a named external data source pointing to the Azure Blob storage location of the error file that will contain errors found during the import. The external data source must be created using the TYPE = BLOB_STORAGE option added in SQL Server 2017 (14.x) CTP 1.1. For more information, see CREATE EXTERNAL DATA SOURCE.
FIRSTROW = first_row Specifies the number of the first row to load. The default is the first row in the specified data file. FIRSTROW is 1-based.
FIRE_TRIGGERS Specifies that any insert triggers defined on the destination table execute during the bulk-import operation. If triggers are defined for INSERT operations on the target table, they are fired for every completed batch.
If FIRE_TRIGGERS is not specified, no insert triggers execute.
FORMATFILE_DATA_SOURCE = 'data_source_name' Applies to: SQL Server 2017 (14.x) 1.1. Is a named external data source pointing to the Azure Blob storage location of the format file that will define the schema of imported data. The external data source must be created using the TYPE = BLOB_STORAGE option added in SQL Server 2017 (14.x) CTP 1.1. For more information, see CREATE EXTERNAL DATA SOURCE.
KEEPIDENTITY Specifies that identity value or values in the imported data file are to be used for the identity column. If KEEPIDENTITY is not specified, the identity values for this column are verified but not imported and SQL Server automatically assigns unique values based on the seed and increment values specified during table creation. If the data file does not contain values for the identity column in the table or view, use a format file to specify that the identity column in the table or view is to be skipped when importing data; SQL Server automatically assigns unique values for the column. For more information, see DBCC CHECKIDENT (Transact-SQL).
KEEPNULLS Specifies that empty columns should retain a null value during the bulk-import operation, instead of having any default values for the columns inserted. 
KILOBYTES_PER_BATCH = kilobytes_per_batch Specifies the approximate number of kilobytes (KB) of data per batch as kilobytes_per_batch. By default, KILOBYTES_PER_BATCH is unknown. For information about performance considerations, see "Remarks," later in this topic.
LASTROW = last_row Specifies the number of the last row to load. The default is 0, which indicates the last row in the specified data file.
MAXERRORS = max_errors Specifies the maximum number of syntax errors allowed in the data before the bulk-import operation is canceled. Each row that cannot be imported by the bulk-import operation is ignored and counted as one error. If max_errors is not specified, the default is 10.
ORDER ( { column [ ASC | DESC ] } [ ,... n ] ) Specifies how the data in the data file is sorted. Bulk import performance is improved if the data being imported is sorted according to the clustered index on the table, if any. If the data file is sorted in a different order, that is other than the order of a clustered index key or if there is no clustered index on the table, the ORDER clause is ignored. The column names supplied must be valid column names in the destination table. By default, the bulk insert operation assumes the data file is unordered. For optimized bulk import, SQL Server also validates that the imported data is sorted.
n Is a placeholder that indicates that multiple columns can be specified.
ROWS_PER_BATCH = rows_per_batch Indicates the approximate number of rows of data in the data file.
By default, all the data in the data file is sent to the server as a single transaction, and the number of rows in the batch is unknown to the query optimizer. If you specify ROWS_PER_BATCH (with a value > 0) the server uses this value to optimize the bulk-import operation. The value specified for ROWS_PER_BATCH should approximately the same as the actual number of rows. For information about performance considerations, see "Remarks," later in this topic.
TABLOCK Specifies that a table-level lock is acquired for the duration of the bulk-import operation. A table can be loaded concurrently by multiple clients if the table has no indexes and TABLOCK is specified. By default, locking behavior is determined by the table option table lock on bulk load. Holding a lock for the duration of the bulk-import operation reduces lock contention on the table, in some cases can significantly improve performance. For information about performance considerations, see "Remarks," later in this topic.
For columnstore index. the locking behavior is different because it is internally divided into multiple rowsets. Each thread loads data exclusively into each rowset by taking a X lock on the rowset allowing parallel data load with concurrent data load sessions. The use of TABLOCK option will cause thread to take an X lock on the table (unlike BU lock for traditional rowsets) which will prevent other concurrent threads to load data concurrently.
A. Using pipes to import data from a file
BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
   FROM 'f:\orders\lineitem.tbl'
   WITH
      ( 
         FIELDTERMINATOR =' |'
         , ROWTERMINATOR =' |\n'
      );

B. Using the FIRE_TRIGGERS argument

BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
   FROM 'f:\orders\lineitem.tbl'
   WITH
     (
         FIELDTERMINATOR =' |'
         , ROWTERMINATOR = ':\n'
         , FIRE_TRIGGERS
      );

C. Using line feed as a row terminator

DECLARE @bulk_cmd varchar(1000);
SET @bulk_cmd = 'BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')';
EXEC(@bulk_cmd);

D. Specifying a code page

BULK INSERT MyTable
FROM 'D:\data.csv'
WITH
( CODEPAGE = '65001'
   , DATAFILETYPE = 'char'
   , FIELDTERMINATOR = ','
);

E. Importing data from a CSV file

BULK INSERT Sales.Invoices
FROM '\\share\invoices\inv-2016-07-25.csv'
WITH (FORMAT = 'CSV'
      , FIRSTROW=2
      , FIELDQUOTE = '\'
      , FIELDTERMINATOR = ';'
      , ROWTERMINATOR = '0x0a');

F. Importing data from a file in Azure blob storage

--> Optional - a MASTER KEY is not required if a DATABASE SCOPED CREDENTIAL is not required because the blob is configured for public (anonymous) access!
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
GO
--> Optional - a DATABASE SCOPED CREDENTIAL is not required because the blob is configured for public (anonymous) access!
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
 WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
 SECRET = '******srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z***************';

 -- NOTE: Make sure that you don't have a leading ? in SAS token, and
 -- that you have at least read permission on the object that should be loaded srt=o&sp=r, and
 -- that expiration period is valid (all dates are in UTC time)

CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
          LOCATION = 'https://****************.blob.core.windows.net/invoices'
          , CREDENTIAL= MyAzureBlobStorageCredential --> CREDENTIAL is not required if a blob is configured for public (anonymous) access!
);

BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage');


--> Optional - a MASTER KEY is not required if a DATABASE SCOPED CREDENTIAL is not required because the blob is configured for public (anonymous) access!
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'YourStrongPassword1';
GO
--> Change to using Managed Identity instead of SAS key
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Identity';
GO
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
          LOCATION = 'https://****************.blob.core.windows.net/curriculum'
          , CREDENTIAL= msi_cred --> CREDENTIAL is not required if a blob is configured for public (anonymous) access!
);

BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage');



No comments:

Post a Comment