Use OPENROWSET(BULK)
One
approach is to use OPENROWSET(BULK) instead. If we create a complex
XML format file that describes our CSV File contents like so (without
the /* … */ of course)…
/* <?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="10"/> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="20"/> <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="20"/> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="TerritoryID" xsi:type="SQLINT"/> <COLUMN SOURCE="2" NAME="TotalSales" xsi:type="SQLDECIMAL"/> <COLUMN SOURCE="3" NAME="TotalCost" xsi:type="SQLDECIMAL"/> </ROW> </BCPFORMAT> */
…then we can reference that format file as a parameter in a call to OPENROWSET(BULK):
select TerritoryID ,TotalSales ,TotalCost from openrowset(bulk 'C:\CSV Files\Territory Data.CSV' ,formatfile='C:\CSV Files\Territory Data.FMT' ,firstrow=2) CSVInput /* TerritoryID TotalSales TotalCost ----------- ----------- ----------- 1 5767341.98 4746522.45 2 3857163.63 3174445.67 3 4677108.27 3849260.11 4 8351296.74 6873117.22 5 2851419.04 2346717.87 6 6917270.88 5692913.94 7 3899045.69 3208914.61 8 2481039.18 2041895.24 9 1977474.81 1627461.77 10 3514865.91 2892734.64 */
Note that we had to specify FIRSTROW=2 to skip over the header row in the CSV file that had the column names.
This
approach works fine, but it’s kind of a pain to have to set up the
format file. For any new CSV File that we wanted to process, we would
have to create one of these format files. No thanks.
Another approach… much easier… is to use BULK INSERT, like so:
if object_id('tempdb..#TerrData','U') is not null drop table #TerrData create table #TerrData ( TerritoryID int ,TotalSales decimal(12,2) ,TotalCost decimal(12,2) ) bulk insert #TerrData from 'C:\CSV Files\Territory Data.CSV' with (fieldterminator=',' ,rowterminator='\n' ,firstrow=2) select * from #TerrData /* TerritoryID TotalSales TotalCost ----------- ----------- ----------- 1 5767341.98 4746522.45 2 3857163.63 3174445.67 3 4677108.27 3849260.11 4 8351296.74 6873117.22 5 2851419.04 2346717.87 6 6917270.88 5692913.94 7 3899045.69 3208914.61 8 2481039.18 2041895.24 9 1977474.81 1627461.77 10 3514865.91 2892734.64 */
It can’t get much more simple than that!
But, of course, there’s a catch…
The
BULK INSERT and the OPENROWSET(BULK) approaches simply won’t work in
certain cases. Let’s say we had data in Excel that looked like this:

If we save that as a CSV file, its contents would look like this (without the /* … */):
/* CustomerID,CompanyName,Street 1,John's Lovely Meat Pies,"2838 ""A"" Street" 2,"Candies ""R"" Us",167 Broad Ave 3,"Fork & Spoon, Inc",28 Grover Blvd 4,"""Cakes,"" She Said","732 Arch St, Suite ""D""" 5,Soup Is On Me,93573 Vista Ave */
You
can see that any string column that contained a double-quote character
or a comma is surrounded (or delimited) with double-quote characters…
and the embedded double-quote characters are doubled up.
If we try to read that in, we certainly don’t get what we want:
if object_id('tempdb..#CustData','U') is not null drop table #CustData create table #CustData ( CustomerID int ,CompanyName varchar(40) ,Street varchar(40) ) bulk insert #CustData from 'C:\CSV Files\Customer Data.CSV' with (fieldterminator=',' ,rowterminator='\n' ,firstrow=2) select * from #CustData /* CustomerID CompanyName Street ---------- ----------------------- --------------------------------------- 1 John's Lovely Meat Pies "2838 ""A"" Street" 2 "Candies ""R"" Us" 167 Broad Ave 3 "Fork & Spoon Inc",28 Grover Blvd 4 """Cakes "" She Said","732 Arch St, Suite ""D""" 5 Soup Is On Me 93573 Vista Ave */
We
get the surrounding double-quotes and the embedded double double-quotes
as part of our data… and the commas within our string columns are
interpreted as being column separators.
Yuk!
I guess this
is why tab-delimited files were invented. But, interestingly enough,
Excel insists on doing the same double-quote delimiting when you save
data as a tab-delimited file as well.
So what now?
|