设为首页收藏本站

 找回密码
 注册

QQ登录

只需一步,快速开始

BCM 门户 IT世界 应用开发 查看内容

Read CSV Files with double quotes and comma into SQL DB

2012-6-12 13:04| 发布者: Test| 查看: 3008| 评论: 0|原作者: Brad Schulz|来自: SELECT Blog FROM Brad.Schulz

摘要: In this post, I’d like to talk about Comma-Separated Values in Files (i.e. CSV Files).I had a client with a need to read in a CSV file and do some subsequent processing with it. This was a very small ...
Use OPENROWSET(BULK)

One approach is to use OPENROWSET(BULK) instead. If we create a complex XML format file that describes our CSV File contents like so (without the /* … */ of course)…

/*
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="10"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="20"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="20"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="TerritoryID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="TotalSales" xsi:type="SQLDECIMAL"/>
<COLUMN SOURCE="3" NAME="TotalCost" xsi:type="SQLDECIMAL"/>
</ROW>
</BCPFORMAT>
*/
…then we can reference that format file as a parameter in a call to OPENROWSET(BULK):

select TerritoryID
,TotalSales
,TotalCost
from openrowset(bulk 'C:\CSV Files\Territory Data.CSV'
,formatfile='C:\CSV Files\Territory Data.FMT'
,firstrow=2) CSVInput
/*
TerritoryID TotalSales TotalCost
----------- ----------- -----------
1 5767341.98 4746522.45
2 3857163.63 3174445.67
3 4677108.27 3849260.11
4 8351296.74 6873117.22
5 2851419.04 2346717.87
6 6917270.88 5692913.94
7 3899045.69 3208914.61
8 2481039.18 2041895.24
9 1977474.81 1627461.77
10 3514865.91 2892734.64
*/
Note that we had to specify FIRSTROW=2 to skip over the header row in the CSV file that had the column names.

This approach works fine, but it’s kind of a pain to have to set up the format file. For any new CSV File that we wanted to process, we would have to create one of these format files. No thanks.

Another approach… much easier… is to use BULK INSERT, like so:

if object_id('tempdb..#TerrData','U') is not null drop table #TerrData
create table #TerrData
(
TerritoryID int
,TotalSales decimal(12,2)
,TotalCost decimal(12,2)
)

bulk insert #TerrData
from 'C:\CSV Files\Territory Data.CSV'
with (fieldterminator=','
,rowterminator='\n'
,firstrow=2)

select * from #TerrData
/*
TerritoryID TotalSales TotalCost
----------- ----------- -----------
1 5767341.98 4746522.45
2 3857163.63 3174445.67
3 4677108.27 3849260.11
4 8351296.74 6873117.22
5 2851419.04 2346717.87
6 6917270.88 5692913.94
7 3899045.69 3208914.61
8 2481039.18 2041895.24
9 1977474.81 1627461.77
10 3514865.91 2892734.64
*/
It can’t get much more simple than that!

But, of course, there’s a catch…

The BULK INSERT and the OPENROWSET(BULK) approaches simply won’t work in certain cases. Let’s say we had data in Excel that looked like this:

Customer Data in Excel

If we save that as a CSV file, its contents would look like this (without the /* … */):

/*
CustomerID,CompanyName,Street
1,John's Lovely Meat Pies,"2838 ""A"" Street"
2,"Candies ""R"" Us",167 Broad Ave
3,"Fork & Spoon, Inc",28 Grover Blvd
4,"""Cakes,"" She Said","732 Arch St, Suite ""D"""
5,Soup Is On Me,93573 Vista Ave
*/
You can see that any string column that contained a double-quote character or a comma is surrounded (or delimited) with double-quote characters… and the embedded double-quote characters are doubled up.

If we try to read that in, we certainly don’t get what we want:

if object_id('tempdb..#CustData','U') is not null drop table #CustData
create table #CustData
(
CustomerID int
,CompanyName varchar(40)
,Street varchar(40)
)

bulk insert #CustData
from 'C:\CSV Files\Customer Data.CSV'
with (fieldterminator=','
,rowterminator='\n'
,firstrow=2)

select * from #CustData
/*
CustomerID CompanyName Street
---------- ----------------------- ---------------------------------------
1 John's Lovely Meat Pies "2838 ""A"" Street"
2 "Candies ""R"" Us" 167 Broad Ave
3 "Fork & Spoon Inc",28 Grover Blvd
4 """Cakes "" She Said","732 Arch St, Suite ""D"""
5 Soup Is On Me 93573 Vista Ave
*/
We get the surrounding double-quotes and the embedded double double-quotes as part of our data… and the commas within our string columns are interpreted as being column separators.

Yuk!

I guess this is why tab-delimited files were invented. But, interestingly enough, Excel insists on doing the same double-quote delimiting when you save data as a tab-delimited file as well.

So what now?

鲜花

握手

雷人

路过

鸡蛋

相关阅读

最新评论

手机版|小黑屋|BC Morning Website ( Best Deal Inc. 001 )  

GMT-8, 2025-12-13 16:15 , Processed in 0.013313 second(s), 18 queries .

Supported by Best Deal Online X3.5

© 2001-2025 Discuz! Team.

返回顶部