设为首页收藏本站

 找回密码
 注册

QQ登录

只需一步,快速开始

BCM 门户 IT世界 应用开发 查看内容

Read CSV Files with double quotes and comma into SQL DB

2012-6-12 13:04| 发布者: Test| 查看: 3011| 评论: 0|原作者: Brad Schulz|来自: SELECT Blog FROM Brad.Schulz

摘要: In this post, I’d like to talk about Comma-Separated Values in Files (i.e. CSV Files).I had a client with a need to read in a CSV file and do some subsequent processing with it. This was a very small ...
A generic way of handling a CSV file in T-SQL

I was stubborn. I wasn’t going to let these limitations stand in my way. I was determined to come up with a fairly generic way of handling a CSV file in T-SQL.

So I wrote a procedure called usp_ProcessCSVFile. It accepts the (fully-qualified) name of a CSV File, appropriately handling the double-quote delimiters (and the embedded commas and double double-quotes) in the data, and it produces rows containing columns of datatype VARCHAR(100). We can just INSERT the procedure results into a destination table and the columns will be converted appropriately.

For example…

if object_id('tempdb..#CustData','U') is not null drop table #CustData
create table #CustData
(
CustomerID int
,CompanyName varchar(40)
,Street varchar(40)
)

insert #CustData
exec usp_ProcessCSVFile 'C:\CSV Files\Customer Data.CSV'

select * from #CustData
/*
CustomerID CompanyName Street
---------- ----------------------- ----------------------
1 John's Lovely Meat Pies 2838 "A" Street
2 Candies "R" Us 167 Broad Ave
3 Fork & Spoon, Inc 28 Grover Blvd
4 "Cakes," She Said 732 Arch St, Suite "D"
5 Soup Is On Me 93573 Vista Ave
*/
The procedure also accepts an optional second parameter (@HeaderRowExists), a bit column that defaults to 1. If your CSV file does not contain a header row, then pass a 0 for this parameter and it will process the first row as actual data instead of a header row.

I would have liked to create a function rather than a procedure to process CSV Files, but that was impossible because I had to use Dynamic SQL in order to handle the filename being in a variable/parameter, because I’m using OPENROWSET(BULK), which, as I mentioned earlier, only accepts string literals.

In short, the procedure reads the contents of the CSV File (using OPENROWSET(BULK)) into a single VARCHAR(MAX) variable called @CSVContents, and then it converts it from comma-separated data into tab-separated data, at the same time taking care of any special string columns that contain double-quotes or commas. It then inserts XML tags (<row></row> and <col></col>) into @CSVContents to designate the rows and columns and then, finally, it CASTs @CSVContents into XML and shreds it… something like this (for a 3-column CSV File):

select [Column1]=XMLNode.value('(col[1]/text())[1]','varchar(100)')
,[Column2]=XMLNode.value('(col[2]/text())[1]','varchar(100)')
,[Column3]=XMLNode.value('(col[3]/text())[1]','varchar(100)')
from (select XMLData=cast(@CSVContents as xml).query('.')) F_XML
cross apply XMLData.nodes('/row') F_Node(XMLNode)
There were a few challenges in making this work, but you can see how I accomplished it in the comments of the code, which you can download from my SkyDrive. (Editor: source displayed bellow)

鲜花

握手

雷人

路过

鸡蛋

相关阅读

最新评论

手机版|小黑屋|BC Morning Website ( Best Deal Inc. 001 )  

GMT-8, 2025-12-13 16:15 , Processed in 0.020669 second(s), 18 queries .

Supported by Best Deal Online X3.5

© 2001-2025 Discuz! Team.

返回顶部