Read CSV Files with double quotes and comma into SQL DB

2012-6-12 13:04| 发布者: Test| 查看: 3011| 评论: 0|原作者: Brad Schulz|来自: SELECT Blog FROM Brad.Schulz

摘要: In this post, I’d like to talk about Comma-Separated Values in Files (i.e. CSV Files).I had a client with a need to read in a CSV file and do some subsequent processing with it. This was a very small ...

A generic way of handling a CSV file in T-SQL

I was stubborn. I wasn’t going to let these limitations stand in my way. I was determined to come up with a fairly generic way of handling a CSV file in T-SQL.

So I wrote a procedure called usp_ProcessCSVFile. It accepts the (fully-qualified) name of a CSV File, appropriately handling the double-quote delimiters (and the embedded commas and double double-quotes) in the data, and it produces rows containing columns of datatype VARCHAR(100). We can just INSERT the procedure results into a destination table and the columns will be converted appropriately.

For example…

if object_id('tempdb..#CustData','U') is not null drop table #CustData
create table #CustData 
(
   CustomerID int
  ,CompanyName varchar(40)
  ,Street varchar(40)
)
  
insert #CustData 
exec usp_ProcessCSVFile 'C:\CSV Files\Customer Data.CSV'
    
select * from #CustData
/*
CustomerID CompanyName             Street
---------- ----------------------- ----------------------
         1 John's Lovely Meat Pies 2838 "A" Street
         2 Candies "R" Us          167 Broad Ave
         3 Fork & Spoon, Inc       28 Grover Blvd
         4 "Cakes," She Said       732 Arch St, Suite "D"
         5 Soup Is On Me           93573 Vista Ave
*/

The procedure also accepts an optional second parameter (@HeaderRowExists), a bit column that defaults to 1. If your CSV file does not contain a header row, then pass a 0 for this parameter and it will process the first row as actual data instead of a header row.

I would have liked to create a function rather than a procedure to process CSV Files, but that was impossible because I had to use Dynamic SQL in order to handle the filename being in a variable/parameter, because I’m using OPENROWSET(BULK), which, as I mentioned earlier, only accepts string literals.

In short, the procedure reads the contents of the CSV File (using OPENROWSET(BULK)) into a single VARCHAR(MAX) variable called @CSVContents, and then it converts it from comma-separated data into tab-separated data, at the same time taking care of any special string columns that contain double-quotes or commas. It then inserts XML tags (<row></row> and <col></col>) into @CSVContents to designate the rows and columns and then, finally, it CASTs @CSVContents into XML and shreds it… something like this (for a 3-column CSV File):

select [Column1]=XMLNode.value('(col[1]/text())[1]','varchar(100)')
      ,[Column2]=XMLNode.value('(col[2]/text())[1]','varchar(100)')
      ,[Column3]=XMLNode.value('(col[3]/text())[1]','varchar(100)')
from (select XMLData=cast(@CSVContents as xml).query('.')) F_XML
cross apply XMLData.nodes('/row') F_Node(XMLNode)

There were a few challenges in making this work, but you can see how I accomplished it in the comments of the code, which you can download from my SkyDrive. (Editor: source displayed bellow)

1 234 5 / 5 页下一页