|
楼主 |
发表于 2012-5-22 10:02:21
|
显示全部楼层
本帖最后由 demo 于 2012-5-28 23:14 编辑
Part 2. Text Parsing
The next point is extraction of strings from a given text by using a Regular Expression pattern.
This is the task of getting a table, i.e., the result of function execution is a table with some data. Let it be the start index in the text, the length, and the value of the result string. Altogether, there are three columns in the result table. The CLR represents a streaming model for table-valued functions which ensures that results can be consumed immediately after the first row is available, instead of waiting for the entire table to be populated. It gives a lot of advantages for the end user and for performance, and it, of course, causes additional complexity in the implementation. User-defined table-valued functions require the implementation of two public static methods: one is the master method that is called by MS SQL Server and returns an objects enumeration (IEnumerable), and the other is the secondary method that is called by the first to fill the table rows. All will be clear if we look at the code below:
- /// <summary>
- /// Class that allows to support regular expressions
- /// in MS SQL Server 2005/2008
- /// </summary>
- public partial class SqlRegularExpressions
- {
- // this is place of Like() method
-
- /// <summary>
- /// Gets matches from text using pattern
- /// </summary>
- /// <param name="text">text to parse</param>
- /// <param name="pattern">regular expression pattern</param>
- /// <returns>MatchCollection</returns>
- [SqlFunction(FillRowMethodName="FillMatch")]
- public static IEnumerable GetMatches(string text, string pattern)
- {
- return Regex.Matches(text, pattern);
- }
- /// <summary>
- /// Parses match-object and returns its parameters
- /// </summary>
- /// <param name="obj">Match-object</param>
- /// <param name="index">TThe zero-based starting
- /// position in the original string where the captured
- /// substring was found</param>
- /// <param name="length">The length of the captured substring.</param>
- /// <param name="value">The actual substring
- /// that was captured by the match.</param>
- public static void FillMatch(object obj, out int index,
- out int length, out SqlChars value)
- {
- Match match = (Match)obj;
- index = match.Index;
- length = match.Length;
- value = new SqlChars(match.Value);
- }
- }
复制代码 GetMatches returns all the matched elements as a MatchCollection object, and FillMatch is called for each object (Match) obtained before to determine the table row fields based on the data from it. We can see it from the list of parameters: first is the object reference and the rest is the variables marked by the out attribute. This 'rest' determines the possible columns' nature.
Be accurate! You have to indicate the Fill method name in the SqlFunction.FillRowMethodName property; on the other hand, it allows some flexibility.
Now, we build the assembly, register it in MS SQL Server again because its strong name was changed by the building, and create a target table-valued function:
- CREATE FUNCTION
- --function signature
- RegExpMatches(@text nvarchar(max), @pattern nvarchar(255))
- RETURNS TABLE
- ([Index] int, [Length] int, [Value] nvarchar(255))
- AS
- --external name
- EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.GetMatches
- GO
复制代码
And now, we can extract the strings from some text by a certain pattern. For instance, let's get all the words from the text that starts with a lower-case 'a':
- -- RegExpMatches sample
- DECLARE @Text nvarchar(max);
- DECLARE @Pattern nvarchar(255);
-
- SET @Text =
- 'This is comprehensive compendium provides a broad and thorough investigation of all '
- + 'aspects of programming with ASP.Net. Entirely revised and updated for the 2.0 '
- + 'Release of .Net, this book will give you the information you need to master ASP.Net '
- + 'and build a dynamic, successful, enterprise Web application.';
- SET @Pattern = '\b(a\S+)'; --get all words that start from 'a'
- select * from dbo.RegExpMatches(@Text, @Pattern)
- GO
复制代码
and you get the result below:

Please give attention to that fact that the script does not return any 'ASP.NET' token because they start with upper-case 'A'. If you want to ignore case when matching occurs, you have to have a new function, or just add one more parameter to the Regex.Matches() method as shown:
- [SqlFunction(FillRowMethodName="FillMatch")]
- public static IEnumerable GetMatches(string text, string pattern)
- {
- return Regex.Matches(text, pattern, RegexOptions.IgnoreCase);
- }
复制代码
And now, RegExpMatches returns all 'a'-starting words including 'ASP.NET'.
This removes the script for the RegExpMatches function:
- DROP FUNCTION RegExpMatches
复制代码
SQL Regular Expressions, Version 2
The SqlServerProject download consists of a 'SQL Server Project' type solution instead of a previously created 'Class Library' type solution. It is a more natural type for developing objects for a database. There are a couple of files in the project. We use them because it was boring to do monotonous actions to drop/create assembly/function during the development in the 'Class Library' type project. The 'SQL Server Project' type assumes all these actions without any human participation. Also, its IDE consists of special menu items to perform database objects tasks. But, you can enjoy it if you only have VS Pro/Team.
Easy deploying. We have to do three steps to deploy Regular Expressions to our SQL Server:
- Setup the project to reference the target database.
- Build the project.
- Deploy the assembly that supports Regular Expressions in SQL Server to our database engine.
And now, step by step:
To setup a project to reference a database where it would be good to use Regular Expressions power, we do this (see image below):

- We right click on the project in the Solution Explorer window.
- We choose the 'Properties' menu item. The assembly properties window will be expanded.
- On the left side, we choose a bookmark with a 'Database' title.
- We click on the 'Browse...' button. The 'Add database reference' window will appear.
- Next, we choose an existing database reference from the list or add new. To do this, we click on the 'Add New Reference...' button. The 'New Database Reference' dialog will be shown.
- We choose a server name and a database name from the respective dropdown lists.
- Then, we click 'OK' - and the 'Add New Reference' dialog will be closed.
- We click 'OK' again, the 'Add Database Reference' dialog will be closed and a new target for the assembly deploying is set.
Now it is time for the final actions:
- Build the project by selecting 'Build SqlRegularExpressions' from the 'Build' menu (we may skip this step if changes are absent, because the assembly is already compiled).
- Select 'Deploy SqlRegularExpressions' from the same 'Build' menu.
This is the result we get:
========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========
In case you have not got Visual Studio IDE, you may install the Regular Expressions functionality as described in the 'Use of Code' section for the RegExpLike function. The assembly DLL is located in the 'Bin' folder of the SqlServerProject package. One more thing, this is the script for the RegexMatch function:
- --function signature
- CREATE FUNCTION RegexMatch(@Text nvarchar(max),
- @Pattern nvarchar(255), @Options INT) RETURNS BIT
- --function external name
- AS EXTERNAL NAME SqlRegularExpressions.UserDefinedFunctions.RegexMatch
复制代码
By db_developer
|
|