Thursday, 14 December 2006

SQL Character Analysis

I love working with a development team. The great thing about it is you can tell other people about the cool things you figure out and how silly you are when you do something wrong. (As a rule, I like to take the blame for everything. Hey, it happens sometimes that things break because of me!) Also you can complain about unexpected "features" using different technologies or approaches.

I wrote the following bit of SQL script for a problem that was brought to my attention by a friend a while ago. It involved a string that had some unprintable characters that were causing the string to be difficult to parse. The following SQL generates a random string of 8000 characters (with character codes between 1 and 255) (for use within this example) and it also rips apart the random string to tell you the character composition of the string - by ascii code, character, and frequency of the character within the string (count).


/*

   COPYRIGHT © 2006 George Zabanah

   Please feel free to use this code and distribute,

   but leave this header at the top.

   No warranty of any kind is implied by using this code.

   Use at your own risk.

*/



DECLARE @RandomNumber   
float

DECLARE @MaxValue       
        
int

DECLARE @MinValue       
        
int

DECLARE @RandomString   
varchar(8000)

DECLARE @StringLength   
int

DECLARE @count  
        
        
int

DECLARE @singlechar     
        
char(1)



-- SET INITIAL VARIABLES

SELECT

        
@MaxValue = 255,

        
@StringLength = 8000,

        
@MinValue = 1,

        
@count = 1,

        
@RandomString = ''



-- GENERATE A RANDOM STRING THAT IS 8000 CHARACTERS

WHILE(@count <= @StringLength)

BEGIN

        
SELECT

        
@RandomNumber   
= RAND(),

        
@RandomString   
= @randomstring + CHAR(((@MaxValue + 1) - @MinValue) * @RandomNumber + @MinValue),

        
@count  
        
        
= @count + 1

END



DECLARE @tablevar TABLE (onechar char(1))

-- DECONSTRUCT THE STRING INTO SINGLE CHARACTERS AND INSERT INTO A TABLE VARIABLE

SET @count = 1

WHILE(@count <= @StringLength)

BEGIN

        
SELECT

        
@singlechar     
        
= LEFT(@RandomString,1),

        
@RandomString   
= RIGHT(@RandomString,LEN(@RandomString) - 1),

        
@count  
        
        
= @count + 1

        
SET NOCOUNT ON

        
INSERT @tablevar values(@singlechar)

        
SET NOCOUNT OFF

END



-- RETURN THE RESULT SET OF CHARACTER ANALYSIS

SELECT ASCII(onechar) AsciiCode, onechar SingleCharacter, count(*) CharacterCount

FROM @tablevar

GROUP BY ASCII(onechar),onechar

ORDER BY ASCII(onechar),onechar


No comments: