Thursday 14 December 2006

SQL Character Analysis

I love working with a development team. The great thing about it is you can tell other people about the cool things you figure out and how silly you are when you do something wrong. (As a rule, I like to take the blame for everything. Hey, it happens sometimes that things break because of me!) Also you can complain about unexpected "features" using different technologies or approaches.

I wrote the following bit of SQL script for a problem that was brought to my attention by a friend a while ago. It involved a string that had some unprintable characters that were causing the string to be difficult to parse. The following SQL generates a random string of 8000 characters (with character codes between 1 and 255) (for use within this example) and it also rips apart the random string to tell you the character composition of the string - by ascii code, character, and frequency of the character within the string (count).


/*

   COPYRIGHT © 2006 George Zabanah

   Please feel free to use this code and distribute,

   but leave this header at the top.

   No warranty of any kind is implied by using this code.

   Use at your own risk.

*/



DECLARE @RandomNumber   
float

DECLARE @MaxValue       
        
int

DECLARE @MinValue       
        
int

DECLARE @RandomString   
varchar(8000)

DECLARE @StringLength   
int

DECLARE @count  
        
        
int

DECLARE @singlechar     
        
char(1)



-- SET INITIAL VARIABLES

SELECT

        
@MaxValue = 255,

        
@StringLength = 8000,

        
@MinValue = 1,

        
@count = 1,

        
@RandomString = ''



-- GENERATE A RANDOM STRING THAT IS 8000 CHARACTERS

WHILE(@count <= @StringLength)

BEGIN

        
SELECT

        
@RandomNumber   
= RAND(),

        
@RandomString   
= @randomstring + CHAR(((@MaxValue + 1) - @MinValue) * @RandomNumber + @MinValue),

        
@count  
        
        
= @count + 1

END



DECLARE @tablevar TABLE (onechar char(1))

-- DECONSTRUCT THE STRING INTO SINGLE CHARACTERS AND INSERT INTO A TABLE VARIABLE

SET @count = 1

WHILE(@count <= @StringLength)

BEGIN

        
SELECT

        
@singlechar     
        
= LEFT(@RandomString,1),

        
@RandomString   
= RIGHT(@RandomString,LEN(@RandomString) - 1),

        
@count  
        
        
= @count + 1

        
SET NOCOUNT ON

        
INSERT @tablevar values(@singlechar)

        
SET NOCOUNT OFF

END



-- RETURN THE RESULT SET OF CHARACTER ANALYSIS

SELECT ASCII(onechar) AsciiCode, onechar SingleCharacter, count(*) CharacterCount

FROM @tablevar

GROUP BY ASCII(onechar),onechar

ORDER BY ASCII(onechar),onechar


Wednesday 13 December 2006

Xml Compare and Excel Routines

A few days ago I mentioned the Xml Compare and Excel Routines that I use for testing. I'm posting them here ... they are quite handy routines. Both are written in C#.

Xml Compare:


/*

COPYRIGHT © 2006 George Zabanah

Please feel free to use this code and distribute,

but leave this header at the top.

No warranty of any kind is implied by using this code.

Use at your own risk.

*/




/// <summary>

/// Displays the difference between two xml strings

/// </summary>

/// <param name="expected">expected result</param>

/// <param name="actual">actual result</param>

private void XmlCompare(string expected,string actual)

{

XmlDocument xExpected = new XmlDocument();

xExpected.LoadXml(expected);



XmlDocument xActual = new XmlDocument();

xActual.LoadXml(actual);

System.Xml.XPath.XPathNavigator xeNav = xExpected.CreateNavigator();

System.Xml.XPath.XPathNavigator xaNav = xActual.CreateNavigator();



// Move to the first non-comment element.

xeNav.MoveToChild(System.Xml.XPath.XPathNodeType.Element);

xaNav.MoveToChild(System.Xml.XPath.XPathNodeType.Element);



System.Xml.XPath.XPathNodeIterator xeNodeIterator = xeNav.SelectDescendants(System.Xml.XPath.XPathNodeType.Element,true);

System.Xml.XPath.XPathNodeIterator xaNodeIterator = xaNav.SelectDescendants(System.Xml.XPath.XPathNodeType.Element,true);

TestContext.WriteLine(string.Empty.PadLeft(80, '='));

TestContext.WriteLine("XML COMPARISON");

TestContext.WriteLine(string.Empty.PadLeft(80, '='));

while (xeNodeIterator.MoveNext() && xaNodeIterator.MoveNext())

{



if (xeNodeIterator.Current.Name != xaNodeIterator.Current.Name)

{

TestContext.WriteLine("Actual Node Name: {0} Expected Node Name: {1}", xaNodeIterator.Current.Name, xeNodeIterator.Current.Name);

// return;

}

if ((xeNodeIterator.Current.Value == xeNodeIterator.Current.InnerXml xaNodeIterator.Current.Value == xaNodeIterator.Current.InnerXml) && xeNodeIterator.Current.Value != xaNodeIterator.Current.Value)

{

TestContext.WriteLine(String.Empty.PadLeft(80, '-'));

TestContext.WriteLine("Actual Node \t{0}: {1} \r\nExpected Node \t{2}: {3}", xaNodeIterator.Current.Name,xaNodeIterator.Current.Value, xeNodeIterator.Current.Name,xeNodeIterator.Current.Value);

TestContext.WriteLine(String.Empty.PadLeft(80, '-'));

}

}



}






Excel Routines:


/*

COPYRIGHT © 2006 George Zabanah

Please feel free to use this code and distribute,

but leave this header at the top.

No warranty of any kind is implied by using this code.

Use at your own risk.

*/




public static class ExcelFunctions

{

public static void WriteToExcel(DataSet ds, String MyFileName, bool Append)

{

string connect3 = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + MyFileName + "\";User ID=Admin;Password=;Extended Properties=\"Excel 8.0;HDR=YES\";";

OleDbConnection _to = new OleDbConnection(connect3);

bool blnFileExists = File.Exists(MyFileName);

if (blnFileExists && !Append) File.Delete(MyFileName);

_to.Open();

foreach (DataTable dt in ds.Tables)

{

if (!Append !blnFileExists)

{

String table = "CREATE TABLE " + dt.TableName + " (";

foreach (DataColumn dc in ds.Tables[dt.TableName].Columns)

{

String typename = " [" + dc.ColumnName + "] " + (dc.DataType.ToString() == "System.Int64" dc.DataType.ToString() == "System.Double" dc.DataType.ToString() == "System.Int32" ? "NUMERIC" : "TEXT");

table += typename + (dc == ds.Tables[dt.TableName].Columns[ds.Tables[dt.TableName].Columns.Count - 1] ? ")" : ",");

}

OleDbCommand cmd = new OleDbCommand(table, _to);

try

{

cmd.ExecuteNonQuery();

}

catch (Exception e)

{

_to.Close();

_to.Dispose();

Console.Error.WriteLine(e.Message);

Console.Error.WriteLine(e.StackTrace);

return;

}

}

OleDbDataAdapter daTo = new OleDbDataAdapter("select * from [" + dt.TableName + "$]", _to);

OleDbCommandBuilder x = new OleDbCommandBuilder(daTo);

x.QuotePrefix = "[";

x.QuoteSuffix = "]";

daTo.InsertCommand = x.GetInsertCommand();

try

{

daTo.Update(ds, dt.TableName);

}

catch (Exception e)

{

Console.Error.WriteLine(e.Message);

Console.Error.WriteLine(e.StackTrace);

}



}

_to.Close();

_to.Dispose();

}

public static DataSet ReadFromExcel(string MyFileName)

{

string connect3 = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + MyFileName + "\";User ID=Admin;Password=;Extended Properties=\"Excel 8.0;HDR=YES\";";

OleDbConnection _to = new OleDbConnection(connect3);

DataSet ds = new DataSet();

_to.Open();

DataTable schemaTable = _to.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,

new object[] { null, null, null, "TABLE" });

foreach (DataRow dr in schemaTable.Rows)

{

OleDbDataAdapter daTo = new OleDbDataAdapter(dr["TABLE_NAME"].ToString(), _to);

daTo.SelectCommand.CommandType = CommandType.TableDirect;

daTo.AcceptChangesDuringFill = false;

if (dr["TABLE_NAME"].ToString().IndexOf("$") == dr["TABLE_NAME"].ToString().Length - 1)

{

daTo.Fill(ds, dr["TABLE_NAME"].ToString());

ds.Tables[dr["TABLE_NAME"].ToString()].TableName = dr["TABLE_NAME"].ToString().Replace("$", "");

}

}

_to.Close();

return ds;



}



}






Also, I have used this handy little utility to generate the HTML (from the code) by doing the following.

1) Install RTF to HTML utility found here (freeware).
2) Open the code in VS2005.
3) Run the RTFtoHTML utility. Convert to HTML and post in your blog.

I'd be interested to know if someone has a better way of doing this.

Monday 11 December 2006

The Ingredients of Good Manager/Developer relationships

Take a look at this fantastic letter. In my travels, I've encountered a number of
managers who have broken these rules more often than not. The managers that are
excellent are the ones I still keep in touch with. They live by the "ingredients" I outline below.

A manager's role is to foster the following characteristics within the working environment, thereby building "cohesive leadership and participation" between team members. The first step for a manager to achieve this is by solidifying the manager/developer relationship. A team member's role is to show commitment as an active participant within the team.

The following is my brief list of essential ingredients for healthy developer/manager
relationships:

  • have open two-way communication
  • have/give trust
  • show integrity and have mutual respect
  • have constant reality checks and remove unrealistic expectations
  • encourage participation in healthy discussions/debates - "the best idea wins"
  • praise in public/adjust behaviours in private
  • show sincerity and empathy
  • be approachable and considerate of other people's contributions

Although these points seem surprisingly simple -- they can only be put into effective practice
by people who live these values in their everyday lives. In order to make your team
follow you and want to follow you (as well as replicate your behaviour) -
a manager must live out these values and lead by example.

Also, a manager who believes that he/she is better than everyone else (or even just has an ego/disposition to suggest this) is worse than a team member who believes the same - a manager should be smart enough to know better. Most people can tell a good manager from a bad one.

A manager's responsiblity is to help remove obstacles from the people around them (by managing downwards to team members, sideways to colleagues, and upwards to upper management).

Contrary to what some people may say, employee turnover isn't always
due to restructuring, "grass is greener syndrome", or a fact of life. Employee turnover
can be a telltale sign of poor management/IT planning. Poor management/IT planning is
a definite result of ignoring the ingredients of good developer/manager relationships.

If you have read this far - take a look at the letter now. It is definitely a good read.

My thanks to the good managers out there.

Saturday 9 December 2006

One of my favourite SQL Routines

Looking through a few of the sql routines I wrote within the past year, I dug this one up because I want to write a version which works with Oracle. The following bit of code will go through all view/tables within the current database. It will report in one table the following combination:

TableName
ColumnName
TotalRowCount
DistinctCount
NullCount

The only limitation this script has is the limitation on dynamic sql (8000 characters). I get around this a little bit by using temporary procedures. I have avoided using cursors to keep it simple and efficient.


/*

ANALYZE TABLES/VIEWS in current database by rowcount, distinct count

and null count.

August 8, 2006 Initial Version

COPYRIGHT © 2006 George Zabanah

Please feel free to use this code and distribute,

but leave this header at the top.

No warranty of any kind is implied by using this code.

Use at your own risk.

*/

-- Table to hold column information

CREATE TABLE #ColumnData (TableName sysname,

ColumnName sysname,

TOTALROWS bigint,

DistinctCount bigint,

NullCount bigint)

GO

-- Temp procedure for a particular table, column combination

CREATE PROCEDURE #C (@TableName sysname, @ColumnName sysname, @CastVarchar bit = 0)

AS

declare @mysql varchar(4000)

set @mysql = 'insert #ColumnData select ''' + @TableName + ''' TABLENAME, ''' +

@COLUMNNAME + ''' COLUMNNAME, COUNT(*) TOTALROWS, ' +

'COUNT(DISTINCT(' + (CASE WHEN @CastVarchar = 1 then 'CAST(' else '' end) +

'[' + @COLUMNNAME + ']' +

(CASE WHEN @CastVarchar = 1 then ' as varchar(8000))' else '' end) +

')) DISTINCTCOUNT, SUM(CASE WHEN [' +

@COLUMNNAME + '] is NULL THEN 1 ELSE 0 END) NULLCOUNT from [' + @TableName + ']'

RAISERROR('%s.%s',10,1,@TableName,@ColumnName) WITH NOWAIT

exec(@mysql)

GO

-- Temp procedure for a particular table

CREATE PROCEDURE #T (@TableName sysname)

AS

declare @mysql varchar(4000)

set @mysql = 'declare @t varchar(8000) set @t = ''' + @TableName + ''''

select @mysql = @mysql COLLATE DATABASE_DEFAULT + ' exec #C @t,''' + COLUMN_NAME + '''' +

(CASE WHEN DATA_TYPE like '%Text' then ',1' else '' end)

from information_schema.columns where table_name = @TableName

RAISERROR('-- TABLE: %s --',10,1,@TableName) WITH NOWAIT

exec('SET NOCOUNT ON ' + @mysql + ' SET NOCOUNT OFF')

GO

Declare @mysql varchar(4000)

-- TABLES

set @mysql = ''

select @mysql = @mysql + ' exec #T ''' + table_name + ''''

from information_schema.tables where table_type = 'BASE TABLE'

exec(@mysql)

-- VIEWS

set @mysql = ''

select @mysql = @mysql + ' exec #T ''' + table_name + ''''

from information_schema.tables where table_type = 'VIEW'

exec(@mysql)

select * from #ColumnData

drop table #ColumnData

drop procedure #T

drop procedure #C




A few noteworthy points:

  • the #T temporary procedure iterates through each table/view and the #C temporary procedure iterates through the columns of each table/view.
  • I use the simple set-based variable assignment in SQL to concatenate one big dynamic SQL statement for each table/view.

Like I said before, this script is one of my favourites.

More on Test Driven Development

My last post really didnt finish on what TDD means to me. True, I mentioned automated testing which is a useful part of TDD. But bcwhite (thanks for the comment) mentioned the distinction of writing your tests up front vs automating your tests. I do write some of my tests up front ... but you can't always write all of them. I find myself adding more tests as requirements change (or as I think of them).

TDD means writing your code while being mindful of the need to test it. You write your code to expose properties or return values that can help you test. This way you can simulate Input in/Output out and check whether an Exception occurred yes/no, etc.

Also, for your "composite" functions which have connections to units that you don't want to test, (i.e. a component which communicates with a singleton object that you don't need/don't want to test, or a component which uses threading or sockets) you factor your code in such a way that you can test your "units" in isolation of everything else. If you can't easily separate out your unit from everything else, you expose a boolean property called "TestMode" and only set this during automated testing. For each line of code in your unit that you don't want to test during unit testing, you check this boolean first i.e.

If (not TestMode)
PassObjectToSingletonQueue

OR

If(not TestMode)
PassDataToOpenSocket

The candidate function I mention above is a component which does have inputs/outputs and exceptions thrown but also communicates with a singleton object that passes data to and from an open socket connection. In this case, I want to test the inputs and outputs and exceptions (without communicating with the singleton object).

This makes it much easier to have as much of your code unit tested as possible. Anything beyond a unit no longer becomes unit testing. Thinking this way makes it easier for you to test your code and helps guard your code against the likelihood that changes in requirements will yield undesirable results. In the case where your some or all of your unit tests fail, you know then and there that you have some work to do.

Friday 8 December 2006

Test Driven Development

My first blog ... time to dive in. Here goes.

Automated Unit testing has saved me many times over. For the time you spend, versus the time you save it is a solid investment. Whether you use nunit or Visual Studio Team Framework - it is up to you. But it is well worth the time up front. How many times do you think - oh the requirements won't change (or if they change it won't be that difficult to keep track of or fix my code). For example ... take the case when you are comparing lines upon lines of xml output. Or even worse, let's say you have to use fairly complex (1000+ characters) regular expressions to match multi-delimiter delimited data and use that to generate xml. A few lines of code ... and you save heaps of time ... I have written a very simple function (called XmlCompare) that uses XPathNodeIterator to perform the following Xml Walkthrough of Expected and Actual:

1) Expected Node Name vs. Actual Node Name
2) Expected Node Value vs. Actual Node Value
3) Expected Self-Closing Empty Node vs. Actual Empty Node
4) Expected Empty Node vs. Actual Self-Closing Empty Node

It's worked exceptionally well. In addition, I've written a function to split my very long (necessarily so) and complex regular expressions so that I can tell if the regular expression is matching the data properly.

Combine this with a function that uses OleDbConnection object using Jet4.0 to read test data from Excel into a dataset and you have a beautiful thing. By iterating through the rows in the dataset (I have a spreadsheet for each function I want to test and each spreadsheet has quite a few variations of test data), I can test each of the functions I have with multiple variations (literally at the click of a button).

Developers are such optimistic creatures ... with test driven development, developers can have a reason for optimism.