Friday 12 January 2007

More on Regular Expression patterns

Regular expression patterns are powerful. I have been recently using Log4Net to log system activity and created a regular expression similar to the one below to allow multi-line text log parsing.

Sample Text:

2007-01-01 00:00:00 Test log message
Another line
Yet one more line
2002-01-01 00:01:01 Another test log message

Regular Expression Pattern:

^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})[ ]*((?:(?!\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*(?:(?!\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*\n?)*))

The above regular expression pattern will capture Date and parse the LogText where the additional lines don't begin with the Date.

Tuesday 2 January 2007

Happy New Year!

It is a New Year. It is a time to look ahead and time to look behind. Back to a favorite topic of mine. Regular expressions. Today I discovered a cool way to do selective search and replace on a substring. Take the following example:

Sample Text:

i before e, except after c
Man very early made jars stand up! Nearly perfect

Regular Expression Pattern

(?<=^[^,!]*)e

Regular Expression Replace String

E

Use a tool like Expresso to work with regular expressions. Other free tools are RegEx Designer or The Regulator. The result you will get when you run the replace with the above Regular Expression is as follows:

i bEforE E, except after c
Man vEry Early madE jars stand up! Nearly perfect

Notice the selective replacement of e with a capital E. How does this work? Take a look at the analysis for the above regular expression pattern:

Match a prefix but exclude it from the capture. [^[^,!]*]
^[^,!]*
Beginning of line or string
Any character that is not in this class: [,!], any number of repetitions
e


What this effectively does is look behind any letter e in a string that is not preceded by a , or !. In effect, that allows us to limit the replacement to all occurrences of e that occur before a , or !. Pretty cool, no?