Introduction to Regular Expressions in VBA


Regular-Expressions (RegExp) is something I bumped into when my string manipulation requirements jumped to an advanced level. Before I started using this powerhouse, all my string manipulation functions involved maneuvering through a dozen for-loops; and tackling hundreds of Left-Right-Mid-InStr functions. RegExp is a pattern matching system that allows you to perform advanced string manipulations very easily. It may take a while to get used to it, but once you get the hang of it, the possibilities are endless.

To tap into this super power, a Reference to “Microsoft VBScript Regular Expressions 5.5” has to be added to the VBA Project. Click here to see how to add the reference to your project. Unfortunately, the version implemented in Visual Basics, using VBScript Scripting Library, does not include Regular Expressions in its entirety. Some of the new features have not been implemented yet. Click here to check out which features are available.

RegExp in VBA is fashioned as an Object. Pattern, Global and IgnoreCase are  the properties you might need to use. Replace is the one method that I use very often. Please remember that the primary purpose of this post is the ease you into getting the hang of the Replace method.

Steps to using the Replace Method:

  1. Declare and Initialize the RegExp Object
  2. Set the IgnoreCase property to False, if you do not want the pattern to be case sensitive. It is set to True by default
  3. Set the Global property to True, to replace all instances of matched pattern. It is set to False by default i.e. only the first instance is replaced.
  4. This is the most important step. Set the Pattern property. I advise going through the links I have posted below to get the hang of using this.
  5. Finally, we use the Replace method and assign it to a string variable.

It is a good idea to wrap this into a nice neat function, so you do not have to declare new objects or change the properties of existing ones explicitly. This function can be used within VBA and can also be used in the Excel Formula Bar; we will see examples of both. I can’t imagine doing string manipulations in Excel without RegExp. I intend to post a lot of other user defined functions in the future that use the following function:

'====================================================================================
'Wrapper function to impliment the Replace method of the RegExp object
'Author : Ejaz Ahmed - ejaz.ahmed.1989@gmail.com
'Date: 08 December 2013
'Website : https://strugglingtoexcel.wordpress.com/
'====================================================================================
Function RegExpReplace(ByVal WhichString As String, _
                        ByVal Pattern As String, _
                        ByVal ReplaceWith As String, _
                        Optional ByVal IsGlobal As Boolean = True, _
                        Optional ByVal IsCaseSensitive As Boolean = True) As String
'Declaring the object
Dim objRegExp As Object
'Initializing an Instance
Set objRegExp = CreateObject("vbscript.regexp")
'Setting the Properties
objRegExp.Global = IsGlobal
objRegExp.Pattern = Pattern
objRegExp.IgnoreCase = Not IsCaseSensitive
'Execute the Replace Method
RegExpReplace = objRegExp.Replace(WhichString, ReplaceWith)

End Function

This function can be wrapped inside other functions to perform string manipulations easily. Below are some very simple examples of how the above function could be used in your VBA functions.

Function to extract only numbers in a string:

'====================================================================================
'Function to extract only the numbers in a string using RegExp
'Author : Ejaz Ahmed - ejaz.ahmed.1989@gmail.com
'Date: 08 December 2013
'Website : https://strugglingtoexcel.wordpress.com/
'====================================================================================
Function OnlyNumbers(ByVal WhichString As String) As Variant
    OnlyNumbers = CDbl(RegExpReplace(WhichString, _
                        "[^0-9]", vbNullString, True))
End Function

The screenshot below contains few examples of RegExpReplace() being used in the Excel interface. I would suggest experimenting with the Pattern and Replace-With strings to test your ideas; and finally wrap them into user defined function of your own.

Regular Expression
Example : Regular Expression

Please feel free to share your ideas here. Finally, I found the following links very useful and I hope they help you too:

Advertisements

14 thoughts on “Introduction to Regular Expressions in VBA

  1. Hi. I have a very limited experience with VBA, but I know speed is important when using anything in VBA and regular expressions are incredibly helpful. So… Could you please explain why my function takes a lot of time to calculate?

    Function FindTags(my_phrase As String, my_tags_range As Range, list_separator As String, negative_result_text As String) As String

    Dim cell As Range

    On Error GoTo ErrHandling

    FindTags = “”

    For Each cell In my_tags_range
    ‘ If InStr(UCase(my_phrase), UCase(cell.Value)) <> 0 Then FindTags = FindTags & cell.Value & list_separator & ” ” ‘this finds “know” in “unknown” and “acknowledge” so I’m using the following line instead, which solves this problem, but now it takes a lot of time to calculate
    If reFindWord(cell.Value, my_phrase, False) <> 0 Then FindTags = FindTags & cell.Value & list_separator & ” ”
    Next cell

    If FindTags = “” Then
    FindTags = negative_result_text
    Else
    FindTags = Left(FindTags, Len(FindTags) – 2)
    End If

    Exit Function

    ErrHandling:
    FindTags = “!!!! ERROR !!!!”

    End Function

    Function reFindWord(FindWord As String, SearchText As String, Optional MatchCase As Boolean = False) As Boolean ‘function written by Ron Rosenfeld / found on stackoverflow
    ‘False = case insensitive
    Dim RE As Object
    Dim sPattern As String
    Set RE = CreateObject(“vbscript.regexp”)
    sPattern = “\b” & FindWord & “\b”
    With RE
    .Pattern = sPattern
    .ignorecase = Not MatchCase
    reFindWord = .test(SearchText)
    End With
    End Function

    Like

      • Hi, Ejaz.

        I have a list of things to learn (>1000) and these things are described with sentences containing keywords (tags). The list of keywords is in a separate worksheet in a table with a single column. There can be more than one keyword in a cell. For example: “Define the terms copyright, intellectual property. Recognize the need to acknowledge sources and/or seek permission as appropriate.” >> “define, recognize”, but not “know”. The keywords are not always the first word in the sentence.

        The function should identify which keywords exist in a cell and if there are none it should assume it is a practical thing to be done like “Change the column, bar, line, pie slice colors in the chart.” and thus marking it with “do” (=negative_result_text).

        The function is placed in an Add-In and it is used as =FindTags(skills[[#This Row],[TASK ITEM]],tags,”,”,”do”).
        I’ve just noticed I put the arguments in a strange order FindTags(my_phrase, my_tags_range…) = “in my_phrase find my_tags” instead of “find my_tags in my_phrase”.

        There are keywords where the function will fail (for example “outline” in “Enter text into a placeholder in standard, outline view.” >> “do”, not “outline” vs. “Outline the types of software licenses: proprietary, open source, trial version, shareware, freeware” >> “outline”), but my interest right now is the speed factor. Why using InStr is fast and using RegEx object is slow? Is there something to be added or changed so it becomes fast?

        Like

        • I am a little busy at the moment. I haven’t gotten around to seeing your code yet. Let me get back to you on this. Feel free to use the contact page in my blog to remind me if I forget.

          Like

  2. Hi Ejaz,
    Found this post while trying to figure out how to integrate regex into the following code which I use as a macro in word. I teach English as a second language and use the code to explore student texts to see how well they are using certain kinds of grammar. So I have a separate Excel file with a list of words to look for like “did not” or “didn’t”. Sentences get extracted with those strings in them. By way of example, in the case of “didn’t” and “did not”, it lets me see how well they are doing with Simple Past negative forms.

    The ability to use REGEX searches instead of simple string searchs would greating decrease the number different searches needed.

    My hope was to be able to do something simple like changing the code:
    Execute(Findtext:=strSearch(var)
    to something like
    Execute(FindobjRegExp:=strSearch(var)

    Of course it is not that easy. Below is the script that I am using which I found at VBAExpress through the kind contributions of a coder there although it needed updating from VB to VBA. I changed appXL to appExcel throughout.

    Any help you might be able to offer would be appreciated. As you can see I am not an experienced programmer.

    Tx Ejaz.

    Sub SentencesToExcel()
    Dim appExcel As Object
    Dim objSheet As Object
    Dim wbkXLNew As Object

    Dim strSearch() As String 
    Dim var 
    Dim r As Word.Range 
    Dim j As Long 
    
    Set appExcel = CreateObject("Excel.Application") 
    Set wbkXLSource = appExcel.Workbooks.Open(FileName:="c:\temp\wordlist.xls") 
    Set wbkXLNew = appExcel.Workbooks.Add 
    
    For var = 0 To 300 
         ' build the array of search words from the source Excel file
        Redim Preserve strSearch(var) 
        strSearch(var) = wbkXLSource.Worksheets("Sheet1").Cells(var + 1, 1).Value 
    Next 
     ' close the source Excel file as do not need it
    wbkXLSource.Close 
     ' destroy its object
    Set wbkXLSource = Nothing 
    j = 1 
     ' for each search words
    For var = 0 To UBound(strSearch()) 
         ' make a range ovbject of the whole document
        Set r = ActiveDocument.Range 
        With r.Find 
             ' with each Found
            Do While .Execute(Findtext:=strSearch(var), Forward:=True) _ 
                = True 
                 ' expand to the sentence
                r.Expand Unit:=wdSentence 
                 ' and put in the next cell in the new Excel file
                wbkXLNew.Worksheets("Sheet1").Cells(j, 1).Value = r.Text 
                   With wbkXLNew.Worksheets("Sheet1").Cells(j, 1) 
                    ChrStart = InStr(wbkXLNew.Worksheets("Sheet1").Cells(j, 1), strSearch(var)) 
                    If ChrStart &gt; 1 Then 
                        With .Characters(Start:=ChrStart, Length:=Len(strSearch(var))).Font 
                            .FontStyle = "Bold" 
                        End With 
                        ' With .Characters(Start:=ChrStart, Length:=Len(strSearch(var))).Font
                         '   .Color = RGB(255, 0, 0)
                       'End With
                    End If 
                End With
    
    
    
                j = j + 1 
                r.Collapse 0 
                wbkXLNew.Worksheets("Sheet1").Cells(j, 1).Value = "break" 
            Loop 
        End With 
    Next 
    appExcel.Visible = True 
    
     'wbkXLNew.SaveAs FileName:="valid_path"
     'Set wbkXLNew = Nothing
     'appXL.Quit
     'Set appXL = Nothing
    

    End Sub

    Liked by 1 person

    • Hi Mel. Firstly, Kudos for caring about your students this much. I hope to be a teacher some day, and I’d strive to be more like you.

      I have not used the Execute method before, I need to look it up. This sounds like an Interesting project and I will try my best to come up with a solution.

      Recently, I did a text mining project that involved importing a bunch of resumes, which were stored as word documents in a folder, into individual cells in excel. I think I can do something like that to import all your student’s essays into excel.

      If you have some kind of standard file naming convention, I think you can list all the sentences along with the student’s names. I think that would let you get a better picture, wouldn’t it?

      I will work on that project over the weekend.

      If you need to rename all the filenames, check out my Excel based batch file renamer. https://strugglingtoexcel.wordpress.com/2014/01/01/excel-batch-renamer/

      Like

    • Hi Mel,

      I have written a code that turns a word document into an array of sentences. Once that is done, I can use Regular Expressions to match patterns in each of those sentences. If a match exists, I can list the sentence on an excel spreadsheet, along with the word document’s name. That way, you can put your students’ essays into one folder and analyze them as a whole.

      In Summary:
      1. You put all the essays, as word documents into a folder
      2. You browse for the the folder in the model I would make for you.
      3. You specify the patterns you’d like to match in a sheet.
      4. Once that is done, you hit a button to list out the matches in all the documents in the folder you chose above. This will list out the file names and matched sentences in another sheet.
      5. Ill add in a couple of buttons to clear the inputs and stuff to increase productivity.

      How does that sound?

      Like

      • Ejaz,
        Thanks so much for giving this some thought. Sorry I haven’t gotten back to you a little quicker. I have spent a bit of time refining the process.

        Much as I like the idea of working with all the essays stored as files in a folder, it is probably a future tool for me even though I am not adverse to seeing that code. The reality is that I get the student Word files one at a time online through a teaching site using a tool called Peer Review. The word file opens and there is a list of criteria that I use to grade it. So I end up having to do them one at a time as I develop the feedback for the student and paste it into a field which I submit to their file and then go onto the next one as managed by the online submission process used by the students. Not that I am not open to eventually changing that process or the tool used depending on how things develop here.

        The best would be if I could send you some files that have the process which I have refined quite a bit quite a bit. In fact, the more I use it, the more automation I add. It is supposed to improve the feedback and save me time at the same time. The remaining problem at this point is the inability to search using regular expressions. Without them, I end up with too many sentences. For example, to find the Present Progressive, I search for ing (followed by a space). However that finds “thing”, “ring” “have been going” and more. What I want to do is to parse looking for /[s|re][n’t| not]? ?[always|usually|likely]? *ing/ which is more precise pulling out “is working”, “isn’t working” “are filling”, “are likely filling” etc.

        Let me know how to send you those files if you are still open to pursuing this question.
        Tx,
        Mel

        Like

        • Ahhh, I am an Excel Fan boy, and my mind thinks only Excel 😀

          Anyway, feel free to send those files over to StrugglingToExcel@outlook.com

          Do add a sample essay. Some search phrases and what you’d like to do once you find a match.

          Word search also has WildCard search facility, you may want to look into that alos, before deciding regular expression is the way to go.

          Like

  3. Ejaz – This is a great post. Thanks for creating this clear and concise introduction. Nice ‘About’ page, too. Teaching and mentoring, especially in the STEM subjects is critically needed. Warm regards – Henry

    Liked by 1 person

    • Hi Henry,

      I am glad you found it useful. And thanks for reading my about page and taking the time to let me know you liked my post. Means a lot to me. I hope you would take some time to check out my other posts; you might find something useful.

      Like

    • Thanks Newton. I am glad you found it useful. I intended this to be just that, a starting point. Good to know the post is doing its job well. Infact I am a little proud that it ranks well in Google.

      I do hope you’d check out a few other posts in my blog. Thanks for stopping by.

      Liked by 2 people

  4. Pingback: Clean a string for Range Names – Regular Expressions | Struggling To Excel

If you liked it, let me know. If you didn't make sure you let me know!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s