Wednesday, April 15, 2015

How to share Windows Explorer window in Lync

This is not really an Excel solution, but I thought I'd share it anyway.

Lync allows you to share your desktop or individual applications with the person or group with whom you are chatting. However, what if you want to share just a Windows Explorer window, in order to demonstrate the contents of an individual folder, or how to navigate through folders to find a specific file?

At least in Lync 2010, even when Windows Explorer is open, it does not appear in the list of programs you can share.



My kludge is:

1) to share an empty Office document, whether Excel, Word, or PowerPoint.
2) then start to Open (Ctrl-O)
3) Depending on your version of Word, this might launch the Open dialog box immediately.
4) But if instead it brings you to Backstage view, click on Computer and then Browse to launch the Open dialog box:



5) Since you are sharing Word, you will also automatically share the launched Open dialog box.

That dialog box is fully functional, meaning that you can resize it, type a new directory name in the address bar, navigate between directories, create and copy and delete directories and files, and so on. In other words, it is the equivalent of Windows Explorer except it is shareable in Lync.

Wednesday, March 25, 2015

Minimum Edit Distance, part 2

In part one, we introduced Minimum Edit Distance and showed how to implement (as a dynamic programming approach) using just three Excel formulae. Then, we implemented it in VBA, as a UDF -- a user defined function that can be called from any cell. I ended that first post with the following note:
However, I've discovered that when there are a lot of calls to MinEditDistance in a workbook, then it can slow to a crawl. 
This is because the performance of MinEditDistance is O(n * m), where n and m are the lengths of the source and destination strings. Perhaps in a later post we can consider how to improve this performance.
We could try to improve the performance of the algorithm, and achieve some minor speed-up by tweaking the VBA. But, I am not going to focus on such minor tweaks here.

On other projects, when I've needed to run Minimum Edit Distance on millions of pairs of words, I ordered the words and then, based on shared prefixes, reused already partially-filled matrices. This achieves a more significant speed-up. In this case, Minimum Edit Distance is to be used as a UDF, and we cannot readily control the order in which pairs of words are presented to the function, and so this is not doable.

I've come up with an algorithmic improvement, in which, assuming the cost function for insertion, deletion and substitution are the same, we could change the runtime from O(n * m) to something usually much smaller. However, we won't delve into such algorithmic improvements here. Even if we can improve this particular algorithm, there will be other algorithms which we cannot improve in like manner.

Instead, we can use the same memoization technique already used in the dynamic programming approach. That is, MinEditDistance on these many pairs of words will be slow once. But once these results have been computed, we will store the results somewhere. Any time there is a cause to recompute a pair (e.g. an explicit call to recompute via pressing F9, or on workbook save), the UDF will first check if it has computed the result already. If it has, then it will reuse that result. Only if it has not already computed the result will it run the code for Minimum Edit Distance.

The difficulty with this is that we need someplace to store the computed results. We cannot store them on a hidden worksheet (writing something like ShtHidden.[B5].Value = ComputedResult) because MinEditDistance is a UDF. Recall that a UDF, which is able to be invoked from any cell, is not allowed to make any change to a workbook.

Instead, we will use the Scripting Dictionary. This dictionary is implemented as a Hashmap, such that lookup is O(1).

To use the scripting dictionary, we need to make use of the Microsoft Scripting Library, and the Scripting.Dictionary object within that library. We can add a Reference to the Microsoft Scripting Library via the Tools menu, References... menu item, and checking off the checkbox for this library:


This will allow us to declare our variable using the Dictionary data type.
Option Explicit

Dim dictResults As Dictionary

Sub foo()
    Set dictResults = New Dictionary
    Call dictResults.Add("foo", "bar")
    Debug.Print dictResults("foo")
End Sub
Alternatively, we could just use CreateObject as follows:

Option Explicit

Dim dictResults As Object

Sub foo()
    Set dictResults = CreateObject("Scripting.Dictionary")
    Call dictResults.Add("foo", "bar")
    Debug.Print dictResults("foo")
End Sub
The former is better because it makes use of early binding. Also, we can use Intellisense to guide us while programming.

We can assign an entry to a Dictionary because any changes to the Dictionary occurs in the code, rather than on the sheet.

Here is the new version of the UDF, with the lines number put in only for the new lines which handle memoization.
Option Explicit

Dim dictResults As Dictionary

Function MinEditDistance(ByVal Source As String, ByVal Destination As String) As Long
10  If dictResults Is Nothing Then
20      Set dictResults = New Dictionary
30  End If
    
40  Dim key As String
50  key = Source & "@@@@@@@" & Destination
60  If dictResults.Exists(key) Then
70      MinEditDistance = dictResults(key)
80      Exit Function
90  End If

    Const DeletionCost = 1
    Const InsertionCost = 1
    Const SubstitutionCost = 1
        
    Dim matrix() As Long
    ReDim matrix(Len(Source), Len(Destination))
    
    ' initialize first column and row of array
    Dim c As Long, r As Long
    For c = 0 To Len(Destination)
        matrix(0, c) = c
    Next c
    
    For r = 0 To Len(Source)
        matrix(r, 0) = r
    Next r
    
    ' dynamic programming to fill matrix
    For r = 1 To Len(Source)
        For c = 1 To Len(Destination)
            matrix(r, c) = WorksheetFunction.Min(matrix(r - 1, c) + DeletionCost, matrix(r, c - 1) + InsertionCost, matrix(r - 1, c - 1) + IIf(Mid(Source, r, 1) <> Mid(Destination, c, 1), SubstitutionCost, 0))
        Next c
    Next r
    
    MinEditDistance = matrix(Len(Source), Len(Destination))
100 dictResults.Add key, MinEditDistance
End Function
Outside the function, we have declared dictResults as a reference to a Dictionary object, though we do not create the actual Dictionary object.

Then,
10  If dictResults Is Nothing Then
20      Set dictResults = New Dictionary
30  End If
We check if a dictResults has not been created (or if it was set to nothing). If so, we create it.

Then,
40  Dim key As String
50  key = Source & "@@@@@@@" & Destination
60  If dictResults.Exists(key) Then
70      MinEditDistance = dictResults(key)
80      Exit Function
90  End If
We use as our key the Source string followed by seven @ symbols followed by the Destination string. The hope is that the string  "@@@@@@@" is unique enough that it will not occur in either the source or destination string. If this key already exists in the dictionary, then we have stored the result of the MinEditDistance computation, and so we retrieve it from the dictionary (an O(1) operation) and then return it. Otherwise, we proceed with the calculation.

Finally,
100 dictResults.Add key, MinEditDistance
At the very end of the function, besides assigning MinEditDistance (the return code) the result of the computation, we add it to the Dictionary under that key as well.

Note that the Minimum Edit Distance of Source to Destination is the same as that of Destination to Source. If so, we could save on space in our dictionary by making the key the smaller of the strings followed by the larger of the strings:
Dim key As String
If Source < Destination Then
    key = Source & "@@@@@@@" & Destination
Else
    key = Destination & "@@@@@@@" & Source
End If
Because in many cases we cannot be certain that the string "@@@@@@@" or any other selected string will not occur in either the Source or the Destination strings, we can instead maintain a Dictionary of Dictionaries. That is, the key of the first Dictionary will be just the Source string. Lookup by the Source key will yield another Dictionary, in which the key is the Destination string and the value is the calculated MinEditDistance.

The VBA code is as follows, with the line numbers put in only for the lines we have added:
Option Explicit

Dim dictResults As Dictionary

Function MinEditDistance(ByVal Source As String, ByVal Destination As String) As Long
    If dictResults Is Nothing Then
        Set dictResults = New Dictionary
    End If
    
10  Dim key1 As String, key2 As String
20  key1 = Source
30  key2 = Destination
    
40  If dictResults.Exists(key1) Then
50      Dim dict As Dictionary
60      Set dict = dictResults(key1)
70      If dict.Exists(key2) Then
72          MinEditDistance = dict(key2)
75          Exit Function
80      End If
90  Else
100     dictResults.Add key1, New Dictionary
110 End If

    Const DeletionCost = 1
    Const InsertionCost = 1
    Const SubstitutionCost = 1
        
    Dim matrix() As Long
    ReDim matrix(Len(Source), Len(Destination))
    
    ' initialize first column and row of array
    Dim c As Long, r As Long
    For c = 0 To Len(Destination)
        matrix(0, c) = c
    Next c
    
    For r = 0 To Len(Source)
        matrix(r, 0) = r
    Next r
    
    ' dynamic programming to fill matrix
    For r = 1 To Len(Source)
        For c = 1 To Len(Destination)
            matrix(r, c) = WorksheetFunction.Min(matrix(r - 1, c) + DeletionCost, matrix(r, c - 1) + InsertionCost, matrix(r - 1, c - 1) + IIf(Mid(Source, r, 1) <> Mid(Destination, c, 1), SubstitutionCost, 0))
        Next c
    Next r
    
    MinEditDistance = matrix(Len(Source), Len(Destination))
120 dictResults(key1).Add key2, MinEditDistance
End Function
We now go through these lines in detail. First, we now have two keys, key1 and key2.

10  Dim key1 As String, key2 As String
20  key1 = Source
30  key2 = Destination

Key1 is the Source string, used to look into the primary Dictionary, dictResults. Key2 is the Destination string, used to look into one of the Dictionaries stored in the primary dictionary.

Next,

40  If dictResults.Exists(key1) Then
50      Dim dict As Dictionary
60      Set dict = dictResults(key1)
70      If dict.Exists(key2) Then
72          MinEditDistance = dict(key2)
75          Exit Function
80      End If
90  Else
100     dictResults.Add key1, New Dictionary
110 End If

If key1 exists in the primary Dictionary, then there has been a secondary Dictionary stored there. And so we assign that Dictionary to the dict variable. We then check if key2 exists in that secondary Dictionary. If it does, then we have matched both Source and Destination strings, and so at some time in the past, we have already computed MinEditDistance for those two strings. Therefore, we retrieve that computed value and return.

A second possibility is that key1 does not exist in the primary dictionary. If so, that particular Source string has never been used in a MinEditDistance operation. We create a new, blank, secondary dictionary and store it in the primary dictionary under the key key1. In this way, in the future, and later in the code, both primary and secondary dictionary will exist.

A third possibility is that key1 is found in the primary dictionary but key2 is not found in its secondary dictionary. If so, there is nothing to do. The the primary and secondary dictionaries exist, and we fall through this code without doing anything.

Finally,

120 dictResults(key1).Add key2, MinEditDistance

We have changed the memoization step here. We confidently look up the secondary dictionary via a call to dictResults(key1). After all, if we have reached this point in the code, an entry in the primary dictionary for key1 either existed from before, as in line 40, or else was added in line 100. With that secondary dictionary in hand, we add an entry under the key key2, with the computed value MinEditDistance.

One last modification is, as before, to save on space. Earlier we said:
Note that the Minimum Edit Distance of Source to Destination is the same as that of Destination to Source. If so, we could save on space in our dictionary by making the key the smaller of the strings followed by the larger of the strings:
We do the same, as follows:
Dim key1 As String, key2 As String
If Source < Destination Then
    key1 = Source
    key2 = Destination
Else
    key1 = Destination
    key2 = Source
End If
The rest of the code follows as before.

This memoization makes it so that we are not calling this time-consuming operation millions of times unnecessarily.

Yet, there is room for improvement. A Dictionary object resides in memory, rather than on the sheet. That means that if something were to wipe out that Dictionary, all the results would need to be recomputed. This would happen if there were some VBA error which caused a crash, if the user clicked on the Stop button in the VBA editor. Or, quite commonly, if the user closes the workbook. We need some way to make this persistent. Perhaps in a part three of this series we will cover this.

Sunday, March 1, 2015

Minimum Edit distance, part 1

Minimum Edit distance can be used to perform fuzzy string matching.

Instead of calculating whether a string S and a string T are identical, you can use minimum edit distance to see how close two strings are to each other.

For instance, KITCHEN and MITTEN are not identical, but they share several common letters, in order. The minimum edit distance is the minimum number of insert, delete, and substitute operations which will transform the source string into the destination string.

This can be used, together with a threshold, to discover if two strings are close enough to each other that you would consider them identical.

(You can modify this basic minimum edit distance in various ways. For instance, here, the cost of each insertion, deletion, or substitution operation is 1. You can define a cost function so that these operations have different costs.)

I constructed the rather neat spreadsheet below, which calculates Minimum Edit Distance using only three formulas. Perhaps first read the Wikipedia entry on Minimum Edit Distance works.


 

In cell B3 is the value 0.
In C3 (and then copied to the right) is =B3+1
In B4 (and then copied down) is =B3+1
In C4 (and then copied down and to the right) is =MIN(C3+1, B4+1, B3+(C$2<>$A4))

The way this works is as follows. KITCHEN is the source word and SMITTEN is the destination word. Row 3 shows the cost of inserting each letter of the destination word where the source word (in the third row) is still null. Column B shows the cost of deleting each letter of the source word where the destination (in column B) is still null.

The formula is C4 is where all the calculation occurs. That formula, again, is:
=MIN(C3+1, B4+1, B3+(C$2<>$A4))

That is, take the minimum of three values:

  • C3+1: same column, one row up, plus the deletion cost of 1
  • B4+1: same row, one column left, plus the insertion cost of 1
  • B3+(C$2<>$A4): one column left, one row up, plus the substitution cost. The substitution cost will either be 0 or 1, as follows. If C$2 (the current letter of the destination word, with a mixed reference of $2 fixing the row) <> $A4 (the current letter of the source word, with a mixed reference of $A fixing the column), then 1. Otherwise, 0.

--------------------

We could also code the same in VBA, as a UDF:

Option Explicit

Function MinEditDistance(ByVal Source As String, ByVal Destination As String) As Long
    Const DeletionCost = 1
    Const InsertionCost = 1
    Const SubstitutionCost = 1
    Dim matrix() As Long
    ReDim matrix(Len(Source), Len(Destination))
    
    ' initialize first column and row of array
    Dim c As Long, r As Long
    For c = 0 To Len(Destination)
        matrix(0, c) = c
    Next c
    
    For r = 0 To Len(Source)
        matrix(r, 0) = r
    Next r
    
    ' dynamic programming to fill matrix
    For r = 1 To Len(Source)
        For c = 1 To Len(Destination)
            matrix(r, c) = WorksheetFunction.Min(matrix(r - 1, c) + DeletionCost, matrix(r, c - 1) + InsertionCost, matrix(r - 1, c - 1) + IIf(Mid(Source, r, 1) <> Mid(Destination, c, 1), SubstitutionCost, 0))
        Next c
    Next r
    
    MinEditDistance = matrix(Len(Source), Len(Destination))
End Function

------------------

However, I've discovered that when there are a lot of calls to MinEditDistance in a workbook, then it can slow to a crawl.

This is because the performance of MinEditDistance is O(n * m), where n and m are the lengths of the source and destination strings. Perhaps in a later post we can consider how to improve this performance.
to be continued...

Saturday, April 4, 2009

How many sheets do I have in my workbook?

Saw this question here:
Great job as usual.  I'm sorry that I did not make myself clear.  I
have several workbooks that contain several sheets each and while I do
have tabs at the bottom, I wanted to see a number perhaphs at the
bottom representing how many sheets the workbook had upon opening it.
If that's not possible, that's okay.  I really appreciate the
shortcut.  It's going to come in very handy. Thanks
The simple answer is to create a UDF, which reads:
Public Function NumSheets()
NumSheets = ThisWorkbook.Sheets.Count
End Function

Then, in a cell, type =NumSheets()

Sometimes it is a trivial answer.

Wednesday, April 1, 2009

Generating a Word Count of an Excel Workbook

That is the question at this forum:

Someone suggests: Save as a CSV file.
Another gives some VBA code he created. See inside.

How would I do it? Well, VBA is certainly one way to go, but why do this when a simple formula will accomplish the same thing? If you make it a macro, then you need to enable macros for that spreadsheet, and some folks are wary of macros.

How do you get a word count of words in an individual cell? Well, a similar way to the VBA approach, the annoying kludge of substitute of space for null followed by a difference in length. There are other ways to avoid this kludge, but for now we will not:

That is, imagine the text is in A1.
The formula in A2: =TRIM(A1)
This will remove any spaces before and after the text, as well as runs of spaces inside the text.
A3: =SUBSTITUTE(A2, " ", "")
A4: =1 + LEN(A1) - LEN(A3)
Since we replaced all " " with nothings, each replacement will make it one character less. The difference between these two lengths are the number of spaces. That number + 1 is the number of words.

That is for a single cell. The combined formula would be:
=1+LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1)," ",""))

Now what about for a range of cells? We just use an array formula, and take the SUM. Also, if the cell does not have text, we don't want to count it. For the range A1:D4, type the following in some other cell:
=SUM(IF(ISTEXT(A1:D4), 1+LEN(TRIM(A1:D4))-LEN(SUBSTITUTE(TRIM(A1:D4)," ","")), 0))
and press Ctrl-Enter. Thus will give you a count of all words in that range.

How do we do it for all cells on a given worksheet? Make the reference be 1:1048576.
=SUM(IF(ISTEXT(1:1048576), 1 + LEN(TRIM(1:1048576)) - LEN(SUBSTITUTE(TRIM(1:1048576), " ","")), 0))

Of course, this would not work, because Excel would run out of resources before calculating that value, and will give us an error message to that effect. Besides, if it encompasses the entire spreadsheet, you will get a circular reference. But use a smaller range and it should work fine.

Which is better, the VBA or this? I don't know.

Tuesday, March 31, 2009

Calling Python from an Excel UDF

Just to start of this blog, I'll post some interesting code I found, which lets you run python within VBA. I got this from here. I'll comment on it afterward.

Global sc As New MSScriptControl.ScriptControl

Public Function os_getcwd()
sc.Language = "python"
sc.ExecuteStatement ("import os")
os_getcwd = sc.Eval("os.getcwd()")
End Function

With this you can set your Excel formula to =os_getcwd()

For me it returns "C:\Documents and Settings\Administrator\My Documents",
which I needed to know at the time so I didn't have to screw around with
the ever annoying pythonpath.
Read it all. We can do other things in it. For example, I would suggest the following:
sc.ExecuteStatement ("x = 7 + 6")
os_getcwd = sc.Eval("x")
and we would get 13. This is just a proof of concept. Practical examples which make use of this would be useful may be forthcoming if I am interested enough. In particular, the memory this retains plus the built in set and list formers may lead to some very powerful built-in functions. Clever!

One such example: How do I center text in a string, by padding with spaces before and after, such that the entire length of the string is 100. I could take the string length, figure out have many spaces to append before, how many after, and so on. Or else I could use this UDF:

Public Function CenterText(text, width)
text = Chr(34) & text & Chr(34) ' add quotes around "text"
sc.ExecuteStatement ("import string")
sc.ExecuteStatement ("x = string.center(" & text & ", " & width & ")")
CenterText = sc.Eval("x")
End Function

See how I first build the command using string concatenation, and then run it.
I don't actually need to store in a temporary in the meantime. I could write:

Public Function CenterText(text, width)
text = Chr(34) & text & Chr(34) ' add quotes around "text"
sc.ExecuteStatement ("import string")
CenterText = sc.Eval("string.center(" & text & ", " & width & ")")
End Function

And of course, the "import string" should come much earlier, on the load of the workbook, so that we don't need to import every single time.

Is there anything you would like to accomplish in Excel, but don't know how? Ask away!