Cleaning up the dirtyPath

One of our recent projects involved creating Excel functionality inside a custom application we built for a client. After a few iterations where we tried to emulate most of the basic features of Excel (including enabling the pasting of tables through a lot of parsing and basic table functionalities), it became clear that we would not be able to arrive at a user-friendly solution without completely re-creating Excel.

So we came up with a different approach. Instead of letting the user manipulate pasted Excel content inside the application itself, we made it easy for the user to get back to the original table inside Excel. We accomplished this by pasting an image instead of the table data, which is supported by Excel by default (copy as image) and then associating the pasted image with the original Excel file that can be opened from our application.

Now it would be nice to know the location of the original Excel file from the clipboard and associate that with the image automatically, right? Here’s how we did it.

The clipboard provides you with several types and formats of information, some more or less interesting. You can get a full list of all formats with this:

Clipboard.GetDataObject().GetFormats()

In this case we are only interested in the “Link Source”. There are several ways to get to it, but I chose to get a plain ASCII string:

if (!Clipboard.ContainsData("Link Source"))
 {
   return string.Empty;
 }
var mstream = (MemoryStream) Clipboard.GetData("Link Source");
var dirtyPath = Encoding.ASCII.GetString(mstream.ToArray());

This gives us a pretty dirty string with a lot of chars that don’t mean much (e.g., \0 – lots of them). The first time I worked on this I tried to turn everything into HEX and then find patterns. That was a terrible idea, especially if you are using anything but apps like Word, notepad++, web browser, plain paper, etc. Since the encoding might differ slightly from app to app, you could miss the important parts.

After some more digging I found a solution. In the dirtyPath there are Unicode characters that will tell us everything we need to know about the string we are reading. Here is a list http://unicode-table.com/en/.

Here are the two important ones for us:

  • \u0004 = End of Transmission
  • \u0003 = End of Text

We also have \u0002 that stands for Start of Text, but I didn’t need it for this issue (but it was helpful to know).

Here is an example of the “dirtyPath”:

dirtypathdark

If you look closely there are two paths inside this string: one is the temporary location for the Office file (.xls) and one is the actual location within the sheet (.xlsx). But the second path is pretty obstructed.

dirtypathlight

I needed the second path so here is what I did. I parsed the string for the last “End of Transmission” Unicode, which is the last place before we are getting non-path related info (the current selection of cells and book inside our Excel worksheet).

var endOfTransmissionIndex = dirtyPath.LastIndexOf('\u0004');
if (endOfTransmissionIndex < 0)
{
   return string.Empty;
}
dirtyPath = dirtyPath.Substring(0, endOfTransmissionIndex);

Now the dirtyPath looks like this:

dirtypath2

It’s just a little shorter now, and we removed the extra info from the end. We can safely strip off the other front now by finding the last “End of Text” Unicode character.

var endOfTextIndexForPath = dirtyPath.LastIndexOf('\u0003');
if (endOfTextIndexForPath < 0) 
{ 
   return string.Empty; 
} 
dirtyPath = dirtyPath.Substring(endOfTextIndexForPath + 1);

This time the dirty path is a lot shorter:

dirtypath3

Now let’s just get rid of all the weird characters.

public static string StripWeirdChars(string source) 
{ 
   string res = ""; 
   foreach (char c in source) if ((int)c >= 32) res += c; 
   return res; 
}
var cleanPath = StripWeirdChars(dirtyPath);

Our final clean path looks like this:

C:\\Users\\[username]\\Documents\\Tester.xlsx

And that’s it. You can strip anything out of the clipboard this way as long as the file you are pasting from is saved on your machine or local network.

If the file is not saved you will still get the current selection info. That means the original dirtyPath will look like this:

dirtypath4

Cleaned up:

dirtypath5

NOTES:

  1. All pasted strings are from Visual Studio 2015 and might look different in other environments. The Unicode and “weird” characters are not displayed everywhere, so be careful.
  2. File to be copied from must be saved and not in read-only mode to extract a path from the clipboard.
  3. I have tested this on Windows 10, Windows 8 & 8.1, Windows 7, and with Office 2007 through Office 365.