Levenshtein distance

March 28th, 2006

We usu­ally want to talk about Lev­en­shtein dis­tance when we want to com­pare two dif­fer­ent strings. Prat­i­cally, the Lev­en­shtein dis­tance (as its creator’s name, Vladimir Lev­en­shtein) is a mea­sure of the sim­i­lar­ity between two strings. If you want to look at the algo­rithm in detail, then you should take a look at this page avail­able on wikipedia.

The Lev­en­shtein dis­tance is noth­ing but else than the number of steps needed to trans­form a string into another one. For exam­ple, if we want to go from “cat” to “cats”, then we have a Lev­en­shtein dis­tance equal to 1 (since we added only a ‘s’, so one step).

Here is the code I done (tested with gcc, after a strip --strip-unneeded is only 4.5kb large):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAX_STR_LEN     100

typedef unsigned short int Matrix;

static int levenshtein(const char *, const char *);
static int min(const int, const int, const int);
static char *strlower(char *);

int
main(int argc, char **argv)
{
        unsigned short int i, ss, c_sens = 1;
        char str[1][MAX_STR_LEN];

if (argc < 3) {
                printf("Usage: %s [-i] <string1> <string2>n", argv[0]);
                return EXIT_FAILURE;
        }

ss = 0;
        for (i = 1; i < argc; i++)
        {
                if (!strcmp(argv[i], "-i")) {
                        c_sens = 0;
                } else {
                        strcpy(str[ss], argv[i]);
                        ss++;
                }
        }

if (!c_sens) {
                strlower(str[0]);
                strlower(str[1]);
        }

printf("[%s]/[%s]:%dn", str[0], str[1], levenshtein(str[0], str[1]));

return EXIT_SUCCESS;
}

static int
levenshtein(const char *str1, const char *str2)
{
        unsigned short int n, m, i, j, cost;
        Matrix matrix[MAX_STR_LEN][MAX_STR_LEN];

n = strlen(str1);
        m = strlen(str2);

if (n == 0)
                return m;
        if (m == 0)
                return n;

for (i = 0; i <= n; i++)
                matrix[i][0] = i;
        for (i = 0; i <= m; i++)
                matrix[0][i] = i;

for (i = 1; i <= n; i++)
        {
                for (j = 1; j <= m; j++)
                {
                        cost = ((str1[i - 1] == str2[j - 1]) ? 0 : 1);
                        matrix[i][j] = min(     matrix[i - 1][j] + 1,
                                                matrix[i][j - 1] + 1,
                                                matrix[i - 1][j - 1] + cost);
                }
        }

return matrix[n][m];
}

static int
min(const int a, const int b, const int c)
{
        unsigned short int min = a;

if (b < min)
                min = b;

if (c < min)
                min = c;

return min;
}

static char *
strlower(char *a)
{
        unsigned short int i, len = strlen(a);

for (i = 0; i < len; i++)
                *(a + i) = tolower(*(a + i));

return a;
}

If you want a case insen­si­tive com­pu­ta­tion, use the -i switch on the com­mand line.
This ver­sion could be improved by using point­ers when threat­ing the matrix, but maybe I would do it for a newer release.

0 Comments, tagged with Coding

What’s about IOCCC?

March 17th, 2006

I guess you already know about IOCCC. But if you don’t, here is an exam­ple appli­ca­tion I wrote. That surely isn’t one of the most well writ­ten exam­ples, but since it is my first obfus­cated c pro­gram, I think it is also enough :)
I didn’t obfus­cated it using #define’s since by using -E gcc’s option you can easily retrieve the start­ing code. This is really a obfus­cated, chaotic pro­gram. And, of course, I won’t give you the solu­tion (but if you under­stand what it does, please let me know)!
There’s an issue: this code doesn’t fit into the blogspot layout. I really have to make a per­son­al­ized layout that allow me to include code snippets.

#include <stdio.h>
#include <string.h>
int bl(const char *) ;; int main (int na , char **a) { if (na != 2) return -1;;
printf("%cn", bl (*(a + 1)) + '0'); return 0; } int bl(const char *expr) { int
len = strlen(expr) * 2, i = 0, b = 0; enum S { $S, BO, BC, $E }; enum S s = $S;
char x; while (s != $E && i < (len / 2)){ x = *(expr + i); if (x == ' ') { i++;
continue; } switch (s) { case $S: if (x == 'x28') { s = BO; b++;} else s = $E;
break; case BO: case BC: if (x == 'x28') { s = BO; b++;} else if (x == 'x29')
{ --b; if (b == 0) { s = $S;; } else s = BC; } else { s = $E; } break; case $E:
default:return 1;break; } i++; } if ((b != 0) || (s == $E)) return 1;return 0;}

1 Comment, tagged with Coding

Javascript script execution in innerHTML: the revenge

March 7th, 2006

The execJS() I posted some time ago have some prob­lems (and it, yes, was just a modify to the AHAH ver­sion). I didn’t really real­ized what exactly it is, but I found a simple (?) solu­tion. The prob­lem, as far as I can under­stand, is that eval() doesn’t always exe­cute the code. So here it is the workaround: look for <script> tags, take its con­tent and create a new ele­ment into the <head> with createElement/appendChild. In this way we should also be more standard-​compliant than before:

function execJS(node)
{
  var bSaf = (navigator.userAgent.indexOf('Safari') != -1);
  var bOpera = (navigator.userAgent.indexOf('Opera') != -1);
  var bMoz = (navigator.appName == 'Netscape');

  if (!node) return;

  /* IE wants it uppercase */
  var st = node.getElementsByTagName('SCRIPT');
  var strExec;

  for(var i=0;i<st.length; i++)
  {
    if (bSaf) {
      strExec = st[i].innerHTML;
      st[i].innerHTML = "";
    } else if (bOpera) {
      strExec = st[i].text;
      st[i].text = "";
    } else if (bMoz) {
      strExec = st[i].textContent;
      st[i].textContent = "";
    } else {
      strExec = st[i].text;
      st[i].text = "";
    }

    try {
      var x = document.createElement("script");
      x.type = "text/javascript";

      /* In IE we must use .text! */
      if ((bSaf) || (bOpera) || (bMoz))
        x.innerHTML = strExec;
      else x.text = strExec;

      document.getElementsByTagName("head")[0].appendChild(x);
    } catch(e) {
      alert(e);
    }
  }
};

I tested it only under fire­fox, but it should work on other browsers too. If it doesn’t, let me know.

Update (Octo­ber 1st, 2006): now it works under Inter­net Explorer too (if you want to inject some text in a script ele­ment in IE you can’t use .inner­HTML, but you have to use .text!)

26 Comments, tagged with Javascript

Gimp tutorials

March 2nd, 2006

There was a time when I was younger that I wanted to learn how to use The Gimp. Look­ing for tuto­ri­als has always been an issue: there isn’t a lot of mate­ri­als about Gimp, surely a lot less than Pho­to­shop.
So here I am: now that I have a dis­creet knowl­edge of Gimp I would give a hand. How? Making video tuto­ri­als (with audio, maybe). There are some prob­lems: how should I record the video (I need to get the mouse pointer vis­i­bile)? How could I add the audio? How to encode it? Where to put the .avi?
I can’t say exactly how much band­width I need: if this thing would have suc­cess, then I’d need some­thing like a 500Mb/month of guar­an­tee band­width. Oth­er­wise a 100Mb host­ing space on AOL could be enough.
There’s another issue: I need of some­one who cor­rect my eng­lish (and speak­ing is a lot more dif­fi­cult) :)
I’ll let you know.

0 Comments, tagged with Graphic

Microblogging

  • Funny thing: yesterday night I had an idea about a good blog post I could make. But now I completely forgot what that idea was about. 13 hours ago #
  • I think pownce has a little issue with caching since if I delete a message and I write a new one, it doesn't appear in my homepage. Nov 16, 6:34pm #
  • I didn't know that something like [(x,y) for x in range(10) for y in range(x)] was possible in Python. Nov 16, 3:45pm #
  • I'm about to go to the local LUG dinner: pizza for everyone. Nov 14, 9:15pm #
  • Lately I've been very interested in fast data structures with minimum memory usage. Just surprised to find out that list comprehension in Python are sometimes slower for large quantities of data than classic for loops. Still trying to understand why (if someone has a clue, please let me know). Nov 12, 12:44pm #
  • So wordpress was silently modifying HTTP request headers and I was getting a 400 when fetching Pownce RSS. Now everything works as expected on my blog, shame on WP. Nov 9, 3:58pm #
  • Experimenting with document language identification. Nov 6, 10:23pm #
  • So looks like I finally found an interesting topic apart from web development: information retrieval. Nov 3, 5:13pm #
  • Planning a trip to Bologna in December Nov 1, 5:24pm #
  • After today, I want to go as far as I can from Italy. Oct 29, 11:49am #

Search


« Authored by Giuliani Vito Ivan »