TinTin++ Mud Client The TinTin++ message board

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
TinTin++ Mud Client

Patch for a fast indexing system for Map Descriptions

 
Post new topic   Reply to topic    The TinTin++ message board Forum Index -> Development
View previous topic :: View next topic  
Author Message
Slysven



Joined: 10 Apr 2011
Posts: 365
Location: As "Jomin al'Bara" in WoTMUD or Wiltshire, UK

PostPosted: Sat Jan 12, 2013 6:54 pm    Post subject: Patch for a fast indexing system for Map Descriptions Reply with quote

I have been using this code for some months now that uses an hash function on non-null room descriptions. By linking it with #ACTIONs that captures the room description as I move around (brief mode OFF for some MUD servers) I can identify where I am in a previously explored map in a fraction of a second for a map of more than 10K rooms - provided the description has not changed. Theoretically by using the 4 Hex digit hash value - possibly with the room name - it ought to be possible to pass location information to other parties even if they are using a different map file (or if #MAP VNUM has been used by either) with just this information - no need to send a multi-line description!!! The only other ways currently to search for a room with a matching description are:
  • use a #MAP {LIST} {roomdesc} {<description>}
  • scan the Map data manually by searching each room VNUM with a #MAP {GET} {ROOMAREA} {temp_variable} {VNUM} and then doing an #IF {"<temp_variable>"=="<description>"}
Unfortunately at least the second method has been unreliable when <description> is a multiple-line text string in the recent past; also, both these methods require a scan through the entire Map room_list data which is relatively very slow compared to the index system I offer here.

The hash value for each description is generated automatically and is available for those interested with a #MAP GET {ROOMHASH}; utilisation is by a new #MAP command #MAP {INDEX} in the manner of:
Code:
#MAP {INDEX} {MAKE} {<description>} [{<variable>}]
Given a text description - possibly from an #ACTION that captures the output from the MUD, displays (or stores in the given variable) the corresponding Hash value.
Code:
#MAP {FIND} {<hashvalue>|<description>} [{variable}]
Given a hexadecimal value comprised of exactly 4 digits (0-9,A-F,a-f) ONLY, or a text description as before, display or return as a semicolon separated list in a variable (which can be parsed with a #FORALL {$<variable>} {...} to examine each value) a list of rooms with a matching hash value. The value used here is 2^16-1 or a hash value range of 64K so the chance of hash collisions should be minimal if I have chosen a good function to use and even if there are collisions the number of rooms that will have to be examined further (for exits or name matches) should be a very small number for a map of thousands of rooms.

This patch was built against the 2.00.9 beta release code as 12 Jan 2012.
Code:
diff -u -p ./tt_2_00_9_beta_20130112/src/help.c ./tt_2_00_9_beta_sly/src/help.c
--- ./tt_2_00_9_beta_20130112/src/help.c   2012-12-26 21:50:43.000000000 +0000
+++ ./tt_2_00_9_beta_sly/src/help.c   2013-01-13 00:07:41.000000000 +0000
@@ -904,6 +904,13 @@ struct help_type help_table[] =
       "         #map insert <direction> [roomflag]: Insert a room in the given\n"
       "                  direction. Most useful for inserting void rooms.\n"
       "\n"
+      "         #map index make <description> [variable]: display [or put in variable\n"
+      "                  the hash value for the given description text.\n"
+      "\n"
+      "         #map index find <hash>|<description> [variable]: display [or put in\n"
+      "                  variable] a list of rooms that has a matching hash (exactly\n"
+      "                  4-hex digits OR a text description.\n"
+      "\n"
       "         #map jump <x> <y> <z>: Jump to the given coordinate, which is relative\n"
       "                  to your current room.\n"
       "\n"
diff -u -p ./tt_2_00_9_beta_20130112/src/mapper.c ./tt_2_00_9_beta_sly/src/mapper.c
--- ./tt_2_00_9_beta_20130112/src/mapper.c   2013-01-01 20:51:45.000000000 +0000
+++ ./tt_2_00_9_beta_sly/src/mapper.c   2013-01-12 22:14:54.000000000 +0000
@@ -77,6 +77,7 @@ DO_COMMAND(do_map)
       tintin_printf2(ses, "#map flag     <map flag>               (set map wide flags)");
       tintin_printf2(ses, "#map get      <option>     <variable>  (get various values)");
       tintin_printf2(ses, "#map goto     <location> [exits]       (moves you to given room)");
+      tintin_printf2(ses, "#map index    <find|index> <h|d> [<v>] (find room by/make index of desc)");
       tintin_printf2(ses, "#map leave                             (leave the map, return with goto)");
       tintin_printf2(ses, "#map legend   <symbols>                (sets the map legend)");
       tintin_printf2(ses, "#map link     <direction>  <room name> (links 2 rooms)");
@@ -725,6 +726,7 @@ DO_MAP(map_get)
       tintin_printf2(ses, "   roomdesc: %s", room->desc);
       tintin_printf2(ses, "  roomexits: %d", get_map_exits(ses, room->vnum));
       tintin_printf2(ses, "  roomflags: %d", room->flags);
+      tintin_printf2(ses, "   roomhash: %04x", room->hash);
       tintin_printf2(ses, "   roomname: %s", room->name);
       tintin_printf2(ses, "   roomnote: %s", room->note);
       tintin_printf2(ses, " roomsymbol: %s", room->symbol);
@@ -753,6 +755,10 @@ DO_MAP(map_get)
       {
          set_nest_node(ses->list[LIST_VARIABLE], arg2, "%d", room->flags);
       }
+      else if (is_abbrev(arg1, "roomhash"))
+      {
+         set_nest_node(ses->list[LIST_VARIABLE], arg2, "%04x", room->hash);
+      }
       else if (is_abbrev(arg1, "roomname"))
       {
          set_nest_node(ses->list[LIST_VARIABLE], arg2, "%s", room->name);
@@ -825,6 +831,93 @@ DO_MAP(map_goto)
    }
 }
 
+DO_MAP(map_index)
+{
+   char arg3[BUFFER_SIZE], buff[BUFFER_SIZE], temp[BUFFER_SIZE];
+   struct desc_hash_data *ptr;
+   size_t j, hash;
+
+   *buff='\0';
+   *temp='\0';
+
+   arg = sub_arg_in_braces(ses, arg, arg1, GET_ONE, SUB_VAR|SUB_FUN); //action to carry out
+   arg = sub_arg_in_braces(ses, arg, arg2, GET_ONE, SUB_VAR|SUB_FUN); //roomdesc/roomhash
+   arg = sub_arg_in_braces(ses, arg, arg3, GET_ALL, SUB_VAR|SUB_FUN); //optional variable
+
+   if (*arg1=='\0'||*arg2=='\0')
+   {
+      tintin_printf2(ses, "MAP INDEX {FIND} {<roomdesc/hash>} [{<variable>}]  list rooms matching given description/hash value");
+      tintin_printf2(ses, "                                                   or store in a given variable.");
+      tintin_printf2(ses, "MAP INDEX {MAKE} {<roomdesc>} {[<variable>]}       display or store hash value for given description");
+      return;
+   }
+
+   if (is_abbrev(arg1, "make"))
+   {
+      hash=make_desc_hash(arg2);
+
+      if (*arg3)
+      {
+         add_nest_node(ses->list[LIST_VARIABLE], arg3, "%04X", hash);
+      }
+      else
+      {
+         tintin_printf2(ses, "MAP INDEX {MAKE}: Hash of given description: %04X", hash);
+      }
+   }
+   else if (is_abbrev(arg1, "find"))
+   {
+      if (strlen(arg2)==4)
+      {
+         if(isxdigit((unsigned int)arg2[0])&&isxdigit((unsigned int)arg2[1])&&isxdigit((unsigned int)arg2[2])&&isxdigit((unsigned int)arg2[3]))
+         {
+            hash=(size_t)strtol(arg2,NULL,16);
+         }
+         else
+         {
+            hash=make_desc_hash(arg2);
+         }
+      }
+      else
+      {
+         hash=make_desc_hash(arg2);
+      }
+      ptr=&ses->map->hash_index[hash];
+
+      if (ptr->count>0)
+      {
+         for (j=0; j<ptr->count; j++)
+         {
+            if (*buff)
+            {
+               snprintf(temp, BUFFER_SIZE, "%s;%d", buff, ptr->room[j]);
+            }
+            else
+            {
+               snprintf(temp, BUFFER_SIZE, "%d", ptr->room[j]);
+            }
+            strncpy(buff, temp, BUFFER_SIZE);
+         }
+      }
+
+      if (*buff!='\0')
+      {
+         if (*arg3)
+         {
+            add_nest_node(ses->list[LIST_VARIABLE], arg3, "%s", buff);
+         }
+         else
+         {
+            tintin_printf2(ses, "MAP INDEX {FIND}: Rooms that match given hash or hash of given description: %s", buff);
+         }
+      }
+   }
+   else
+   {
+      tintin_printf2(ses, "MAP INDEX: Invalid option, use either {FIND} <roomdesc/hash> [<variable>] or {MAKE} <roomdesc> [<variable>]");
+   }
+}
+
 DO_MAP(map_info)
 {
    int room, cnt, exits;
@@ -1606,6 +1699,7 @@ DO_MAP(map_set)
       tintin_printf2(ses, " roomsymbol: %s", room->symbol);
       tintin_printf2(ses, "roomterrain: %s", room->terrain);
       tintin_printf2(ses, " roomweight: %.3f", room->weight);
+      tintin_printf2(ses, "   roomhash: %04x", room->hash);
    }
    else
    {
@@ -1626,7 +1720,25 @@ DO_MAP(map_set)
       }
       else if (is_abbrev(arg1, "roomdesc"))
       {
+      /*Need to remove old hash entry for this description*/
+         if (room->desc!=NULL)
+         {
+            if (strcmp(room->desc,""))
+               remove_desc_hash(ses, room->vnum, room->hash);
+         }
+         room->hash = 0;
          RESTRING(room->desc, arg2);
+      /*And add new one?*/
+         if (room->desc!=NULL)
+         {
+            if (strcmp(room->desc,""))
+            {
+               room->hash=make_desc_hash(room->desc);
+               if (room->hash==0)
+                  tintin_printf2(ses, "#MAP SET ROOMAREA: UM, GOT A ROOM WITH A DESCRIPTION \"%s\"BUT THE HASH COMES TO ZERO!", room->desc);
+               add_desc_hash(ses, room->vnum, room->hash);
+            }
+         }
          show_message(ses, -1, "#MAP SET: roomdesc set to: %s", arg2);
       }
       else if (is_abbrev(arg1, "roomflags"))
@@ -2044,6 +2156,8 @@ void create_map(struct session *ses, cha
    ses->map->path_color = strdup("<138>");
    ses->map->room_color = strdup("<178>");
 
+   init_desc_hash(ses);  /*Ought to test return value...!*/
+
    create_room(ses, "%s", "{1} {0} {} {} { } {} {} {} {} {} {1.0}");
 
    if (!HAS_BIT(ses->flags, SES_FLAG_UTF8))
@@ -2140,6 +2254,14 @@ int create_room(struct session *ses, cha
 
    ses->map->room_list[newroom->vnum] = newroom;
 
+   if (newroom->desc)
+   {
+      if (strcmp(newroom->desc,""))
+      {
+         newroom->hash=make_desc_hash(newroom->desc);
+         add_desc_hash(ses, newroom->vnum, newroom->hash);
+      }
+   }
    show_message(ses, -1, "#MAP CREATE ROOM %5d {%s}.", newroom->vnum, newroom->name);
 
    return newroom->vnum;
@@ -2155,6 +2277,12 @@ void delete_room(struct session *ses, in
       delete_exit(ses, room, ses->map->room_list[room]->f_exit);
    }
 
+   if (ses->map->room_list[room]->desc!=NULL)
+   {
+      if(strcmp(ses->map->room_list[room]->desc,""))
+         remove_desc_hash(ses, room, ses->map->room_list[room]->hash);
+   }
+
    free(ses->map->room_list[room]->name);
    free(ses->map->room_list[room]->color);
    free(ses->map->room_list[room]->symbol);
@@ -3782,3 +3910,162 @@ void explore_path(struct session *ses, i
       path_run(ses, arg2);
    }
 }
+
+int init_desc_hash(struct session *ses)
+{
+   size_t i;
+
+   ses->map->hash_size=HASH_SIZE;
+   if (ses->map)
+   {
+      ses->map->hash_index=(struct desc_hash_data *)calloc(ses->map->hash_size, sizeof(struct desc_hash_data));
+      if (ses->map->hash_index==NULL)
+      {
+         tintin_printf2(ses, "PANIC: Out of memory whilst initilising indexes!");
+         return -1;
+      }
+      else
+      {
+         for (i=0; i<ses->map->hash_size; i++)
+         {
+            ses->map->hash_index[i].count=0;
+            ses->map->hash_index[i].room=NULL;
+         } /*This is probably redundent but can't hurt*/
+      }
+   }
+   return 0;
+}
+
+size_t make_desc_hash(char *string)
+{
+   char *ptr;
+   size_t        hash, count;
+
+   ptr=string;
+   count=1;
+   hash=0;
+   while (*ptr!='\0')
+   {
+      hash += count++ *(size_t)(*ptr)
+      hash %= HASH_SIZE;
+      ptr++;
+   }
+
+   return hash;
+}
+
+void remove_desc_hash(struct session *ses, size_t vnum, size_t hash)
+{
+   size_t *pti, *pto, *new, i;
+   ptrdiff_t found;
+
+   pti=ses->map->hash_index[hash].room;
+   found=-1;
+
+   /*Need to check that this vnum IS in this list already...*/
+   if (ses->map->hash_index[hash].count)
+   {
+      for (i=0; i<ses->map->hash_index[hash].count; i++)
+      {
+         if (ses->map->hash_index[hash].room[i]==vnum)
+         {
+            found=i;
+         }
+      }
+   }
+
+   if (found==-1)
+   {
+      tintin_printf2(ses, "remove_desc_hash(): I think that room %i is NOT in index list with the %i hash value.", vnum, hash);
+      return; /*Abort straight out, vnum NOT in this hash index*/
+   }
+
+   ses->map->hash_index[hash].count--;
+   if (ses->map->hash_index[hash].count==0)
+   {
+      free(ses->map->hash_index[hash].room);
+      ses->map->hash_index[hash].room=NULL;
+      return; /*No values to retain so just delete last value*/
+   }
+
+   new=(size_t *)calloc(ses->map->hash_index[hash].count, sizeof(size_t *));
+   if (new==NULL)
+   {
+      tintin_printf2(ses, "PANIC: Out of memory whilst manipulating (deleting from) indexes!");
+      return;
+   }
+
+   pto=new;
+   for (i=0; i<=ses->map->hash_index[hash].count; i++)
+   {
+      if(*pti!=vnum)
+      {
+         *pto=*pti;
+         pto++;
+      }
+      pti++;
+   }
+
+   free(ses->map->hash_index[hash].room);
+   ses->map->hash_index[hash].room=new;
+}
+
+void add_desc_hash(struct session *ses, size_t vnum, size_t hash)
+{
+   size_t *pti, *pto, temp, *new, i;
+
+   pti=ses->map->hash_index[hash].room;
+
+   /*Need to check that this vnum is NOT in this list already, found a glitch elsewhere that might produce this...*/
+
+   if (ses->map->hash_index[hash].count)
+   {
+      for (temp=0; temp<ses->map->hash_index[hash].count; temp++)
+      {
+         if (ses->map->hash_index[hash].room[temp]==vnum)
+         {
+            tintin_printf2(ses, "add_desc_hash(): I think that room %1 is already in the index list with the %i hash value.", vnum, hash);
+            return; /*Abort straight out, vnum already in this hash index*/
+         }
+      }
+   }
+
+   temp=vnum;
+   ses->map->hash_index[hash].count++;
+   new=(size_t *)calloc(ses->map->hash_index[hash].count, sizeof(size_t *));
+   if (new==NULL)
+   {
+      tintin_printf2(ses, "PANIC: Out of memory whilst manipulating (adding to) indexes!");
+      return;
+   }
+
+   pto=new;
+   if (pti==NULL)
+   {
+      *pto=vnum;
+   }
+   else
+   {
+      for (i=0; i<ses->map->hash_index[hash].count-1; i++)
+      {
+         if (*pti<vnum||temp==0)
+         {
+            *pto=*pti;
+            pti++;
+            pto++;
+         }
+         else
+         {
+            *pto=vnum;
+            pto++;
+            temp=0;
+            *pto=*pti;
+            pti++;
+            pto++;
+         }
+      }
+      *pto=vnum;
+      free(ses->map->hash_index[hash].room); /*Crashing occurs here!!!*/
+   }
+   ses->map->hash_index[hash].room=new;
+}
diff -u -p ./tt_2_00_9_beta_20130112/src/tables.c ./tt_2_00_9_beta_sly/src/tables.c
--- ./tt_2_00_9_beta_20130112/src/tables.c   2012-12-26 21:48:17.000000000 +0000
+++ ./tt_2_00_9_beta_sly/src/tables.c   2013-01-12 22:02:38.000000000 +0000
@@ -472,6 +472,7 @@ struct map_type map_table[] =
    {     "FLAG",             map_flag,            1    },
    {     "GET",              map_get,             2    },
    {     "GOTO",             map_goto,            1    },
+   {     "INDEX",            map_index,           1    },
    {     "INFO",             map_info,            1    },
    {     "INSERT",           map_insert,          2    },
    {     "JUMP",             map_jump,            2    },
diff -u -p ./tt_2_00_9_beta_20130112/src/tintin.h ./tt_2_00_9_beta_sly/src/tintin.h
--- ./tt_2_00_9_beta_20130112/src/tintin.h   2012-12-28 03:31:58.000000000 +0000
+++ ./tt_2_00_9_beta_sly/src/tintin.h   2013-01-12 22:59:20.000000000 +0000
@@ -121,6 +121,42 @@
 #define NUMBER_SIZE                    100
 #define LIST_SIZE                        2
 
+#define HASH_SIZE     65536
+/* This value is used to define an index of room descriptions for the map where
+ * a non-null description is converted to an unsigned short (0 to HASH_SIZE-1)
+ * hash value derived from the roomdesc by taking the ASCII value of each
+ * character, multiplying it by its position in the description and taking the
+ * modulus HASH_SIZE of it added to the running total.  By then making an array
+ * of this number of elements where each element holds a sorted list of room
+ * vnumber which have THAT value (and a size of that list) it is intended that
+ * this can be used to find rooms with matching descriptions very much faster
+ * than by checking each room in term.  The current search processes
+ * when involving multi-line room descriptions just did not work well and is not
+ * fast enough to use in real-time tracking of location in maps of more than a
+ * few hundred of rooms.  It is suggested that when using a multi-line
+ * capture to gather room description text the end of (all but the last) line
+ * string (2+NULL character) of "\n" is used - then it can be displayed on
+ * screen with an #ECHO.
+ * Data is accessed using:
+ * #MAP INDEX {FIND} {description OR hash} [{variable}]:
+ *     get a list of matching vnums give a full description or the hash value
+ * in the form of 4 hex digits (without a 0x or other prefix!), displaying the
+ * room numbers or placing them in a semicolon separated list in the varibale
+ * name given.
+ * #MAP INDEX {MAKE} {description} [{variable}]:
+ * produces a hash value in the same manner for a description caught by an
+ * #ACTION so that it could be processed by the user e.g. to check for
+ * duplicates, to confim map position is correct or the map description has not
+ * changed or (provided the map has indeed not changed) to pass to someone else
+ * (who would not necessarily have the same room numbers if either have used
+ * #map renum or theoretically if the technique is used by other clients
+ * even the same Client!)
+ * It is hoped that for a typical MUD the chance of a hash collision is small
+ * enough with the choosen setting that this is a viable system, and even WITH a
+ * collision the user only have to check a few room numbers rather than the
+ * ENTIRE list.
+ */
+
 #define CLIENT_NAME              "TinTin++"
 #define CLIENT_VERSION           "2.00.9  "
 
@@ -874,6 +910,13 @@ struct str_hash_index_data
    struct str_hash_data    * l_node;
 };
 
+struct desc_hash_data
+{
+   size_t                  count;
+   size_t                * room;
+};
+/*One of these for each hash value, actually will be built into a fixed size array*/
+
 struct map_data
 {
    struct room_data     ** room_list;
@@ -899,6 +942,10 @@ struct map_data
    short                   display_stamp;
    short                   nofollow;
    char                    legend[17][100];
+   size_t                  hash_size;
+   /* Will be (fixed) number of entries in this array:*/
+   struct desc_hash_data * hash_index;
+   /* 2^8 pointers, though that isn't defined here*/
 };
 
 struct room_data
@@ -919,6 +966,7 @@ struct room_data
    char                    * note;
    char                    * terrain;
    char                    * data;
+   unsigned int              hash;
 };
 
 struct exit_data
@@ -1165,6 +1213,7 @@ extern DO_MAP(map_find);
 extern DO_MAP(map_flag);
 extern DO_MAP(map_get);
 extern DO_MAP(map_goto);
+extern DO_MAP(map_index);
 extern DO_MAP(map_info);
 extern DO_MAP(map_insert);
 extern DO_MAP(map_jump);
@@ -1217,7 +1266,10 @@ extern int tunnel_void(struct session *s
 extern int find_coord(struct session *ses, char *arg);
 extern int spatialgrid_find(struct session *ses, int vnum, int x, int y, int z);
 extern void show_vtmap(struct session *ses);
-
+extern int    init_desc_hash(struct session *ses);
+extern size_t make_desc_hash(char * string);
+extern void   remove_desc_hash(struct session *ses, size_t vnum, size_t hash);
+extern void   add_desc_hash(struct session *ses, size_t vnum, size_t hash);
 #endif
 
 #ifndef __MATH_H__
@@ -1547,7 +1599,8 @@ extern DO_COMMAND(do_end);
 extern DO_COMMAND(do_forall);
 extern DO_COMMAND(do_info);
 extern DO_COMMAND(do_nop);
-extern DO_COMMAND(do_run);extern DO_COMMAND(do_send);
+extern DO_COMMAND(do_run);
+extern DO_COMMAND(do_send);
 extern DO_COMMAND(do_showme);
 extern DO_COMMAND(do_snoop);
 extern DO_COMMAND(do_suspend);
I hope this is as useful for others as I have found it. Enjoy Coffee
Back to top
View user's profile Send private message
Scandum
Site Admin


Joined: 03 Dec 2004
Posts: 3770

PostPosted: Sat Jan 12, 2013 11:17 pm    Post subject: Reply with quote

What is the speed difference compared to an unhashed search?

I think a better (general purpose) approach would be to pre-compile the regular expressions, which would increase performance by about 500%, and probably more as the current approach is horribly inefficient.

The proper approach would be to fully compile the search query, next do whatever search needs to be done. I'll put this on my todo list.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    The TinTin++ message board Forum Index -> Development All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Get TinTin++ Mud Client at SourceForge.net. Fast, secure and Free Open Source software downloads Get TinTin++ Mud Client at SourceForge.net. Fast, secure and Free Open Source software downloads
TinTin++ Homepage

Powered by phpBB © 2001, 2002 phpBB Group