TinTin++ Mud Client The TinTin++ message board

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
TinTin++ Mud Client

Hashing function in tt++ proper

 
Post new topic   Reply to topic    The TinTin++ message board Forum Index -> General Discussion
View previous topic :: View next topic  
Author Message
perlsaiyan



Joined: 21 Sep 2014
Posts: 25

PostPosted: Mon Jan 01, 2018 11:14 am    Post subject: Hashing function in tt++ proper Reply with quote

Doing this in C would be a huge performance increase over doing it in tintin scripting:

http://tintin.sourceforge.net/board/viewtopic.php?t=2272&view=next&sid=6396bccf48e987742958320ce70a1105

The idea is to be able to pass a string and generate an (ideally) n-bit hash, but anything sufficiently large (24-bit or better) would work for my purposes Cool
Back to top
View user's profile Send private message
Scandum
Site Admin


Joined: 03 Dec 2004
Posts: 3796

PostPosted: Mon Jan 01, 2018 3:22 pm    Post subject: Reply with quote

What exactly would you need a hashing function for?

TinTin++ tables have extremely fast lookup rates, and if you use #write so the table is properly sorted the #read loading time is extremely fast as well.


By hashing you would create an unnecessary additional step because TinTin++ does not provide simple arrays. #list under the hood works with a table.
Back to top
View user's profile Send private message Send e-mail
Scandum
Site Admin


Joined: 03 Dec 2004
Posts: 3796

PostPosted: Mon Jan 01, 2018 4:56 pm    Post subject: Reply with quote

Take the forum post you linked to for example, creating a table to handle this would be quick and easy.

It'd take a little bit more memory, but it would be faster than any other scripted solution, and I wouldn't be surprised if it'd be competitive with a hash table directly implemented in tt++.
Back to top
View user's profile Send private message Send e-mail
lugnut



Joined: 01 Jan 2018
Posts: 3

PostPosted: Tue Jan 02, 2018 12:41 am    Post subject: Reply with quote

I think the idea is not to create a hash table but to generate a key for a table of a specified length from an arbitrary string.

I can think of a use case for that as well.

For instance if your mud does not expose room ids or numbers and you want to trigger something happening in a specific room without trying to come up with an action for every room you could create a key in a table for the room and then show the text in your table based on that key.

You could put the entire room description as the key but if you have a mud with 50000 + rooms that becomes a problem. Instead if you hash the room description to a unique string and use that as the key the tintin table functions become useful.

The problem is generating a repeable unique key across a large dataset quickly based on a large random input.
Back to top
View user's profile Send private message
Scandum
Site Admin


Joined: 03 Dec 2004
Posts: 3796

PostPosted: Tue Jan 02, 2018 9:14 am    Post subject: Reply with quote

If the average room description is 4 lines that's about 300 bytes, times 50000 makes for 15 MB.

A 32 bit hash would be about 0.5 MB.

Tintin nodes have an overhead of 32 bytes, so that comes out at 2 MB for the hash table and 16.5 MB for the room description table.

If collision detection is added the complexity and size of the hash table will further increase, to about 4 MB.

The hash lookup is going to be slower, especially if written in tintin, though I'm not sure by how much.


I'm however open to adding a hashing function, I've got a make-shift one, but I don't know if it's very good.

Code:

unsigned short generate_hash_key(char *str)
{
        unsigned int h;

        for (h = str_hash_len = 0 ; *str != 0 ; str++, str_hash_len++)
        {
                h -= (4 - (h >> 7)) * *str;
        }

        h += str_hash_len;

        return h % MAX_STR_HASH;
}


That's the function I use for the tintin scrollback buffer. I'm not sure how it compares against other hashing routines.
Back to top
View user's profile Send private message Send e-mail
lugnut



Joined: 01 Jan 2018
Posts: 3

PostPosted: Sat Jan 06, 2018 9:02 am    Post subject: Reply with quote

Here is a diff for adding a hashing function using Jenkins one at a time hash. This should generate 32 bit hashes of any string passed in hexidecimal format. This function is pretty fast. There are faster out there but they become considerably more complex. It handles collisions very well. [url]https://en.wikipedia.org/wiki/Jenkins_hash_function[/url]

I patched against the latest source code. If would would just prefer a tarball of the files let me know.


diff --new-file tt-2.01.4_orig/tt/src/hash.c tt-2.01.4_wh/tt/src/hash.c
0a1,70
> /******************************************************************************
> * TinTin++ *
> * Copyright (C) 2004 (See CREDITS file) *
> * *
> * This program is protected under the GNU GPL (See COPYING) *
> * *
> * This program is free software; you can redistribute it and/or modify *
> * it under the terms of the GNU General Public License as published by *
> * the Free Software Foundation; either version 2 of the License, or *
> * (at your option) any later version. *
> * *
> * This program is distributed in the hope that it will be useful, *
> * but WITHOUT ANY WARRANTY; without even the implied warranty of *
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
> * GNU General Public License for more details. *
> * *
> * You should have received a copy of the GNU General Public License *
> * along with this program; if not, write to the Free Software *
> * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *
> ******************************************************************************/
>
> /******************************************************************************
> * (T)he K(I)cki(N) (T)ickin D(I)kumud Clie(N)t *
> * *
> * coded by Benjamin Porter 2018 *
> ******************************************************************************/
>
> #include "tintin.h"
>
> unsigned int oneAtATimeHash(char *key)
> {
> char *p = key;
> unsigned int h = 0;
> size_t i, len;
>
> len = strlen(key);
> for (i = 0; i < len; i++)
> {
> h += p[i];
> h += (h << 10);
> h ^= (h >> 6);
> }
>
> h += (h << 3);
> h ^= (h >> 11);
> h += (h << 15);
>
> return h;
> }
>
> DO_COMMAND(do_hash)
> {
> char arg1[BUFFER_SIZE], arg2[BUFFER_SIZE], result[9];
> unsigned int hashValue;
>
> arg = get_arg_in_braces(ses, arg, arg1, GET_NST);
> arg = sub_arg_in_braces(ses, arg, arg2, GET_ONE, SUB_VAR|SUB_FUN);
>
> if (*arg1 == 0 || *arg2 == 0)
> {
> show_error(ses, LIST_VARIABLE, "#SYNTAX: #HASH {VARIABLE} {string}");
>
> return ses;
> }
>
> hashValue = oneAtATimeHash(arg2);
> sprintf(result, "%08x", hashValue);
> set_nest_node(ses->list[LIST_VARIABLE], arg1, result);
> return ses;
> }
diff --new-file tt-2.01.4_orig/tt/src/help.c tt-2.01.4_wh/tt/src/help.c
620a621,630
> "HASH",
> "<178>Command<078>: #hash <178>{<078>variable name<178>}<078> <178>{<078>string<178>}<078>\n"
> "\n"
> " The hash command computes a 32 bit hash in hexidecimal of the passed in string\n"
> " and stores it in the passed variable.\n"
> "\n"
> "<178>Example<078>: #hash {temp} {The quick brown fox jumped over the lazy dog.}\n"
> " sets the variable temp to the value 37b1a2e4\n"
> },
> {
diff --new-file tt-2.01.4_orig/tt/src/Makefile.in tt-2.01.4_wh/tt/src/Makefile.in
67c67
< data.o nest.o advertise.o ssl.o port.o
---
> data.o nest.o advertise.o ssl.o port.o hash.o
diff --new-file tt-2.01.4_orig/tt/src/tables.c tt-2.01.4_wh/tt/src/tables.c
62a63
> { "hash", do_hash, TOKEN_TYPE_COMMAND },
diff --new-file tt-2.01.4_orig/tt/src/tintin.h tt-2.01.4_wh/tt/src/tintin.h
1022a1023,1029
> #ifndef __HASH_H__
> #define __HASH_H__
>
> extern DO_COMMAND(do_hash);
>
> #endif
>


[/url]
Back to top
View user's profile Send private message
Scandum
Site Admin


Joined: 03 Dec 2004
Posts: 3796

PostPosted: Sat Jan 06, 2018 9:50 am    Post subject: Reply with quote

Thanks for the code, already updated the beta code 4 days ago though. The hashing was as good as Jenkin's last time I checked, and a little bit faster.

I reused the hash code from buffer.c and added a %H option to #format. It creates a 32 bit unsigned hash.

http://tintin.sf.net/download/tintin-beta.tar.gz

Example:

#format hash %H bliblablo
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    The TinTin++ message board Forum Index -> General Discussion All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Get TinTin++ Mud Client at SourceForge.net. Fast, secure and Free Open Source software downloads Get TinTin++ Mud Client at SourceForge.net. Fast, secure and Free Open Source software downloads
TinTin++ Homepage

Powered by phpBB © 2001, 2002 phpBB Group