Hash and Binary Tree

HASHING & HASH TABLE

HASHING

Hashing is a technique used for storing and retrieving keys in a rapid manner.
In hashing, a string of characters are transformed into a usually shorter length value or key that represents the original string.
Hashing is used to index and retrieve items in a database because it is faster to find item using shorter hashed key than to find it using the original value.
Hashing can also be defined as a concept of distributing keys in an array called a hash table using a predetermined function called hash function.

HASH TABLE

Hash table is a table (array) where we store the original string. Index of the table is the hashed key while the value is the original string.
The size of hash table is usually several orders of magnitude lower than the total number of possible string, so several string might have a same hashed-key.
For example, there are 267 (8,031,810,176) string of length 7 character consists of lowercase only.

Please take a look at the following example :

Supposed we want to store 5 string: define, float, exp, char, atan into a hash table with size 26. The hash function we will use is “transform the first character of each string into a number between 0..25”

(a will be 0, b will be 1, c will be 2, …, z will be 25).

This is the hash table for those strings.

h [ ]	Value
0	atan
1
2	char
3	define
4	exp
5	float
6
…
25

atan is stored in h[0] because a is 0.

char is stored in h[2] because c is 2.

define is stored in h[3] because d is 3.

and so on..

We only consider the first character of each string.

HASH FUNCTION

There are many ways to hash a string into a key. The following are some of the important methods for constructing hash functions :

Mid-square
Division (most common)
Folding
Digit Extraction
Rotating Hash

MID-SQUARE

Square the string/identifier and then using an appropriate number of bits from the middle of the square to obtain the hash-key.
If the key is a string, it is converted to a number.
Steps :

Square the value of the key. (k^2)
Extract the middle r bits of the result obtained in Step 1

Function : h(k) = s

k = key

s = the hash key obtained by selecting r bits from k²

example :

Here the entire key participates in generating the address so that there is a better chance that different addresses are generated even for keys close to each other. For example :

Key	Squared Value	Middle Part
3121	9740641	406
3122	9746884	468
3123	9753129	531

DIVISION

Divide the string/identifier by using the modulus operator.

It’s the most simple method of hashing an integer x.

Function: h(z) = z mod M

z = key

M = the value using to divide the key, usually using a prime number, the table size or the size of memory used

example :

Suppose the table is to store strings. A very simple hash function would be to add up ASCII values of all the characters in the string and take modulo of table size, say 97.

“COBB” would be stored at the location

( 64+3 + 64+15 + 64+2 + 64+2) % 97 = 88

“HIKE” would be stored at the location

( 64+8 + 64+9 + 64+11 + 64+5) % 97 = 2

“PPQQ” would be stored at the location

( 64+16 + 64+16 + 64+17 + 64+17) % 97 = 35

“ABCD” would be stored at the location

( 64+1 + 64+2 + 64+3 + 64+4) % 97 = 76

FOLDING

The Folding method works in two steps :

Divide the key value into a number of parts where each part has the same number of digits except the last part which may have lesser digits than the other parts.
Add the individual parts. That is obtain the sum of part1 + part2 + ... + part n. The hash value produced by ignoring the last carry, if any.

example :

Given a hash table 100 locations, calculate the hash value for key 5678 and 34567

Key

5678

34567

Parts

56 and 78

34, 56, and 7

Sum

134

97

Hash Value

34 (ignore the last carry)

97

DIGIT EXTRACTION

A predefined digit of the given number is considered as the hash address.

example :

Consider x = 14,568
If we extract the 1st, 3rd, and 5th digit, we will get a hash code of : 158.

ROTATING HASH

Use any hash method (such as division or mid-square method)
After getting the hash code/address from the hash method, do rotation
Rotation is performed by shifting the digits to get a new hash address.

example :

Given hash address = 20021
Rotation result: 12002 (fold the digits)

COLLISION

What happened if we want to store these strings using the previous hash function (use the first character of each string)
define, float, exp, char, atan, ceil, acos, floor.
There are several strings which have the same hash-key, it’s float and floor (hash-key: 5), char and ceil (hash-key: 2).
It’s called a collision. How can we handle this?
There are two general ways to handle collisions:

Linear Probing

Search the next empty slot and put the string there.
Linear probing has a bad search complexity if there are many collisions.
The table “step” on the right describe how many loop/step needed to find the string.

Supposed we want to find ceil, we compute the hash key and found 2. But ceil is not there so we should iterate until we found ceil

h[ ]	Value	Step
0	atan	1
1	acos	2
2	char	1
3	define	1
4	exp	1
5	float	1
6	ceil	5
7	floor	3
…

void linear_probing(item, h[]) {

              hash_key = hash(item);

              i = has_key;

              while ( strlen(h[i] != 0 ) {

                             if ( strcmp(h[i], item) == 0 ) {

                                           // DUPLICATE ENTRY
                             }

                             i = (i + 1) % TABLE_SIZE;

                             if ( i == hash_key ) {

                                           // TABLE IS FULL
                             }
              }

              h[i] = item;
}

Chaining

Put the string in a slot as a chained list (linked list).
In chaining, we store each string in a chain (linked list).

So if there is collision, we only need to iterate on that chain.

h[ ]	Value
0	atan à acos
1	NULL
2	char à ceil
3	define
4	exp
5	float à floor
6	NULL
7	NULL
…

void chaining(item, h[]) {

              hash_key = hash(item);

              trail = NULL, lead = h[hash_key];

              while ( lead ) {

                             if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }

                             trail = lead; lead = lead->next;
              }

              p = malloc new node;

              p->item = item;

              p->next = NULL;

              if ( trail == 0 ) h[hash_key] = p; else trail->next = p;

}

BINARY TREE

TREE :

Node at the top is called as root.
A line connecting the parent to the child is edge.
Nodes that do not have children are called leaf.
Nodes that have the same parent are called sibling.
Degree of node is the total sub tree of the node.
Height/Depth is the maximum degree of nodes in a tree.
If there is a line that connects p to q, then p is called the ancestor of q, and q is a descendant of p.

BINARY TREE :

Binary tree is a rooted tree data structure in which each node has at most two children.
Those two children usually distinguished as left child and right child.
Node which doesn’t have any child is called leaf.

example :

A sample of binary tree of 9 nodes, rooted on node which contains 18.
Leaves are nodes which contain 9, 12, 10 and 23.

type of binary tree :

PERFECT binary tree is a binary tree in which every level are at the same depth.
COMPLETE binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. A perfect binary tree is a complete binary tree.
SKEWED binary tree is a binary tree in which each node has at most one child.
BALANCED binary tree is a binary tree in which no leaf is much farther away from the root than any other leaf (different balancing scheme allows different definitions of “much farther”).

property of binary tree :

Maximum number of nodes on level k of a binary tree is 2k.
Maximum number of nodes on a binary tree of height h is 2h+1 - 1.
Height of a tree is the total number of levels (excluding the root) or the highest level h (if the levels of a tree are: 0, 1, 2, …, h)
The following figure below has the height of: 3
Minimum height of a binary tree of n nodes is 2log(n).
Maximum height of a binary tree of n nodes is n - 1.

representation of binary tree :

implementation using array

Index on array represents node number
Index 0 is Root node
Index Left Child is 2p + 1, where p is parent index
Index Right Child is 2p + 2
Index Parent is (p-1)/2

implementation using linked list

struct node {

              int data;

              struct node *left;

              struct node *right;

              struct node *parent;

};

struct node *root = NULL;

Threaded Binary Tree Concept

In one way threading, a thread will appear either in the right field or the left field of the node.

A one way threaded tree is also called a single threaded tree.

If the thread appears in the left field, then the left field will be made to point to the in-order predecessor of the node.

Such a one way threaded tree is called a left threaded binary tree.

On the contrary, if the thread appears in the right field, then it will point to the in-order successor of the node. Such a one way threaded tree is called a right threaded binary tree.

Hashing Implementation in Blockchain

Hashing in blockchain refers to the process of having an input item of whatever length reflecting an output item of a fixed length. If we take the example of blockchain use in cryptocurrencies, transactions of varying lengths are run through a given hashing algorithm, and all give an output that is of a fixed length.

Search This Blog

Vincent's Blog

Hash and Binary Tree

Comments

Post a Comment

Popular posts from this blog

AVL Tree and Binary Tree

Key	5678	34567
Parts	56 and 78	34, 56, and 7
Sum	134	97
Hash Value	34 (ignore the last carry)	97