Hash and Binary Tree
HASHING & HASH TABLE
HASHING
- Hashing is a technique used for storing and retrieving keys in a rapid manner.
- In hashing, a string of characters are transformed into a usually shorter length value or key that represents the original string.
- Hashing is used to index and retrieve items in a database because it is faster to find item using shorter hashed key than to find it using the original value.
- Hashing can also be defined as a concept of distributing keys in an array called a hash table using a predetermined function called hash function.
HASH TABLE
- Hash table is a table (array) where we store the original string. Index of the table is the hashed key while the value is the original string.
- The size of hash table is usually several orders of magnitude lower than the total number of possible string, so several string might have a same hashed-key.
- For example, there are 267 (8,031,810,176) string of length 7 character consists of lowercase only.
Supposed we want to store 5 string: define, float, exp, char, atan into a hash table with size 26. The hash function we will use is “transform the first character of each string into a number between 0..25”
(a will be 0, b will be 1, c will be 2, …, z will be 25).
This is the hash table for those strings.
This is the hash table for those strings.
h [ ]
|
Value
|
0
|
atan
|
1
| |
2
|
char
|
3
|
define
|
4
|
exp
|
5
|
float
|
6
| |
…
| |
25
|
atan is stored in h[0] because a is 0.
char is stored in h[2] because c is 2.
define is stored in h[3] because d is 3.
and so on..
We only consider the first character of each string.
HASH FUNCTION
There are many ways to hash a string into a key. The following are some of the important methods for constructing hash functions :
- Mid-square
- Division (most common)
- Folding
- Digit Extraction
- Rotating Hash
MID-SQUARE
- Square the string/identifier and then using an appropriate number of bits from the middle of the square to obtain the hash-key.
- If the key is a string, it is converted to a number.
- Steps :
- Square the value of the key. (k^2)
- Extract the middle r bits of the result obtained in Step 1
Function : h(k) = s
k = key
s = the hash key obtained by
selecting r bits from k2
|
example :
- Here the entire key participates in generating the address so that there is a better chance that different addresses are generated even for keys close to each other. For example :
Key
|
Squared Value
|
Middle Part
|
3121
|
9740641
|
406
|
3122
|
9746884
|
468
|
3123
|
9753129
|
531
|
DIVISION
Function: h(z) = z mod M
z
= key
M = the value
using to divide the key, usually using a prime number, the table size or the
size of memory used
|
example :
- Suppose the table is to store strings. A very simple hash function would be to add up ASCII values of all the characters in the string and take modulo of table size, say 97.
“COBB” would be stored at the
location
(
64+3 + 64+15 + 64+2 + 64+2) % 97 = 88
“HIKE” would be stored at the
location
(
64+8 + 64+9 + 64+11 + 64+5) % 97 = 2
“PPQQ” would be stored at the
location
(
64+16 + 64+16 + 64+17 + 64+17) % 97 = 35
“ABCD” would be stored at the
location
(
64+1 + 64+2 + 64+3 + 64+4) % 97 = 76
|
- The Folding method works in two steps :
- Divide the key value into a number of parts where each part has the same number of digits except the last part which may have lesser digits than the other parts.
- Add the individual parts. That is obtain the sum of part1 + part2 + ... + part n. The hash value produced by ignoring the last carry, if any.
- Given a hash table 100 locations, calculate the hash value for key 5678 and 34567Key567834567Parts56 and 7834, 56, and 7Sum13497Hash Value34 (ignore the last carry)97
DIGIT EXTRACTION
- A predefined digit of the given number is considered as the hash address.
- Consider x = 14,568
- If we extract the 1st, 3rd, and 5th digit, we will get a hash code of : 158.
ROTATING HASH
- Use any hash method (such as division or mid-square method)
- After getting the hash code/address from the hash method, do rotation
- Rotation is performed by shifting the digits to get a new hash address.
- Given hash address = 20021
- Rotation result: 12002 (fold the digits)
COLLISION
- What happened if we want to store these strings using the previous hash function (use the first character of each string)
- define, float, exp, char, atan, ceil, acos, floor.
- There are several strings which have the same hash-key, it’s float and floor (hash-key: 5), char and ceil (hash-key: 2).
- It’s called a collision. How can we handle this?
- There are two general ways to handle collisions:
- Linear Probing
- Search the next empty slot and put the string there.
- Linear probing has a bad search complexity if there are many collisions.
- The table “step” on the right describe how many loop/step needed to find the string.
- Supposed we want to find ceil, we compute the hash key and found 2. But ceil is not there so we should iterate until we found ceilh[ ]ValueStep0atan11acos22char13define14exp15float16ceil57floor3…
- void linear_probing(item, h[]) {hash_key = hash(item);i = has_key;while ( strlen(h[i] != 0 ) {if ( strcmp(h[i], item) == 0 ) {// DUPLICATE ENTRY
}i = (i + 1) % TABLE_SIZE;if ( i == hash_key ) {// TABLE IS FULL
}
}h[i] = item;
}
- Chaining
- Put the string in a slot as a chained list (linked list).
- In chaining, we store each string in a chain (linked list).
- So if there is collision, we only need to iterate on that chain.h[ ]Value0atan à acos1NULL2char à ceil3define4exp5float à floor6NULL7NULL…
- void chaining(item, h[]) {hash_key = hash(item);trail = NULL, lead = h[hash_key];while ( lead ) {if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }trail = lead; lead = lead->next;
}p = malloc new node;p->item = item;p->next = NULL;if ( trail == 0 ) h[hash_key] = p; else trail->next = p;}
BINARY TREE
TREE :
- Node at the top is called as root.
- A line connecting the parent to the child is edge.
- Nodes that do not have children are called leaf.
- Nodes that have the same parent are called sibling.
- Degree of node is the total sub tree of the node.
- Height/Depth is the maximum degree of nodes in a tree.
- If there is a line that connects p to q, then p is called the ancestor of q, and q is a descendant of p.
BINARY TREE :
- Binary tree is a rooted tree data structure in which each node has at most two children.
- Those two children usually distinguished as left child and right child.
- Node which doesn’t have any child is called leaf.
- A sample of binary tree of 9 nodes, rooted on node which contains 18.
- Leaves are nodes which contain 9, 12, 10 and 23.
type of binary tree :
- PERFECT binary tree is a binary tree in which every level are at the same depth.
- COMPLETE binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. A perfect binary tree is a complete binary tree.
- SKEWED binary tree is a binary tree in which each node has at most one child.
- BALANCED binary tree is a binary tree in which no leaf is much farther away from the root than any other leaf (different balancing scheme allows different definitions of “much farther”).
- Maximum number of nodes on level k of a binary tree is 2k.
- Maximum number of nodes on a binary tree of height h is 2h+1 - 1.
- Height of a tree is the total number of levels (excluding the root) or the highest level h (if the levels of a tree are: 0, 1, 2, …, h)
- Minimum height of a binary tree of n nodes is 2log(n).
- Maximum height of a binary tree of n nodes is n - 1.
- implementation using array
- Index on array represents node number
- Index 0 is Root node
- Index Left Child is 2p + 1, where p is parent index
- Index Right Child is 2p + 2
- Index Parent is (p-1)/2
struct node {
int data;
struct node *left;
struct node *right;
struct node *parent;
};
struct node *root = NULL;
|
Threaded Binary Tree Concept
- In one way threading, a thread will appear either in the right field or the left field of the node.
- A one way threaded tree is also called a single threaded tree.
- If the thread appears in the left field, then the left field will be made to point to the in-order predecessor of the node.
- Such a one way threaded tree is called a left threaded binary tree.
- On the contrary, if the thread appears in the right field, then it will point to the in-order successor of the node. Such a one way threaded tree is called a right threaded binary tree.
Hashing Implementation in Blockchain
Hashing in blockchain refers to the process of having an input item of whatever length reflecting an output item of a fixed length. If we take the example of blockchain use in cryptocurrencies, transactions of varying lengths are run through a given hashing algorithm, and all give an output that is of a fixed length.









Comments
Post a Comment