? Hash Table – Definition and Explanations
Introduction
In the field of calculation, a hash table is a data structure that allows the association of key elements, ie abstract type symbol table.
Each element of the table is accessed through it the key. This board (Table can have several meanings depending on the context in which it is used 🙂 unordered (array indexed by integers). Accessing an element is done by converting the key to a hash value (or simply hash) via a hash function (We call the hash function a certain function starting from the data…). Hash a number (The concept of number in linguistics is considered in the article “Number…”) is the index of the element in the array, usually a hash, which allows the arrangement of the elements in the array. A cell in the array is called cell.
All (The whole, understood as the whole of what exists, is often interpreted as the world or…) Like arrays, hash tables provide O(1) access medium (A mean value is a statistical measure that characterizes a set of elements…), regardless of the number of elements in the table. However time (Time, by man… worst case input can be O(n). Compared to other associative arrays, hash tables are most useful when the number of entries is very large.
The position of the elements in the hash table is pseudo-random. Therefore, this structure is not suitable for accessing sorted data. Types of data structures such as balanced trees are generally slower (by O(log n)) and more complex to implement, but retain an ordered structure.
The fact of creating a hash from the key can cause problems collision, meaning that a hash function from two different keys can return the same hash value and therefore access the same position in the “array”. Its hash function should be chosen carefully to minimize the risk of collisions.
Choosing a good hash function
A good hash function is critical to performance. Conflicts are usually resolved by methods research (Scientific research is primarily… linear, a bad hash function, i.e. generating too many collisions, will severely slow down the search. On the other hand, it is preferable not to have a hash function complexity (Complexity is a concept used in philosophy, epistemology (for…) high.
The hash calculation is sometimes done in 2 steps:
- Used to extract an integer from an application-specific hash function data (In information technology (IT), data is an elementary description, often…) origin.
- This integer is usually converted to a possible table position by calculating the remainder module (In modulo calculus, we are talking about numbers corresponding to modulo modulo. The term modulo is also…) table size.
The sizes of hash tables are often prime numbers to avoid common divisor problems that can cause a large number of collisions. An alternative is to use a Strong (The word power is used in a certain sense in several areas 🙂 consists of two, which allows to perform modulo operation with simple offsets and therefore to gain speed.
A frequent and surprising problem is the phenomenon of clustering, which refers to finding hash values ​​side by side in a table. formative (Basic frequency changes in intonation are treated as variations…) clusters. This is very penalizing for collision resolution methods with open addressing. Hash functions that implement a uniform distribution of hashes are therefore best, but difficult to find in practice.
In environments where an adversary tries to attack search performance by introducing a large number of collision-causing entries to slow down the search, one solution is universal hashing, which randomly selects a hash function at the beginning of the search. An adversary has no way of knowing the type of data that will cause a collision.