Associative array

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears just once in the collection.

Operations associated with this data type allow:[1][2]

  • the addition of pairs to the collection
  • the removal of pairs from the collection
  • the modification of the values of existing pairs
  • the lookup of the value associated with a particular key

The dictionary problem is a classic computer science problem: the task of designing a data structure that maintains a set of data during 'search' 'delete' and 'insert' operations.[3] A standard solution to the dictionary problem is a hash table; in some cases it is also possible to solve the problem using directly addressed arrays, binary search trees, or other more specialized structures.[1][2][4]

Many programming languages include associative arrays as primitive data types, and they are available in software libraries for many others. Content-addressable memory is a form of direct hardware-level support for associative arrays.

Associative arrays have many applications including such fundamental programming patterns as memoization and the decorator pattern.[5]

Operations

In an associative array, the association between a key and a value is often known as a "binding", and the same word "binding" may also be used to refer to the process of creating a new association.

The operations that are usually defined for an associative array are:[1][2]

  • Add or insert: add a new (key, value) pair to the collection, binding the new key to its new value. The arguments to this operation are the key and the value.
  • Reassign: replace the value in one of the (key, value) pairs that are already in the collection, binding an old key to a new value. As with an insertion, the arguments to this operation are the key and the value.
  • Remove or delete: remove a (key, value) pair from the collection, unbinding a given key from its value. The argument to this operation is the key.
  • Lookup: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some associative array implementations raise an exception.

In addition, associative arrays may also include other operations such as determining the number of bindings or constructing an iterator to loop over all the bindings. Usually, for such an operation, the order in which the bindings are returned may be arbitrary.

A multimap generalizes an associative array by allowing multiple values to be associated with a single key.[6] A bidirectional map is a related abstract data type in which the bindings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as argument and looks up the key associated with that value.

Example

Suppose that the set of loans made by a library is to be represented in a data structure. Each book in a library may be checked out only by a single library patron at a time. However, a single patron may be able to check out multiple books. Therefore, the information about which books are checked out to which patrons may be represented by an associative array, in which the books are the keys and the patrons are the values. For instance (using notation from Python, or JSON (JavaScript Object Notation), in which a binding is represented by placing a colon between the key and the value), the current checkouts may be represented by an associative array

{
    "Great Expectations": "John",
    "Pride and Prejudice": "Alice",
    "Wuthering Heights": "Alice"
}

A lookup operation with the key "Great Expectations" in this array would return the name of the person who checked out that book, John. If John returns his book, that would cause a deletion operation in the associative array, and if Pat checks out another book, that would cause an insertion operation, leading to a different state:

{
    "Pride and Prejudice": "Alice",
    "The Brothers Karamazov": "Pat",
    "Wuthering Heights": "Alice"
}

In this new state, the same lookup as before, with the key "Great Expectations", would raise an exception, because this key is no longer present in the array.

Implementation

For dictionaries with very small numbers of bindings, it may make sense to implement the dictionary using an association list, a linked list of bindings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of bindings; however, it is easy to implement and the constant factors in its running time are small.[1][7]

Another very simple implementation technique, usable when the keys are restricted to a narrow range of integers, is direct addressing into an array: the value for a given key k is stored at the array cell A[k], or if there is no binding for k then the cell stores a special sentinel value that indicates the absence of a binding. As well as being simple, this technique is fast: each dictionary operation takes constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small.[4]

The most frequently used general purpose implementation of an associative array is with a hash table: an array of bindings, together with a hash function that maps each possible key into an array index. The basic idea of a hash table is that the binding for a given key is stored at the position given by applying the hash function to that key, and that lookup operations are performed by looking at that cell of the array and using the binding found there. However, hash table based dictionaries must be prepared to handle collisions that occur when two keys are mapped by the hash function to the same index, and many different collision resolution strategies have been developed for dealing with this situation, often based either on open addressing (looking at a sequence of hash table indices instead of a single index, until finding either the given key or an empty cell) or on hash chaining (storing a small association list instead of a single binding in each hash table cell).[1][2][4][8]

Another common approach is to implement an associative array with a (self-balancing) red-black tree.[9]

Dictionaries may also be stored in binary search trees or in data structures specialized to a particular type of keys such as radix trees, tries, Judy arrays, or van Emde Boas trees, but these implementation methods are less efficient than hash tables[citation needed] as well as placing greater restrictions on the types of data that they can handle. The advantages of these alternative structures come from their ability to handle operations beyond the basic ones of an associative array, such as finding the binding whose key is the closest to a queried key, when the query is not itself present in the set of bindings.

Language support

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting.

Built-in syntactic support for associative arrays was introduced by SNOBOL4, under the name "table". MUMPS made multi-dimensional associative arrays, optionally persistent, its key data structure. SETL supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with AWK and including Rexx, Perl, Tcl, JavaScript, Python, Ruby, and Lua, support associative arrays as a primary container type. In many more languages, they are available as library functions without special syntax.

In Smalltalk, Objective-C, .NET,[10] Python, REALbasic, and Swift they are called dictionaries; in Perl, Ruby and Seed7 they are called hashes; in C++, Java, Go, Clojure, Scala, OCaml, Haskell they are called maps (see map (C++), unordered_map (C++), and Map); in Common Lisp and Windows PowerShell, they are called hash tables (since both typically use this implementation). In PHP, all arrays can be associative, except that the keys are limited to integers and strings. In JavaScript (see also JSON), all objects behave as associative arrays. In Lua, they are called tables, and are used as the primitive building block for all data structures. In Visual FoxPro, they are called Collections. The D language also has support for associative arrays [11]

Permanent storage

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Most programs using associative arrays will at some point need to store that data in a more permanent form, like in a computer file. A common solution to this problem is a generalized concept known as archiving or serialization, which produces a text or binary representation of the original objects that can be written directly to a file. This is most commonly implemented in the underlying object model, like .Net or Cocoa, which include standard functions that convert the internal data into text form. The program can create a complete text representation of any group of objects by calling these methods, which are almost always already implemented in the base associative array class.[12]

For programs that use very large data sets, this sort of individual file storage is not appropriate, and a database management system (DB) is required. Some DB systems natively store associative arrays by serializing the data and then storing that serialized data and the key. Individual arrays can then be loaded or saved from the database using the key to refer to them. These key-value stores have been used for many years and have a history as long as that as the more common relational database (RDBs), but a lack of standardization, among other reasons, limited their use to certain niche roles. RDBs were used for these roles in most cases, although saving objects to a RDB can be complicated, a problem known as object-relational impedance mismatch.

After about 2010, the need for high performance databases suitable for cloud computing and more closely matching the internal structure of the programs using them led to a renaissance in the key-value store market. These systems can store and retrieve associative arrays in a native fashion, which can greatly improve performance in common web-related workflows.

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 Lua error in package.lua at line 80: module 'strict' not found.
  2. 2.0 2.1 2.2 2.3 Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. 4.0 4.1 4.2 Lua error in package.lua at line 80: module 'strict' not found..
  5. Goodrich & Tamassia (2006), pp. 597–599.
  6. Goodrich & Tamassia (2006), pp. 389–397.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found..
  9. Joel Adams and Larry Nyhoff. "Trees in STL". quote: "The Standard Template library ... some of its containers -- the set<T>, map<T1, T2>, multiset<T>, and multimap<T1, T2> templates -- are generally built using a special kind of self-balancing binary search tree called a red-black tree."
  10. Lua error in package.lua at line 80: module 'strict' not found.
  11. Lua error in package.lua at line 80: module 'strict' not found.
  12. "Archives and Serializations Programming Guide", Apple Inc., 2012

External links