Talk:Array (data structure)
This is the talk page for discussing improvements to the Array (data structure) article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This level-5 vital article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
Text and/or other creative content from this version of Index (computer science) was copied or moved into Array data structure with this edit on 17:50, 10 January 2012. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
array
[edit]Is there benefit of arrays?
- Good point; perhaps the headings could be refactored to include the word "benefits", for better style or compatability with other articles. --Mike Van Emmerik 21:42, 5 September 2005 (UTC)
--- How is an array also known as a list? Surely this is just wrong. An array is indexable; a list is not. --Mike Van Emmerik 02:33, 5 October 2005 (UTC)
'of a specific data type'
[edit]"In computer programming, a group of homogeneous elements of a specific data type is known as an array, one of the simplest data structures" is false for Ruby; "Ruby's arrays can accomodate diverse object types". -Ruby User's Guide
Is the definition of arrays inaccurate, or are Ruby arrays not strictly arrays? --Wootery 20:44, 3 October 2006 (UTC)
- "Array" is one of those slippery terms. The usual definition is from C terminology, where, of course, arrays are always homogenous. As far as I know, 'array' originaly was more or less synonymous with 'list' in the computer science literature to mean a collection of things ordered by index. I'm not sure where computer science comes down on this issue today. --Apantomimehorse 04:59, 10 October 2006 (UTC)
- Ruby's arrays can be seen as arrays of variant types. The homogenity of type is not quite as essential as the uniformity of element size, which is necessary for ordinary address computation. It appears someone has added this in addition to the existing wording, which says "usually" and is now redundant. Argh. Deco 09:39, 10 October 2006 (UTC)
Zero based never faster in practice?
[edit]From the article: "Supporters of zero-based indexing often criticise one-based and n-based arrays for being slower. While this is true, a one-based or n-based array access can be easily optimized, for example either with a very simple Common subexpression elimination or the use of a well defined Dope vector - to name only two options available. So in real-life applications one-based and n-based arrays are just as fast as zero-based arrays."
- Surely this is only true when the base of the address is known, e.g. a global or a stack local array. Arrays passed as parameters, or references to arrays, would surely not be as fast to use if they are 1-based or n-based. --Mike Van Emmerik 09:04, 12 December 2005 (UTC)
- I agree with the anon who made this edit. There may be some instances where they are slower, but almost all of the time, you can extract the computation of the base address outside the inner loop of the computation with simple CSE and loop invariant hoisting (the only impediment being if this involves crossing non-inlined function call boundaries). Also, when iterating over arrays in order, which is the norm, inductive variable analysis eliminates indexing altogether. Moreover, if the array lower bound is fixed at compile-time, any additional calculations can often be simplified away entirely, even in an interprocedural scenario.
- Even in the rare case that an additional operation per index is necessary, the impact this would have is miniscule beside the time needed for the memory access. Modern compiler technology and CPU-memory performance ratios have simply made any claim of the efficiency of zero-based arrays obsolete. Deco 09:12, 12 December 2005 (UTC)
Managed arrays
[edit]I'm looking for a place to rant and rave (as objectively as possible of course :-)) about the C++ vector<bool> specialisation standard-that-is-not-really-a-standard. Where would I go about writing such an article? Or do I just write a vector<bool> article and let someone else move it / splice it later? --The Extremist 09:14, 6 January 2006 (UTC)
- How about bit array? Deco 06:53, 14 June 2006 (UTC)
Differentiation
[edit]QUESTION: A company is seeking to determine the optimum length of the guarantee period on its product, a device whose reliability and propensity to failure can be predicted with a high degree of accuracy. It is known that the failure rate of the device can be adequately approximated in terms of a constant function of time i.e. about 10% of the devices sold will fail each year until all have failed at the end of 10 years. The company makes a profit of $2.80 on each unit sold. If it offers no guarantee, it can expect to sell 100 units annually. The length of the guarantee is of some importance to customers however, and some empirical studies have suggested that its effect on sales may be shown by the following equation;
Q=100+5t
Where Q=Quantity sold (in units) And T=Length of guarantee (in years)
The company sustained a cost of $1.00 for each unit which it had to repair under terms of the guarantee. In view of this and in view of the fact that 10% of Q would be expected to fail each year, it was believed that the cost of making guaranteed repairs under a guarantee period of length t would be;
C=$1.00X0.1QT =0.1QT
Profit in this analysis can be regarded merely as the gross profit from sales less the cost of making guaranteed repairs. It will be noted that certain simplifying assumptions have been made in the foregoing exposition. The assumed constant failure rate has already been mentioned. Also, it should be noted that constant sales volume has been assumed, and the effect on sales of all variables other than that of the guarantee period has been disregarded.
RFEQUIRED: Determine the optimum length of the guarantee period in order to maximize profits.
- Congratulations. You win the most bizarre question asked on a computer science talk page award. Deco 06:52, 14 June 2006 (UTC)
C & C++ have Dimension = 1?
[edit]In the "Array system cross-reference list", why are C and C++ listed as having Dimension = 1? What do they lack compared to languages listed under Dimension = n? Certainly both multidimensional arrays (arrays of arrays) and arrays of references (Iliffe vectors) are used in C and C++. --Spoon! 03:25, 25 July 2006 (UTC)
- Two dimensional slices like
Slice (5 .. 20, 5 .. 20)
- of course since C/C++ don't support array slices to begin with you won't notice the difference. --Krischik T 11:53, 30 January 2008 (UTC)
- The article goes far enough in allowing that c/c++ supports arrays. Essentially, "c arrays" are just pointers (with weak, compile-time type support) to chunks of memory. —Preceding unsigned comment added by 188.126.207.212 (talk) 12:46, 16 March 2011 (UTC)
"1-Dimensional or Linear Array" makes no sense
[edit]- 1-Dimensional or Linear Array
- An array that consists of elements which can be sorted either horizontally or vertically is known as a linear array or 1-dimensional array. The elements are sorted in only one direction (horizontal or vertical), that means the variable can hold only a single value.
Does this make any sense to anyone? What does "sorted" have to do with 1-dimensional arrays? (maybe it's "stored"?) And what does "horizontal" or "vertical" have to do with 1-dimensional arrays? --Spoon! 11:30, 25 August 2006 (UTC)
- That segment is now deleted. It basically repeated what the introduction said about arrays, using confusing language.
137.112.141.152 07:34, 19 October 2006 (UTC)
PHP Arrays?
[edit]Howcome PHP isn't on the Array system cross-reference list? -Dan Leveille 22:09, 17 November 2006 (UTC)
- Perhaps because PHP arrays aren't actually arrays, but a bizarre array-hashtable hybrid data structure that can be used to emulate traditional arrays. JulesH 19:45, 17 February 2007 (UTC)
Programming language array attributes table - show row- or column-major?
[edit]Don't know if this is a big deal, but MATLAB is column-major as well. Should we update the table of programming languages/array characteristics to indicate whether that language is row- or column-major?--Andrew Blackburn 16:20, 01 December 2006 (UTC)
Linear time insertion/deletion
[edit]So the article used to talk about how insertion/deletion in an array requires linear time - an editor recently pointed out that insertion/deletion in an array doesn't really make sense, as it's fixed in size. While I'm sure there's no issue with describing insertion/deletion in the context of dynamic arrays, I've still seen many algorithm books and references describe arrays as requiring linear time for insertion and deletion in contexts such as insertion sort, where a list of bounded size is stored inside an array. I attempted to explain this with the following paragraph, but it may not have been sufficiently clear:
- Use of arrays in an algorithm (eg, sorting) does not alter the behavior of the arrays access (ie, it has constant time for all elements of a static array) to be that of the algorithm. Inserting a new element into an already sorted list requires a variable amount of time (the proportionality factor will depend on the algorithm used). This is a property of the algorithm, not of the array access. Dynamic array access will depend on the algorithm used to implement dynamic array accesses. Derek farn 00:43, 27 July 2007 (UTC)
- If there is an upper bound on the size of a list, the list may be stored in a subrange of an array, typically at the beginning. This enables constant-time insertions and deletions at the end of the list, but requires Ω(n) time to insert or delete in an arbitrary location, whereas a linked list can do this in constant time.
I would appreciate any feedback on how better to organize and express this point. Dcoetzee 23:38, 26 July 2007 (UTC)
I think that the lead of this article should only be about fixed-size arrays, with a small section about dynamic arrays. Your example of using a fixed-size array to implement a list whose size is bounded is no different than using a dynamic array having first asked the data structure to allocate enough memory. So, I think that the best way to organize this information is to phrase like that and to put this information into the dynamic array page, if it's not already there. MisterSheik 23:45, 26 July 2007 (UTC)
- I think the article should cover both statically and dynamically sized arrays, after all it is about arrays. However, it should point out that statically sized is by far the most common and so spend more time discussing this case. There are commonly used languages (eg, awk) where all arrays are dynamically sized. The article currently contains general statements that are in fact only true for certain languages. These need to be corrected. In fact the whole article could do with a rewrite. Derek farn 00:43, 27 July 2007 (UTC)
- Dynamic arrays are not the same thing as dynamically-sized (runtime allocated) arrays - they're an entirely separate data structure built on top of fixed-size arrays (which include both static- and dynamically-sized arrays). We should not conflate these distinct concepts. Dcoetzee 00:52, 27 July 2007 (UTC)
- All arrays are dynamic in that storage for them is allocated during program execution (ok, this might be during program startup). I guess there are two kinds of arrays that might be called dynamic, rather than static. Arrays whose size is only known when the storage is allocated (ie, it is not known at compile time) and arrays whose size can increase as more elements are added (some languages use a different term to describe these arrays). Arrays whose size is known at compile time are often called static arrays (of course, just to make life complicated, in C++ the term static array is used to refer to a particular kind of fixed sized array). Derek farn 10:53, 27 July 2007 (UTC)
- Yeah, there'a bunch of conflicting terminology usage common to different language communities, as evidenced by the multitude of names for the dynamic array and the whole static/dynamic mix-up. I'm leaning towards agreeing with MisterSheik that it's more appropriate to compare linked lists with dynamic arrays, since they actually support the same operation set, and that the storage of a list in a subarray is what dynamic arrays do also. I will add a bit of text to link appropriately. Dcoetzee 00:44, 28 July 2007 (UTC)
More cleanup
[edit]Hoped I could fit this all in an edit summary. The attempted merge of array and dynamic array, followed by an incomplete cleanup, left the article in an odd state. I reverted and merged in unrelated changes. Then I discovered the article was still attempting to assert the nonstandard position that "dynamic arrays are a type of array", which is controversial at best, so I've eliminated any claim one way or the other and added a suitable disclaimer. I'm not happy with a number of other changes, like the new informal-to-the-point-of-misleading characterization of multidimensional arrays, but one problem at a time. The modified section title "comparison with linked lists" was modified again to fit my changes and the bizarre introduction of out-of-bounds accesses as only a "source of bugs" was moved and rewritten. Dcoetzee 10:06, 1 August 2007 (UTC)
Dynamic array edit war
[edit]So it seems there's been a recent series of drastic changes to Array and Dynamic array based primarily in a terminology conflict: basically, whether or not a dynamic array is a kind of array or not. I hope we're past the point of attempting to merge Dynamic array into Array, but there's still the issue of how the two ought to be related on their own pages.
First of all I want to avoid referring to fixed-size arrays as static arrays for the reason stated in the article: it's too easily confused with statically-allocated arrays. I also don't like the terms "statically/dynamically-sized", as these aren't widely used and are even more easily confused with the fixed-size versus growable distinction. On reflection I wish I'd titled Dynamic array as Growable array or something like that to prevent confusion with dynamically-allocated arrays - maybe it ought to be moved.
Next, is a dynamic array an array? Sort of and sort of not. You can see it as a kind of array or an array variant that is extended to support efficient resizing and insertion and deletion at the end. On the other hand, you can also see it as a more complex data structure built on top of ordinary arrays. NIST defines a dynamic array as "an array whose size may change over time" and appears to include dynamic arrays as a special case in its liberal definition of arrays, whereas others restrict the term "array" to fixed-size arrays. I don't believe this wording was intended to include arrays with expensive resizing operations like using realloc() in C - conflating dynamically-allocated and growable arrays would only produce confusion. The most useful conceptual distinction to make is between fixed-size and growable arrays, which have significantly different implementations and operation sets. I think the current wording in both articles is consistent can at least be interpreted in a way that's consistent with NIST, which is at least one authoritative source, and it creates useful distinctions. Is anyone unhappy with the way things look now?
As a gentle reminder, please look carefully at the entire content of a set of changes before changing them. I have had unrelated changes reverted on a number of occasions. I hope we can reach a reasonable consensus that reflects not only our understanding of these terms but wide usage in the literature and education. Dcoetzee 11:34, 1 August 2007 (UTC)
- I may be biased because I wrote a large chunk of the wording that has just been reverted by Dcoetzee again, but I think the original wording is more concise, technically correct and captures the essentials of what needs to be said. Perhaps somebody will venture an opinion? Derek farn 20:11, 1 August 2007 (UTC)
- I also think the original wording is more concise. MisterSheik 20:46, 1 August 2007 (UTC)
- As I mentioned, I object to the terminology "statically-sized" and "dynamically-sized" as these are both nonstandard (can you name any source that uses these terms?) and easily confused with "static" and "dynamic". The terms I see used in C-related material are "statically-allocated" and "dynamically-allocated." I'll change just these words. My wording was intended to also avoid a false dichotomy: there are many other data structures that may fit the general definition of an array, and not all arrays can be classified as static or dynamic. I will try to clarify this with the smallest possible change. Dcoetzee 06:19, 2 August 2007 (UTC)
- Your objection to "statically sized" is certainly valid (the usage is certainly wider than C, but not meaningful for a reader not familiar with programming). I think we should use "fixed size". I think "dynamically sized" is in general usage (Google returns 100K hits, so it must be ;-) Derek farn 10:02, 2 August 2007 (UTC)
- Okay. I still object to the false dichotomy created by the "classified as" terminology, which I mentioned and fixed twice. But it's good like it is. Thanks for your useful discussion. Dcoetzee 14:17, 2 August 2007 (UTC)
- Either the array can grow, or it can't. How can there be another option? MisterSheik 17:00, 2 August 2007 (UTC)
- The issue is that I consider static and dynamic arrays as defined here to be concrete data structures implying a particular implementation, and so they fail to encompass all fixed-size and variable-size array data structures. The general array data structure definition requires primarily that it support constant-time indexing, and it could be be implemented in a completely nonconventional manner, like some kind of linked data structure. Again it's a question of how you're defining the terms. Dcoetzee 22:04, 2 August 2007 (UTC)
- I agree with MisterSheik that "grow" or "can't grow" cover all possible arrays. But allow me to point out that in C, there are 3 points along the "fixed" vs "growable" spectrum. For example, if you wanted a structure to store (among other things) a person's name, one might use (based on an example from the C FAQ)
- (1)
struct employee{ ... char namestr[80]; }; ... void editname( char namestr[80] ){ char buf[80]; ... }
an array with a fixed size set at compile time. - (2)
struct employee{ ... int namelen; char namestr[]; }; ... void editname( char namestr[], int length ){ char buf[length]; ... }
an array whose size is "fixed" in that it cannot be grown after it is created, although some people still call it a variable-length array since size is not set at compile time. (I think this was added in C99). - (3)
struct employee{ ... int namelen; char *namep; }; ... void editname( char **namestr, int *length ){ char *buf; ... }
an "array" that is growable using realloc().
- (1)
- Is this a quirk of C, or are these 3 types generic enough to mention all 3 in this array article?
- --68.0.124.33 (talk) 08:04, 30 March 2008 (UTC)
- I agree with MisterSheik that "grow" or "can't grow" cover all possible arrays. But allow me to point out that in C, there are 3 points along the "fixed" vs "growable" spectrum. For example, if you wanted a structure to store (among other things) a person's name, one might use (based on an example from the C FAQ)
"Array" section of article
[edit]The first section of the article was rather bizarre:
==Arrays== Variables normally only store a single value but, in some situations, it is useful to have a variable that can store a series of related values - using an array. For example, suppose a program is required that will calculate the average age among a group of six students. The ages of the students could be stored in six integer variables in C programming language: <source lang="c"> int age1; int age2; int age3; ... </source> However, a better solution would be to declare a six-element array: <source lang="c">int age[6];</source> This creates a six element array; the elements can be accessed as <code>age[0]</code> through <code>age[5]</code> in C. (Note: in [[Visual Basic .NET]] the similar declaration <code>Dim age(6) as Integer</code> will create a ''seven'' element array, accessed as <code>age(0)</code> through <code>age(6)</code>.) ===Implicit array=== A new array object is implicitly created when an array initializer expression is evaluated; this can occur when a class or interface is initialized, when a new instance of a class is created, or when a local variable declaration statement is executed.
At best, this (excluding paragraph before and after the subsection "Implicit array") is an elementary example of use of an array, but it's probably not complete enough to be useful for anyone who doesn't already know what it seeks to explain. (A more realistic example could be to compute the average score over 6 tests.) If the section is reinserted, it should probably be named "Example" or similar.
The Visual Basic .NET paragraph hints at the split (similar to 0/1/n-based indexing) over whether the number in an array declaration should be the size of the array or the index of the last element. This would be useful to mention in the article, but not in such a specialised context.81.231.33.217 (talk) 12:48, 30 June 2008 (UTC)
Remove Unreferenced tag?
[edit]The article contains one reference, so is it appropriate to remove the Unreferenced tag?
Also, would it be appropriate to increase its quality rating? It seems to meet C-class criteria easily. 81.231.34.61 (talk) 21:10, 10 September 2008 (UTC)
Index of the first element / example in french elevators
[edit]The article states:
- "The 0-based/1-based debate is not limited to just programming languages. For example, the ground-floor of a building is elevator button "0" in France, but elevator button "1" in the USA."
Which is wrong. In french, there is a word rez-de-chaussée meaning "groung floor", while étages are levels over the ground floor. Thus, on elevator buttons, étages are pointed to by ordinals from 1 up, while the unique rez-de-chaussée is (rather logically) referenced either by "RdC" or "0". In other words, what is adressed by the article's statement is a difference of word semantic scope. While it obviously tried to state that : "There are cases in real life where zero is used as an ordinal". Which may be right (?) -- yet precisely not in the case cited as a typical instance!
For couter-arguments, just watch languages. In english, ordinals and cardinals are equally called "numbers" -- compare with german Nummer/Zahl or french numéro/nombre. Yet one may still notice that zero's novelty is witnessed by the everyday use (actually un-use): people don't say "zero apple", "zero person", or "zero time"..., do they? Rather "no apple", "nobody", "never"... --Denispir (talk) 21:31, 8 October 2008 (UTC)
modulation
[edit]In Array#Index_of_the_first_element I'd like to add the point that zero-base is also more natural for applications where an arbitrary subscript is reduced to the appropriate range with a mod operator; but I don't see where it will fit well. Comments? —Tamfang (talk) 08:12, 1 November 2008 (UTC)
- Do you have a good example? Rgrig (talk) 01:24, 24 April 2009 (UTC)
- For example, some kinds of hash table often reduces a number to the appropriate range of array index (the range varies over time with dynamic resizing) with a mod operator. --DavidCary (talk) 08:32, 23 January 2021 (UTC)
Proposal to split the article
[edit]I did an extensive cleanup of the article, trying to clearly distinguish the two concepts "array data structure" (language-independent stuff) and "array data type" (language-dependent stuff). While much remains to be done, I now think that the article should be split along that line. The two concepts are clearly separable and each has enough meterial to fill a large article. The separation should also help reduce editorial conflicts, that often seem to arise from confusion between the two concepts. What do you think? All the best, --Jorge Stolfi (talk) 04:53, 12 May 2009 (UTC)
- Just to clarify, would this be a fair characterization of the 2 topics?: Array-the-datatype is the C-like low-level datatype implemented as a sequential block of memory. Array-the-data-structure is the more abstract notion of an ordered collection of items accessible by integer indices (without regard to the underlying implementation; e.g. it is represented using an associative array in some languages, though this is uncommon); these sorts of arrays are often also Lists. --Cybercobra (talk) 05:44, 12 May 2009 (UTC)
- No, "array data type" would be about the many ways that programming languages define array-like *types*, and (broadly) how they implement them. In that page one would discuss 0-based vs 1-based vs n-based, A(i) vs A[i], syntax for array declarations, slicing and array construcutors, parameter passing, static vs. runtime-sized arrays, etc. That page would be the present "Array data types" section, duly expanded.
"Array data strcuture" would be mostly about the concrete data strcuture that consists of a collection of elements whose addresses are computed from integer index tuples — without mentioning programming languages, compilers, libraries, type systems, etc.. (That is, stuff that Von Neumann could have written back in 1954. 8-) It may still retain a section on "Abstract array data structure" which is what you mentioned above (like the section in the present article), but probably augmented with slicing operations and performance requirements that will practially exclude hash table implementations.
In short, one is about the high-level-language-programmer's view, the other is about the machine-language-programmer's view. Makes sense? --Jorge Stolfi (talk) 07:08, 12 May 2009 (UTC)- Ok, that seems reasonable but I'm a bit iffy on the proposed page titles/nomenclature. I would have thought that the datatype article would cover the low-level view and the structure article would cover the high-level view, since datatypes are generally lower-level than data structures.--Cybercobra (talk) 09:13, 12 May 2009 (UTC)
- Funny, I have exaclty the opposite view ( perhaps for being in the other hemisphere? 8-). To me, data structures are low-level, language-independent concepts that existed well before high-level languages and type systems, and can be perfectly described, implemented and analyzed without ever using the word "type". (That was indeed a conscious decision that Knuth made in the ACP books. Perhaps not good for marketing, but made for a more concrete and precise view.) Data types are high-level concepts that exist only within a specific programming language and its compiler, and were invented precisely to hide the implementation details of data strcutures such as integers, arrays, records, strings, etc.. Thus, for example, the distinction between signed and unsigned integers is meaningful mostly when discussin data types in particular languages; as data structures, they are the same (N-bit arrays packed into a word so that the ADD instruction can operate on them). When implementing a data structure in a high level language, one must necessarily do so through that language's type abstractions, but that does not mean that the data strcuture is built "from" the data types -- rather, it is built "through" them, or even "emulated" with them.
Perhaps you are thinking of "abstract data type" and "abstract data structure", which are the same concept (as far as I can tell) but quite distinct from both "data type" and "data structure". ADTs are theoretical concepts, much as real numbers or linear operators. All the best, --Jorge Stolfi (talk) 18:17, 12 May 2009 (UTC)- I agree with Jorge Stolfi. "Data type" tells you what operations are supported, but not how. So you can have a data type of finite sets, supporting element insertion and deletion and test for membership. This data type has many possible implementations, which have different efficiency characteristics. It is possible that a processor directly realizes the operations of a given data type, but that is only exceptionally the case. A "data structure" specifies a specific way of laying out the storage of a container data type in linear RAM memory using pointers and such, thereby giving an implementation for that data type. For example, the article Heap (data structure) tells us: "The several variants of heaps are the prototypical most efficient implementations of the abstract data type priority queues." --Lambiam 21:49, 17 May 2009 (UTC)
- Funny, I have exaclty the opposite view ( perhaps for being in the other hemisphere? 8-). To me, data structures are low-level, language-independent concepts that existed well before high-level languages and type systems, and can be perfectly described, implemented and analyzed without ever using the word "type". (That was indeed a conscious decision that Knuth made in the ACP books. Perhaps not good for marketing, but made for a more concrete and precise view.) Data types are high-level concepts that exist only within a specific programming language and its compiler, and were invented precisely to hide the implementation details of data strcutures such as integers, arrays, records, strings, etc.. Thus, for example, the distinction between signed and unsigned integers is meaningful mostly when discussin data types in particular languages; as data structures, they are the same (N-bit arrays packed into a word so that the ADD instruction can operate on them). When implementing a data structure in a high level language, one must necessarily do so through that language's type abstractions, but that does not mean that the data strcuture is built "from" the data types -- rather, it is built "through" them, or even "emulated" with them.
- Ok, that seems reasonable but I'm a bit iffy on the proposed page titles/nomenclature. I would have thought that the datatype article would cover the low-level view and the structure article would cover the high-level view, since datatypes are generally lower-level than data structures.--Cybercobra (talk) 09:13, 12 May 2009 (UTC)
- No, "array data type" would be about the many ways that programming languages define array-like *types*, and (broadly) how they implement them. In that page one would discuss 0-based vs 1-based vs n-based, A(i) vs A[i], syntax for array declarations, slicing and array construcutors, parameter passing, static vs. runtime-sized arrays, etc. That page would be the present "Array data types" section, duly expanded.
Abstract array axioms
[edit]The abstract array axioms in the current version are:
- get(set(A,I, V), I) = V
- get(set(A,I, V), J) = get(A, J) if I ≠ J
- for any array state A, any value V, and any tuples I, J for which the operations are defined.
Methink that this is not OK because:
- To write an sbtract algorithm that scans an array, one needs to know the set of valid indices. This requires additional operations and possibly other axioms.
- These axioms say nothing about time and space costs, so they allow implementations where the cost of get/set is proportional to the size of A (or worse). This is not good when analyzing the performance of algorithms that use arrays.
- The axioms also allow an Awk-style array that creates any element that gets assigned to, but fails when reading a non-existent element.
- In fact they are satisfied by a "pea-brained" Awk array where each set(A,I,V) creates A[I] and destroys all other elements.
My idea for fixing the last two bugs was to introduce a "FAIL" value to be the result of accessing an invalid index, and postulate that get(FAIL,I) = set(FAIL,I,V)=FAIL for all I,V; and then remove the phrase "for which the operations are defined". However then the first axiom is violated if I is an invalid index.
Does the paper say anythng about these problems? --Jorge Stolfi (talk) 02:54, 13 May 2009 (UTC)
- The axioms are pretty standard (at least in the research communities dealing with program correctness), except the names used are typically not get and set. Let's take your complaints one by one. (1) Yes, you will need additional axioms to specify particular types of arrays. These axioms only capture the essence of an array. (2) Yes, correctness and efficiency are orthogonal issues. (Although there are attempts to analyze efficiency using logic, but they are rather clumsy at the moment.) (3) Yes. Again, they are supposed to be general enough to describe the essence of what "array" means. (4) No, you are mistaken and I do not see how you reached this conclusion. Finally, regarding your fix: While introducing a special value is certainly possible, note that the current axioms simply cannot be applied to deduce that something is read from a location that was not written, and as such it cannot be proved using this axioms that the read value from an unwritten location is constrained in any way. Just Google for "array axioms" and you will find a plethora of resources, pretty much all using the same axioms. Rgrig (talk) 16:58, 10 September 2009 (UTC)
- Also, looking at the history of the article I see that at some point it said that I stands for a tuple of integers (!!!) and other such nonsensical things. Perhaps it's better it was removed in the end. Rgrig (talk) 17:08, 10 September 2009 (UTC)
Requested move
[edit]- The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.
The result of the move request was not moved Aervanath (talk) 18:10, 25 May 2009 (UTC)
Swap Array data structure and Array data type — Data types are primitive (e.g. Machine data types), whereas data structures are higher-level and more abstract; therefore, this article, which describes the more low-level concept, is misnamed. See also the section "Proposal to split the article" above. --Cybercobra (talk) 05:45, 16 May 2009 (UTC)
Survey
[edit]- Feel free to state your position on the renaming proposal by beginning a new line in this section with
*'''Support'''
or*'''Oppose'''
, then sign your comment with~~~~
. Since polling is not a substitute for discussion, please explain your reasons, taking into account Wikipedia's naming conventions.
- Support per my above rationale. See also data type and note how its classes include comparatively low-level types/concepts. --Cybercobra (talk) 05:44, 16 May 2009 (UTC)
- Against data structures are language-independent, relatively concrete concepts ("ways to organize data in a computer's memory") whereas data types are abstractions that "exist" only within a programming language (cf. type declarations, type systems, type theory, etc.) That how data type defines data types, BTW. --Jorge Stolfi (talk) 06:23, 16 May 2009 (UTC)
- Against as well. A data type can be primitive or complex. In any case, it's a more abstract term than a data structure. For example, a class is a data type and its instances can contain several data structures. Jafeluv (talk) 10:05, 16 May 2009 (UTC)
- Oppose a programming language may not have an array data type, but that does not mean you cannot create an array data structure. The only primitive machine data types are register types. 76.66.202.139 (talk) 20:52, 16 May 2009 (UTC)
- Oppose per my comment above at #Proposal to split the article. --Lambiam 21:55, 17 May 2009 (UTC)
Discussion
[edit]- Any additional comments:
- I say, merge the articles. Whether you call them types or structures is an implementation detail that varies from language to language. Plenty to be said besides that; whether arrays are treated as a primitive or compound by syntax is not an important enough distinction to merit two different articles. -- Beland (talk) 06:39, 21 May 2009 (UTC)
- But the point is that "data structure" is a concept that could (and has been) defined and discussed without ever referring to "programming language", "syntax", or "type system"; whereas "data type" is a concept that exists only in the so-caled "higher programming languages" and cannot be discussed independently of them. This distinction is clear, for example, in Knuth's The Art of Computer Programming, which is still a basic reference in the field, and in Paul E. Black's NIST Dictionary of Algorithms and Data Structures (which does not even have an entry for "type"). Data types are abstractions that those languages provide so that programmers can handle memory words and (concrete) data structures with safety and convenience — much as those languages substitute abstract "commands" for "CPU instructions". And "abstract data type" (or "abstract data structure") is an even more abstract thing — a mathematical concept (like a group or graph) that can be used to model both data structures and data types. Sure, often programmers use "data type" and "data structure" interchangeably, to mean any of those three terms, much as one may say "a wine bottle" when one actually means "a wine bottle full of wine", or describe sake as "rice wine". Merging array data type and array data structure (or data type and data structure) is as appropriate as merging wine and wine bottle. As the old philosphers would say:
- When I use my finger to point at the Moon, no one would mistake my finger for the Moon. But why is it then, that when I use a word to point to an idea, people so often mistake the word for the idea? -- attributed to Zhuangzi, c. 350 BC.
- All the best, --Jorge Stolfi (talk) 19:57, 21 May 2009 (UTC)
- But the point is that "data structure" is a concept that could (and has been) defined and discussed without ever referring to "programming language", "syntax", or "type system"; whereas "data type" is a concept that exists only in the so-caled "higher programming languages" and cannot be discussed independently of them. This distinction is clear, for example, in Knuth's The Art of Computer Programming, which is still a basic reference in the field, and in Paul E. Black's NIST Dictionary of Algorithms and Data Structures (which does not even have an entry for "type"). Data types are abstractions that those languages provide so that programmers can handle memory words and (concrete) data structures with safety and convenience — much as those languages substitute abstract "commands" for "CPU instructions". And "abstract data type" (or "abstract data structure") is an even more abstract thing — a mathematical concept (like a group or graph) that can be used to model both data structures and data types. Sure, often programmers use "data type" and "data structure" interchangeably, to mean any of those three terms, much as one may say "a wine bottle" when one actually means "a wine bottle full of wine", or describe sake as "rice wine". Merging array data type and array data structure (or data type and data structure) is as appropriate as merging wine and wine bottle. As the old philosphers would say:
- I say, merge the articles. Whether you call them types or structures is an implementation detail that varies from language to language. Plenty to be said besides that; whether arrays are treated as a primitive or compound by syntax is not an important enough distinction to merit two different articles. -- Beland (talk) 06:39, 21 May 2009 (UTC)
- The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.
Array structure?
[edit]These articles about arrays have become a collection of ORIGINAL RESEARCH which is not the purpose of WP. An array is called just an 'array' and hardly ever called an 'array structure'. The hair-splitting activity w/r/t to 'array data structure' and 'array data type' is similarly obscure. Someone please provide definitive reference to the legitimacy of the terms and their definition. The new set of articles is confusing and almost incomprehensible even for knowledgeable readers. Side-by-side comparison of the original with the new clearly reveals that the original array article was superior article for encyclopedic use. Kbrose (talk) 06:33, 16 May 2009 (UTC)
- MOVED MATERIAL FROM PERSONAL TALK PAGE: Hi, you removed the Template:distinguish:Array data type from array data structure and added "original research" and "expert attention needed" in addition to the "unreferenced" and "cleanup needed" tags that I had already put there. Now, I won't call myself an expert, but I have been programming data strcutures for 40 years, and I am quite familiar with language design, type theory, etc.., from COBOL to Modula-3. So, Yes, those articles still need a lot of checking and editing; but I dare say that they are a bit better than the original Array article. The main problem of the latter (which is still present in data type and other related articles) was a pervasive confusion between "data strcutures", "data types", and "abstract data types", which are three very different things. What I did was basically separate the first two. Besides, much of the material in the two new articles came from the old Array article. So, if you feel uncomfortable about some parts, please be more specific,and we may discuss them. All the best, --Jorge Stolfi (talk) 07:56, 16 May 2009 (UTC)
- I did not remove that template, it appears you did. I added the extra tags for expert verify and original research. You should not remove them without providing references for the edits and changes in terminology. If you don't call yourself an expert you should further not remove the requested expert verify tag. As to confusion, it appears to be greater now than ever. The articles are literally incomprehensible now because of use of obscure poorly defined terms. Within your logic someone could write articles about any datatype, say integer, with a theoretical discussion of integer structure, integer storage, integer data type and possibly others, which is absurd in the context of a general audience encyclopedia. Lastly, you should have discussed this on the talk page where I pointed out the problems, not here. Kbrose (talk) 16:43, 16 May 2009 (UTC)
- No, it is quite the opposite. My view is that distinct concepts should be in separate articles. But one needs at some point to discuss connections and constrasts between the two concepts. The lead section may not be the best place for that, and/or the writing may be improved. But would you agree that the split was correct? --Jorge Stolfi (talk) 17:02, 16 May 2009 (UTC)
- I did not remove that template, it appears you did. I added the extra tags for expert verify and original research. You should not remove them without providing references for the edits and changes in terminology. If you don't call yourself an expert you should further not remove the requested expert verify tag. As to confusion, it appears to be greater now than ever. The articles are literally incomprehensible now because of use of obscure poorly defined terms. Within your logic someone could write articles about any datatype, say integer, with a theoretical discussion of integer structure, integer storage, integer data type and possibly others, which is absurd in the context of a general audience encyclopedia. Lastly, you should have discussed this on the talk page where I pointed out the problems, not here. Kbrose (talk) 16:43, 16 May 2009 (UTC)
- Sorry, what part of it is "original research"?
The distinction between "data structure" and "data type"?
Do you dispute that data types are defined within a specific language, and that one can do data strcutures (but no data types) in assembly language? (The Knuth book is still a reference for data structures, but all coded in "pseudo-assembler" with not a hint of data types)
The lead sections of the two articles could be much improved, agreed. But please note that the confusion was inherited from the lead section of the old "Array" article, which got very dense and confusing because it had to explain the two senses of the word. That was one of the reason why the article had to be split.
Surely, 99% of the time people say just "array" because it is clear (or it does not matter) which of the two senses is meant. Also 99% of programmers have only handled array data structures through array data types in high-level languages, and many have never thought that the two are different things. But when writing a WP article about "car" you must separate "car (automoble)" and "car (train wagon)" and "car (lisp functiion)", even though people (including train conductors and Lisp programmers) say just "car", even when talking about two senses at the same time.
In this case, the abstraction that a particular language calls "array type" may OR MAY NOT be implemened by an array structure. Conversely, what a language calls "list type" may be actually implemented as an array structure.
Discussing the "array" data types of all languages in the same article together with the array structure (and any other structures that can be used to implement them) is like discussing clothing fashions in the "human body" article (together with dogs, scarecrows, and all other things that one can put clothes on). Just because human bodies are hardly seen without clothes, and clothes are made to fit the human body, it does not mean that they should be discussed together. All the best, --Jorge Stolfi (talk) 16:42, 16 May 2009 (UTC)- PS. Indeed, now that the articles have been split, after the lead section one can write just "array" — which was not the case before the split. --Jorge Stolfi (talk) 16:52, 16 May 2009 (UTC)
- If datatypes are language-specific, then surely something requiring such low-level use of memory as this article's topic requires language-level support? --Cybercobra (talk) 01:06, 17 May 2009 (UTC)
- Many languages that provide array types will translate them to "true" arrays as defined in this article, so this article is relevant to all of them. (Programmers need to be aware fo the addressing formula to understand what A[-1,-1]=0 does, why the cache is/isn't working, why 2D arrays are transposed when calling a FORTRAN routine from Pascal, etc.).
The "systems programming languages" will let you do pointer arithmetic, so you don't even have to use the language's array types to work with array structures. (C/C++ have array types, but many C/C++ programs out there create and manipulate d-dimensional arrays, with multilinear addressing, without using a single square bracket. And assembly languages are often used for array processing, even though they have no data types at all.)
In languages that provide "true" arrays but only of rank 1, you can get matrices and more by computing part of the indexing formula explicitly, e.g A[i*c + j*d] to get A[i,j]. (I can't recall any notable language in this class — not surprising, since multiple indices are very easy to compile.)
Even those few languages that actually prevent the programmer from using "true" arrays (such as Awk, whose "arrays" are all hashtables; or some dialects of Lisp, where all you have are S-expressions) still use "true" arrays internally. Obviously their interpreters/compilers must be written in some other language (usually C, C++, or assembly language), at least in part.
So, "language independent" means that the topic of this article is relevant for all programmers, irrespective of which languages they can/must use, whether those languages have "array" types, and what those types actually mean. All the best, --Jorge Stolfi (talk) 02:54, 17 May 2009 (UTC)
- Many languages that provide array types will translate them to "true" arrays as defined in this article, so this article is relevant to all of them. (Programmers need to be aware fo the addressing formula to understand what A[-1,-1]=0 does, why the cache is/isn't working, why 2D arrays are transposed when calling a FORTRAN routine from Pascal, etc.).
- The continuous rename or move actions and requests show that the terminology used here is not clear and is non-standard. Please provide reliable references for all this. Kbrose (talk) 16:32, 16 May 2009 (UTC)
- Sorry, what "continuous rename or move actions"? There was only *one* split of "Array" into two articles (and a consequent move of "Array (disambiguation)" to "Array") --Jorge Stolfi (talk) 16:42, 16 May 2009 (UTC)
- I daresay I am somewhat of an expert in this field. I have a Ph.D. in computer science with a dissertation on the design of programming languages. I was the secretary of the Ada Distinguished Reviewers committee which reviewed language design decisions. I have participated in the design of multiple languages and on the design and implementation of multiple compilers and interpreters, including the design of their runtime architectures (i.e. the concrete data structures used to implement the language data and control structures). I have reviewed a variety of obscure programming languages for the DoD. And of course I have used a large variety of programming languages.
- I agree with Stolfi that there are several concepts of 'array' which are often conflated. However, I disagree strenuously with his initiative to make array into a disambiguation page -- the array data structure, the array data type, and associative arrays are not three applications of a very broad concept in completely distinct disciplines (like say phased antenna arrays and DNA microarrays), but three very closely related concepts in the same field -- which is why, in fact, it is often useful and sensible to conflate them.
- There is a central core concept of 'array' -- a structure whose components are denoted by an index and its various implementations -- which can usefully be discussed in the mother article. Then, additional detail can be added in subsidiary articles. --macrakis (talk) 00:06, 18 May 2009 (UTC)
- At best, the terms used to separate the articles aren't particularly well-established or canonical. At worst, they are arbitrary, possibly original research, and possibly incorrect. --Cybercobra (talk) 00:29, 18 May 2009 (UTC)
It looks like the definition of Array Data Structure cites Dictionary of Algorithms and Data Structures definition of array, which says, "Array (data structure) An assemblage of items that are randomly accessible by integers, the index." This matches the WP definition of "Array Data Type", and not so much the WP "Array Data Structure" definition, and says nothing about indexing by physical address (itself a questionable link in this day of virtual memory) by formula. Additionally the ADT information from the DADS entry appears on the Array Data Type page, not here, but is cited here, not there.
AOCP1 is also cited, and Knuth does have a simple formula on p 244 where he talks about: Linear Lists (which are 1D arrays per p 4 & 629) with Sequential Allocation. A similar formula appears for multidimensional arrays on p 299 with Sequential Allocation, where Sequential Allocation is just a straight-up allocation of a block of words (i.e. a normal array data structure). However, Knuth goes on to then discuss "arrays" with Linked Allocation (i.e. a linked list) on pages 254 and 301. That is to say, he is speaking of arrays in the WP "Array Data Structure" sense when he is talking about Sequential Allocation, but in the WP "Array Data Type" sense when he is talking about Linked Allocation. But since they're all "arrays" to him, WP is making a terminology distinction that Knuth does not seem to be making in the same way.
So I'm not sure the definitions we're using here are well-backed by their citations.
Plus, the two pages are very confusing for reasons I don't quite understand (and I'm a pretty experienced computer guy). Arrays as an ADT and as an implementation whether by "sequential allocation", "linked allocation", or otherwise, are a very simple concept to present, so there is no reason why the articles should be anything other than crystal clear. I'm willing to take a stab at it, but there's obviously a lot of history here and I'm reluctant to get involved. So just put this down as a vote for "needs work". Beej71 (talk) 12:29, 7 January 2010 (UTC)
iliffe array
[edit]The implementation of the iliffe array presented here makes the assumption that each row is allocated separately. However, Numerical Recipes in C provide a version of such vector with better locality by allocating the data as a contiguous memory block and have the indexing array points to beginning of rows in this block. Tests show few to no overhead between this and a traditional Dope vector. Would it be worthwhile to speak of this distinction ? Joel falcou (talk) 18:27, 4 January 2012 (UTC)
Arrays not analogous to vectors or matrices
[edit](Referring to the 3rd paragraph) Arrays are not analogous to mathematical vectors, matrices or tensors, but rather to the objects called tuples in mathematical terminology. I appreciate that they have often been named this way, and there may even be cases where it is helpful to explain them like this, but the statement as it stands is not correct. I believe at the least there should be a mention of tuples. I am not aware of the mathematical term for multidimensional tuples.
Example: Vectors and the related objects are linear objects. An array of customer details is valid in computer science, but it has no meaning to act on it using coordinate transformations, linear functions, inner products or any of the other machinery associated with linear mathematics. Conversely Maxwell's equations contain vectors but do not contain any arrays (or tuples). When a vector can be represented as a tuple/array, it must have elements that are members of a field, this is not the case for a general array.
I see that the paragraph doesn't have a reference, so unless there is some disagreement or discussion I will delete it, or possibly edit it.
David Drakard — Preceding unsigned comment added by 86.182.183.171 (talk) 22:20, 2 October 2012 (UTC)
linked list O(1) insertion/deletion in the middle?
[edit]I know this is a common myth, but am surprised to see it in a Wikipedia article. Moreover, how can we read both O(n) for indexed access and O(1) for insertion/deletion in the middle, since to insert or delete item #i in a list, one must first get there, meaning perform indexed access. Can someone expose this better? I would happily do it, but since I have no reference, edits would probably be quickly removed by ll fans, and I don't want to start an edit war (better abstain).
denis 'spir' (talk) 18:09, 9 December 2012 (UTC)
"A programmer (or a sophisticated compiler) may use this information to choose between ...
[edit]... row- or column-major layout for each array."
Very nice, but in what actual programming language is that possible"
- Several languages and libraries for numeric computing, e.g. NumPy. I'm not aware of compilers that do this. QVVERTYVS (hm?) 10:51, 25 February 2016 (UTC)
A Commons file used on this page has been nominated for deletion
[edit]The following Wikimedia Commons file used on this page has been nominated for deletion:
Participate in the deletion discussion at the nomination page. —Community Tech bot (talk) 23:38, 26 July 2019 (UTC)
Discuss delayed and manifest arrays
[edit]We should have a section that discusses different conceptual representations of arrays i.e. manifest and delayed. On a high level:
Manifest arrays
- exists in memory;
- contain trash from the memory without initialisation
- can be hidden behind functions to act as delayed arrays
Delayed arrays
- do not have to actually exist;
- are represented by a function of type (!! :: ix -> a) where ix is the index type and a is the type of the values of the array;
- Multidimensional array access translates to function composition
Delayed arrays seem to become more and more prominent in languages such as Haskell, Accelerate, Single Assignment C, Halide, Repa, Massiv and Futhark.