🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

05.01 - Data Structuring and Organization

Started by
3 comments, last by Teej 23 years ago
Problems and Solutions The essence of programming is, when given a problem, devise and implement a solution. One's ability to come up with clear, proper and efficient solutions is directly proportional to their knowledge of data structures, code organization, speed/efficiency issues, and of course the programming language itself. It basically all comes down to having a lot of tools available to you (i.e. knowledge of different programming techniques), and knowing which tools to use for the job. If, for instance, you don't have a decent amount of powerful tools under your belt, you'll be hard-pressed to come up with a viable solution to a given problem. As well, using the wrong tool increases the complexity of the problem or creates unreasonable speed/organizational penalties. Let's look at a simple example. If I asked you to write a routine that displayed the numbers from 1 to 5 to the standard output, what possible solutions are there? Which is the best? Look here:
int n = 1;
while (n < 6)
{
    printf("%d\n", n);
    n++;
}
 
int n = 1;
do
{
    printf("%d\n", n);
    n++;
} while (n < 6);
 
for (int n = 1; n < 6; n++)
{
    printf("%d\n", n);
}
 
int n = 1;
display:
    printf("%d\n", n);
    n++;
    if (n < 6) goto display;   
Actually, here's the best solution:
printf("1\n2\n3\n4\n5\n");   
There are two points I'm trying to make here:
  • There's always more than one way to do something
  • The ideal solution is a mixture (or tradeoff) between processing speed, readability/intuitiveness and simplicity
The sad truth is that there's always a single ideal solution to a problem, but you won't always know what that is, so you have to make comprimises in order to come up with the best solution that you can. Of these comprimises, the one that you can eradicate altogether is a lack of knowledge of applicable techniques. Oh, and by the way, that what's were here for. Here's a quick example of a case in point: if you're writing a simple phone book application, you're ultimately going to have to store data. Assuming that all records must exist in memory, your solution is going to be based upon your familiarity with memory allocation. Quite simply, if you don't know how to implement a linked-list or other such dynamic memory allocation scheme, you're stuck with the traditional array. So, what's the difference? The size of an array has to be specified at compile-time, which means that you're going to have to either (a) limit the number of entries in your phone book, or (b) choose some arbitrarily large number that will likely lead to wasted memory. If I lifted the 'in-memory' restriction, those with the 'know-how' could implement a phone book that could hold as many entries as the user's hard-disk would allow. And I haven't even gotten into access/search times... The same is true with game development -- even more so! Folks, this isn't a Visual Basic form we're writing here, so we always have to be intimately aware of speed and memory. Speed ties directly into our frame rate, so we're always 'on the clock', which means our rendering, AI, physic, etc. all have to be extremely fast. Memory is a finite resource which has to be managed properly, in the video card and sound card as well as in the system itself. Even the nature of today's processors can influence how we code, as we need to be aware of data type conversion speeds, mathematical operation speeds, onboard cache behaviour, special instructions and other optimizations. The Heart of Game Development I don't know how better to emphasize this point, other than to beg you to read it carefully : The most important aspect of game development is the organization of data . Data structures are the heart of a game, and the logic that manipulates that data comprise the arteries. Your choice of data structures is going to decide the outcome of your entire game -- if it ever gets finished, if it works properly, how long it took you to write it, how robust it is, and how upgradeable it is. If you choose your data structures wisely, your game will almost write itself. If you choose poorly, you'll either be constantly hitting brick walls and coding around them (making your game too slow, code-ugly and unmanageable), or you simply won't be able to continue without scrapping the entire project and restarting. Don't believe me? It's too bad if you don't, because proper data organization is the mark of a good programmer; one who finishes projects in the shortest time with the best results. Here's the reason: When you structure your data properly, you and the code are mentally synchronized -- you both have the same visualization of the game in terms of the data that is to be manipulated. As you program the logic surrounding this data, your code follows a manageable and intuitive path because the data actually dictates how the rest of the game is written. If you think about it, that's exactly the reason why people love C++ so much -- data and logic are connected by their organization. There's no need to harp-on about data structuring, as you'll be gaining first-hand knowledge on this most important of lessons as we practice designing games. I'm going to end this section with two of the most important statements I can make about game development, or programming anything for that matter. These are more important than you may realize, so let them sink in: Your data structures dictate how the rest of your game is written. If you choose your data structures wisely, your game will almost write itself. Fundamental Data Structures Data structures are nothing more than collections of data. It is not the data itself that is important, but rather the method in which the data is organized. But why are there different methods of organizing data? Well, in order to answer this question, we need to take an evolutionary journey; one of necessity. Let's start with nothing but the actual language itself. Keep in mind of course that the fundamental data structures that we're going to 'discover' are independant of the language they're implemented on, and are used universally in software development. Generally speaking, at its lowest level a programming language gives us the facility of assigning areas of memory for our own use. In the C language, this is analogous to declaring variables. A given language specifies possible data types, which serve to define memory sizes, and what the contents of that memory actually represents. These are called atomic data types, and (in C) are int , char and float . From these three atomic data types, all other data types must originate. It is the art of mixing, matching and manipulating the existing data types and language features that produces the variety of common data structures in use everywhere. (1) Fixed Data Structures Fixed data structures are those whose memory requirements are explicitly stated. In other words, if you know how much memory you need ahead of time, these are a good choice. (1.1) Arrays An array is a data structure used to store a collection of data items that are all the same type. Each data item in the array is called an element, and elements in an array are implicitly numbered (starting at zero). Arrays can also contain arrays as elements, allowing them to abstractly represent dimensions (for example 2d and 3d arrays). Folks, arrays are your friends. Strike that -- arrays are your best friends. More often then not, any programming problem facing you can be described in terms of arrays. This should be self-evident when you think of the generic abstract concept of grouping and ordering data, both properties of arrays. An array may not always be the right data structure for the job, but it's definitely the most versatile, so don't underestimate it! As we start looking at developing actual games, you'll see time and time again the array's uncanny ability to represent things efficiently -- especially when you start manipulating arrays of multiple dimensions containing structures, themselves containing arrays, and other such crazy mixtures. (1.2) Structures As with arrays, structures are aggregate data types -- collections. A structure is simply a collection of data items. Each data item in the structure can be of any type, and all items become the 'property' of the structure:
struct PERSON
{
    char firstName[80];
    char lastName[80];
    char emailAddress[80];
}
 
PERSON me;   
Here, me is a variable of type PERSON. This variable contains three fields, which are accessable like so:
strcpy(me.firstName, "Tim");
strcpy(me.lastName, "Boston");
strcpy(me.emailAddress, "teej@gdnmail.net");   
Structures are indespensible as data organizers. We use structures to logically group data items, and since structures themselves define a data type, can be used in turn in arrays and other structures. They are analogous to fields in tables of a database, and are intuitively used in that fashion. It should be noted that arrays and structures, when used together, can be formed to represent any data storage requirement. This isn't to say that they're the ideal solution all of the time, because you have to take speed and memory requirements into account. (1.3) Unions Unions are very similar to structures, with one major exception: members of a union are mutually exclusive. Take a look at this example:
union PERSON
{
    char name[80];
    int  idNumber;
}   
Here, PERSON contains two fields, but only one of them can be used. In other words, the compiler sets aside enough memory for the largest single field, and no more. In this example, either a name or an ID number is needed to identify a person, but not both. Here's another possible use for a union:
struct MESSAGE
{
    int type;
    
    union
    {
        int commandCode;
        char szData[64];
    }
} 
Here, we can look at the type variable in order to determine whether the commandCode variable contains a message command, or the szData variable contains some generic data that needs to be stored. Of course, I'm just making this example up, but you should note that it's up to you to decide which union field is valid -- the computer won't keep track of it for you. (2) Dynamic Data Structures Dynamic data structures make use of the operating system's memory allocation facilities to provide memory as it's needed. These data structures are designed to use only as much memory as is needed, making them ideal if your needs aren't fixed. Note that with dynamic data structures, the term 'node' is used to refer to a data item, or more specifically, a data item containing actual data. These descriptions do not include implementation details -- see 03.03 - Selected C Topics for material related to implementing these data structures. (2.1) Linear Data Structures The word linear, in this context, can be taken to mean 'in succession'. These data structures are all based on the fact that data items are stored sequentially, which has its benefits and down-sides, depending on what they're used for. (2.1.1) Linked Lists A linked list can be thought of as an array of data items, but the number of items stored in the array can grow or shrink as needed. Each node in a linked list contains not only the data item(s) to be stored, but one or more pointers to other nodes which allows the nodes to be 'chained' together (thus the term 'linked list'). A linked list always has a first node, called the head or root. Successive nodes are attached to each other from the head of the list, and the entire list can be traversed (just like going through the elements of an array) by following node pointers. As a new node is needed, memory is allocated for it, and the node is attached. When a node is no longer needed, it is removed from the list, with the remaining nodes (if any) reattached to each other to fill the gap. There are other linked list variations, one being the doubly-linked list, which chains nodes together in both directions so that the list can be traversed either way. Other variations use additional pointers to nodes in the linked list, just as bookmarks are used in books. Just as with its fixed counterpart the array, linked lists are a common staple of software developers, and are used wherever sequential storage is desired and memory requirements aren't fixed or may vary greatly. (2.1.2) Stacks If you know anything about assembly programming, you definitely know what a stack is. If you don't, think of a stack of plates in a cafeteria -- you know, where the plates are spring-loaded in a table. Stacks are called LIFO (Last In, First Out) data structures, and are described in terms of 'pushing' and 'popping' data nodes. Using our stack of plates, pushing a data node would be similar to placing a plate onto the stack, and popping a data node implies removing the top plate from the stack. If you push three data nodes (or plates) onto the stack, and then pop one out, you'll get the third (the last) node (or plate) which is on the top -- hence it's called a LIFO data structure. Guess how local variable memory is organized in C? Yup, stacks. The same is true for function parameters. As a matter of fact, it's good to think of a stack as temporary 'memory' for a program, as that's what it's good at emulating. For instance, if you are processing some piece of data in a program, and that particular data leads you to process something else, you can push this data onto the stack, go off processing somewhere else, and then pop the data off again when you're ready to continue. A good example of this is in scripting languages where there are expressions that need to be evaluated: 13 + (6 * (3 + x)) If your program needs to process an expression like this, it can use a stack to hold intermediate values: STACK (bottom to top): (empty) EXPRESSION: 13 + (6 * (3 + x)) When the first bracket is reached in the expression, we need to evaluate expressions in brackets first, so we push 13 onto the stack for later retrieval: STACK: (bottom to top): 13+ EXPRESSION: 6 * (3 + x) Here again we need to temporarily put away a value to tackle more brackets: STACK (bottom to top) 13+, 6* EXPRESSION: 3 + x Once we access the variable x (say it's 7), we now have STACK (bottom to top): 13+, 6* EXPRESSION: 10 Since 10 is a value (and not an expression), we now work backwards by popping values from the stack: STACK (bottom to top): 13+ EXPRESSION: 6 * 10 And finally, STACK (bottom to top): (empty) EXPRESSION: 13 + 60 = 73 We know we have the final value because our stack is empty. Granted, stacks are tricky little devils in that it's hard to visualize when they're the answer to a problem, but I assure you that they do have their uses...just think of them in terms of depth (levels), path-following and temporary 'memory', and you'll be able to conjure up a use for one when you're stuck with a seemingly wierd programming problem when the time comes. It's also interesting to note that a stack can be used to implement an algorithm instead of recursion -- the two are interchangeable. A stack can be implemented in a number of ways, with a linked list variation being the most common. Here, a pointer is maintained for both the front (head) and rear (tail) of the linked list, serving as positions for stack pushes and pops. (2.1.3) Queues A queue is also a linear data structure which is used something like a stack (i.e. pushing and popping), but is FIFO (First In, First Out), which means that items are popped from a queue in the same order that they're pushed. A great way to think of queues is to think of people standing in line for something -- 'first come, first served '. Quite simply, a queue can be used wherever items are stored and processed independently. A good example would be a process running two threads, where one thread receives incoming messages from a socket connection and places them into a message queue for the main thread to process. As far as implementation is concerned, a queue is a minor variation of a linked list where nodes are 'popped' from the front (head) of the list, and 'pushed' to the rear. (2.2) Heirarchial Data Structures Data structures that fall under this category usually take the shape of inverted trees, where one data node is considered the root (at the top), and branches stem from this root containing nodes which in turn branch to other nodes. These tree stuctures make use of a set of popular terms, with which you should become familiar:
  • root : the top-most (head) node.
  • parent : any node which branches off to other nodes. Every node has a parent except for the root.
  • child : any node stemming from a parent node.
  • sibling : nodes that share the same parent node.
  • leaf : any node that has no children.
  • level : all nodes that are the same number of 'generations' away from the root node.
  • depth : the number of levels in a tree structure.
(2.2.1) General (K-ary) Trees Tree data structures are hierarchial, which means that nodes are related to other nodes in terms of ownership or descendency. Because trees can be utilized in so many different ways, one often labels the type of tree data structure in question by the number of possible branches a node can have. For instance, a tree having nodes with two possible branches is called a binary tree, a 3-ary tree has nodes with three branches, and a K-ary tree generally speaking has nodes with K branches. Because of the one-to-many relationship between nodes, there are various techniques for 'walking around' in a tree structure in order to extract node data; these are called preorder, inorder and postorder traversals. Basically, traversal starts at the root node, and then proceeds to visit each of the nodes in the tree in a specific order until all tree nodes have been visited exactly once. Nodes are added and removed from tree structures in an implementation-specific manner, but is similar to that of the linked list insofar as maintaining node pointers is concerned. It's worthy to note that tree structures tend to make heavy use of recursion in their implementation; this simplifies and shortens the amount of code needed, and works intuitively with the concept of trees and subtrees. The two driving forces that would have one utilize trees in a program are hierarchial organization of data and search speed. As far as organization is concerned, you'll know exactly when a tree structure fits your program's needs -- it's a rather common relationship that data elements have. As for speed, if your data is placed into the tree a certain way, searching can be amazingly fast. In the next section, we'll look at the speed-demon search tree... (2.2.2) Binary Search Trees A BST (Binary Search Tree) is nothing more than a binary tree (i.e. a tree with two possible children per node), but have the special designation as being lightning-fast for data searches. For this reason, people use binary search trees for all types of data -- they don't need to have a hierarchial relationship. So long as your data can be ordered (sorted), you can harness the power of the BST. The trick to using BSTs is in the placement of data nodes. Starting at the root node, you compare the data you'd like to add, to the data in the root node. If your data is logically 'before' or 'less than' the root node's data, you follow the left branch downwards; otherwise take the right branch. You then continue to perform comparisons until you find an empty node, and place your data there. When it comes time to search the tree structure for a particular data item, the search is logarithmic -- you can find a particular data item in a collection of 1,000,000 in less than 20 tries (assuming that the tree is balanced)! Think about it: every time you follow a branch, you are effectively ruling out half of the entire tree, so it doesn't take long to narrow down your search to the exact data item you're looking for. (2.3) Graphs Graphs resemble tree structures in that there are nodes with relationships to other nodes, but graphs have two differences:
  • graphs don't necessarily have a root node
  • nodes can be connected to any other nodes
Like with the general tree structure, graphs have their particular uses, and these only become evident when confronted with a certain type of programming problem. Graph structures are meant to mimic physical graph diagrams as an intuitive aid to the programmer when attempting to describe such a scenario in 'computer terms'. Perhaps you're writing a space exploration game, and would like to store information on planets, solar systems and travel routes -- this is the type of information graphs can handle for you. An adventure game with rooms, dungeons and pathways is another candidate for a graph data structure, as the graph itself can be used to determine the contents of a room (data) as well as viable travel routes to other rooms (pointers to other data nodes). Much of the time, you'll have a data structure in your program that's technically a graph, and you won't even realize it because of the loose specification of what a graph is. Basically, if you've got data nodes with pointers to other data nodes, it's considered a graph. Technically, a tree is an instance of a graph as well; there are just some rules imposed on the structure itself. A Quick Reference Here's a quick reference of the data structures covered above:
  • Fixed Data Structures
    • Array
    • Structure
    • Union
  • Dynamic Data Structures
    • Linear:
      • Linked List
      • Stack
      • Queue
    • Hierarchial:
      • Tree
      • BST (Binary Search Tree)
    • Graph
Arm Yourself When you think about it, data structures are nothing more than rules that are imposed on how data is organized in a program. The fixed data structures that we discussed are part of the C language itself, and the dynamic data structures all make use of pointers, memory allocation and other fixed structures (notably structs). There are many different ways of structuring data, and this list is not meant to be exhaustive, but it does fairly represent the core data structures at your disposal, so you'd do well to be familiar with each of them. When you come across a situation that calls for one of these data structures, you can either implement the data structure directly into your code, or take advantage of the collection of data structures that comes with your compiler, which are part of the STL (Standard Template Library), and also provided as MFC classes. Take a look at 03.03 - Selected C Topics for material on some of these data structures and information on how they can be implemented. If I haven't gotten around to writing up an article on one and you have a question, feel free to ask; many here are already well-versed in using and implementing these (and others). The important thing to realize is that these data structures are common because they address recurring problems in software development. There's always more than one way to do things when programming, but it's smart to take advantage of the culmination of quality, reusable data structures which have proven themselves. Finally, you should be aware of the tradeoffs inherent in choosing a data structure. For instance, an array provides sequential data item access, is very fast (if you know the index of the data item you want), but needs to be declared in advance (i.e. at compile time) and has a relatively slow linear search. Linked lists (and others that use them) grow dynamically, but involve some overhead in terms of memory allocation and pointer maintenance, and are also linearly searched. Trees are fast for searches, but insertions are slow and trees may be severly unbalanced, penalizing their search speed. But hey, what's the best way to get comfortable with the fundamental data stuctures? By using them, of course. I did say that we were going to be writing a game in this section, so let's take our first stab at designing a game with our new weapons ready... Questions? Comments? Please reply to this topic. Edited by - Teej on July 4, 2001 5:42:21 PM
Advertisement
I was wondering... I know vectors are from the C++ side of the STL. so you might not want to go into them. But what kind of efficiency do they have, compared to arrays. I love using vectors because I don''t need to set aside memory for the whole thing at initialization, and they are great when I don''t know how many objects will be put on one. Is it slower to access individual elements, either sequentially or randomly, and as the vector grows, does it really reinitialize itself in a new memory block whenever it out grows the one it is in, and anyway, does any of this really matter?
While I''m at it, what about list, also from the C++ STL, compared to linked lists... any idea of the efficieny here? I''ve read conflicting articles about using the C++ STL, some say it is about as efficient as you can get, especially if you are trying to write your own containers, while others say they are slow and not to bother with them.
Anyhow, keep up the good work Teej, and don''t fret too much about your supposed procrastination... I think that what you have already accomplished is way more than any of us should have hoped for, especially considering the price/effort we need to put into getting this info. To have so much in one place, and at, what I believe, everyone''s favorite price, FREE You really only should concern yourself with our praise, and take heart that you have helped many on a long and ardorous path, one lit by your nuggets of wisdom and marked by your signs of knowledge and ability to convey in a meaningful, yet introductory way. With these, hopefully we all will find our way to the end, wherever that may be.
Well.. enough of the ass kissing.. Let''s program this puppy up. I''m feeling tetris-bound
I found this forum this evening and have read through to 4.00. This is by far the best game tutorial I have seen. Although I have not been able to download the BaseCode1.zip file. Could someone help me obtian that file?

Thanks...


I just wanna write my signature under those well choosen words from Mitijea above.

There´s growin some serious liking around here

Crox
Mitijea:
STL Vectors are basically arrays that take care of the busywork for you. They guarantee fast random-access, but can be slower when inserting and deleting items in the begining or middle. They automatically resize themselves when they outgrow their memory, I think the default is just to double in size every time that happens.

STL Lists are doubly-linked lists. They are really fast for inserting and deleting anywhere, and accessing the first or last element, but are slow for random-access.

(Vectors and lists have mostly inverse effeciency, what one''s good at, the other isn''t.)

http://www.sgi.com/tech/stl/ - seems to be a fairly definitive guide

-j

This topic is closed to new replies.

Advertisement