Data Oriented Design – The Numbers

There’s recently been an interesting discussion on twitter about Data Oriented Design.  I’ve followed the arguments, seen the presentations, but i wanted to try it out for myself.  More importantly, as a programmer obsessed with optimisation, i wanted to see if it can really make that big a difference to performance.  I mean seriously, after over 30 years of x86 processor development, can a virtual call or a couple more bytes in a structure still matter?
To kick things off, i’m going to declare a couple of objects, a Car and a Truck.  Using classic (good, according to uni) object oriented design, both of these can be moved, so lets inherit both from a common Movable class.  I’m then going to update all my Movable objects, as would be done once per frame in a game.
typedef vec_type ...;

class Movable
{
	vec_type pos;
	vec_type speed;
	virtual void update(float timestep)
	{
		pos += speed * timestep;
	}
};

class Car : public Movable
{
	void* model;
	void* wheels[4];
	void* damage;
	void* driver;
	int type;
	virtual void update(float timestep)
	{
		Movable::update(timestep);
	}
};

class Truck : public Movable
{
	void* model;
	void* wheels[6];
	void* damage;
	void* driver;
	int type;
	virtual void update(float timestep)
	{
		Movable::update(timestep);
	}
};
So, nothing too complex here. I’ve padded the Car and Truck with a few extra fields we’d likely see on them, just to get some useless data to fill up the cache when we update each object.
Now onto the UpdateManager classes. I’ve written several of these to test out the different methods of updating all the Movable objects. The worst one i could think of was a std::vector of pointers to the Cars and Trucks. All the objects are also “newed” so that they shouldn’t all be contiguous in memory, ie, i’m going to gets lots and lots of cache misses.
#define NUM_OBJECTS 1000000

class VectorUpdateManager
{
	vector<Movable<vec_type>*> objects;
	/* constructors, etc */
	void update(float timestep)
	{
		for (auto it = objects.cbegin(); it != objects.cend(); ++it)
		{
			auto object = *it;
			object->update(timestep);
		}
	}
};
Next up, a fixed-sized array instead of a vector, so that we don’t have the overhead of the iterator doing the update. We still have pointers so the data won’t necessarily be contiguous.
class VectorUpdateManager
{
	Movable<vec_type>* objects[NUM_OBJECTS];
	/* constructors, etc */
	void update(float timestep)
	{
		for (int i = 0; i < NUM_OBJECTS; i++)
		{
			objects[i]->update(timestep);
		}
	}
};
Then a fixed-sized array of each of Car and Truck so we have no virtual calls. As the array is of objects and not pointers, we also now guarantee data is contiguous. This should avoid alot of the slowdown with cache misses.
class VectorUpdateManager
{
	Car<vec_type> cars[NUM_OBJECTS / 2];
	Truck<vec_type> trucks[NUM_OBJECTS / 2];
	/* constructors, etc */
	void update(float timestep)
	{
		for (int i = 0; i < NUM_OBJECTS / 2; i++)
		{
			cars[i].update(timestep);
		}
		for (int i = 0; i < NUM_OBJECTS / 2; i++)
		{
			trucks[i].update(timestep);
		}
	}
};
And finally, the Data Oriented Design. Here we have an array of just the Movable class so no virtual calls, and the data is as small as necessary for an update.
class VectorUpdateManager
{
	Movable<vec_type> objects[NUM_OBJECTS];
	/* constructors, etc */
	void update(float timestep)
	{
		for (int i = 0; i < NUM_OBJECTS; i++)
		{
			objects[i].update(timestep);
		}
	}
};

Results

So, 4 cases. Vector of pointers, array of pointers, 2 arrays of objects, and the data oriented approach. The times, in µs, are:

Vector of pointers: 86961
Array of pointers: 77188
2 Object arrays: 71645
Data oriented: 37521

Going from std:vector to an array gets us about a 10% speedup. An array of pointers to an array of objects, another 10%. But losing all that extra bloat and going data oriented gives us a 100% speedup. I’m stunned by this. Like I said earlier, i’ve seen the presentations, but to see the numbers for myself amazes me. I really thought that the enormous effort Intel and AMD have put into caches, memory buses, etc, would lessen the effect of bad coding, but i was wrong. Sure this was a simple example, and the update method in real games would be more complex, but that would only alleviate the virtual function overhead and from these numbers, its the cache misses that dictate performance.

Next up. I can get this down to 6100 µs. Yes, that’s another 600% faster than my best result here. And all it took was SSE and for Visual Studio or Intel to do something I still don’t understand.

Advertisements
This entry was posted in C++, Data Oriented Design. Bookmark the permalink.

34 Responses to Data Oriented Design – The Numbers

  1. Dino Dini says:

    One thing to consider: at the moment your figures of 100% speedup are based on a very small amount of processing: a single line in fact. Most of the time, objects have a lot more going on: maybe some physics, collisions, animation and so on. Additionally, there is all the processing required to actually draw the object.

    It would not take many such lines of code to change the significance of the result. At the moment, the overhead is probably greater than the actual executed update code. If, however, we increase the complexity of the executed code to even just ten times its current time… this would diminish the value of your optimisations significantly.

    You may find that in the real world, one million objects being updated doing something more significant than a one line addition would take… let’s guess… if each objects functioning code took even just one microsecond (which is pretty damn fast), the entire update would be 1 second, of which in your worst case scenario 86 milliseconds out of 1000 would by taken up in overhead, or 8.6 %. Your saving would be about 4%.

    This is not to diminish your point, but I felt it should be put into perspective. I argue that some small abstractions (not necessarily OOD lead) would easily give you flexibility to make large and small scale optimisations (one advantage of data decoupling and abstraction… if you do not believe me, think of the concept of a cache… it relies on some kind of abstraction interface between the caller and the data). Such optimisation opportunities often result in very large speed gains, which would wipe out such a tiny performance loss as 4%.

    • petecubano says:
      Hi Dino Dini

      Thanks for the interesting comments and analysis. You’re right that i’ve done an unrealistic amount of work per update here. I’ve pushed things to an extreme that perhaps gives too big a win when going data oriented. I’ve improved the update function slighty to add acceleration and some very very simple boundary collions (lets assume a 2d world or something for now). The results greatly reduce the significance of data oriented design. The new results are:

      Vector of pointers: 118566
      Array of pointers: 100268
      2 Object arrays: 87025
      Data oriented: 70832

      Now understandably, all of my variants take longer, but the benefit of data oriented design is nowhere near 100% any more, any this is still very simple code. However, i’m still convinced this approach can give big wins in performance. The draw case you mentioned is one i want to look into because draw only needs the position field of a Movable object, not its speed, acceleration, etc, and so a data oriented draw method will demonstrate far better cache locality. On x86 this might not be a big deal because of the DirectX overhead, but on PS3 much of the draw overhead is modifying memory buffers directly and DMA which have potentially lower overhead.

      • Dino Dini says:

        How many objects are you trying to manipulate at 60 FPS? By my calculation, your modified version takes 0.07 microseconds per object.

        There are 16666 microseconds per frame at 60FPS, meaning that you could operate 235298 objects. With your worst case example, this is reduced to ‘only’ 140563 objects.

        It is hard to think of a game that would require 1000 objects active per frame, but let’s go with that for the moment.

        At 60 FPS, this would allow about 16 microseconds per object. From your measurements, the saving through use of completely data focused design is about 0.05 microseconds per object, making the difference as a percentage about 0.3%.

        Although I have seen statements elsewhere by other people to the effect that the use of virtual functions would cause frame drop problems with as few as 400 objects, I think the math proves that at the per object level, the costs of supposed ‘non data orientated programming’ are in fact so small as to be irrelevant compared to other areas of optimisation in a game engine.

        Your statement: “I really thought that the enormous effort Intel and AMD have put into caches, memory buses, etc, would lessen the effect of bad coding, but i was wrong.” is misleading. Clearly, if the realistic performance cost of an abstracted object system is, for example, 0.3%, it could be that the programmer chooses to take that hit in order to gain the many benefits of good program structure… and this would not be “bad coding”. And I repeat again that the modularity and flexibility awarded by a modest abstraction layer provides many opportunities for optimisations which could not be done otherwise.

        A good example of such optimisations is that when the engine is presented with a complete database of the scene to be drawn, it is possible to apply optimisations to the entire scene, such as culling, with full knowledge the scene to be drawn.

        Thus I do not consider the techniques I use, which involve a thin abstraction layer between the object level and the data level, “Bad programming”. Quite the opposite.

      • petecubano says:

        I agree that my numbers are unrealistic in current games, but as YE has said, games are updating 10000s of objects already. And even with your numbers, updating 140000 objects per frame is only possible if you have a whole core available which is allowed to take a whole frame to update all the objects. In reality, many PS3 developers are stuck with the majority of the code squeezed into just 2 PPU threads. Especially towards the end of projects, I imagine many developers would be more willing to refactor code to avoid virtual functions or cache misses than learn SPU programming to get the framerate they need.

        Also, I’m running a 2GHz x86 core. It may not be fast these days, but as YE pointed out, so many developers are writing for a 700MHz in-order processor and may need all the help they can get to hit 60fps.

        As for the “Bad programming” comment. Apologies. It wasn’t meant to be offensive. I use OO programming in all my programs. My point was meant to be that for certain situations there are alternatives to OO which may be worth considering. As part of this i’m eager to work out when those situations arise, and equally, how easy is it to alter the code to deal with them. An abstraction layer is preferable when organising the code and is easier for most programmers to understand, but if an abstracted piece of code is in the critical path of a game loop then I think its important to know of ways to speed it up. Ultimately, a programmer may choose to lose the abstraction and readability of code if it gets them the performance the need.

    • petecubano says:

      Oops. Looks like we both replied at the same time. Luckily we do appear to have agreed that while OOP has a major and valid role in programming, alternatives are important to consider too.

      • Dino Dini says:

        Excellent! That’s is all I ask for. I advocate balance and the right tool for the job, and am asking for clarity in this debate, which often seems to appear to be a form of virtual function bashing for the sake of it. Mike Acton, for instance, when I challenged the idea that DOD was the solution for everything in programming, would not budge an inch . So I am a little trigger happy on trying to address the balance. My efforts to keep perspective on the middle ground and the bigger picture get me labelled an extremist by extremists, but that’s just the same old story that’s been part of human nature since forever.

        As for the 10,000 objects per frame active at once, can you give me a breakdown? When I think of objects, I am thinking of players, enemies, weapon fire and so on. 10,000 objects with full behavior/AI all at once would seem impractical.

  2. Gregory says:

    Just a side note, is Movable templatized or not? At first it isn’t, then it is.

    • petecubano says:

      Yeah, sorry about the confusion. It is. I meant to save the templating for later as i’ve also done work on SSE for the vector type to see how much performance that gives me

  3. Gregory says:

    Also between vector of pointers and array of pointers, I find the “overhead of the iterator doing the update” explanation a bit blurry.

    I would have said it’s not about the iterators themselves. If std::vector‘s iterators are not typedefs to pointers then they should provide the same efficiency in release builds (in debug iterators might assert correctness).

    Maybe the compiler failed to optimize the cbegin() and cend() calls out of the loop? And since you’re using prefix increment on the iterator, I really wouldn’t have blamed the use of iterators just by reading the code without actual numbers.

    Or maybe is it auto that causes overhead?

    • petecubano says:

      I was thinking the same thing. Using iterators really shouldn’t give an overhead, and this was a release build.

      This could be a combination of what you’re saying, and Dino Dini’s comments. The update method is just too simple, so if the iterator contributed even or 2 instructions more to the update loop than the array method then i’d see significant differences in performance. I didn’t see any major differences in the assembly generated by each way of iterating, but i’ll have another look just incase i missed anything.

      I’m also going to improve the update method to do some more realistic amount of work and redo the numbers so see whats going on.

      As for auto, it a front-end language feature and should have no impact on performance. But thats also worth a little test just for completeness.

      • Rachel Blum says:

        I might be wrong, but IIRC the extra price you’re paying for the iterators might be due to the fact that you call cend() for each loop comparison.

        As with all micro-benchmarks, a look at the generated assembly is crucial to understanding the results – otherwise, we’re practicing hand-waving and voodoo coding.

      • petecubano says:

        This is very confusing. I thought it might be the cend() call, but having done what you suggested and looked at the generated code, its almost identical! I’ve made the cend() change you suggested and here’s the code. Apologies if this doesn’t look good, but i don’t think wordpress can do assembly sourcecode. Oh, and i’m just posting the loop section, not the pre/post loop instructions.

        std::vector iterator with cend() change:
        00C81564 mov ecx,dword ptr [ebx]
        00C81566 fld dword ptr [timestep]
        00C81569 mov eax,dword ptr [ecx]
        00C8156B push ecx
        00C8156C fstp dword ptr [esp]
        00C8156F call dword ptr [eax]
        00C81571 add ebx,4
        00C81574 cmp ebx,edx
        00C8157A jne update<VectorUpdateManager >+64h (0C81564h)

        for (int i = 0; i < …):
        01031659 mov ecx,dword ptr [ebx]
        0103165B fld dword ptr [timestep]
        0103165E mov eax,dword ptr [ecx]
        01031660 push ecx
        01031661 fstp dword ptr [esp]
        01031664 call dword ptr [eax]
        01031666 add ebx,4
        01031669 cmp ebx,offset ___native_startup_state (1404DC0h)
        0103166F jl update<VectorUpdateManagerPointerArray >+61h (1031659h)

        The only instruction that differs is the compare but thats understandable. Both loops still iterate 1 million times. And the virtual call is to the same function. I made sure of that just in case i’d done something stupid there.

      • Rachel Blum says:

        (Would’ve replied to your asm comment, but WP doesn’t show a link…)

        This is showing the pitfalls of micro-benchmarks. There are many other reasons for the performance difference, one of which could be the simple layout of your functions – if one of the virtual functions called shares a cache-line with the main loop, there’s another reason for things going wrong. Next up, VTune or cachegrind 🙂

        (Or you can try forcing each function into its own section, which – on Windows, at least – mandates some sort of alignment. At least that way, cache misses are guaranteed and consistent across tests)

  4. Jon W says:

    It’s unfortunate that iterators still seem to have a cost, even in release mode. The whole design is that they shouldn’t have.

    When it comes to data driven design, you can also think of it as aspect oriented design. There’s a “Movable” aspect, so all Movables are updated together. Then there’s a “Posable” aspect, so all Posables are updated together. Keep going for other aspects. This benefits both caches and CPU branch prediction. Even when code gets more complex, branch prediction that works correctly does help, and should keep the performance edge of data driven design.

    • petecubano says:

      The iterators really are strange. I even used the constant ones just to give the compiler as much hint as possible that it can optimize. I’d like to try this out with different compilers and architectures to see if that improves the situation. Perhaps GCC or the EA STL instead of Microsoft’s code will give more expected results.

      Something that really interests me with this example, and with your comments about Movables and Posables is just how far to go with data driven design. Should the speed and position vectors in a Movable be in their own arrays in memory, or is leaving those 2 fields together ok? I guess it all comes down to the use in the game. Drawing all the objects will probably benefit from all the positions being in their own array, unless the game has motion blur in which case we need the speed available in the renderer anyway.

  5. YE says:

    Except that recent versions Visual Studio default to “checked STL implementation” even in release builds, which leads to real overhead of iterators.

    @Dino: you’re resorting to several logical fallacies in your arguments.

    First, many games have to work with 1000s objects simultaneously; think of something like Geometry Wars, for instance. Even in our admittedly pedestrian 3rd person action games, just the renderer considers on the order of 10 000 objects per frame (before deciding to actually renderer, say, 3000-5000, using 1000-3000 drawcalls).

    Second, a game would typically do much more to each of its thousands of objects than += of positions; if you are a good OOP citizen and do visibility, frustum culling, physics update, renderer update, render draw via separate methods – and possibly, by inheriting separate virtual interfaces for each of the aspects of the objects, you will do many virtual function calls and cache misses per object per frame.

    So the 0.3% hit you come up is far from reality.

    There will soon be 100 million deployed current-generation game consoles in the world with virtually the same crappy ~700-clock-L2-cache-miss, branch-mispredict-on-every-virtual-call PowerPC core. If you can sustain your business with ignoring them, more power to you – I can’t. We’ve done enormous progress in the last few months on performance by sifting through our codebase and removing the fossilized remains of some 90s OOP textbooks. The code is becoming smaller, clearer, and much simpler; the possible “future extensions” in the name of which all that OOP was written years ago never came, despites shipping 7 games in the meantime. The need for more performance, in contrast, came, and was very real.

    • Dino Dini says:

      Erm, ok… so let’s say your game has 10,000 objects. Playing a lot of Halo Reach recently, by the way. I don’t see 1000 objects, do you? Particle effects, by nature, ought to be grouped into one abstraction, i.e. “Put as blast here”. Animations such as swaying trees and things, similarly. Geometry Wars RE 2, which I play a lot of, has again lots of particles (which, if implemented as individual fragments instead of an expanding texture, might require special treatment).

      That would make a performance gain of 3%. So I really do not see what your beef is. I still do not see 3% as a problem, especially as high level optimisations will wipe that out, generally. Additionally, all the code aside from the wrapper being measured here would reduce the relative cost of the wrapper. So I am not sure why you bring that up, it reinforces my argument.

      Now having said that, I quite agree that textbook OOP is bad in certain contexts. But, please, don’t forget the bigger picture. That is all. Use the right tool for the job. Too many people are not being clear about the context when arguing the evils of things like virtual functions.

      • YE says:

        Yeah, fully agree with the “bigger picture” part. Updating AI behavior once a second? Be my guest, make a virtual function call. Initializing a new enemy that will live for thousands of frames? No problem, spend some time that will make its subsequent updates faster.

        But I draw the line at “per-object, per-frame”; at that particular frequency, on our particular CPUs, virtual function calls and random memory access patterns are a measurably Bad Thing.

      • Rachel Blum says:

        “Grouped into one abstraction” is exactly what DOD is about.

        And 3% is quite large, especially towards the end of the project. Also, keep in mind that the “abstracted object system” you’d like to see in games will most likely not only have one update call per object, but many. So suddenly you’re quite likely to burn 5% to 10% on cache misses caused by bad data organization.

        Not all data has the potential to behave that badly, but if you have large amounts of data, it would be entirely irresponsible to ignore that upfront, arguing that you can always fix it later. That is simply ignoring reality – you *will* have to fix it later, so let’s do it right the first time.

        I know somebody is going to come out claiming YAGNI, but this is a clear case where it doesn’t apply – if you know upfront that things are going to be too expensive for the full data set, you DO need to optimize & design for that.

  6. Pal-Kristian Engstad says:

    Interesting! I have long had a hunch that x86 processors also could benefit from DoD from a performance perspective, but I wouldn’t have guessed it would be that much. Even your more “realistic” experiment shows a gain, after all.

    Now, the main reason that people interested in optimizing code like DoD is not necessarily that the code automagically gets faster (though it seems it does), but that it makes it optimizable. If performance is still not enough, wizards can go in and do things that you would not think possible. For instance, perhaps your loop could be run using the GPU! It is possible when the code is written using DoD.

    I would also think that the code would execute a lot faster than the original on current generation consoles. It would be interesting to measure the difference.

    Finally, there was a comment regarding the importance of optimizing this specific loop. Dino is quite wrong – you won’t have a budget of 16 msec for all of the objects for this little piece. There’s a lot of other processing that needs to take place. This update loop “should” take only a small fraction of the frame-budget, perhaps 1-2% of the 60 Hz frame. If 1%, that means 1406 objects max – and right there you see the problem. DoD does matter!

    • petecubano says:

      For the purposes of being fully open and honest, I am running a Dual Core Petium T4200 laptop. So not the faster chip out there, but it still has a fairly big cache and outer of order exectution. I’m really eager to get into work and try this out on a PS3.

      An SN Systems presentation (link) said that a virtual function call has 1000 cycles worst case performance if missing both caches. 1000 cycles isn’t much i’ll admit, but once you scale up to 1000’s of objects it becomes an issue. Maybe that makes my example even more artificial as the virtual function overhead will probably be exaggerated on my code on PS3, but its still interesting to see.

      Interesting idea to run this on a GPU. There’s a rumour (or is it confirmed?) that Microsoft are going with an AMD fusion processor for the next Xbox. If thats the case then finding critical code sections like this to offload to the Fusion cores will be very important. Doing it automatically, or with a wizard would be great! And for that, automatically putting the DoD code on GPU would be relatively easy for a wizard to spot, but if code is too abstracted then it becomes an issue. As with anything, over time such tools will be improved to deal with greater complexity in terms of functionality and abstraction. In the current generation, porting this simple example from PPU to SPU, if virtual functions are in place usually involves patching v-tables and thats not something to attempt lightly.

    • Dino Dini says:

      I think you are missing the point of my argument, and declaring it wrong without comprehending it. There are 16 ms per frame. The analysis here is regarding *Specifically* the overhead of the object update system, nothing else.

      In a system with 1406 objects, the optimisation described would save a grand total of 49440 micro seconds for a million objects, which works out as 0.04944 micro seconds per object. Multiply that by 1406 and you get about 70 microseconds save, which is 0.4% and that’s the difference between the worst case and the best case examples.

      • Pal-Kristian Engstad says:

        No, I understood you perfectly. See – we’re suggesting to use DoD wherever it matters, and not just in one place. It all adds up, and the interesting things is, apparently – for not much sweat!

  7. Dino Dini says:

    I guess it is important to be clear what an “object” is. And that’s part of the problem. Unless we are clear about what we mean, we cannot be sure to communicate what we mean.

    Generally there are two side to objects in game designs. There are objects that are complicated in behavior, and objects that are simple in behavior. Examples: particles are simple and you need a lot of them. Enemies are complicated and you need not so many of them.

    It is good news that when things are complex, they tend to be less numerous. This means that you can manage the complexity with some (often claimed back through high level optimisations) loss of performance (in other words: no loss of performance and usually a gain).

    When you have objects that are simple and numerous, they also tend to be objects that have no side effects. Particles are usually for show. So they have no architectural implication and can be hacked together any old way that is just fast.

    • Pal-Kristian Engstad says:

      It is true that it is important to define what an “object” is. Well, actually – in DoD, we rather say that an object is the sum of all its data. And you are also correct in assuming that a more complicated object is less likely to show any coherency in code-paths. For instance, the “player” object in a third person action/adventure game is likely to be very specialized.

      However, even if you limit yourself to a small number of “complicated” enemy characters, there may be ample room for DoD! An example is vision checks. All enemies want to cast a number of visibility checks towards enemies & friendlies. Batch it up and run this as one process.

      The more you think of these opportunities, the more likely you are to see parallelism and thereby gain speedups.

      • Dino Dini says:

        The funny thing is, and a point I stress as much as it is ignored, is that I don’t need an education in DOD. DOD was the way I learnt to program. I moved away from “all coding should have a DOD focus” in the search for better software design. A search for Reusable code. Portable code. Cross platform development. Modularity. You know what I mean?

        Now I code with a balance. I create architectures that allow me to be fluid in how I apply DOD techniques, while still achieving these bigger aims. Yes I want it all. No compromise. Ironic that.

      • petecubano says:

        I read something with a similar point recently. It said that DoD was necessary when reading data meant accessing a tape drive, and this style of programming cotinued to be commonplace until DRAM speeds and caches mitigated much of the slowdown.

        As much as I approach this from looking at x86 and PS3 PPU, and in those cases its fair to argue on both sides of the DoD debate, I also do SPU optimisation work and that does benefit from simpler arrays of data. But that doesn’t mean all data should necessarily be reorganised to DoD, it just means that in some specific cases where offloading work to SPUs, it may be worth first processing the data on PPU to get it into a better form for SPUs. As always, it depends on the data and on the constraints of the game. You need enough time to do the PPU side data arrangement, and enough memory to have a PPU buffer to organise it into.

      • Dino Dini says:

        Indeed: one way of thinking about it is car design. The engine has lots of fast moving parts. It is complicated and efficiency is everything. All kinds of tricks are applied to make it work reliably and well.

        But the driver sees none of this. In comparison, the car controls are slow. The controls lightly used. There is actually not a lot that is exciting about the car interior. And yet, the perfect fusion of these two aspects of the design and fundamental to a successful car design.

        So it is with software design, if you think about it.

  8. Dino Dini says:

    “No, I understood you perfectly. See – we’re suggesting to use DoD wherever it matters, and not just in one place. It all adds up, and the interesting things is, apparently – for not much sweat!” –

    Erm… thanks for the clarification of what *you personally* are suggesting. What *I* am suggesting is that the small cost of an abstraction at the complex object level is worth it, and I believe I have demonstrated that to be the case. If you disagree, fine, but to me the maths does not lie.

    • Pal-Kristian Engstad says:

      I think we are talking over our heads… To clarify: I do actually agree with you, sometimes the abstraction overhead is acceptable. For things that execute once, or rarely – you can write your code in whatever suits you, slow object-oriented C++, Python, Haskell, C#, what have you – it doesn’t matter as long as you are within your budget! Another point is that for some code, ease of development as well as rapid iteration and safe code trumps other concerns.

      Having said that, I find it interesting that merely changing data representation in something simple as this provides a benefit even on an x86. That was surprising and new to me, and suggest that one should try to see if DoD can be used even on the game-object side of things.

      • Dino Dini says:

        I am glad we are coming to agreement. But I would also add that the optimisation opportunities afforded by abstraction is something completely ignored by many. Not once has that point been addressed or acknowledged by any of the ‘nouveau DOD advocates’ in any of the debates so far.

  9. Andrew Richards says:

    How are the objects ordered? Does the vector have Cars first, then Trucks? Or, are they interleaved? If they are interleaved, then you’re thrashing the I-Cache and branch target predictors. If they’re Cars first, then Trucks, then the caches and branch target predictors should make the virtual calls very cheap.

    • petecubano says:

      Currently its half Cars, then half trucks. I was thinking i should interleave them and probably also allocate them in a more random order as successive calls to “new” are probably likely to be contiguous anyway.

  10. Pingback: Data Oriented Design vs. Object Oriented Programming « Joshua Jacobs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s