interesting read, thanks! Indeed the test wasn’t complete.
Yes using a class instead of a struct was a fault I made. Out of curiosity I updated the test. First I added a field which is used inside the functions (still a class though). I’m on a different (much stronger) PC now so all results are better, so lets compare ratio:
Test array=00:00:03.6223599 [total calls=1000000000]
Test tree=00:00:03.5463660 [total calls=1000000000]
Test recursion=00:00:05.2259963 [total calls=1000000000]
Looks like recursive calls is a little less efficient here for some reason.
Now lets replace with proper structs and test only the array iteration, comparing it to the results above.
Code:
using System;
using System.Diagnostics;
struct Test
{
//public Test child;
public int testfield;
// random action
public int DoSomething()
{
int a = 2 + 5;
return a + testfield;
}
// random action recursive
public int DoSomethingRec(ref int total)
{
//if (child != null) { child.DoSomethingRec(ref total); }
total++;
return 1;
}
}
namespace ConsoleApplicationTest
{
class Program
{
static void Main(string[] args)
{
// create the array / tree
int arraySize = 10000000;
Test[] test1 = new Test[arraySize];
for (int i = 0; i < arraySize; ++i)
{
test1[i].testfield = i;
}
// method 1 - iterate array
{
int total = 0;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int x = 0; x < 100; ++x)
{
for (int i = 0; i < arraySize; ++i)
{
total++;
test1[i].DoSomething();
}
}
sw.Stop();
Console.WriteLine("Test array={0} [total calls={1}]", sw.Elapsed, total);
}
}
}
}
Output:
Test array=00:00:03.4940627 [total calls=1000000000]
Summary of all runs:
Test array=00:00:03.6223599 [total calls=1000000000]
Test tree=00:00:03.5463660 [total calls=1000000000]
Test recursion=00:00:05.2259963 [total calls=1000000000]
Test array with structs=00:00:03.4940627 [total calls=1000000000]
Still pretty much the same. Different runs give different results, each time different method is winning. Maybe it still doesn’t emulate cache miss correctly? Not sure.
Anyway I still think this optimization (assuming it works) doesn’t worth uglifying the code for, but I’m genuinely curious about seeing the difference between cache miss or not.
If you have any ideas or manage to make it work let me know, I think the experiment is still somehow faulty, because I do expect to see some difference.
Edit:
I also tried replacing the field with a class instance and bloat the original class/struct with longs, something like:
public long a;
public long b;
public long c;
public long d;
public long e;
public long f;
public long g;
public long h;
public long i;
public long j;
public long k;
Still same ratio, but the variations between different runs became bigger. Maybe the struct just have to be really big for it to make a difference.