Wednesday, November 30, 2011

PA4 – Implicit Conversions, RVOs and Proxy Objects!


Hello once again! I am trying to keep the pace here with the posts and hopefully you are enjoying them. Today I will talk about Assignment #4 (PA4) – C++ Efficiency. This assignment was a combination of three important topics about Efficiency in C++ programming: Implicit Conversions, Return Value Optimizations and Proxy Objects. I will give you a summary of each one in the following paragraphs:

  • Implicit Conversions: the whole idea behind this trick/optimization is to prevent coders from passing wrong-typed parameters to functions. For example, imagine that you have a set function for a particular member of a class and you do not want the user to pass a value that has a different type, how can you achieve this? Aside from turning up the compiler settings (warnings), there is one trick you could use:

Class Diego {
public:
          void setX( const  float  inX );
private:
          template <typename T> void setX( const T inX );
}

When this trick is applied, anything but a float variable will give a compilation error, therefore successfully preventing wrong-typed values to be passed as parameters. Sexy or what?

  • Return Value Optimizations: I will start by saying this: temporaries kill performance big time! (quoting famous philosopher Ed Keenan). The idea behind RVO is to remove as much temporary variables as possible in order to improve performance. By removing these temporaries we are also removing unnecessary calls to constructors, destructors, copy constructors, etc. Let’s look at the following example to better understand the idea:

With temporaries (No RVO):
Vect2D Vect2D::operator + ( const Vect2D &tmp ) const {
    Vect2D sum;
    sum.x = this->x + tmp.x;
    sum.y = this->y + tmp.y;
    return ( sum );
};

Without temporaries (RVO):
Vect2D Vect2D::operator + ( const Vect2D &tmp ) const {
    return Vect2D( this->x + tmp.x, this->y + tmp.y );
};

You might think this change does not improve performance a lot, but in my assignment I had a stress test in which a bunch of math operations were performed and repeated 5 million times, and just by using RVO I improved the time to 301ms down from 939ms originally. That’s a hell of an improvement in my opinion!

  • Proxy objects: this one is another trick to remove/reduce temporaries and copies from function calls. This method involves (ab)using operator overloading to: replace operations that involve more than two operands, to compare vector length using squared distance instead of square root, etc. I will show you guys an example for your entertainment and pleasure, try running/debugging it so you really understand  what is going on (it took me a while to get it):

struct VaddV {
    const Vect2D &v1;
    const Vect2D &v2;

    VaddV( const Vect2D &t1, const Vect2D &t2 )
        : v1(t1), v2(t2) {};

    operator Vect2D() {
        return Vect2D( v1.x + v2.x, v1.y + v2.y );
    }
};

inline Vect2D::VaddV operator + ( const Vect2D &a1, const Vect2D &a2 ) {
    return Vect2D::VaddV(a1, a2);
};

Try performing an addition operation between three (3) vectors and you will see this baby up and running.

Well my fellow readers, I hope these tricks are useful for you, and if not try using them as opening lines in your next date (not recommended). See you later! Adios!

Tuesday, November 29, 2011

PA3 – Another memory system!


Hey guys! Long time no see, eeeeeh..... not!

Another assignment submitted, and if you have been following this blog for a long time (seriously?) you will know that I have “been there, done that” with this assignment. This one is Assignment #3 (PA3) – Memory System.

The whole idea behind this assignment was to build my own Memory System to replace the standard one that comes with C++, in other words, I had to create my own version of malloc() and free() functions. In order to achieve this I allocated one big block of memory (50KB + alignment padding), which we call the heap, then moved the initial block address so that it was 16-byte aligned, placed the heap header on this address and created a free block with the remaining space of the original block. The 16-byte alignment is not essential to the functionality of this project, but it will be useful later (SIMD).

The heap header is the Memory System part that holds all the pieces together. It has pointers to the first used and free blocks, and has an internal structure that keeps updated stats about the system usage. All blocks in the system are connected by pointers (next and previous) to form doubly-linked lists for easier insertion and removal of new blocks. Now, let me talk briefly about the two functions implemented:

  • malloc(): this function first rounds up the requested size to the next multiple of 16 (to preserve the 16-byte alignment), then checks if there is enough space available in the heap, if that’s the case it proceeds to find the first free block that can hold the required amount. When the block is found, if its size is bigger than the requested one it is split into a used block and a free block, then both blocks are added to their respective lists. Finally, the stats are updated and the address of the used block is returned to the user.
  •  free(): this functions takes the address of the block to be freed as a parameter, then converts this used block into a free one and adds it to the respective list. When this is done, the function also checks if the blocks above and below the new free block are not used, and if that is the case, it coalesces them together into a bigger free block. As was the case before, the stats are updated at the end.


Beside this, PA3 also had another part related to using the Placement New operator. The big difference between this operator and the regular New is that this one takes a pointer to a previously allocated memory location as a parameter, and then uses it to create the new object there. On the other hand, the traditional new first allocates the blocks (malloc) and then creates the object there.

I was going to add some diagrams to this post, but I am kind of lazy today. Anyway, I hope this information was useful. See you later! Adios!

Monday, November 28, 2011

PA2 - Keep the hot side hot and the cold side cold


What up dudes!

Long hiatus, I know, sorry about that! But here I am again, this time I will explain Assignment #2 (PA2) – The Hot/Cold data structure.

This Hot/Cold thing refers to the process of splitting a data structure in two (2) parts to improve performance, but some conditions should be met in order for this approach to work efficiently: you need to have some kind of linked list, your data structure is bloated with members inside (size of data structure is on the large side), each data object has some kind of identifier (a key member) and a lot of searching/traversing back and forward over the linked list is done.

So, what we want to do is split the bloated data structure in two (2) parts. The first part (and small one) is called the Hot Node and it contains: the pointers (links) to the other nodes in the list, the key/ID that identifies the structure and a pointer to its Cold Node. The other part (the big one) is named the Cold Node and it carries the rest of the members from the original data structure. Therefore, if you combine each Hot Node with its corresponding Cold Node you get the original bloated data structure.


Ideally you would like to have all Hot Nodes stored close to each other in memory (data locality), that way traversing and searching the list is not that expensive (there should not be many cache misses). On the other hand, the Cold Nodes could be store anywhere in memory since they should only be fetch when the corresponding Hot Node was found.

 If you still don’t understand what all this I just talked about means, I suggest you watch the following video, it will explain it very clearly. Enjoy it!


See you soon! Adios!