Question Everything: 01/01/2012

Thursday, January 12, 2012

Why does C++ not allow overloading ‘.’ , ‘:*’ , ‘::’ and ‘?:’ operators?

Stroustrup’s wanted to allow programmers to use all the operators with user defined data types as well. Thus he added operator overloading as feature in C++ so that programmers can define functionality of operators for user defined data types. A restriction is that the operators like ‘.’ , ‘:*’ , ‘::’ and ‘?:’ are not allowed for overloading. These operators are not meant to use with any data types. These are introduced to use language’s features like:

“.” Direct member access operator is used to access member variable/function.

“:*” De-reference pointer to class member operator is used to De-reference pointer to class member.

“::” Scope resolution operator is used to access global variable and define method outside class.

“?:” Conditional operator is like if-else condition. Why do we need to overload it.

“Sizeof” operator is used get size of an object. It can’t be overloaded because built-in operations such as incrementing a pointer into an array implicitly depend on it. Consider:

X a[10];

X* p = &a[3];

X* q = &a[3];

p++; // p points to a[4]

// thus the integer value of p must be

// sizeof(X) larger than the value of q

Thus, sizeof(X) could not be given a new and different meaning by the programmer without violating basic language rules.

As above operators are close to the core of the language, allowing overloading of these operators can cause many problems/confusions without any benefits (“->” operator is allowed to overload to make a smart pointer class).

Reference: The Design and Evolution of C++ by Bjarn Stroustrup

Thursday, January 5, 2012

How is default argument to a method implemented in C++?

C++ allows a function to assign an argument a default value when no argument is specified in a call to that function. The third argument of following function will have “0” when it is not passed by the caller.

int sum(int num1, int num2, int num3 = 0)

{

return num1 + num2 + num3;

}

When a method is called, all the arguments are pushed on stack and method pops them from stack and copy values in the formal arguments.

Let’s see the dis-assembly code generated for a method call with all the arguments:

sum (1,2,3);

// Here is the dis-assembly code for passing arguments to sum method

push 3 //push value 3 on stack

push 2 //push value 2 on stack

push 1 //push value 1 on stack

call sum (41123Fh) // call sum method

Here, we can see that all the three values are pushed on the stack.

Now see the dis-assembly code generated for a method call when no value to supplied for last argument:

sum(1,2);

// Here is the dis-assembly code for passing arguments to sum method

push 0 //Here 0 as default value for last argument

push 2 //push value 2 on stack

push 1 //push value 1 on stack

call sum (41123Fh) // call sum method

We can see that “push 0” assembly code is pushing “0” (specified in method’s definition) as default value for last argument “num3”.

Wednesday, January 4, 2012

'this' pointer implementation

In C++, the ‘this’ keyword is a constant pointer to object. It is actually a local variable defined in each member function, including constructors and destructors, which get initialized with object’s address passed by the caller.

Whenever, a public method called using a object, object’s address is passed to the member function and then the member function copies this supplied address to ‘this’ variable. Since ‘this’ variable is created on stack, each member method call will have separate ‘this’ variable on stack. Now using ‘this’ pointer, each of the data member is accessed in the method.

There can be two ways to pass object's address to the member function. 1) By pushing address on stack 2) by copied address in a register. Compiler can use any of the above method or can use any other method too.

Let’s take an example:

class test {

private:

int data;

public:

int public_data;

test() { data = public_data = 0; }

void display()

{ printf("\ndata = %d, public_data = %d", data, public_data); }

};

Let’s see the dis-assembly code generated for public method call on object:

test obj;

// Object’s address is getting copied in ECX register to supply it

// as input to constructor.

lea ecx,[obj]

call test::test (411195h) // constructor is getting called

obj.display();

// Object’s address is getting copied in ECX register to supply it

// as input to display method

lea ecx,[obj]

call test::display (411235h)

Now let's see the dis-assembly code of display method:

void display()

{

........

// Here ECX register contain object's address. Its value is getting copied to

// 'this' variable

mov dword ptr [ebp-8],ecx

printf("\ndata = %d, public_data = %d", data, public_data);

mov esi,esp

mov eax,dword ptr [this] // getting object's address

mov ecx,dword ptr [eax+8] //accessing 'public_data' value using 'this' pointer

push ecx

mov edx,dword ptr [this]

mov eax,dword ptr [edx+4] //accessing 'data' value using 'this' pointer

push eax

push offset string "\ndata = %d, public_data = %d" (415B10h)

call dword ptr [__imp__printf (4192D4h)]

......

}

In above dis-assembly code, we can see that the 'this' pointer is getting initialized with the object's address supplied by the caller. And the data members are getting accessed via 'this' pointer.

Tuesday, January 3, 2012

How does compiler achieve runtime binding/polymorphism?

Compiler determines address of every variables and methods while compiling source code not while executing the binary (it is applicable to virtual methods and dynamic link libs as well). Now the question is when address is determined at compile time, how run time binding works?

Here is the answer:

Whenever a method is called, compiler puts machine level code (‘call’ instruction in assembly level language) and supplies method’s address to call that method. Let’s take an simple example to understand this:

class test {

public:

void compile_time_binding_method()

{ printf("\nIn compile_time_binding_method() method"); }

virtual void run_time_binding_method()

{ printf("\nIn compile_time_binding_method() method"); }

};

Here, the ‘test’ class contain a virtual method ‘run_time_binding_method()’ and a non-virtual method ‘compile_time_binding_method()’.

Let’s create an object, its reference and a pointer to point to the created object:

test a; // Created a object of test class

test &refA = a; // Reference of object ‘a’

test *ptrA = &a; // Pointer of object ‘a’

Let’s call methods using object:

What do think calling a virtual method via its object will be a run time binding? If your answer is no, you are correct. When any method is called using its object, compiler is sure about the method to call. So dynamic binding/call is not at all required here even if a virtual method is being called. You can verify this by reviewing generated dis-assembly code.

a.compile_time_binding_method();

lea ecx,[a] // Dis-assembly code

call test::compile_time_binding_method (41118Bh)

a.run_time_binding_method();

lea ecx,[a] // Dis-assembly code

call test::run_time_binding_method (4110EBh)

In above generated dis-assembly code, we can see that the address of both the methods are hard coded (address determined by compiler while compiling the source code) to resolve the call. This hard coded address will never change (unless you modify and re-compile the source code) in the binary. Such binding/linking is known as static binding/linking (or compile time binding).

These calls are static calls because address is hard coded with machine code generated by compiler (same as ‘call ’ in assembly level language). Instead of using hard coded address, if compiler puts machine code for ‘call EAX’ where EAX register will hold the address of method to call. This way, the value of AX register can be changed any time and can call any method. I.e. Any method can be called at run-time by putting its address in AX register. This is how run-time binding is implemented. See the dis-assembly code generated for following call:

Let’s call methods using its pointer:

For non-virtual methods:

ptrA->compile_time_binding_method();

mov ecx,dword ptr [ptrA]

call test::compile_time_binding_method (41118Bh)

ptrA->run_time_binding_method();

mov eax,dword ptr [ptrA] //getting object's address

mov edx,dword ptr [eax] //getting VTABLE's address

mov esi,esp

mov ecx,dword ptr [ptrA]

//The following line will gets the address of

//run_time_binding_method() from VTABLE

mov eax,dword ptr [edx]

call eax // will call run_time_binding_method

cmp esi,esp

call @ILT+370(__RTC_CheckEsp) (411177h)

Let’s call methods using its reference:

As reference is nothing but an implicit pointer to the object, the method calls via reference is same as method calls via pointer:

refA.compile_time_binding_method();

mov ecx,dword ptr [refA]

call test::compile_time_binding_method (41118Bh)

refA.run_time_binding_method();

mov eax,dword ptr [refA]

mov edx,dword ptr [eax]

mov esi,esp

mov ecx,dword ptr [refA]

mov eax,dword ptr [edx]

call eax

cmp esi,esp

call @ILT+370(__RTC_CheckEsp) (411177h)

As I am not an author by profession, I might not have explained it in a best way J. Please help me make it best by raising your question/doubt.

Monday, January 2, 2012

Reference vs. Pointer

A reference is an alternate name for an object/variable. Reference is an implicit constant pointer to a variable. It can’t be used to point memory location say 0x1000. On the other hand, pointers can be used to point to any location in the memory. To access any address, we'd need to use pointers instead of references.

Here is a simple example of reference/pointer.

int i = 10;

mov dword ptr [i],0Ah // dis-assembly code

Reference:

int &ref = i;

lea eax,[i] // disassembly code

mov dword ptr [ref],eax // dis-assembly code

int j = ref;

mov eax,dword ptr [ref] // dis-assembly code

mov ecx,dword ptr [eax] // dis-assembly code

mov dword ptr [j],ecx // dis-assembly code

Pointer:

int *ptr = &i;

lea eax,[i] // disassembly code

mov dword ptr [ptr],eax // dis-assembly code

j = *ptr;

mov eax,dword ptr [ptr] // dis-assembly code

mov ecx,dword ptr [eax] // dis-assembly code

mov dword ptr [j],ecx // dis-assembly code

If you see their dis assembly code, you can see that the dis assembly code generated for reference and pointer is same. This means that implementation wise they are same.

Use reference as much as you can as you don’t need to use & and * confusing operators J.

Sunday, January 1, 2012

What is ORG (origin) directive in assembly level language?

The origin directive tells the assembler where to load instructions and data into memory. It changes the program counter to the value specified by the expression in the operand field. Subsequent statements are assembled into memory locations starting with the new program/location counter value. If no ORG directive is encountered in a source program, the program counter is initialized to zero.

Assembler uses an internal variable called LC (Location Counter) to store current offset address of the statement being processed. When it encounters a variable declaration statement, it puts the value of LC in its symbol table as variable’s address.

For example:

; Initial value of LC is 0

MOV AX, BX ; Here LC = 0

MOV CX, DX ; Now LC = LC + size of above statement i.e. LC = 0 + 2 = 2

A db 0; ; LC = LC + size of above statement i.e. LC = 2 + 2 = 4.

; So the address of “A” will be 4 as LC = 4 when variable definition appear.

MOV DX, A ; LC = LC + size of above statement i.e. LC = 4 + 1 = 5

; In above statement “A” will be replaced with the address of “A” which is 4.

; At end LC = 5 + 4 = 9

This program will work when it is loaded at offset 0 in the segment pointed by DS register. I.e. loading this program at 200:00h (Segment : Offset) or 700:00h address will work as the offset address is 00h.

What if we need to load this program at 200:300h address? Here DS = 200h and offset = 300h (offset != 0), the variable “A” is physically located at “200:304h” address. But the program will try reading its value from 200:04h address. It is obvious that we will not get expected result as the program is not reading variable from its actual address (200:304h).

This program would have worked if the initial value of LC was 300h. Isn’t it?

So we need a directive which can instruct assembler to initialize LC with a specific value like 300h. The directive “ORG” does this. In such scenarios, we would need to use “ORG XXh” statement at the begging of the program to initialize LC with value XXh.

The bottom line is that we should use “ORG” directive when DS (Data Segment) register is not pointing to the first variable in Data segment (when program has separate Code and Data segment) or first instruction (when program has only one segment for both Code and Data).

This directive is very useful when writing boot loader, device drivers, virus, antivirus and OS components because these programs need to loaded at particular offset address.

Question Everything

Upcoming Posts

Thursday, January 12, 2012

Why does C++ not allow overloading ‘.’ , ‘:*’ , ‘::’ and ‘?:’ operators?

Thursday, January 5, 2012

How is default argument to a method implemented in C++?

Wednesday, January 4, 2012

'this' pointer implementation

Tuesday, January 3, 2012

How does compiler achieve runtime binding/polymorphism?

Monday, January 2, 2012

Reference vs. Pointer

Sunday, January 1, 2012

What is ORG (origin) directive in assembly level language?

About Me