Addresses and Pointers

Number Systems

The decimal number system we normally use for representing numbers is a positional number system in which any natural number may be uniquely represented by use of the ten symbols 0, 1, 2, ..., 9. In this system these ten symbols, also referred to as decimal digits, represent the numbers zero, one, two, . . ., nine, respectively. A unique representation of any natural number m can be given in the form

d_nd_n-1d_n-2. . .d₁d₀

where m >= 0. The same natural number m can also be represented in the form

d_n x 10ⁿ + d_n-1 x 10^n-1 + d_n-2 x 10^n-2 + . . . + d₁ x 10¹ + d₀ x 10⁰

For example, the number one hundred twenty-three may be represented by

123

or by

1 x 10² + 2 x 10¹ + 3 x 10⁰

The decimal number system is also called the base ten system since ten digits are utilized in the number representations in this system. There is, however, nothing sacred about the base ten since the notion of a positional number system can easily be generalized to any given base b where b is a natural number greater than or equal to two.

For example, we can also represent the number one hundred twenty-three in the base two or binary number system. The symbols 0 and 1 are chosen to represent zero and one, just as ten symbols were selected in the base ten system to represent zero, one, two, . . . nine. Then, since

123 = 1 x 2⁶ + 1 x 2⁵ + 1 x 2⁴ + 1 x 2³ + 0 x 2² + 1 x 2¹ + 1 x 2⁰

the number one hundred twenty-three would be represented in the binary number system by

So, any natural number that we can represent in base ten can also be represented in base two. It's also much easier to represent two distinct states in a physical system like a computer than it is to represent ten distinct states. You can use the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization or polarization, etc.

The fact that it's easy to represent binary numbers with hardware has made base two the number system of choice in digital devices such as computers.

Bits and Bytes

Computer memory (often called RAM - "random access memory") is measured using different units (like inches, feet, yards). Most commonly, we use

bit (binary digit): Each bit can hold a 0 or a 1, nothing more. A 2 is too big.
byte: Can hold 8 bits, or a decimal value from 0 to 255.
word: Size depends on the system. Usually 2, 4, or 8 bytes. 2 bytes can hold a value from 0 up to about 65,000. 4 bytes can hold a value up to about 4 billion. 8 bytes can hold a value up to about 1.8 x 10¹⁹.

We can represent many different types of data using just groups of bits:

As outlined above, groups of bits can be used to represent the larger integer numbers we might want to store in a computer program.
One bit can be used for the number's sign (0 = positive, 1 = negative), which allows us to represent negative values.
Character data can be represented using a numeric code such as ASCII or Unicode that assigns a distinct integer representation to each character.
Boolean values can also be easily represented as an integer (0 = false, 1 = true).
Floating-point numbers can be represented (with some loss of precision) as two integer values packed together, one for the fraction and one for the exponent (similar to scientific notation).

The variables you define in a C++ program and the code for the functions you write all occupy space in the computer's memory when the program is run. That means they occupy some number of bytes. For a variable, the number of bytes occupied depends on the variable's data type and the system the program is compiled and run on. Here are the sizes of some different data types on our Unix system:

char = 1 byte (0 to 255 decimal)
bool = 1 byte
short int = 2 bytes (-32,768 to 32,767 decimal)
unsigned short int = 2 bytes (0 to 65,535 decimal)
int = 4 bytes (-2,147,483,648 to 2,147,483,647)
unsigned int = 4 bytes (0 to 4,294,967,295)
long int = 8 bytes (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
unsigned long int = 8 bytes (0 to 18,446,744,073,709,551,615)
float = 4 bytes (plus or minus 10³⁸, limited to ~ 6 significant digits)
double = 8 bytes (plus or minus 10³⁰⁸, limited to ~ 12 significant digits)
long double = 16 bytes (plus or minus 10³⁰⁸, limited to ~ 31 significant digits)
C string = depends on number of chars in the array
object type = sum of the sizes of the individual data members

Note that these sizes may be different on a different computer. To find the size of a data type or a variable if you don't know it, use the sizeof operator:

    sizeof(int)    // An expression that evaluates to 4 on our Unix system.

    sizeof(x)      // Evaluates to the amount of memory that x occupies.

(Note: sizeof looks a bit like a function, but it's not, really. It's an operator built-in to the C++ language.)

The uncertainty of the size of various data types in C and C++ is a problem. It was never defined as part of the language and it's too late to change now. More modern languages such as Java standardize the size (and therefore the range) of numeric data types so that there is no uncertainty.

Addresses

Bytes in the computer's memory are assigned consecutive increasing numbers starting with the number 0. Thus, storage may be pictorially represented as

bbbbbbbb
 byte 0

bbbbbbbb
 byte 1

bbbbbbbb
 byte 2

...

where each of the b's represent a bit and the number assigned to a given byte is called the address of that byte. Addresses range from 0 to the maximum amount on the computer. Addresses are binary numbers, but are often printed in hexadecimal (base 16) to save space. You can print an address in decimal by type casting it to an integer.

The address of a variable is the address of its first byte of storage that it occupies. Similarly, the address of a function is the address of the first byte of storage that the function's code occupies. We rarely have to know the actual address of a variable or function, but we do need to understand the idea of addresses and the fact that variables take up a certain amount of space in memory.

To obtain the address of a non-array variable, we can use the & operator.

This is not the same operator as the & used when declaring a reference variable. It also has nothing to do with the && operator used in compound conditions. This is confusing, but you just have to keep in mind the context in which you're using the &.

Table 1: Five uses of the & symbol in C++

Symbol	Context	Means	Example
&	In the declaration of a data type (variable declaration, function return data type, function parameter)	This data type is a reference type	`int& x = num;`
&	As a unary operator (variable or function name to the right of the operator, no whitespace), usually in an assignment statement or function call	"Address of" operator	`cout << &num;`
&	As a binary operator (variable or literal on both sides of the operator), usually in an assignment statement	Bitwise AND operator	`num = num & 5;`
&&	As a binary operator, usually in a decision or loop condition	Logical AND operator	`if (num >= 5 && num <= 10)`
&&	In the declaration of a data type (variable declaration, function return data type, function parameter)	This data type is an "r-value reference" (an advanced data type used in C++ "move semantics")	`string&& other`

We can use the & operator to obtain the address of a variable and print it (in either hexadecimal or decimal) in a program:

#include <iostream>

using std::cout;
using std::endl;

int main()
{
    int num = 5;
   
    cout << "Value of num is " << num << endl;                             // Prints 5, the value of num.
    cout << "Address of num (hexadecimal) is " << &num << endl;            // Prints the address of num in base 16.
    cout << "Address of num (decimal) is " << (long int) &num << endl;     // Prints the address of num in base 10.
   
    return 0;
}

Running this program on turing produced the following output:

Value of num is 5
Address of num (hexadecimal) is 0x7ffdf03df4cc
Address of num (decimal) is 140728634045644

You can try running this code yourself. Since the actual address of num is determined by your computer when you run the program, you may (or may not) get different numbers for the addresses.

Figure 1 shows the relationship between the variable num's name, value, and address.

Figure 1: Relationship between a variable's name, value, and address

Relationship between a variable's name, value, and address

Since num is an int variable it actually occupies four contiguous bytes, with the addresses 140728634045644 - 140728634045647 (0x7ffdf03df4cc - 0x7ffdf03df4cf in hexadecimal). The only address we have any reason to care about though is the address of the first byte, which is also the address of the variable.

Pointer Variables

A pointer variable is a special type of C++ variable that can hold the address of another variable (or as we'll see later, the address of a function).

For every non-array data type in C++ (char, int, long int, float, double, etc.), including programmer-defined types such as Date or CreditAccount, you can create a pointer variable that holds the address of another variable of that data type.

The general syntax to declare a pointer variable is

    data-type-to-point-to* variable-name

For example:

int* p;     // p is a pointer variable that can hold the address of an int variable.

Note this syntax carefully. The int* denotes a new data type, one that can hold the address of an int.

We read this right to left as "p is a pointer to an int" or "p holds the address of an int"

There are several equally valid ways to write this declaration in C++:

int* p;      // Valid
int * p;     // Valid
int *p;      // Valid
int*p;       // Valid, but not recommended

We can declare a pointer variable to any non-array data type. C++ considers all of these pointer types to be different data types; a pointer declared as int* and a pointer declared as char* are not the same data type.

float* floatPtr;     // floatPtr is a pointer to a float.

char* first;         // first is a pointer to a char.

Date* datePtr;       // datePtr is a pointer to a Date object.

double* x, * y;      // x and y are both pointers to double.

We can use the & operator to put the address of a variable into the appropriate type of pointer variable.

int num = 5;

int* p = &num;

Now we can say "p contains the address of num" or "p points to num". Figure 2 illustrates the relationship we've established with these two lines of code.

Figure 2: Pointer to an int

Pointer to an int

Note that since p is itself a variable, it occupies some number of bytes in the computer's memory and has has its own address (14072863404556 in this example).

What good does this do? Some of the most important uses will come later, but for right now, we can use this together with one more new idea to create a new way to access the value in a variable.

We know we can access the value in num by using num itself; for example:

cout << num << endl;

Now we can use the pointer variable to get to num's value (assuming as above that p points to num). This will prove very useful soon.

But first we have to know how we can access the value stored in num by using the pointer p? We dereference the pointer. "Dereference a pointer" means "access the value of the variable that the pointer points to".

The dereference operator is the *. Write the * before the pointer and you have an expression that refers to the "value pointed to" by the pointer.

So given the declarations and assignments above, we can code:

cout << num << endl;     // Prints 5.
cout << *p << endl;      // Also prints 5.

In the first line of code, we print the value stored in variable num.
In the second line of code, we print the value stored in the int variable pointed to by p (which is the value in num).

They are the same thing.

Notice another possible source of confusion:

In a declaration, you write:

int* p;

This declares p to be a variable that can hold the address of an integer variable. The data type of p is int* (pointer to an integer).

In contrast, in an executable statement you might write:

x = *p;

Here, *p refers to "the value in the variable whose address is stored in the pointer variable p" or more briefly, "the value pointed to by p"

In all, there are three different contexts in which you might use the character * in C++.

Table 2: Three uses of the * symbol in C++

Symbol	Context	Means	Example
*	In the declaration of a data type (variable declaration, function return data type, function parameter)	This data type is a pointer type	`int* p;`
*	As a unary operator, with a pointer variable name or pointer arithmetic expression to the right of the operator	Dereference operator	`cout << *p;`
*	As a binary operator	Multiplication operator	`num = num * 5;`

Pointers to Objects

We can create pointers to objects in the same fashion as pointers to built-in types like int and double. Pointers to objects have some additional syntax associated with them when it comes to accessing members of the object:

Expression	Meaning
`pointer-name`	Address of the object pointed to by `pointer-name`
`*pointer-name`	Value of the object pointed to by `pointer-name`
`(*pointer-name).member_name`	Syntax to access the data member `member-name` of the object pointed to by `pointer-name`
`pointer-name->member-name`	Alternative syntax to access the data member `member-name` of the object pointed to by `pointer-name`
`(*pointer-name).member-function-name(arguments)`	Syntax to call the member function `member-function-name()` of the object pointed to by `pointer-name`
`pointer-name->member-function-name(arguments)`	Alternative syntax to call the member function `member-function-name()` of the object pointed to by `pointer-name`