Algorithm Analysis

A review of some mathematics:

Exponents:

Xa * Xb = Xa+b

Xa / Xb = Xa-b

(Xa)b = Xab

Xn + Xn = 2Xn

2n + 2n = 2n+1

Logarithms:

The logarithm of a number to a given base is the power or exponent to which the base must be raised in order to produce the number. So:

XA = B if and only if logX (B) = A

In computer science, the base is 2 unless otherwise specified. This definition leads to some convenient equalities/theorems.

Theorem 1.1:

logA B = logC B / logC A where A, B, C > 0 and A != 1

Proof of theorem 1.1:

Let X = logC B, Y = logC A, and Z = logA B. The definition of a logarithm gives:

X = logC B     =>     CX = B
Y = logC A     =>     CY = A
Z = logA B     =>     AZ = B

AZ    =  B
(CY)Z =  CX
CYZ   =  CX
YZ    =  X
Z     =  X / Y
logA B = logC B / logC A

Theorem 1.2:

log AB = log A + log B where A, B > 0

Proof of theorem 1.2:

Let X = log A, Y = log B, and Z = log AB. Using a base value of 2 results in:

X = log A     =>     2X = A
Y = log B     =>     2Y = B
Z = log AB    =>     2Z = AB

2Z  =  AB
2Z  =  2X2Y
2Z  =  2X+Y
Z   =  X + Y
log AB = log A + log B

Some other useful formulas, which can be derived in a similar manner:

log A/B = log A - log B

log (AB) = B log A

log X < X for all X > 0

Big-Oh Notation

The efficiency of an algorithm will be defined in terms of asymptomatic complexity, which is how running times scale as a function of the size of the input.

In other words, if the input is of size n, then the running time is a function of n:

n n log n n2 n3 18 + 3n( log n2) + 5n3

The most common notation for asymptomatic complexity is the Big-Oh Notation, which is an estimated upper bound for the running time. In essence, the result is a guarantee that the program will terminate within a certain time period. The program may stop earlier, but never later.

To simplify the analysis, the convention that there is no particular unit of time is adopted. This allows for:

A simple program to calculate i3 from i = 1 through N:

int sum( int N )
{
int partialSum;

partialSum = 0;                  /* line 1 */

for( int i = 1; i <= N; i++ )    /* line 2 */
  partialSum += i * i * i;       /* line 3 */

return partialSum;               /* line 4 */
}

Lines 1 and 4 each count for 1 time unit.

Line 2 counts for 2N + 2 time units because of the initialization (1), condition checking (N + 1), and incrementing (N)

Line 3 counts for 4 units per time executed because of 2 multiplications, 1 addition, and 1 assignment. It's executed N times, therefore it's 4N units.

1 + (2N + 2) + 4N + 1 = 6N + 4 => O(6N + 4) => O(N)

Some general rules when determining Big-Oh Notation:

Maximum Subsequence Sum Problem

Given integers A1, A2, ..., AN. Find the maximum value of Ak where k = i through j (Note: for ease, the maximum subsequence sum is 0 if all of the integers are negative.)

For example: -2  11  -4  13  -5  -2

The maximum subsequence is 20 for values A2 through A4.

Solution 1:

The Brute Force method: given n integers, what is the maximum number of different sums that can be formed, assuming that each number appears only once in each sum.

For example: 1  2  3  4

The possible sums include: 1, 1+2, 1+2+3, 1+2+3+4, 2, 2+3, 2+3+4, 3, 3+4, 4

int maxSubSum1( const vector<int> &a )
{
int maxSum = 0;                       // line 1

for( int i = 0; i < a.size(); i++ )   // line 2
 {
 for( int j = i; j < a.size(); j++ )  // line 3
   { 
   int thisSum = 0;                   // line 4

   for( int k = i; k <= j; k++ )      // line 5
     thisSum += a[k];                 // line 6

   if( thisSum > maxSum )             // line 7 
     maxSum = thisSum;                // line 8
   } 
 } 

return maxSum;                        // line 9
}

Line 1 is O(1).

The loop at line 2 is of size N.

The loop at line 3 has size N - i, which could be small but could also be of size N.

The loop at line 5 has size j - i + 1, which could be of size N.

Lines 7 and 8 take O(N2) since they are inside of two for loops.

Therefore, O(1) + O(N * N * N) + O(N2) = O(1) + O(N3) + O(N2) = O(1 + N3 + N2) = O(N3)

Solution 2:

The Brute Force algorithm can be improved by recognizing that Ak where k = i through j is equal to Aj + Ak where k = i through j-1

int maxSubSum2( const vector<int> &a )
{
int maxSum = 0;                       // line 1

for( int i = 0; i < a.size(); i++ )   // line 2
 {
 int thisSum = 0;                     // line 3

 for( int j = i; j < a.size(); j++ )  // line 4
   {
   thisSum += a[j];                   // line 5

   if( thisSum > maxSum )             // line 6 
     maxSum = thisSum;                // line 7
   } 
 } 

return maxSum;                        // line 8
}

This solution will be O(N2).

Solution 3:

A divide and conquer strategy can be adopted to solve the problem. The "divide" is to split the problem into two roughly equal sub-problems, which are then solved recursively. The "conquer" is to piece together the solutions of the sub-problems (plus a little additional work) to arrive at the solution for the whole problem.

Dividing the list of numbers in half gives three possible solutions:

Left Half Right Half
4 -3 5 -2 -1 2 6 -2

The maximum subsequence for the left half is 6, while the maximum for the right half is 8.

The third case can be solved by finding the largest sum in the left half that includes the last element in the left half, and the largest sum in the right half that includes the first element in the right half. The two sums are then added together.

The maximum of the left half is 4. The maximum of the right half is 7. Therefore the overall maximum is 11.

int maxSubSum3( const vector<int> &a )
{
return maxSumRec( a, 0, a.size() - 1 );
}


int maxSumRec( const vector<int> &a, int left, int right )
{
if ( left == right )                  //line 1: base case
  if ( a[left] > 0 )                  //line 2
    return a[left];                   //line 3
  else
    return 0;                         //line 4

int center = (left + right) / 2;               //line 5

int maxLeftSum = maxRecSum(a, left, center);   //line 6

int maxRightSum = maxRecSum(a, center+1, right);   //line 7

int maxLeftBorderSum = 0, leftBorderSum = 0;   //line 8

for( int i = center; i >= left; i++ )          //line 9
 {
 leftBorderSum += a[i];                        //line 10

 if( leftBorderSum > maxLeftBorderSum )        //line 11
   maxLeftBorderSum = leftBorderSum;           //line 12
 } 

int maxRightBorderSum = 0, rightBorderSum = 0;   //line 13

for( int j = center + 1; j <= right; j++ )       //line 14
 {
 rightBorderSum += a[j];                         //line 15

 if( rightBorderSum > maxRightBorderSum )        //line 16
   maxRightBorderSum = rightBorderSum;           //line 17
 } 

return max( maxLeftSum, maxRightSum, maxLeftBorderSum + maxRightBorderSum );  //line 18
}

Let T(N) be the time that it takes to solve a maximum subsequence sum problem of size N. If N = 1, then the program takes some constant amount of time to execute lines 1 to 4, which can be considered 1 time unit. Thus, T(1) = 1.

Otherwise, the program must perform 2 recursive calls, the 2 for loops, and some "extras" such as lines 5 and 18.

The 2 for loops will examine ever element in the subarray, and this is constant work inside the loops, so the time for lines 9 to 17 is O(N).

The code in lines 1 to 5, 8, 13, and 18 is all a constant time and can therefore be disregarded.

The recursive calls solve two subsequence problems of size N/2. Therefore, those 2 lines take T(N/2) units of time each for a total of 2T(N/2).

The total time for the problem is then 2T(N/2) + O(N). This gives the equations:

T(1) = 1
T(N) = 2T(N/2) + O(N)   which can be written as T(N) = 2T(N/2) + N

So:

T(1) = 1
T(N) = 2T(N/2) + N

T(1) = 1
T(2) = 2T(2/2) + 2  =  2T(1) + 2  =  2*1 + 2  =  2 + 2  =  4   =  2 * 2
T(4) = 2T(4/2) + 4  =  2T(2) + 4  =  2*4 + 4  =  8 + 4  =  12  =  4 * 3
T(8) = 2T(8/2) + 8  =  2T(4) + 8  =  2*12 + 8 =  24 + 8 =  32  =  8 * 4
...

If N = 2k, then T(N) = N * (k+1) = N * (log N + 1) = N log N + N = O( N log N + N ) = O( N log N )

Solution 4:

This solution can improve on solution 2 by recognizing that no negative value (or subsequence) can be:

int maxSubSum4( const vector<int> &a )
{
int maxSum = 0, thisSum = 0;          //line 1

for( int j = 0; j < a.size(); j++ )   //line 2
 {
 thisSum += a[j];                     //line 3

 if( thisSum > maxSum )               //line 4
   maxSum = thisSum;                  //line 5
 else if( thisSum < 0 )               //line 6
   thisSum = 0;                       //line 7
 } 

return maxSum;                        //line 8
}

The running time is O(N).

In solutions 1 and 2, j represents the end of the current sequence, while i represents the start of the current sequence. i  can be optimized out of the solution if we do not need to know where the actual best sequence is. A few observations: