12.23 Some fundamental algorithms on sequences

Algorithms on sequences (arrays)

Sequence shuffling
Sequence sorting
Sequence searching
Sequence merging

Algorithms and Pseudo-Code

It is a finite sequence of rigorous well-defined instructions, whose aim is to solve a specific problem.
Algorithms describe computations.
They can be implemented as programs in a programming language.
They can also be abstractly described using pseudo-code.
- Pseudo-code is an informal notation to abstractly describe programs.
- Strong syntactic rules of programming languages are omitted.
- Types, operators, and control structures are used in a flexible way.
- Subtasks can be referred to as functions/procedures (straightforward tasks, or complex tasks to be further refined/implemented later on).

Pseudo-Code Example

isPrime(int x) -> boolean {
  assume x is positive
  if x is 1, then return false
  else {
    for each i in [2, 3, ..., (x-1)] do {
      if i divides x then return false
    }
  }
  return true
}

Shuffling a Sequence: The Fisher-Yates Algorithm

Problem: Randomly shuffle the elements of a sequence.

Input: A sequence s = [e1, ..., en] of elements.
Output: A random permutation of s.

Fisher-Yates Algorithm:

Produces an unbiased random permutation (every possible output permutation is equally likely).
Shuffles sequence elements "in place" (without the need for additional space).

Shuffling a Sequence: The Fisher-Yates Algorithm

Pseudo-Code:

shuffle(seq s) {
  for i from length(s)-1 to 1 do {
    choose random j in [0..i]
    swap s[i] and s[j]
  }
}

C Implementation:

void shuffle(int array[], int size) {
    srand(time(NULL));
    for (int i = 0; i < size; i++) {
        int j = (rand() % ((size - 1) - i + 1)) + i;
        int aux = array[i];
        array[i] = array[j];
        array[j] = aux;
    }
}

Myself version:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int read(int seq[]) {
    int i = 0;
    while(scanf("%d", &seq[i])) {
        i++;
    }
    return i;
}

void shuff(int seq[], int size) {
    srand(time(NULL));
    for(int i = 0; i < size; i++) {
        int j = rand() % (size - i) + i;
        int temp = seq[i];
        seq[i] = seq[j];
        seq[j] = temp;
    }
}

void print(int seq[], int size) {
    for(int i = 0; i < size; i++) {
        printf("%d", seq[i]); 
    }
}

int main() {
    int seq[128];
    int size = read(seq);
    shuff(seq, size);
    print(seq, size);
    return 0;
}

Sorting a Sequence: An Important CS Problem

Problem: Sorting a sequence of elements.
Input: A sequence s = [e1, ..., en] of sortable elements, i.e., elements for which there exists a defined order.
Output: A permutation of s that is increasingly sorted.
There exist many alternative algorithms to solve this problem, with different characteristics:
Efficiency
Memory requirements
Performance on particular data structures
Performance on partially sorted sequences, etc.

Sorting a Sequence with Insertion Sort

Key Idea:
Maintain a sorted prefix of the sequence.
At each step, insert the first element of the unsorted part into its sorted position of the sorted prefix, thus extending the sorted prefix by one.
When the unsorted part becomes empty, the sequence is sorted.

Sorting a Sequence with Insertion Sort

Pseudo-Code:

‍

insertionSort(seq s) {
  for each i in [1..length(s)-1] do {
    j = i
    while (j > 0 and s[j-1] > s[j]) {
      swap s[j] and s[j-1]
      j = j - 1
    }
  }
}

C Implementation:

void insertionSort(int array[], int size) {
    for (int i = 1; i < size; i++) {
        int j = i;
        while (j > 0 && array[j - 1] > array[j]) {
            int aux = array[j];
            array[j] = array[j - 1];
            array[j - 1] = aux;
            j--;
        }
    }
}

Sorting a Sequence with Selection Sort

Key Idea:

Maintain a sorted prefix of the sequence, with all elements smaller than the unsorted part.
At each step, select the minimum of the unsorted part, and exchange it with the first element in the unsorted part, thus extending the sorted prefix by one.
When the unsorted part becomes empty, the sequence is sorted.

Pseudo-Code:

selectionSort(seq s) {
  for each i in [0..length(s)-2] do {
    min_index = i
    for j from i+1 to length(s)-1 do {
      if s[j] < s[min_index] then {
        min_index = j
      }
    }
    swap s[i] and s[min_index]
  }
}

Sorting a Sequence with Selection Sort

C Implementation:

void selectionSort(int array[], int size) {
    for (int i = 0; i < size - 1; i++) {
        int min_index = i;
        for (int j = i + 1; j < size; j++) {
            if (array[j] < array[min_index]) {
                min_index = j;
            }
        }
        int aux = array[i];
        array[i] = array[min_index];
        array[min_index] = aux;
    }
}

Searching in a Sequence

Problem: Given a sequence s and an element e, check if e belongs to s.
Input: A sequence s = [e1, ..., en] of elements and an element x.
Output:
If x belongs to s, the index i such that ei is x.
-1 if x does not belong to s.

Searching in a Sequence: Linear Search

Pseudo-Code:

linearSearch(seq s, elem x) -> int {
  x_index = -1
  i = 0
  while x_index == -1 and i < length(s) {
    if s[i] == x then x_index = i
    i = i + 1
  }
  return x_index
}

Searching in a Sorted Sequence

Problem: Given a sorted sequence s and an element e, check if e belongs to s.
Input: A sorted sequence s = [e1, ..., en] of elements and an element x.
Output:
If x belongs to s, the index i such that ei is x.
-1 if x does not belong to s.
Observation:
Linear search still works!
But can we exploit the fact that s is sorted to speed up the search?

Searching in a Sorted Sequence: Binary Search

Key Idea:

Look up the mid-point of the sequence s.
If the element in the mid-point of s is x, return the mid-point.
If the element is smaller than the mid-point of s, continue searching in the first half.
If the element is greater than the mid-point of s, continue searching in the second half.
Repeat the process until the element is found, or the subsequence to explore is empty.

Pseudo-Code:

binarySearch(seq s, elem x) -> int {
  low = 0
  high = length(s) - 1
  while low <= high {
    mid = (high - low) / 2
    if s[mid] == x then return mid
    else {
      if s[mid] < x then low = mid + 1
      else high = mid - 1
    }
  }
  return -1
}

Merging Sorted Sequences

Problem: Given two sorted sequences s1 and s2, create a sorted sequence s3 that merges s1 and s2.
Input: Sorted sequences s1 and s2.
Output: A sorted sequence s3 that is the permutation of the concatenation s1 + s2.

Merging Sorted Sequences

Naïve algorithm:
Copy both s1 and s2 into s3 in a contiguous way (i.e., s3 is the concatenation of s1 and s2).
Sort s3.
Key question:
Can we take advantage of the fact that s1 and s2 are already sorted to make the process more efficient?

Merging Sorted Sequences: Linear Merge

Key Idea:

Maintain in s3 a sorted merge of two prefixes of s1 and s2.
s1 and s2 have corresponding "non-treated" parts.
At each step:
Compare the first elements of the non-treated parts of s1 and s2.
Extend s3 with the smaller of the two, and reduce the untreated part of the sequence contributing the element to s3.

Pseudo-Code:

merge(seq s1, seq s2) -> seq {
  output = []
  i = 0
  j = 0

  while i < length(s1) and j < length(s2) {
    if s1[i] <= s2[j] then {
      output = output + s1[i]
      i = i + 1
    } else {
      output = output + s2[j]
      j = j + 1
    }
  }

  while i < length(s1) {
    output = output + s1[i]
    i = i + 1
  }

  while j < length(s2) {
    output = output + s2[j]
    j = j + 1
  }

  return output
}

‍