Skip to content

5.19 Lecture 11

The compilation processimage

The hello world example

/* main.c */
#include <unistd.h>

extern char* hello(void);
extern int messageLength(void);

int main(void) {
    write(STDOUT_FILENO, hello(), messageLength());
    return 0;
}

/* hello.c */
#define hi "Hello world\n"
#define hiLength 12

char *hello(void) {
    return hi;
}

int messageLength(void) {
    return hiLength;
}

/* main.i */
#include <unistd.h>

extern char* hello(void);
extern int messageLength(void);

int main(void) {
  write(1, hello(), messageLength());
  return 0;
}

/* hello.i */
char *hello(void) {
  return "Hello world\n";
}

int messageLength(void) {
  return 12;
}

image

image

Introduction to memory

Memory, a sequence of Bytes

image

  • Memory is byte-addressable.
  • Different data types (like unsigned char , char , int ) require different amounts of memory space.
  • The data type determines how the sequence of bytes at a given address is interpreted to represent a value.

image

  • A process's memory is segmented for different purposes: code, initialized data, uninitialized data, heap, stack, and shared libraries/memory.
  • The program's structure on disk (e.g., ELF sections like .text​, .data​, .bss​) directly influences how it's laid out in memory.
  • Statically linked libraries contribute their code and data directly into the main executable's segments.
  • Dynamically linked libraries are loaded into a separate "Shared Memory" region.
  • The heap and stack are dynamic regions that grow in opposite directions to maximize address space utilization.
  • The kernel maintains its own protected memory space.

image

  • Shows FUNCTION A​ calling FUNCTION B​.
  • The stack grows from "Higher Memory Address"(like the table surface, is where the stacks start) towards "Lower Memory Address" (a common convention, e.g., on x86 architectures).
  • Two activation records are shown, one for Function A (below) and one for Function B (on top, as it was called by A).

  1. int * ptrToValue = foo(32);

    • foo(32)​ is called. Inside foo​, myLocalVariable​ becomes 32. Since 32 is even, myLocalVariable​ becomes 32 * 2 = 64​.
    • foo​ returns the address of its myLocalVariable​ (which held 64).
    • ptrToValue​ now holds this address.
    • Crucial Point: After foo​ returns, its stack frame (including myLocalVariable​) is deallocated/popped. The memory location ptrToValue​ points to is no longer guaranteed to hold 64; it's considered "garbage" or available for reuse. ptrToValue​ is now a dangling pointer.
  2. printf("Printing foo's local variable: %d\n", *ptrToValue);

    • This attempts to dereference the dangling pointer. The behavior is undefined.
    • It might print 64 if the memory hasn't been overwritten yet by sheer luck or by the printf​ call itself.
    • It could print garbage or crash the program.
  3. printf("Printing bar's return value: %d\n", bar(32));

    • bar(32)​ is called. Inside bar​, myLocalVariable​ becomes 32. Since 32 is even, myLocalVariable​ becomes 32 / 2 = 16​.
    • bar​ returns the value 16. This is safe and predictable.
  4. printf("Re-printing foo's local variable: %d\n", *ptrToValue);

    • This again attempts to dereference ptrToValue​.
    • Now, it's highly probable that the memory location ptrToValue​ points to has been overwritten by the execution of bar​ and the previous printf​.
    • The output will almost certainly be garbage or cause a crash. It will not reliably be 64.

image

  • Memory Used: These p1​, p2​, ..., pN​ variables are typically stored on the Stack.
  • Limitations:

  • Highly inflexible: What if you need N+1 products? You'd have to change the code and recompile.

  • Impractical for many items: You can't write code for thousands of individual variables.
  • Stack size limit: The stack has a limited size. Too many large local variables can cause a "stack overflow."

image

  • Memory Used: This is where The Heap comes in. The memory for products​ is allocated from the heap.
  • Advantages:

  • Flexibility: Size is determined at runtime.

  • Handles large data: The heap is generally much larger than the stack.
  • Responsibility: You, the programmer, are responsible for freeing this memory when you're done with it (using delete[]​ in C++ ). Failure to do so leads to a "memory leak."

image

  • Memory Used: The Heap. These structures request memory from the heap as they need to grow.
  • Advantages:

  • Even more flexible: Can grow and shrink easily.

image

Heap

  • It's a region of free memory available to the program at runtime.
  • Memory is allocated from here using functions/operators like new​ (in C++).
  • It typically "grows" upwards (towards higher memory addresses in many systems, though the diagram shows it growing downwards conceptually towards the stack). The key is that it's a flexible pool.
  • The programmer must explicitly deallocate (release) memory from the heap using delete​ (in C++).

  • STACK:

  • Used for local variables, function arguments, and return addresses.

  • Memory is allocated and deallocated automatically as functions are called and return (LIFO - Last In, First Out).
  • It typically "grows" downwards (towards lower memory addresses in many systems).
  • The stack and heap grow towards each other. If they collide, your program runs out of memory.
  • Kernel Space: Memory used by the operating system; user programs cannot access it directly.

What is "The Heap" then?

  1. A Region of Memory: The heap is a specific area in your program's address space, separate from the stack, code, and static data.

  2. For Dynamic Allocation: Its primary purpose is to provide memory that can be allocated (and deallocated) by the program while it's running (at runtime), rather than having its size fixed at compile time.

  3. Flexible Sizing: You can request chunks of memory of varying sizes from the heap. This is essential when you don't know how much data you'll need beforehand (like in Options 3 and 4 for the Products Manager).

  4. Controlled Lifetime: Memory allocated on the heap remains allocated until it is explicitly freed by the programmer (or until the program terminates). This is different from stack memory, which is automatically freed when a function returns.

  5. Programmer Responsibility:

    • Allocation: You must ask for memory (e.g., new Product[size]​).
    • Deallocation: You must give it back when you're done (e.g., delete[] products​). Failure to do so causes memory leaks, where your program consumes more and more memory over time. Using memory after it has been freed leads to dangling pointers and undefined behavior (often crashes).
  6. Why Use It?

    • When the amount of memory needed is unknown at compile time.
    • When you need data to persist longer than the function call that created it.
    • For large data structures that might overflow the stack.

Pointers

What are pointers, and what are the related operations?

  • Definition: Variables that hold a memory address as a value.
  • Dereference Operator (*):

  • Allows access to the value stored at the memory address the pointer holds.

  • Address-of Operator (&):

  • Allows you to get the memory address of a variable.

  • Pointer Arithmetic:

  • Operations like increments (++​), decrements (--​), additions, and subtractions on pointers are "type-aware."

    Type-Aware Operations

    • Core Idea: Pointers know the type of data they point to (e.g., int*​, char*​, MyStruct*​).
    • Size Matters: The compiler uses sizeof(pointed-to-type)​ for calculations.
    • Operations (scaled by sizeof):

    • pointer++​ or ++pointer​:

      • Moves pointer to the next element.
      • Actual address change: current_address + 1 * sizeof(type)​.

        • pointer - N​:
      • Calculates address of the N-th element before pointer​.

      • Actual address: current_address - N * sizeof(type)​.

        • pointer2 - pointer1​:
      • Result is the number of elements between them (not bytes).

      • Requires pointer1​ and pointer2​ to be of the same type and typically point into the same array.
      • Calculation: (address_in_pointer2 - address_in_pointer1) / sizeof(type)​.
        • Array Indexing (array[i]):
    • This is direct application of pointer arithmetic.

    • array[i]​ is equivalent to *(array + i)​.
    • array​ (in this context) decays to a pointer to its first element.
    • The + i​ part automatically scales by element size.
    • Why "Type-Aware" is Key:

    • Abstraction: Programmer works with "elements," not raw byte offsets.

    • Portability: Code remains correct even if sizeof(type)​ varies across platforms.
    • Readability: ptr++​ intuitively means "next item."
    • They automatically adjust based on the size of the data type being pointed to.
    • This is how array indexing (e.g., array[i]​) works without needing to know the element size explicitly, and how pointer++​ moves to the next element of the correct type.
    • Addressable Entities:
  • Almost everything in a program has a memory address, including:

    • String literals
    • Functions
    • Variables
    • Arrays
    • Dynamically allocated memory
    • etc.

An evolution of C

  1. "C++ is a multiparadigm language, as Python."

    • Multiparadigm: This means C++ supports multiple ways of thinking about and structuring programs. Key paradigms in C++ include:

    • Procedural Programming: (Inherited from C) Organizing code into procedures or functions.

    • Object-Oriented Programming (OOP): Organizing code around "objects," which bundle data (attributes) and functions that operate on that data (methods). This involves concepts like classes, inheritance, and polymorphism.
    • Generic Programming: Writing code that can work with different data types without being rewritten for each type (e.g., using templates).
    • Functional Programming (to some extent): Features like lambdas and algorithms that operate on ranges allow for a more functional style.
  2. "Overloading, the idea is simple, for every overloading function create a new symbol/name depending on the types of inputs the function takes."

    • Function Overloading: Allows you to have multiple functions with the same name but different parameter lists (different types of arguments, different number of arguments, or both).
    #include <stdio.h>
    #include <assert.h>
    
    int add() {
        return 0;
    }
    
    int add(int a) {
        return a;
    }
    
    int add(int a, int b) {
        return a + b;
    }
    
    int add(int size, int values[]) {
        assert(size >= 0);
        int result = 0;
        for (int i = 0; i < size; i++) {
            result += values[i];
        }
        return result;
    }
    
    int main() {
        int my_values[] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
        int result = add(add(add(), add(10)), add(9, my_values));
        printf("Result is: %d\n", result);
        return 0;
    }
    
  3. "Operator overloading, this is similar to the previous case, but symbols/names for operators are really hard to find once the program is compiled."

    • Operator Overloading: Allows you to define how standard operators (like +​, -​, *​, <<​, ==​, []​, etc.) behave with objects of your custom classes (structs).
    • Example: You could define the +​ operator for a Vector2D​ class to perform vector addition.
    #include <iostream>
    #include <string>
    
    std::string operator*(const std::string& str, int n) {
        std::string result = "";
        for (int i = 0; i < n; ++i) {
            result += str;
        }
        return result;
    }
    
    int main() {
        std::string text = "Hello";
        std::string multiplied_text = text * 3;
        std::cout << multiplied_text << std::endl;
        return 0;
    }
    

    std:: is a prefix that tells the compiler, "The thing that follows (like cout or string) is part of the C++ Standard Library's namespace." It's crucial for organization and avoiding naming conflicts in C++ programs. * "symbols/names for operators are really hard to find" : Similar to function overloading, overloaded operators also undergo name mangling. The mangled names for operators can be even more obscure than for regular functions (e.g., operator+​ might become something very cryptic). Debuggers often help demangle these names to make them human-readable.

  4. "Semi-Automatic memory management of objects created in the stack, i.e.:"

    • This refers to RAII (Resource Acquisition Is Initialization) , a fundamental C++ concept.
    • Objects on the Stack: When you declare an object (an instance of a class or struct) as a local variable within a function, its memory is allocated on the stack.
    void myFunction() {
        MyClass myObject; // Created on the stack
        // ... do stuff with myObject ...
    } // myObject is automatically destroyed here
    

    When myObject​ goes out of scope (e.g., at the end of myFunction​), its destructor is automatically called. * "An object can be created as both: a value in the stack, a value in the heap..."

    • Value in the stack (Automatic Storage Duration):

      MyClass obj1; // obj1 lives on the stack. Constructor called.
                    // Destructor automatically called when obj1 goes out of scope.
      
      * Value in the heap (Dynamic Storage Duration):

      MyClass* ptrObj = new MyClass(); // ptrObj (the pointer) lives on the stack.
                                      // The MyClass object itself lives on the heap.
                                      // Constructor called.
      // ... use ptrObj ...
      delete ptrObj; // YOU MUST MANUALLY call delete.
                     // This calls the destructor, then frees memory.
                     // If you forget 'delete', it's a memory leak.
      

      The slide's "the second case creates a pointer to the object and it is necessary to call the free function" is C-style terminology. In C++, you use new​ and delete​ (not malloc​ and free​ for class objects, as new​/delete​ handle constructor/destructor calls). * "You still need to implement a destructor for your class (It is the opposite of a constructor)."

    • Destructor: A special method (name is ~ClassName​) called when an object is about to be destroyed. Its primary purpose is to release any resources the object acquired during its lifetime (e.g., free memory allocated on the heap, close files, release network connections).

      #include <iostream>
      
      class Complex {
          private:
              float real;
              float imaginary;
              int id;
      
          //Complex(float real, float imaginary): real(real), imaginary(imaginary) {};
          public:
              Complex(float real, float imaginary) {
                  this->real = real;
                  this->imaginary = imaginary;
                  id = 0;
              }
      
              Complex(const Complex& other) {
                  this->real = other.real;
                  this->imaginary = other.imaginary;
              }
              /*
              This special member function is automatically called 
              when a Complex object is about to be destroyed 
              (e.g., when it goes out of scope, or when delete is called on a pointer 
              to a dynamically allocated Complex object).
              */
              ~Complex() {
                  std::cout << "I'm destroyed! (" << id << ")" << std::endl;
              }
      
              void set_id(int id) {
                  this->id = id;
              }
      
              float get_real() {
                  return real;
              }
      
              float get_imaginary() {
                  return imaginary;
              }
      
              void set_real(float new_real) {
                  real = new_real;
              }
      
              void set_imaginary(float new_imaginary) {
                  imaginary = new_imaginary;
              }
      
              //We use friend so the + operator can access private fields
              friend Complex operator+(Complex const& c1, Complex const& c2);
              friend std::ostream& operator<< (std::ostream& out, const Complex& c);
      
      };
      
      Complex operator+(Complex const& c1, Complex const& c2) {
          return Complex(c1.real + c2.real, c1.imaginary + c2.imaginary);
      }
      
      std::ostream& operator<< (std::ostream& out, const Complex& c) {
          out << c.real << " + " << c.imaginary << "i (" << c.id << ")";
          return out;
      }
      
      int main() {
          Complex c1(1.0f, 2.0f);
          c1.set_id(1);
          Complex * c2 = new Complex(3.0f, 4.0f);//allocate memory on the heap using new
          c2->set_id(2);
          Complex c1PlusC2 = c1 + *(c2);
          c1PlusC2.set_id(3);
          std::cout << "(" << c1 << ") + (" << *(c2) << ") is (" << c1PlusC2 << ")" << std::endl;
          delete(c2);
          return 0;
      }
      
      //
      (1 + 2i (1)) + (3 + 4i (2)) is (4 + 6i (3))
      I'm destroyed! (2)
      I'm destroyed! (3)
      I'm destroyed! (1)
      
      * RAII in action: If MyClass​ internally allocates memory with new​ in its constructor, its destructor should delete​ that memory. If you have a MyClass​ object on the stack, its destructor will be called automatically when it goes out of scope, thus automatically cleaning up the heap memory it manages. This is "semi-automatic" because you write the destructor logic, but its invocation for stack objects is automatic.

We return to "declaration before mention", but we need the definition to create an executable.

#include <iostream>

extern int add(int a, int b); //declaration

int main() {
  printf("add(%d, %d) is %d\n", 1, 2, add(1, 2));
  return 0;
}
//definition
int add(const int a, int b) {
  return a + b;
}
  • Namespaces, remember scopes? Namespaces are named scopes. It is used to organize code and prevent name collisions in large projects.

For example, In large projects, different programmers or libraries might accidentally use the same name for different things (e.g., two different List classes). Namespaces solve this by allowing you to qualify names, like MyProject::List​ and StandardLibrary::List​. * Classes, we can define classes in C++:

  • We have inheritance, in fact we have multiple inheritance, this will lead us to the diamond problem: we inherit from two classes that have a common ancestor. Although C++ give us a way to solve this problem.

    Diamond Problem: This is a specific issue with multiple inheritance. If Class D inherits from Class B and Class C, and both B and C inherit from a common Class A, then D might end up with two copies of A's members (one via B, one via C), leading to ambiguity. * Access modifiers (by section), although these will be checked during compilation time, once compiled, we can do whatever we want.

    • public​: Members are accessible from anywhere.
    • private​: Members are only accessible from within the class itself.
    • protected​: Members are accessible from within the class and by its derived classes.
    • We can define virtual functions/methods that can be overridden, pure virtual methods are the equivalent to abstract methods, they need to be implemented.
    • We not only have constructors but also destructors.
  • Classes in C++, when and how to divide declarations and definitions:

  • In general the declaration of a class (name and signatures of methods) should be in a header file (.hpp). And its definition in a C++ file (.cpp).

  • The exception is a generic class, which uses templates, here we must write everything in a header file (.hpp), we will see templates on the next class.
  • We can inherit a class and apply an access restriction on the inheritance.

    When a class Derived​ inherits from a class Base​, you can specify the type of inheritance:

    • class Derived : public Base { ... };​ (Public inheritance)
    • class Derived : protected Base { ... };​ (Protected inheritance)
    • class Derived : private Base { ... };​ (Private inheritance)

image

image

Circle c = Circle(0, 0, 1);​ on the stack.

Figure* figures[2] = {new Circle(0,0,1), new Rectangle(0,0,1,1)};

  • Figure* figures[2]​: Declares an array named figures​ of size 2. Each element of this array is a pointer to a Figure​ (i.e., Figure*​).
  • new Circle(0,0,1)​:

  • new​: Dynamically allocates memory on the heap for a Circle​ object.


Now let's answer the questions:

1. Why an array of pointers to figure?
* Polymorphism: This is the primary reason. It allows you to treat different types of objects (like Circle​ and Rectangle​) uniformly through a common base class interface (Figure*​). You can iterate through the array and call draw()​ on each element, and the correct draw()​ method for the specific shape will be executed.
* Heterogeneous Collection: You can store pointers to different derived types (which are all "Figures") in the same collection. If you tried to create an array of Figure​ objects directly (e.g., Figure figures[2];​), you would encounter "object slicing." When you assign a Circle​ or Rectangle​ to a Figure​ object, only the Figure​ part of the derived object is copied, and all derived-specific information (and behavior) is lost. Pointers (or references) avoid this.
* Dynamic Allocation: The objects are created with new​, meaning they reside on the heap. Pointers are used to manage heap-allocated memory.

2. We cannot define a variable of class figure, why?
This strongly suggests that Figure​ is an abstract class.
* An abstract class is a class that cannot be instantiated directly (you can't create objects of its type).
* A class becomes abstract if it has at least one pure virtual function. A pure virtual function is declared like this in the base class:
* The = 0​ signifies that the base class provides no implementation for this function and expects derived classes to provide their own. This makes Figure​ an interface or a contract.

3. Which drawis being called in line 6?

Therefore, the Rectangle::draw() ​ method is called.

4. What happens with cand r?
They are allocated on the stack. Their memory is automatically reclaimed.

Namespace example

#include <iostream>

namespace my_namespace {
    int variable_a = 2;
}

namespace my_other_namespace {
    int variable_a = 3;
}

int main() {
    printf("variable_a from my_namespace: %d\n", my_namespace::variable_a);
    printf("variable_a from my_other_namespace: %d\n", my_other_namespace::variable_a);
    return 0;
}
#include <iostream>

int main() {
    std::string my_string = "Hello";
    printf("%s\n", my_string.c_str());
    std::cout << my_string << std::endl;
    return 0;
}
---
#include <iostream>

using namespace std;

int main() {
    string my_string = "Hello";
    printf("%s\n", my_string.c_str());
    cout << my_string << endl;
    return 0;
}

Abstract class

figure.hpp
#include <iostream>

/*
    Declarations of classes:
    1. Figure, an abstract class representing a 2D Figure with a center; a drawing function and a function to calculate it's area
    2. Circle, Rectangle, and Triangle, specific 2D Figures
    3. Point, a Point in a 2D space, this is a Circle with radius 0
*/

class Figure {
    // pure virtual method. Subclasses must implement
    public:
        virtual void draw() = 0;
        virtual float area() = 0;
    protected:
        int center_x;
        int center_y;
};

class Circle : public Figure {
    public:
        Circle(int x, int y, float radius);
        void draw();
        float area();
    protected:
        float radius;
};
---
circle.cpp
#include <iostream>
#include <figure.hpp>

# define PI 3.14159265358979323846

// Circle implementation
Circle::Circle(int x, int y, float r) : radius(r) {
    center_x = x;
    center_y = y;
}

void Circle::draw() {
    std::cout << "Circle@(" 
        << center_x << ", " 
        << center_y << ") with radius " 
        << radius;
}

float Circle::area() {
    return PI * radius * radius;
}

  1. C++ doesn't have a built-in clone()​ method like some other languages (e.g., Java) to duplicate objects. Instead, it uses a special constructor called the copy constructor.

  2. (this uses pass by reference, the reason is passing by value to the copy constructor will result in a circular requirement": This is crucial. A copy constructor typically takes its argument as a constant reference (e.g., MyClass(const MyClass& other)​).

    • Why by reference? If it took the argument by value (e.g., MyClass(MyClass other)​), then to call the copy constructor, you'd first need to make a copy of the argument to pass it by value. This would require calling the copy constructor again, leading to an infinite loop (a "circular requirement"). So, it must be by reference.
  3. The assignment operator (=​) for objects isn't as straightforward as for simple data types (like integers).

    In C++ the assignment operator for objects will make a shallow copy... this means it will copy the value of each field from one object to another."

    • "This can lead to issues, like double free." If both objects' destructors try to delete​ (free) the same shared memory, the program will crash or have undefined behavior (a "double free" error).
    • Another issue is a dangling pointer: if one object is destroyed and frees the memory, the other object's pointer now points to invalid memory.

Example of delete arry

‍#include <iostream>

using namespace std;

struct my_type {
  ~my_type() { cout << "destructor" << endl; };
};

int main() {
  my_type v[5];
  my_type* array = new my_type[4];
  delete[] array;
  cout << "delete finished" << endl;
  return 0;
}

array v will be automatically delete and call the destructor 5 times after return 0 since it's on stack

array will call the destructor 4 times after delete

output:

destructor
destructor
destructor
destructor
delete finished
destructor
destructor
destructor
destructor
destructor

Strings and Strings

  • In C++ we have both strings and string... there is the usual char pointer or char array ending with a 0 that represents a string. But there is also the string class that can represent a string.
  • The string class represents string objects that are mutable, you can change a string in C++!
  • The char pointer or array of chars, ending with a 0, are the usual C strings, these are only mutable if defined in the stack or the heap. String literals, i.e.: "Hello World" will be in a read-only section of the executable and so they are not mutable.
  • The \(<<\) operator is overloaded to work with C strings, but printf or puts (or similar functions) will only work with C strings; you can use the c_str() function on a string object.
#include <iostream>

using namespace std;

int main() {
    string s1 = "Hello world!";
    cout << s1 << endl;
    const char * s2 = "Goodbye!";
    cout << s2 << endl;
    printf("My first string is %s\n", s1);
    printf("My second string is %s\n", s2);
    printf("The correct way to print my first string is %s\n", s1.c_str());
    return 0;
}
/*
output
Hello world!
Goodbye!
My first string is @�ߠ�
My second string is Goodbye!
The correct way to print my first string is Hello world!
*/

Example of copy constructor

#include <iostream>

class Point {
private:
    int x, y;

public:
    // (1) Constructor
    Point(int x_val, int y_val)
        : x(x_val), y(y_val)
    {}

    // (2) Copy constructor
    Point(const Point& other)
        : x(other.x), y(other.y)
    {
        std::cout << "Copy constructor called\n";
    }

    // Accessors for demonstration
    int getX() const { return x; }
    int getY() const { return y; }
};

int main() {
    Point p1(3, 4);
    std::cout << "p1: (" << p1.getX() << ", " << p1.getY() << ")\n";

    // This invokes the copy constructor
    Point p2 = p1;
    std::cout << "p2 (after copy): (" << p2.getX() << ", " << p2.getY() << ")\n";

    return 0;
}