Programming Languages and Compilers

About Me

Professor Craton

Anything you want to know?

Introductions

  • Name
  • What are you hoping to learn from this class?

Syllabus

Link

Quizzes

  • Given regularly at beginning and end of class
  • Should be easy for you
  • Allow me to confirm that everyone is keeping up with course material

Course Overview

Introduction

  • 1.1 - 1.3 Why programming languages?
  • 1.4 - 1.6 Compilation vs Interpretation

Foundations

  • 2.1 - 2.2 Lexing (aka Tokenizing, Scanning)
  • 2.3 - 2.5 Parsing - ASTs, Language Hierarchy
  • 3.1 - 3.2 Binding and Lifetime
  • 3.3 - 3.6 Scope and Closures
  • 5 - Target Machine Architecture

Language Design

  • 6 - Control Flow
  • 7 - Type systems
  • 8 - Composite Types
  • 9 - Subroutines and Control Abstraction

Programming Paradigms

  • 10 - Object Oriented Programming (Python)
  • 11 - Functional Programming (JS)
  • 12 - Declarative Programming (SQL)

Course Software

  • You will need access to an x64 Debian-based Linux system for several of the assignments in this class
  • This can be installed directly on most laptops, or you can run it in a VM
  • We’ll cover VM setup in the first lab

Introduction

What is a computer?

A bunch of rocks

History of Programming

  • 25th Century BC - Abacus (Babylon)
  • 5th Century BC - Pāṇinian grammar (India)
  • 1st Century BC - Antikythera mechanism (Greece)
  • 12th Century - Castle Clock (Arab World)
First computer program? (Ada Lovelace, 1840)

In Babbage’s world his engines were bound by number…what Ada Byron saw—was that number could represent entities other than quantity. So once you had a machine for manipulating numbers, if those numbers represented other things, letters, musical notes, then the machine could manipulate symbols of which number was one instance

A Bombe used for Enigma decryption

“Programming” scene from The Imitation Game

ENIAC - first general purpose computer
Punch card with Fortran statement

Why Languages?

“Hello World” binary

Editing Binaries Manually

objdump -d hello
hello:     file format elf64-x86-64

Disassembly of section .text:

00000000004000d4 <.text>:
  4000d4:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
  4000db:   48 c7 c7 01 00 00 00    mov    $0x1,%rdi
  4000e2:   48 c7 c6 02 01 40 00    mov    $0x400102,%rsi
  4000e9:   48 c7 c2 0e 00 00 00    mov    $0xe,%rdx
  4000f0:   0f 05                   syscall
  4000f2:   48 c7 c0 3c 00 00 00    mov    $0x3c,%rax
  4000f9:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
  400100:   0f 05                   syscall

mov $0x1, %rax

48 c7 c0 01 00 00 00

Memory hierarchy

16 x64 Registers

  • RAX
  • RBX
  • RCX
  • RDX
  • RSI
  • RDI
  • RSP - Stack Pointer
  • RBP - Base Pointer
  • R8 - R15

System Calls

  • Provide a way to call a function in kernel space
  • Examples
    • read
    • write
    • exit

Example of syscall

# exit(0)
mov $60,%rax # exit
mov $0,%rdi # exit code 0
syscall

Compilation and Interpretation 1.4

Compilation

Code is converted to machine code or another executable form

Three stage compiler architecture

Interpretation

  • Code is executed by another program
  • Python REPL example
  • JS browser REPL example

Compilation and Interpretation

  • JITs (e.g. JS)
  • Compiled languages with runtimes (e.g. Go)
  • VM bytecode (e.g. Java)

C Compilation Steps

  1. Preprocessing
  2. Compilation
  3. Assembly
  4. Linking

Viewing Intermediate Output

gcc hello.c -save-temps -o hello
  1. Preprocessor - hello.i
  2. Compilation - hello.s
  3. Assembly - hello.o
  4. Linking - hello

Preprocessor Output

hello.i is plain text C code and can be inspected.

Compiler Output

hello.s is plain text Assembly and can be inspected

Assembler Output

  • hello.o is x64 machine code
  • It can be viewed in a hex editor e.g. hexedit hello.o
  • We can also view it’s content using objdump -d hello.o

Linker Output

  • hello is x64 machine code
  • It can be viewed in a hex editor e.g. hexedit hello
  • We can also view it’s content using objdump -d hello

make

make is a build automation tool

makefiles

Contain a set of directives that instruct make how to create a target.

Example makefile

all: double.txt

double.txt: original.txt
  cat $< $< > $@

Lab 1 Solution

Lexing 2.2

Definition

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens

Tokens

Commonly stored as (name, value) tuples. Example:

my_variable = 17

Becomes:

(identifier, "my_variable")
(operator, "=")
(integer, 17)

Calculator Example

Supported operations

  • 1 + 2
  • 4 - 2
  • 2 * 3
  • 8 / 4

Goal

  • Scan a string such as “1 + 2 * 3” and create a list of (type, value) tuples:
(number, 1)
(operator, +)
(number, 2)
(operator, *)
(number, 3)

Chomsky Hierarchy

Grammar Automaton (Computer)

  1. Unrestricted Turing Machines
  2. Context Free Pushdown Automata
  3. Regular Finite State Automata

Regular Languages

  • Are insufficient to parse most programming languages
  • Are useful for parsing tokens
  • Can be processed by a simple finite automaton

Deterministic Finite Automaton (DFA)

  • Finite set of states \(Q\)
  • Finite set of input symbols called the alphabet \(\Sigma\)
  • Transition function \(\delta : Q \times \Sigma \rightarrow Q\)
  • Initial or start state \(q_0 \in Q\)
  • Set of accept states \(F \subseteq Q\)

Drawing DFAs

  • States are nodes on the graph
  • Start state indicated by arrow
  • Accept states indicated by double border
  • Transitions indicated as labeled arrows
DFA to accept string containing an even number of zeroes

Regular Expressions

A string used as a search pattern for a member of a regular language.

RE Basics

  • Characters are generally matched literally.
import re

s = "the quick brown fox jumped over the lazy dog"
re.findall("dog", s)
['dog']

RE Boolean Or

  • Pipes (|) can be used for boolean or
import re

s = "the quick brown fox jumped over the lazy dog"
re.findall("dog|fox", s)
['fox', 'dog']

RE Quantifiers

  • ? - zero or one occurrences of preceding element
  • * - zero or more occurrences of preceding element
  • + - one or more occurrences of preceding element
  • {n} - exactly n occurrences of preceding element
import re

s = "the quick brown fox jumped over the lazy dog"
re.findall("ov?", s)
['o', 'o', 'ov', 'o']

RE Grouping

  • Parens can be used for grouping
import re

s = "the quick brown fox jumped over the lazy dog"
re.findall("(d|f)o(g|x)", s)
[('fox', 'f', 'x'), ('dog', 'd', 'g')]

RE Bracket Expressions

  • Brackets [] may be used to match a single character against a set of characters
import re

s = "the quick brown fox jumped over the lazy dog"
re.findall("[df]o[gx]", s)
['fox', 'dog']

RE Character Classes

Several special character classes are provided:

  • \w - alphanumeric characters
  • \d - digits
  • \s - whitespace characters
  • . - anything
import re

s = "the quick brown fox jumped over the lazy dog"
re.findall("\s...\s", s)
[' fox ', ' the ']

DFA RE Equivalence

  • A DFA can be created to match any regular expression
  • A regular expression can be created to match a language accepted by any DFA

Match Even Number of Zeroes

import re

inputs = [
    "", "0", "00", "11", "1", "010",
    "110110110", "00100", "1000000",
]

for text in inputs:
    print(f"{text:<9}",
        re.fullmatch(r"1*(01*01*)*", text) != None)

Simple Lexer in Python

import sys
import re

def lex_one(value):
    if re.match('\d+', value):
        return ('number', value)
    elif re.match('[\+\-\*\/]', value):
        return ('operator', value)
    else:
        raise ValueError(f'Invalid token: {value}')

def lex(text):
    return [lex_one(v) for v in text.split()]

Lexing lab questions?

Parsing 2.3

Parsing or syntax analysis is the process of analyzing a string of symbols conforming to the rules of a formal grammar.

Calculator Example

Supported operations

  • 1 + 2
  • 4 - 2
  • 2 * 3
  • 8 / 4

Goal

  • Scan a string such as “1 + 2 * 3” and determine if it is a valid operation, and order it so it can be executed.

Chomsky Hierarchy

Grammar Automaton (Computer)

Unrestricted Turing Machines Context Free Pushdown Automata Regular Finite State Automata

Context-free Languages

Big Idea

  • Grammar rules are applied recursively to generate members of the language.
  • Grammar rules can be applied in reverse to parse a string to determine language membership.
  • The parsing process can result in a parse tree showing token structure.

English Example

  • S -> NP VP
  • NP -> Adj Noun
  • NP -> Noun
  • VP -> Verb Adv
  • VP -> Verb
Parse tree

Implementing a parser

Parse tree example

Top-down parsing

  • Begins with overall structure (sometimes guessed or assumed) and then determines details
  • Top-down refers to order in which nodes in the final parse tree are determined
Top-down parse

Bottom-up parsing

  • Determines low-level details first then builds our surrounding structure.
Bottom-up parse

Shift-reduce parsing

  • Bottom-up
  • Shifts tokens onto a stack
  • Reduces them when they match against a rule

LR Parsing

  • Shift-reduce, bottom-up parse
  • Left-to-right, Rightmost derivation first
  • Deterministic algorithm
  • Linear time performance

Calculator Grammar

  • expr -> number
  • expr -> expr operator expr

Shift-Reduce Trace

Parsing “1 + 2”

Shift:  1
Reduce: (expr 1)
Shift:  +
Shift:  2
Reduce: (expr 2)
Reduce: (expr (expr 1) + (expr 2))

Result: (expr (expr 1) + (expr 2))

Binding

Section 3.1

Names

  • A mnemonic character string used to represent something else
  • Can refer to variables, constants, types, functions, etc
  • Allow programmers to abstract complexity (e.g. memory addresses)

Abstraction

  • Hiding details to create clearer interfaces

Control Abstractions

  • Hide complex executable code
  • Functions, subroutines, and certain macros are examples

Data Abstractions

  • Hide data representation details behind simpler operations
  • Examples are classes and structs

Binding Time

  • A binding is an association between two things
  • Binding time refers to the point when the association is created

Possible Binding Times

  • Language design time
  • Program writing time
  • Compile time
  • Link time
  • Run time

Performance

  • As a general rule, making decisions earlier will make final programs more efficient
  • Earlier binding times generally improve efficiency

Run Time

  • Bindings are not finalized by the time the program begins running
  • Bindings may be altered at runtime

Static and Dynamic Binding

  • Static - any binding determined before runtime
  • Dynamic - any binding determined during runtime

Object Lifetime and Storage

Section 3.2

Key Events

  • Creation and destruction of objects
  • Creation and destruction of bindings
  • Deactivation and reactivation of bindings

Lifetime

  • The period between creation and destruction
  • Object lifetime and binding lifetime may or may not coincide

References

  • Values passed by reference typically have a longer lifetime than their binding
  • Dangling references can occur when a reference outlives the object it points to

Storage Allocation Mechanisms

  • Static - Absolute addresses and exist throughout execution
  • Stack - Allocated and deallocated in LIFO order with matching function calls
  • Heap - Allocated and deallocated at arbitrary times

Static Allocation

  • Global variables
  • Machine code
  • Variables declared static
  • Can be used for local variables in certain languages (e.g. early Fortran without recursion)

Stack Allocation

  • Must be used for local variables to support recursion
  • Function instances get their own stack frame for local variables and other data
#include <stdio.h>

int factorial(int n) {
  if (n == 1) return 1;

  return n * factorial(n - 1);
}

int main(void) {
  for (int i = 1; i < 11; i++) {
    printf("%d! = %d\n", i, factorial(i));
  }
}

Calling Sequence

  • Prolog - Executed initially to prepare the stack
  • Epilog - Executed to put state back as it was before the call

Accessing Stack Variables

  • Addresses are not known at compile time
  • A frame pointer is created to point to the current stack frame
  • Local variable references are offsets to this pointer
Upward-growing call stack

Heap Allocation

  • Required for dynamically allocated data structures
    • Linked data structures (linked lists, trees, graphs, etc)
    • Objects of arbitrary size (documents, images, etc)
  • Heap management is a complex topic

Language Differences

  • Allocation is typically explicit
  • Deallocation is explicit in some language (C,Rust,etc)
  • Some languages specify that deallocation should happen automatically at runtime using garbage collection

Garbage Collection

  • Automatically cleans up objects in memory when they are no longer needed
  • Prevents many common classes of memory bugs
  • Will have some runtime performance cost

Issues Mitigated by Garbage Collection

  • Use after free (dangling references box example)
  • Memory leaks
  • Hard to debug as the problem does not trigger an exception at the root cause, but causes problems later

Heap Allocation in C

  • malloc returns a pointer to allocated heap memory
  • free frees the memory

Example

char* ptr;

// Allocate 100 bytes
ptr = (char*) malloc(100 * sizeof(char));

// Do some things using the memory
// ...

// Free the memory
free(ptr);

Exam in lab, Thursday, March 5th

Scope and Closures

Scope

(Section 3.3)

The textual region of a program where a binding is active

Scope

Can also be used to refer to a region of a program where no bindings are changing

  • Blocks e.g. { … }
  • Functions
  • Modules

Referencing Environment

The set of active bindings

Inspecting the Referencing Environment

import inspect

def main():
  a = 1

  def sub(b):
    c = 3
    print('Locals:', inspect.currentframe().f_locals)
    print('Parent:', inspect.currentframe().f_back.f_locals)

  sub(2)

main()
Locals: {'c': 3, 'b': 2}
Parent: {'sub': <function main.<locals>.sub at 0x7f89903ce268>, 'a': 1}

Static (Lexical) Scope

Binding can be determined at compile time

Early Basic

  • Global-only scope
  • No variable declarations
  • Variable names could be only 2 characters (letter + digit)
10 FOR X = 1 TO 10
20 PRINT "HELLO, WORLD"
30 NEXT X

Fortran 77

  • Global and subroutine scope
  • Subroutines can’t be nested
  • Variable declarations are optional (assumed to be integer if name begins with I-N or real otherwise)

static variables

  • Local variable with lifetime covering entire program execution
  • aka save in Fortran, own in Algol
/* `static` example in C */

#include <stdio.h>

void count() {
  int value = 0; // Try with `static`
  printf("%d\n", value++);
}

int main(void) {
  for (int i = 0; i<10; i++) { count(); }
}

Nested Subroutines

  • Introduced in Algol 60
  • Allow scopes to nest

Closest Nested Scope Rule

  • A name is known in its scope and in nested scopes unless it is hidden by another name in a nested scope
#include <stdio.h>

int main(void) {
  char* level = "Grandparent";
  {
    printf("%s\n", level);
    {
      char* level = "Local";

      printf("%s\n", level);
    }
  }
}

Built-ins

  • Many languages define built-ins at an outermost scope level
  • These can be overwritten in certain languages
# Don't do this...
def len(str):
  return 42

print(len("Hello, World"))

Declaration Order

  • Some languages require the definition exist prior to it being accessed in a block
  • Some languages require all definitions to be present at the start of the block
#include <stdio.h>

int main(void) {
  printf("%d\n", i); // Error

  int i;
}

Section 3.5

Overloading

Some names can refer to multiple objects at a given time

Polymorphism

Allows a subroutine to perform differently based on the types involved

#include <stdio.h>

void print(int val) {
  printf("%d\n", val);
}

void print(char* val) {
  printf("%s\n", val);
}

int main() {
  print((char*)"Hello, world!");
  print((int)10);
}

Modules

  • Provide a way to bundle object such as subroutines, variables, and classes together
  • May not be available unless explicitly exported from one module and imported into another

Python Module Example

Section 3.6

First-class Functions

  • Functions themselves can be used just like any other value
  • They can be passed as a parameter, returned from a function, etc
def call_twice(fn):
  fn()
  fn()

def say_hello():
  print("Hello, world!")

call_twice(say_hello)

Lambda Expressions

  • Also called anonymous functions, as they may not have a name
nums = [1,2,3,4,5,6,7,8,9,10]

squares = map(lambda n: n*n, nums)

print(list(squares))

Shallow and Deep Binding

  • Shallow binding - A referencing environment is bound when the function is finally called
  • Deep binding - A referencing environment is bound when the reference is created

Closures

  • A function and its referencing environment created when a reference to a function is created
def pair(first, second):
  return lambda idx: second if idx else first

p = pair(7,32)

print(p(0))
print(p(1))

import inspect
print(inspect.getclosurevars(p))

Control Flow

Chapter 6

Control Flow

Dictates the order of program execution

Unstructured Flow

  • Control flow is achieved using low-level constructs such as conditional branches and gotos
  • Common in assembly languages and some other early languages
if (A .lt. B) goto 10
...

10 ...

Structured Flow

  • Gotos begin to be considered evil in the 60s
  • We seek better tools to solve control flow problems
  • Examples include if, then, else and for

Multi-level Returns

  • One “advantage” of goto is that it can jump anywhere in a program, not simply return to the caller
  • Sometimes we want to jump outside of our immediate context, and multilevel returns can allow this

Exceptions

  • Exception handling is one example of a non-local return
  • Execution moves from the point where the exception was raised to the point where the exception is handled
def assert_positive(num):
  if num <= 0:
    raise ValueError

def assert_all_positive(nums):
  for n in nums:
    assert_positive(n)

try:
  assert_all_positive([1,2,3,-1])
except ValueError:
  print('All numbers are not positive')

Sequencing

  • Core to imperative programming
  • Determines order of side effects such as assignment

Compound Statement

  • An ordered list of statements that can be used where statements are expected
  • Sometimes called blocks

Selection

  • Provide a way to select one set of statements based on a condition
  • e.g. if or switch

Short-circuit Evaluation

  • Most modern languages will skip evaluating unneeded arguments in boolean expressions
  • This creates more efficient programs
  • This can be used for control flow
function check_value(obj) {
  if (obj && obj.val) {
    // Confirms that obj is defined
    console.log('Value is set')
  } else {
    console.log('Value is unset')
  }
}

check_value()
check_value({ val: 1 })

Switch Statements

  • More efficient in hardware than nested conditionals
  • Creates a jump table that transfers execution in a single instruction
#include <stdio.h>

void count_down(int n) {
  switch (n) {
    case 3:
      printf("Three\n");
    case 2:
      printf("Two\n");
    case 1:
      printf("One\n");
  }
}

int main(void) {
  count_down(3);
}

Iteration

  • Allows computers to perform the same task repeatedly
  • Makes computers useful for more than fixed-size tasks
  • Without some mechanism of iteration or recursion, runtime is linearly coupled to program size

Logically Controlled Loops

  • Run a block of code multiple times
  • A condition may be used to stop execution
  • May be pre-test, post-test, or midtest
  • Post-test is most common
#include <stdio.h>

int main(void) {
  printf("Press 'q' to quit\n");
  while (getchar() != 'q') {
    printf("Not quitting\n");
  }
  printf("Quitting\n");
}

Enumeration-Controlled Loops

  • Executed once for every value in a finite set
  • e.g. for loop
do i = 1, 10, 2
  ...
for (int i = 1; i < 10; i+=2) {
  ...
for i in range(1, 10, 2):
  ...

For loop code generation

mov first, r1
mov step, r2
mov last, r3
body:
  # Loop body
  # Increment loop counter
  add r1, r2
check:
  # Check loop condition
  cmp r1, r3
  jl body

Semantic Complications

  • Many languages e.g. C allow complex expressions in for loop control conditions
  • Machine code for for loops must often be more complex, as loop count can’t be easily predicted

Recursion

  • We can use nested function calls to perform tasks repeatedly
  • This can be more readable than iterative solutions
  • This can be less performant than iterative solutions
def prime_factors(num):
  for i in range(2, num + 1):
    if not num % i:
      return [i] + prime_factors(int(num/i))

  return []

print(prime_factors(780))

Recursive optimizations

  • Some compilers will optimize recursion into iteration removing the need for new stack frames
  • In this case, recursion is very efficient
  • Tail recursion makes optimization simpler
def factorial(n):
  if n == 1: return n

  return n * factorial(n-1)

print(factorial(5))

Tail-recursive version:

def factorial(n, accum=1):
  if n == 1: return accum

  return factorial(n-1, n*accum)

print(factorial(5))

Tail Recursion

  • No computation takes place after the recursive call
  • Can be computed using constant space
  • Can be optimized to simple iteration
  • More info

Iterators

  • Provide a construct to yield successive items
  • Also called enumerators
for i in range(10):
  print(i)
for i in [1,2,3]:
  print(i)
for character in "Hello, World!":
  print(character)

“True” Iterators

  • Many languages provide a way to iterate over objects
  • A true iterator provides a way to iterate without storing intermediate data structures
# Never creates a list of items in memory
for i in range(10):
  print(i)

# Creates a list of items in memory, then iterates over them
for i in [0,1,2,3,4,5,6,7,8,9]:
  print(i)

Python Iterator

  • Includes an __iter__ method that returns an iterator
  • Includes a __next__ method that returns the next item

Duck Typing

  • We can identify the type of an object by its properties
  • If it walks like a duck and quacks like a duck, it is a duck
  • We can implement and iterator without inheriting from an iterator class
def range(start, stop=None, step=1):
    if stop == None:
        stop = start
        start = 0

    current = start

    results = []

    while current < stop:
        results.append(current)
        current += step

    return results

Generators

  • Provide a generalized construct for creating iterators
  • We can yield successive values
def range(start, stop=None, step=1):
    if stop == None:
        stop = start
        start = 0
    
    current = start
    
    while current < stop:
        yield current
        current += step
def squares():
  i = 1
  while(True):
    yield i*i
    i += 1

results = squares()
for i in range(20):
  print(next(results))
def primes():
  current = 2

  while True:
    if is_prime(current):
      yield current
    current += 1

Type Systems

Chapters 7 & 8

Purpose

  1. Provide implicit context for operations
  2. Limit the set of valid operations
  3. Explicit types can make code easier to read
  4. Types known at compile time can drive performance optimizations

Constructs with Types

  • Anything with a value
    • Constants
    • Variables
    • Parameters
    • Literals

A type system

  1. A mechanism to define types
  2. Set of rules for
    • Type equivalence
    • Type compatibility
    • Type inference

Type Checking

  • The process of ensuring that language type rules are followed
  • Violations are type clashes

Strong Typing

Strictly enforces valid types and operations

Static Typing

Type rules are enforced at compile time

Example Languages

  • Javascript - weak/dynamic
  • Python - strong/dynamic
  • C - weak/static (types can be coerced, pointer casting, bounds checking, etc)
  • Java - strong/static

JavaScript

  • Weakly typed
  • Dynamically typed
// None of these are runtime errors
"Six" * 6 // NaN
true * 6 // 6
[1,2] * 6 // NaN
(() => 0) * 6 // NaN
// Results can be different than we might expect
1 + 2 + 3 // 6
1.0 + 2 + 3 // 6
1 + '2' + 3 // '123'
1 + 2 + '3' // '33'
1 - '2' + 3 // 2
1 + '2' - 3 // 9
1 + true // 2

Python

  • Strongly typed
  • Dynamically typed
# We don't need to declare types explicitly
name = "Prof Craton" # Name is a string
# We will get errors for invalid types
score = name + 1 # TypeError

C

  • Weakly typed
  • Statically typed
// This program will not compile
#include <stdio.h>

int main(void) {
  int num = 2; // int type is required

  printf(num); // type error (needs format string)
}
// This program will likely crash
#include <stdio.h>

int main(void) {
  long int num = 2; // int type is required

  printf((char*)num); // type error (needs format string)
}

Polymorphism

  • We can design code to work with different types
  • This can be achieved in C++ using generics (templates)
#include <iostream>
#include <cstring>

template <typename Type>
Type get_min(Type a, Type b) {
  return a < b ? a : b;
}

template <>
const char* get_min<const char*>(const char* a, const char* b) {
  return std::strcmp(a, b) < 0 ? a : b;
}

int main() {
  std::cout << get_min(2, 1) << std::endl;
  std::cout << get_min("hello", "world") << std::endl;
}

Null

  • We often want a way to specify something other than the regular return value
  • e.g. popping from and empty stack, dividing by zero, etc
  • null is one option, but it can be used differently in different contexts

Rust Option

Type Option represents an optional value: every Option is either Some and contains a value, or None, and does not. Option types are very common in Rust code

docs

fn divide(numerator: f64, denominator: f64) -> Option<f64> {
    if denominator == 0.0 {
        None
    } else {
        Some(numerator / denominator)
    }
}

fn main() {
  match divide(2.0, 1.0) {
      Some(x) => println!("Result: {}", x),
      None    => println!("Cannot divide by 0"),
  }
}

Classification of Types

  • Booleans
  • Numerics
  • Enums
  • Subrange
  • Composite

Enums

Introduced in Pascal:

type weekday = (sun, mon, tue, wed, thu, fri, sat);

Subrange

Also introduced in Pascal:

type score = 0..100;

Composite Types

  • Records (structs) - introduced by COBOL
  • Arrays
  • Sets
  • Lists
  • Files

Composite Types

Records (structs)

  • Allows different types to be stored and accessed together
  • Individual fields are typically accessed using . notation.

Struct Packing

On most systems, the layout of items in a struct can change the storage space requirements due to alignment constraints.

#include <stdio.h>

int main(void) {
  typedef struct {
    char id;
    int hp;
    char weapon;
  } actor1;

  typedef struct {
    char id;
    char weapon;
    int hp;
  } actor2;

  printf("%lu %lu\n", sizeof(actor1), sizeof(actor2));
}

Arrays

  • Used to group a number of items of the same type

Strings

  • Most typically an array of characters

Sets

  • Store an arbitrary number of distinct values of the same type
print({1,2,3}) # {1, 2, 3}
print({1,2,3,3,2,1}) # {1, 2, 3}

Lists

  • Hold a number of elements of the same type
  • Typically differ from arrays in their allocation mechanism
  • Lists will scale dynamically while arrays are fixed

Have you ever needed to write code to remove duplicate elements from a list?

The difference between a bad programmer and a good one is whether they consider their code or their data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

  • Linus Torvalds

Homogeneous types

  • Languages may differ in whether they allow different types to be used in certain composite types (e.g. lists and sets)
  • ML and Haskell are homogeneous, many others are heterogeneous

Type Checking

  • Type equivalence - Types are the same
  • Type compatibility - Types can be used interchangeably
  • Type Inference - Determining types automatically

Type Equivalence

  • Structural equivalence - based on content
  • Name equivalence - based on lexical occurrence of type definitions
#include <stdio.h>

typedef struct {
  int inner;
} ContainerA;

typedef struct {
  int inner;
} ContainerB;

void increment(ContainerB* container) {
  container->inner += 1;
}

int main() {
  ContainerA container = {0};
  increment(&container);

  printf("Number %d\n", container.inner);
}

Type Compatibility

  • Types don’t need to always be fully equivalent to be used
  • Coercion is used to convert from one type to another
print(1 + 2) # 3
print(1 + 2.0) # 3.0
print(1 / 2) # 0.5 in Python 3, 0 in Python 2

Universal Reference Types

  • Most languages provide a universal reference that can point to any object
  • Examples in include void in C, Object in Java, and object in C#

Let’s revisit our previous C container example to use void.

#include <stdio.h>

typedef struct {
  int inner;
} ContainerA;

typedef struct {
  int inner;
} ContainerB;

// Use void to accept any pointer
void increment(void* container) {
  // Cast pointer to ContainerB type
  ((ContainerB*)container)->inner += 1;
}

int main() {
  ContainerA container = {0};
  increment(&container);

  printf("Number %d\n", container.inner);
}

Type Inference

  • We can determine types without specifying them
1 + 2 // int
1.0 + 2.0 // float

C++ provides the auto keyword to infer types

#include <iostream>

int main() {
  // Type automatically determined
  auto pi = 3.14;

  std::cout << pi << "\n";
}

Some languages are statically typed while providing type inference such that explicit types are rarely required. Examples are ML, Haskell, and Rust.

fn main() {
    let elem = 5u8;
    // The compiler knows that `elem` has type u8

    let mut vec = Vec::new();
    // The compiler doesn't yet know the exact type of `vec`
    // It just knows that it's a vector (`Vec<_>`)

    vec.push(elem);
    // Aha! Now the compiler knows that
    // `vec` is a vector of `u8`s (`Vec<u8>`)

    println!("{:?}", vec);
}

Equality Testing

  • How do we know if you two values are equal?
  • Compare values
  • Compare references
a = {} // Empty object
b = {} // Another empty object
a == b // `false` because they aren't the same empty object

Shallow vs Deep Compare

  • Shallow - We check to see if references point to the same object
  • Deep - We check to see if objects hold the same values
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void main(void) {
  char * a = malloc(6);
  char * b = malloc(4);
  strcpy(a, "Alice");
  strcpy(b, "Bob");

  // Shallow compare
  printf("%d\n", a == b); // False
  printf("%d\n", a == a); // True

  // Undefined behavior, typical results provided
  printf("%d\n", a == "Alice"); // False
  printf("%d\n", a != "Bob"); // True

  // Deep compare
  printf("%d\n", strcmp(a, a) == 0); // True
  printf("%d\n", strcmp(a, b) == 0); // False
  printf("%d\n", strcmp(a, "Alice") == 0); // True
}

Enum lab questions?

Free Vaccines Tomorrow

  • Over 350 students and faculty are signed up
  • You can still sign up (Walkins welcome)
  • 10am to 1pm in Reardon lobby

It is a great vaccine. It is a safe vaccine, and it is something that works.

– Donald Trump (source, also CDC)

Control Abstraction

Chapter 9

Abstraction

  • Allows us to hide complexity behind simpler interfaces

Data Abstraction

  • Create complex data structures with simple interfaces to pass around (e.g. records, structs, dictionaries, etc)

Control Abstraction

  • Create organized sections of executable code that execute in well-understood ways and can be reused

Subroutine

  • Performs operations for the caller while the caller waits
  • Arguments, or actual parameters, are passed to subroutines and mapped to formal parameters from the subroutine definition
  • Subroutines that can return values can be called functions

Call Stack

  • Memory space for functions to use to store local variables, return addresses, and other data
  • Function calls push a new frame to the stack
  • Returns pop their frame when they are finished with it

Call stack review resources

Parameter Passing

Pass by Value

  • Actual parameter values are made available to the subroutine
  • Modification of values will not impact the caller
  • This may involved making a copy in memory

Pass by Reference

  • Actual parameters are references to data from the caller
  • Modification of values will impact caller
  • Should not involve copying data

C

  • Pass by value
#include <stdio.h>

void increment(int i) {
  // This will not change the callers value
  i = i + 1;
}

int main(void) {
  int i = 0;
  increment(i);
  printf("%d\n", i);
}
  • If we want to be able to modify values, we need to pass values by reference
  • We can’t do this is C, but we can use pointers passed by value to emulate it
#include <stdio.h>

void increment(int* i) {
  // This will change the callers value
  *i = *i + 1;
}

int main(void) {
  int i = 0;
  increment(&i);
  printf("%d\n", i);
}

If we allow C++, we can use a proper pass by reference mode.

#include <stdio.h>

void increment(int &i) {
  // This will change the callers value
  i = i + 1;
}

int main(void) {
  int i = 0;
  increment(i);
  printf("%d\n", i);
}

Python

  • Pass by object reference
  • Numbers and strings are passed by value
  • Composite types such as objects are passed as a reference
def increment(i):
  // Only modifies local value
  i = i + 1

i = 0
increment(i)
print(i)

Callers will be able to see modifications to mutable objects

def append(mylist, value):
  mylist.append(value)

mylist = []
append(mylist, 1)
print(mylist)

Callers will still hold a reference to the exact same object

def empty(mylist):
  # The caller won't be impacted by this reassignment
  mylist = []

mylist = [1,2]
empty(mylist)
print(mylist)

Immutable objects (such as tuples or strings) can’t be modified for the caller

Optional Parameters

  • Some languages allow us to mark parameters as optional
  • We may be able to provide a default value
  • In some languages all parameters are optional

Python allows us to annotate optional parameters by providing default values

def remove_character(mystring, character=" "):
  return mystring.replace(character, '')

print(remove_character("Hello, World!"))

All parameters are optional in Javascript

function print(a) {
  console.log(a)
}

print('Hello, World!')
print() // This is not an error in JS

Named Parameters

  • We’ve been exploring positional parameters
  • Some languages allow them to be used by name
  • This can be very helpful, especially combined with optional parameters
def make_vehicle(type='car', color='red', max_speed=55):
  return (type, color, max_speed)

print(make_car(color='blue'))

Variable Numbers of Arguments

  • It can sometimes be helpful to accept a different number of arguments
  • One example of this is printf in C
int printf ( const char * format, ... );

Function Returns

  • End the function
  • Return some value
  • In languages without separate statements, the return is simply the value of the function body
  • Some languages allow for implicit returns
fn get_num() -> u8 {
  42
}

fn main() {
  println!("{}", get_num());
}

Events

  • Event happens outside of program at unpredictable times
  • Running programs wants to respond

Blocking

  • We simply wait for an event to complete
import urllib.request
import json

junk_foods = [
  'Pizza',
  'Popcorn',
  'Hamburger',
  'Pepsi',
  'Potato_chip',
  'Cake',
]

url = 'https://en.wikipedia.org/w/api.php?action=parse&format=json&page='

for food in junk_foods:
  contents = urllib.request.urlopen(f'{url}{food}').read()
  print(f"{food}: {json.loads(contents)['parse']['properties'][0]['*']}")
const https = require('https')

junk_foods = ['Pizza', 'Popcorn', 'Hamburger', 'Pepsi', 'Potato_chip', 'Cake']

url = 'https://en.wikipedia.org/w/api.php?action=parse&format=json&page='

junk_foods.forEach(food => {
  https.get(
    {
      host: 'en.wikipedia.org',
      path: '/w/api.php?action=parse&format=json&page=' + food,
    },
    function (res) {
      let body = ''
      res.on('data', function (d) {
        body += d
      })
      res.on('end', function () {
        console.log(`${food}: ${JSON.parse(body).parse.properties[0]['*']}`)
      })
    },
  )
})

Object-Oriented Programming

Chapter 10

Abstraction

  • Allows us to hide complexity behind simpler interfaces

Data Abstraction

  • Create complex data structures with simple interfaces to pass around (e.g. records, structs, dictionaries, etc)

Control Abstraction

  • Create organized sections of executable code that execute in well-understood ways and can be reused

Classes

  • Allow programmers to define a group of related abstractions (objects)
  • Programming techniques built around classes are considered object-oriented
  • Combine control and data abstraction

Data Members

  • Provide data abstraction
  • Hold data used by objects
  • Also known as fields or properties

Subroutine Members

  • Provide control abstraction
  • Subroutines and functions that objects can run
  • Also known as methods

Constructors

  • Called to instantiate class object
class Asset():
  def __init__(self, value, appreciation=.05, maintenance=.02):
    self.value = value
    self.appreciation = appreciation
    self.maintenance = maintenance

this and self

  • Provide explicit reference to class instance

Why Classes?

  • We can define new abstractions as extensions or refinements of existing abstractions via inheritance

Liskov Substitution Principle

Objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.

Inheritance Example

class Asset:
    """
    Represents some asset and provides methods to operate on it

    >>> bitcoin = Asset(1, appreciation=1)
    >>> int(bitcoin.future_value(years=10))
    1024
    >>> int(bitcoin.change_in_value(years=10))
    1023
    """

    def __init__(self, value, appreciation):
        self.value = value
        self.appreciation = appreciation

    def future_value(self, years):
        return self.value * (1 + self.appreciation) ** years

    def change_in_value(self, years):
        return self.future_value(years) - self.value


class House(Asset):
    """ Represents a real estate asset

    >>> house = House(100000, appreciation=.01, maintenance=.02, tax=.03, utilities=100)
    >>> int(house.operating_cost(10))
    53311

    >>> int(house.tco(10))
    42848
    """

    def __init__(self, value, appreciation=0.02, maintenance=0.02, tax=0.01, utilities=200):
        super().__init__(value, appreciation)
        self.maintenance = maintenance
        self.tax = tax
        self.utilities = utilities

    def operating_cost(self, years):
        maintenance = sum(self.future_value(y) * self.maintenance for y in range(years))
        tax = sum(self.future_value(y) * self.tax for y in range(years))
        return maintenance + tax + self.utilities * years

    def tco(self, years):
        return -self.change_in_value(years) + self.operating_cost(years)


class Vehicle(Asset):
    """
    Represents a vehicle of some type

    >>> car = Vehicle(16000, appreciation=-.2, maintenance=500, mpg=22)
    >>> int(car.tco(years=5))
    20075
    >>> car = Vehicle(16000, appreciation=-.2, maintenance=500, mpg=38)
    >>> int(car.tco(years=5))
    17204
    """

    def __init__(self, value=16000, appreciation=-0.25, maintenance=300, mpg=25):
        super().__init__(value, appreciation)
        self.maintenance = maintenance
        self.mpg = mpg

    def operating_cost(self, years, miles_per_year=10000, gas_price=3.0):
        return self.maintenance * years + miles_per_year / self.mpg * gas_price * years

    def tco(self, years):
        return -self.change_in_value(years) + self.operating_cost(years)

if __name__ == '__main__':
    house = House(100000, appreciation=0.04, tax=.01, maintenance=.02)
    print("House", house.tco(10))

    car = Vehicle(35000, appreciation=-0.20, mpg=22)
    print("Car", car.tco(10))

    car = Vehicle(8000, appreciation=-0.10, mpg=35)
    print("Budget car", car.tco(10))

Favor composition over inheritance

Inheritance

  • Represents an is-a relationship
  • Encourages chains of inheritance and attempting to fit arbitrary problems into an inheritance model

Composition

  • Represents a has-a relationship
  • Encourages combining multiple classes as members of one another

Trait Example

Functional Languages

Chapter 11

  • First-class functions and higher order functions
  • Extensive Polymorphism
  • List types and operations
  • Garbage Collection

In modern languages, the lines between language families and often have a lot of overlap.

First-class Functions

  • Functions can be passed as values

Higher Order Functions

Functions that accept functions as parameters

setTimeout(() => console.log('1 second has passed'), 1000)

Each

  • Executes a function on each item in a list
  • Returns nothing
;[1, 2, 3].forEach(i => console.log(i))
for (i of [1, 2, 3]) {
  console.log(i)
}

Map

  • Executes a function on each item in a list
  • Returns a new list of the results
// Square a list
;[1, 2, 3].map(i => i * i)

Filter

  • Executes a function on each item in a list
  • Returns a new list containing only items where the function evaluated to true
// Remove negative numbers
;[-1, 0, 1, 2].filter(i => i >= 0)

Every

  • Executes a function on every item in a list
  • Returns true if the function always evaluates to true
// Confirm that all values are positive
;[1, 2, 3].every(i => i > 0)

Reduce

  • Executes a function on each item in a list and accumulates a single value
  • Returns the final accumulated value
// Sum values in a list
;[1, 2, 3].reduce((i, a) => i + a, 0)

Purity

  • A function is said to be “pure” if it has no side effects
  • Side effects may include changing global state or performing I/O
  • Benefits include consistency and testability
# A pure function
def fib(n):
    if n <= 1:
        return n

    return fib(n-1) + fib(n-2)

Memoization

  • We can sometimes optimize the performance of pure functions by caching their outputs
from functools import lru_cache

# By caching results, we can improve the performance of our algorithm
@lru_cache()
def fib(n):
    if n <= 1:
        return n

    return fib(n-1) + fib(n-2)

History

Lambda Calculus

  • Formal mathematical basis for computation introduced by Alonozo Church
  • Provably equivalent to a Turing Machine (see Church-Turing thesis)
  • Not a programming language, but provides framework for computation and functional programming

Lisp

  • First functional language (1958)
  • Designed by John McCarthy and implemented by others (Steve Russell created the first working interpreter)
  • Lisp is still used today in various forms (Scheme, Common Lisp, Clojure, etc)
  • Lisp influenced many other modern languages (JavaScript, Python, Ruby, etc)

Miranda and Haskell

  • Emerged in the 80s as modern functional languages
  • Haskell is open and used for functional programming research

Python

  • Emerged in 1991 and was heavily influenced by Haskell
  • Multi-paradigm language incorporating ideas from various programming languages

JavaScript

  • Created in 1995
  • Brendan Eich was hired by Mozilla to implement Scheme for use in web pages
  • This project quickly shifted to the creation of Javascript
  • Glue language with Java syntax
  • Includes first-class functions and closures

JS Example

Declarative Languages

Declarative Programming

We express the logic of our program without specifying its control flow

  • Declarative - Focuses on the what
  • Imperative - Focuses on the how

Examples

  • Logic languages
  • Purely functional languages
  • Query languages
  • Domain specific languages

Logic Programming

  • Uses formal logic to solve problems
  • Programs are made up of facts and rules
  • Logic languages are covered in detail in Chapter 12
factorial(0,1).

factorial(N,F) :-
   N>0,
   N1 is N-1,
   factorial(N1,F1),
   F is N * F1.

Query Languages

  • You’ve likely already encountered SQL
  • This languages allows us to frame requests for data in terms of the data we want, not how it will be retrieved
users = [(1, 'Alice'), (2, 'Bob')]
messages = [(1, 'Hey'), (2, 'Hi'), (1, 'Bye')]

for (msguser, text) in messages:
  for (userid, name) in users:
    if userid == msguser:
      print(f"{name}: {text}")
/* Create users */
create table users (
  userid integer primary key,
  name text
);
insert into users values
  (1,'Alice'),
  (2,'Bob'),
  (3,'Carol'),
  (4,'Dan'),
  (5,'Eve'),
  (6,'Frank');

/* Create chat message */
create table messages (
  userid integer,
  content text
);
insert into messages values
  (2,'Does anyone know how modify a record in a SQL DB?'),
  (6,'Everyone knows how to do this'),
  (3,'Have you tried `UPDATE`?'),
  (2,'Nope. What does that do?'),
  (3,'It updates a record in a SQL DB...'),
  (2,'Oh cool. I will try that.'),
  (3,'Good luck!'),
  (5,'wat up, nerds!'),
  (4,'Is there any way to link records between two tables?'),
  (6,'Of course there is'),
  (1,'I think that you are looking for a `JOIN`'),
  (4,'Ah. Thanks!'),
  (1,'No problem.');

select "
  Display messages in order";
select userid, content
  from messages;


select "
  Display messages with usernames";

select name, content
  from messages
  natural join users;

select "
  Display messages with usernames in reverse order";

select name, content
  from messages
  natural join users
  order by messages.rowid desc;

select "
  Show Dan's messages using a `where` clause";

select name, content
  from messages
  natural join users
  where name='Dan';

select "
  Show query plan (imperative program to complete query)";
explain query plan
select name, content
  from messages
  natural join users
  where name='Dan';

select "
  Create indexes to improve query plan";
create index messages_userid_idx on messages(userid);
create index users_name_idx on users(name);

select "
  Show new query plan";
explain query plan
select name, content
  from messages
  natural join users
  where name='Dan';


select "
  Change Dan's name to Dave";

update users
  set name = 'Dave'
  where name = 'Dan';

select name, content
  from messages
  natural join users;

select "
  Delete Eve's messages";

delete from messages
  where userid = (
    select userid
    from users
    where name = 'Eve'
  );

select name, content
  from messages
  natural join users;


/*
Lab Excercises

You should be able to complete these using the same techniqes
as above, but you if need a place to start to refresh you SQL
knowledge, you may want to start with an overview like this:

https://en.wikipedia.org/wiki/SQL_syntax#Queries
*/

select "
  Display messages with usernames sorted by username";

select "
  Change Alice's name to Abby";

select "
  Add a new message";

select "
  Delete Frank's messages";

/*
You have nothing to do after this point

We just show all messages here so you can easily see your changes
*/

select "
  Final message list";
select name, content
  from messages
  natural join users;

Domain Specific Languages

Make

  • The language that you’ve used to create makefiles is declarative
  • You aren’t telling make what to do to build your program
  • You are providing a list of targets and their dependencies, and make determines how to resolve the dependencies
all: main

main: main.c
  gcc main.c -o main

CSS

  • Declarative DSL used to provide style information
  • Describes what style an element should have without specifying how to apply it.
body {
  max-width: 960px;
}
h1 {
  font-size: 36px;
}
nav {
  background-color: blue;
}

Hardware Description Languages

Language used to describe the behavior of electronic circuits.

entity adder is
       port (i0, i1 : in std_logic; ci : in std_logic; s : out std_logic; co : out std_logic);
     end adder;

     architecture rtl of adder is
     begin
        s <= (i0 xor i1 xor ci) after 35 ps;
        co <= (i0 and i1) or (i0 and ci) or (i1 and ci) after 70 ps;
     end rtl;

Closing Thoughts and Review

  • Final take-home exam
  • Presentation peer reviews