1 introduction

1.1 autonomy

what is “autonomy”?

we see various examples of it…

1.1.1 what are the aspects of autonomy?

perception how do you “see” the world around you?
sensing various ways to perceive the world around you (e.g, camera, LiDar)
compute what do you “do” with the information about the world?
motion do your computations result in any “physical” changes?
actuation what “actions”, if any, do you take for said physical changes?
planning can you do some “higher order” thinking
(i.e., not just your immediate next move)

1.2 let us define autonomy

Autonomy is the ability to
perform given tasks
based on the systems perception
without human intervention

1.3 autonomous systems

cyber
physical

hence, they fall under the class of systems → cyber-physical systems

1.4 sensors and actuators…

…are everywhere!

the embedded components → interactions with the real world

1.5 sensing and actuation in the real world

consider the following example of two cars… Two cars, one behind the over, top view

the second car is approaching the first Two cars, one behind the over, top view, an arror to the left on top of the car on the right

sensors → constantly gathering data/sensing

  1. periodic sensing

on detection (of other car) → quickly compute what to do

  1. periodic sensing
  2. computation

take physical action (actuation) → say by braking in time

  1. periodic sensing
  2. computation
  3. actuation

  1. periodic sensing
  2. computation
  3. actuation

control

Remember this → on detection (of other car) → quickly compute what to do

“quickly” compute → complete computation/actuation → before a deadline

This is a real-time system.

1.5.1 Come back to sensing

Multiple sensors in an autonomous vehicle → need to combine them somehow

sensor fusion

Once we have information from the sensors (fused or otherwise)…

We need state estimation (kalman filter, ekf).

1.6 Overview/Architecture of Autonomous Systems

So far, we have (briefly) talked about…

Sensing:

Actuation:

But the system includes…an operating system (OS) in there

and it includes real-time mechanisms.

We have briefly discussed, EKF:

note: ekf is versatile; can be used for sensor fusion, slam, etc.

All of it integrates with…control:

There are some real-time functions in there…

like braking, engine control.

Question: if we design such a system…

is it “autonomous”?

We are missing some “higher order” functionss from the perspective of the autonomous system:

  • where am I?
  • where do I need to go?
  • how do I get there?
  • what obstacles may I face?
  • how do I avoid them?

let us not forget the most important question of all…

why is gamora?

1.6.1 high-order functions

In order to answer the following, we need additional functionality. Let us go through what that might be.

1.6.2 slam

Simultaneous localization and mapping → figure out where we are.

1.6.3 waypoint detection

Understand how to move in the right direction at the micro level, i.e., find waypoints.

1.6.4 yolo

Is it “you only live once”? Actually this stands for: “you only look once”. It is an object detection model that uses convolutional neural networks (cnns)

1.6.5 object avoidance

The objective is to avoid objects in the immediate path.

1.6.6 path planning

i.e., how to get to destination at the macro level → uses waypoints.

1.6.7 compute platform

To run all of these functions, we need low power, embedded platforms.

1.6.8 still some non-functional requirements remain

any guesses what they could be?

1.6.9 safety!

Essentially safety of → operator, other people, the vehicle, environment This is cross-cutting issue → affected by all parts of system.

1.6.10 security

Security is another cross-cutting issue → can affect all components.

1.6.11 Course Structure

Hence this figure is a (loose) map of this course:

2 Embedded Architectures

Just like “autonomy” describing and “embedded system” is hard. What (typically) distinguishes it from other types of computer systems (e.g., laptops, servers or GPUs even) is that such systems are typically created for specific functionality and often remain fixed and operational for years, decades even.

Embedded systems often trade off between performance and other considerations such as power (or battery life), less memory, fewer peripherals, limited applications, smaller operating system (OS) and so on. There are numerous reasons for this – chief among them is predictability – designers need to guarantee that the system works correctly, and remains safe, all the time. Hence, it must be easy to certify 1 the entire system. This process ensures that the system operates safely.

2.1 The wcet problem

One piece of information that is required to ensure predictability and guarentee safety is worst-case execution time (WCET). The WCET/BCET is the longest/shortest execution time possible for a program, on a specific hardware platform – and it has to consider all possible inputs. WCET is necessary to ensure the “schedulability”, resource requirements and performance limits of embedded and real-time programs. There are lots of approaches to computing the WCET, e.g.,

  • dynamic/empirical analysis → run the program lots of times (thousands, millions?) on the platform and measure it
  • static analysis → analyze the program at compile time to compute the worst-case paths through the program
  • hybrid → a combination of the two
  • probabilistic → a combination of dynamic analysis+statistical methods
  • ML-based methods → applying machine-learning to the problem

At a high-level, the execution time distributions of applications look like:

WCET analysis is a very active area of research and hundreds of papers have been written about it, since it directly affects the safety of many critical systems (aircraft, power systems, nuclear reactors, space vehicles and…autonomous systems).

There are structural challenges (both in software and hardware) that prevent the computation of proper wcet for anything but trivial examples. For instance, consider,

void main()
{
    int max = 10 ;
    int sum = 0;
    for( int i = 0 ; i < max ; ++i)
        sum += i ;
}

How do you compute the WCET for this code? Say running on some known processor, P?

Well, there’s some information we need,

  • how long each instruction takes to execute on P
  • how many loop iterations?
  • what is the startup/cleanup times for the program on P?

Let’s assume (from the manual for P), we get the following information,

1   void main()         // startup cost = 100 cycles
2   {
3       int max = 15 ;  // 10 cycles
4       int sum = 0;    // 10 cycles 
5       for( int i = 0 ; i < max ; ++i) // 5 cycles, once
6            sum += i ; // 20 cycles each iteration
7   }                   // cleanup cost = 120 cycles

So, based on this, we can calculate the total time to execute this program:

wcet = startup_cost + line_3 + line_4 + loop_startup_cost + (line_6 * max)  [1]

wcet = 100 + 10 + 10 + 5 + (20 * 15)

wcet = 425 cycles

Now consider this slight change to the above code:

void main( int argc, char* argv[] )
{
    int max = atoi( argv[1] ) ;     // convert the command line arg to max
    int sum = 0; 
    for( int i = 0 ; i < max ; ++i) // how many iterations?
        sum += i ;
}

The problem is that equation [1] above fails since we no longer know the value of max. Hence the program can run for any arbitrary amount of time, depending on the given input! Note that none of the aforemention wcet methods will help in this case since the input can be completely arbitrary. Hence, the structure of the software code can affect wcet calculations.

Another problem is that of hardware (and interactions between hardware and software). Now consider if we modify the original code as,


#define VERY_LARGE_ARRAY+SIZE 1>>18

void main()
{
    int first_array[VERY_LARGE_ARRAY_SIZE] ;
    int second_array[VERY_LARGE_ARRAY_SIZE] ;

    int sum_first = 0;
    int sum_second = 0;
    for( int i = 0 ; i < VERY_LARGE_ARRAY_SIZE * 2 ; ++i)
    {
        if( i%2 )
            first_sum += first_array[i/2] ;
        else
            second_sum += second_array[(int)((i/2)+1)] ;
    }
}

Now, while we can compute, using equation [1] the wcet from the code perspective (since we know the loop runs for VERY_LARGE_ARRAY_SIZE * 2 iterations), there will be significant non-obvious hardware issues, in the cache. Each iteration is accessing a different large array. Hence, it will load the cache with lines from that array and in the very next iteration the other array will be loaded, also missing in the cache. For instance,

iteration operation cache state reason
1 first_array loaded miss evicts whatever was previously in cache
2 second_array loaded miss evicts first_array due to lack of space
3 first_array loaded again miss evicts second_array due to lack of space

Hence, this program will constantly sufffer cache misses and since caches misses (and reloads) are expensive (in terms of time), the loop’s execution time will balloon out of control! Hence, even though we fixed the code issue (upper bound on number of iterations, hardware artifacts can change the wcet calculations). So now, we need to model cache behavior for each program and data variable! This is notoriously complicated even for the simplest of programs.

Other hardware designs further complicate matters, e.g.,

  • processor pipelining
  • prefetching
  • branch prediction
  • multithreading
  • multicore systems
  • memory buses
  • networks-on-chip
  • and too many others to recount here…

Any contemporary processor design that improves performance, turns out to be bad for wcet analysis. So, the fewer (or simpler versions of) these features, the better it is for the (eventual) safety and certification of the system.

This is one of the main reasons why embedded (and especially real-time) systems prefer simpler processors (simple pipelines, fewer complex features, simpler memory/cache architectures, if any) since they’re easier to analyze. In fact, many critical systems (e.g., aircraft, cars, etc.) use older processors (often designed in the 1980s and 1990s) – even the ones beind design today!

2.2 Embedded Processors

Just as embedded systems are varied, embedded processors come in a myriad of shapes and sizes as well. From the very small and simple (e.g., DSPs) to the very large and complex (modern multicore chips, some with GPUs!). Here is a (non-exhaustive) list of the types of embedded processors/architectures in use today:

  1. Microcontrollers
  2. Digital Signal Processors (DSPs)
  3. Microprocessors of various designs and architectures (e.g., ARM, x86)
  4. System-on-a-Chip (SoC)
  5. Embedded accelerators
  6. ASICs and FPGAs

2.2.1 Microcontrollers

According to Wikipedia,

“A microcontroller (MC, UC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit.”

These may be among the most common type of “processors” used in embedded systems. According to many studies, more than 55% of the world’s processors are microntrollers! Microcontrollers are typically used in small, yet critical, systems such as car engine control, implantable medical devices, thermal monitoring, fault detection and classification among millions of other applications.

Microcontrollers hardware features typically include,

component details
one (sometimes more) CPU cores typically simple 4 or 8 bit chips
small pipelined architectues sometimes 2 or 4 stage pipelines
some limited memory typically a few hundred kilobytes, perhaps in the form of EEPROMs or FLASH
some programmable I/O to interact with the real world
low operating frequencies e.g., 4 KHz; simpler/older processors, yet more predictable
low power consumption in the milliwatts or microwatts ranges; might even be nanowatts when the system is sleeping
interrupts (some programmable) often real-time (ficed/low latency)
several general-purpose I/O (GPIO) pins for I/O
timers e.g., a programmable interval timer (PIT)


There are some additional features found on some microcontrollers, viz.,

component details
analog to digital (ADC) signal convertors to convert incoming (real-world, sensor) data to a digital form that the uC can operate on
digital-to-analog (DAC) convertor to do the opposite, convert from digital to analog signals to send outputs in that form
universal asynchronous transmitter/receiver (UART) to receive/send data over a serial line
pulse width modulation (PWM) so that the CPU can control motors (significant for us in autonomous/automotive systems), power systems, resistive loads, etc.
JTAG interace debugging interface


Some examples of popular microcontroller families:


Atmel ATmega

Microchip Technology

Motorola (Freescale)

NXP

Microcontroller programs and data,

  • are small –> must fit in memory (since very little expandable memory exists)
  • often directly programmed in assembly!
    • sometimes the assembly code might need hand tuning –> for both, performance as well as fitting into the limited memory
  • C is another popular language
  • no operating systems (or very rare)!
  • sometimes have their own special-purpose programming languages or instructions

2.2.2 Digital Signal Processors (DSPs)

DSPs are specialized microcontrollers optimized for digital signal processing. They find wide use in audio processing, radar and sonar, speech recognition systems, image processing, satellites, telecommunications, mobile phones, televisions, etc. Their main goals are to isoloate, measure, compress and filter analog signals in the real world. They often have stringent real-time constraints.

The Texas Instruments DSP chip, TMS320 Series is one of the most famous example of this type of system:

Typical digital signal processing (of any kind) requires repetitive mathematical operations over a large number of samples, in real-time, viz., - analog to digital conversion - maniupulation (the core algorithm) - digital to analog conversion

Often, the entire process must be completed with low latency, even within a fixed deadline. They also have low power requirements since DSPs are often used in battery-constrained devices such as mobile phones. Hence, the proliferation of specialized DSP chips (instead of pure software implementations, which also exist; MATLAB has an entire DSP System Toolbox).

Typical DSP architecture/flow (credit: Wikipedia):

These types of chips typically have custom instructions for optimizing certain (mathematical) operations (apart from the typical add, subtract, multiply and divide), e.g., - saturate; caps the minimum or maximum value that can be held in a fixed-point representation - ed ; euclidian distance - accumulate instructions ; for multiply-and-accumulate operations, i.e., a ← a + (b * c)

See the Microchip instruction set details for more information for a typical DSP ISA.

DSPs require optimization of streaming data and hence, - require optimized memories and caches → fetch multiple data elements at the same time - code may need to be aware of, and explicitly manipulate caches - may have rudimentary OS but no virtual memory

2.2.3 Microprocessors

Microprocessors are, then, general-purpose chips (as opposed to microcontrollers and DSPs) that are also used extensively in embedded systems. They are used in systems that need more heavy duty computing/memory and/or more flexibility in terms of programming and management of the system. They use a number of commodity processor architectures (e.g,, ARM, Intel x86).

Main features of microprocessors:

component details
cores single or multicore; powerful
pipelines more complex pipelines; better performance, harder to analyze (e.g., wcet)
clock speeds higher clock speeds; 100s of khz, or even GHz
ISA common ISA; well understood, not custom
memory significant memory; megabytes, even gigabytes
cache hierarchies multiple levels, optimized
power consumption much higher, but can be reduced (e.g., via voltage and frequency scaling)
size, cost often higher
interrupts, timers more varied, easily programmable
I/O more interfaces, including commodity ones like USB
security often includes additional hardware security features, e.g., ARM TrustZone.

The ARM M-85 Embedded Microprocessor architecture:

When compared to microcontrollers (or even SoCs), most microprpcessors do not include components such as DSPs, ADCs, DACs, etc. It is possible to augment the microprocessor to include this functionality → usually by connecting one or more microcontrollers to it!

On the software side, microprocessors typically have the most flexibility:

  • general purpose operating systems (e.g., Linux, Android, Windows, UNIX, etc.)
  • most programming languages and infrastructures (even Docker!)
  • large number of tooling, analysis, debugging capabilities
  • complex code can run, but increases analysis difficulty

Due to their power (and cost) these types of systems are only used when really necessary or in higher-end systems such as mobile phones and autonomous cars.

2.2.4 System-on-a-Chip (SoC)

An SoC integrates most components in and around a processor into a single circuit, viz.,

  • processor/chip → could be a microcontroller or even a microprocessor
  • memory and memory interfaces
  • I/O devices
  • buses (memory and I/O)
  • storage (e.g., flash) and sometimes even secondary storage
  • radio modems
  • (sometimes) accelerators such as GPUs

All of these are placed on a single substrate.

SoCs are often designed in C++, MATLAB, SystemC, etc. Once the hardware architectures are defined, additional hardware elements are written in hardware description languages, e.g., register transfer levels (RTL) 2.

Additional components could include,

  • DAC
  • ADC
  • radio and signal processing
  • wireless modems
  • programmable logic.
  • networks on chip (NoC) 3

In some sense, an SoC is an integration of a processor with peripherals. New hardware elements

Some examples of modern SoCs:

Broadcom Soc from Raspberry Pi

Apple M1 SoC

The integration of all hardware components has some interesting side-effects:

effect benefit problems
tight integration better performance, fewer latencies cannot replace individual components
custom code/firmware better use of hardware not reusable in other systems
custom software libraries easier programming of SoC reduces code reusability in other systems
low power consumption better battery life, less heat (potentially) slower

Depending on the processor/microcontroller that sits at the center of the SoC, the software stack/capabilities can vary. Many commons SoCs exhibit the following software properties:

  • usually use contemporary operating systems, though optimized for embedded/SoC systems → e.g., Raspbian aka Rasberry Pi OS. Hence, they can handle multiprocessing, virtual memory, different scheduling policies, etc.
  • can be programmed using most common programming languages → C, C++, python, java, even lisp!

The Raspberry Pi is a common example of a system that uses a Broadcom BCM series of SoCs. We use the BCM2711 SoC in our course for the Raspberry Pi 4-B.

2.2.5 Embedded Accelarators (e.g. GPU-enabled systems)

There are hardware platforms that include accelerators in embedded systems, e.g., GPUs, AI-enabled silicon, extra programmable FPGA fabric, security features, etc. The main idea is that certain computation can be offloaded to these accelerators while the main CPU continues to process other code/requests. The accelerators are specialized for certain computations (e.g., parallel matrix multiplications on GPUs, AES encryption). Some chips include FPGA fabric where the designer/user can implement their own custom logic/accelerators.

In a loose sense, the Navio2 can be considered as a hardware coprocessor for the Raspbery Pi.

The NVidia Jetson Orin is a good example of an AI/GPU focussed embedded processor:


This system’s specifications:

  • 1300 MHz clock speeds
  • 64 GB Memory
  • 256 bit memory bus
  • 204 GB/s bandwidth
  • supports a variety of graphics features (DirectX, OpenGL, OpenCL, CUDA, Vulkan and Shader Models )
  • maximum of 60W power
  • 275 trillion operations/s (TOPS)!

These systems are finding a lot of use in autonomous systems since they pack so much processing power into such a small form factor

2.2.6 ASICs and FPGAs

Application-specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs). These platforms combine the advantages of both, hardware (speed) and software (flexibility/programmability). They are similar, yet different. Both are semiconductor devices that include programmable logic gates but an ASIC is static – i.e., once the board has been “programmed” it cannot be changed while an FPGA, as the name implies, allows for “reprogramming”.

ASICs are custom-designed for specific applications and provide high efficiency and performance. FPGAs are reprogramamble devices that provide significant flexibility. Many designers also used it for prototyping hardware components (before they are eventually included either in the processors or custom ASICs). The choice between ASICs and FPGAs depends entirely on the application requirements and other factors such as cost.

An ASIC Xilinx Spartan FPGA

2.2.6.1 ASICs

These are specialized semiconductor devices – to implement a custom function, e.g., cryptocurrency mining, nuclear reactor control, televisions. ASICs are tailored to their specific applications. Once created, it cannot be reprogrammed or modified. ASICs are created using a process known as photolithography, a method to prepare nanoparticles, that allows components to be “etched” on to a silicon wafer.

The ASIC design process, while expensive and time consuming, becomes valuable for high-volume products as the per-unit cost decrease when production nunbers increase.

advantages disadvantages
high performance lack of flexibility
low power consumption high initial costs
small form factor long development time
ip protection obsolescence risk
good for mass production risks with manufacturing yields
can integrate multiple functions design complexity

2.2.6.2 FPGAs

These are also semiconductor devices but they can be preprogrammed to implement various circuits and functions. Designers can change the functionality after the curcuits have been embossed onto the hardware. Hence, they’re good for systems that might require changes at design time and rapid prototyping. An FPGA is a collection of programmable logic and interconnects. They include lookup tables (LUTs) and other parts that can be used to develop multiple, fairly wide-ranging, functions. The programmable blocks can be connected to each other via the interconnects. Some FPGAs even come with additional flash memory.

FPGAs are programmed using hardware description languages such as Verilog/VHDL.

advantages disadvantages
flexibility lower performance
shorter development time higher power consumption
upgradability high design complexity
lower (initial) costs higher per-unit costs
better processing capabilities design complexity
lower obsolescence risks larger form factor

2.3 Communication and I/O

Embedded systems need to communicate and/or interface with various elements:

  • the physical world via sensors and actuators
  • computers for programming (of the embedded system) or for data transfer
  • with other embedded systems/nodes
  • handheld devices
  • with the internet (either public or to access back end servers)
  • satellites?

Hence a large number of communication standards and I/O interfaces have been developed over the years. Let’s look at a few of them:

  1. serial (UART) → e.g., RS 232
  2. synchronous → I2C, SPI
  3. general-purpose I/O → GPIO
  4. debugging interface → JTAG
  5. embedded internal communication → CAN
  6. other broadly used protocols → USB, Ethernet/WiFi, Radio, Bluetooth

2.3.1 UART | RS-232

Serial communication standards are used extensively across many domains, mainly due to their simplicity and low hardware overheads. The most common among these are the asynchronous serial communication systems.

From Wikipedia:

Asynchronous serial communication is a form of serial communication in which the communicating endpoints’ interfaces are not continuously synchronized by a common clock signal. Instead of a common synchronization signal, the data stream contains synchronization information in form of start and stop signals, before and after each unit of transmission, respectively. The start signal prepares the receiver for arrival of data and the stop signal resets its state to enable triggering of a new sequence.

The following figure shows a communication sample that demonstrates these principles:

We see that each byte has a start bit, stop bit and eight data bits. The last bit is often used as a parity bit. All of these “standards” (i.e., the start/stop/parity bits) must be agreed upon ahead of time.

A universal asynchronous receiver-transmitter (UART) then is a peripheral device for such asynchronous commnication; the data format and transmission speeds are configurable. It sends data bits one-by-one (from least significant to most). The precise timing is handlded by the communication channel.

The electric signalling levels are handled by an external driver circuit. Common signal levels:

Here we will focus on the RS-232 standard since it is most widely used UART signaling level standard today. The full name of the standard is: “EIA/TIA-232-E Interface Between Data Terminal Equipment and Data Circuit-Termination Equipment Employing Serial Binary Data Interchange” (“EIA/TIA” stands for the Electronic Industry Association and the Telecommunications Industry Association). It was introduced in 1962 and has since been updated four times to meet evolving needs.

The RS-232 is a complete standard in that it specifies,

  • (common) voltage and signal levels
  • (common) pin and wiring configurations
  • (minimal) control information between host/peripherals

The RS-232 specifies the electrical, functional and mechanical characteristics to meet all of the above criteria.

For instance, the electrical characteristics are defined in the following figure:

Details:

  • high level [logical 0] (aka “marking”) → +5V to +15V (realistically +3V to +15V)
  • low level [logical 1] (aka “spacing”) → -5V to -15V (realistically -3V to -15V)

Other properties also defined, e.g., “slew rate”, impedance, capacitive loads, etc.

The standard also defines the mechanical interfaces, i.e., the pin connector:

While the official standard calls for a 25-pin connector, it is rarely used. Instead, the 9-pin connector (shown on the right in the above figure) is in common use.

You can read more details about the standard here: RS 232

2.3.2 Synchronous | I2C and SPI

Synchronous Serial Interfaces (SSIs) are a widely used in industrial applications between a master device (e.g. controller) and a slave device (e.g. sensor). It is based on the RS-422 standards and has a high protocol efficiency as well multiple hardware implementations.

SSI properties:

  • differential signalling
  • simplex (i.e., unidirectional communication only)
  • non-multiplexed
  • point-to-point and
  • uses time-outs to frame the data.

2.3.2.1 I2C

The Inter-Integrated Circuit (I2C, IIC, I2C) is a synchronous, multi-controller/multi-target (historically termed as multi-master/multi-slave), single-ended, serial communication bus. I2C systems are used for attaching low-power integrated circuits to processors and microcontrollers – usually for short distance or intra-board communication.

I2C components are found in a wide variety of products, e.g.,

  • EEPROMs
  • VGA/DVI/HDMI connectors
  • NVRAM chips
  • real-time clocks
  • reading hardware monitors and sensors
  • controlling actuators
  • DAC/ADC
  • controlling LCD/OLEDs displays
  • changing computer display settings (contrast, brightness, etc.)
  • controlling speaker volume
  • and many many more

The main advantage of I2C is that a microcontroller can control a network of chips with just two general-purpose I/O pins (serial data line and a serial clock line) and software. A controller device can communicate with any target device through a unique I2C address sent through the serial data line. Hence the two signals are:

line voltage description
serial data line (SDL) +5V transmit data to or from target devices
serial clock line (SCL) +3V synchronously clock data in or out of the target device

Both are bidirectional and pulled up with resistors.

Here is a typical implementation of I2C:

An I2C chip example (used for controlling certain TV signals):

I2C is half-duplex communication where only a single controller or a target device is sending data on the bus at a time. In comparison, the serial peripheral interface (SPI) is a full-duplex protocol where data can be sent to and received back at the same time. An I2C controller device starts and stops communication, which removes the potential problem of bus contention. Communication with a target device is sent through a unique address on the bus. This allows for both multiple controllers and multiple target devices on the I2C bus.

I2C communication details (initiated from the controller device):

condition description
I2C START the controller device first pulls the SDA low and then pulls the SCL low
I2C STOP the SCL releases high and then SDA releases high


I2C communication is split into: frames. Communciation starts when one controller sends an address frame after a START. This is followed by one or more data frames, each consisting of one byte. Each frame also has an acknowledgement bit. An example of two I2C communication frames:


You can read more at: I2C.

2.3.2.2 SPI

The Serial Peripheral Interface (SPI) has become the de facto standard for synchronous serial communication. It is used in embedded systems, especially between microcontrollers and peripheral ICs such as sensors, ADCs, DACs, shift registers, SRAM, etc.

The main aspect of SPI is that one main device orchestrates communication with one ore more sub/peripheral devices by driving the clock and chip select signals.

SPI interface properties:

  • synchronous
  • full duplex
  • main-subnode (formerly called “master-slave”)
  • data from the main or the subnode is synchronized on the rising or falling clock edge
  • main and subnode can transmit data at the same time
  • interface can be 3 or 4-wire (4 wire version is more popular)
microchip SPI basic SPI Interface

The SPI interface contains the following wires:

signal description function
SCLK serial clock clock signal from main
CS chip/serial select To select which host to communicate with
MOSI main out, subnode In serial data out (SDO) for host to target communication
MISO main in, subnode Out serial data in (SDI) for target to host communication

The main node generates the clock signal. Data transmissions between main ahd sub nodes is synchronized by that clock signal generated by main. SPI devices support much higher clock frequencies than I2C. The CS signal is used to select the subnode. Note that this is an active low signal, i.e., a low (0) is a selection and a high (1) is a disconnect. SPI is a full-duplex interface; both main and subnode can send data at the same time via the MOSI and MISO lines respectively. During SPI communication, the data is simultaneously transmitted (shifted out serially onto the MOSI/SDO bus) and received (the data on the bus (MISO/SDI) is sampled or read in).

Example: the following example demonstrates the significant savings and simplification in systems design (reduce the number of GPIO pins required).

Consider the ADG1412 switch being managed by a microcontroller as follows:

Now, as the number of switches increases, the requirement on GPIO pins also increases significantly. A 4x4 configuration requires 16 GPI pins, thus reducing the number of pins available for the microcontroller for other tasks, as follows:

One approach to reduce the number of pins would be to use a serial-to-parallel convertor:

This reduces the pressure on the number of GPIO pins but still introduces additional circuitry.

Using an SPI-enabled microcontroller reduces the number of GPIOs required and and eliminates the overheads of the needing additional chips (serial-to-paralle convertor):

In fact, using a different SPI configuration (“daisy-chain”), we can optimize the GPIO count even further!

You can read more about SPI here.

2.3.3 General-Purpose I/O (GPIO)

A GPIO is a signal pin on an integrated circuit or board that can be used to perform digital I/O operations. By design, it has no predefined purpose → can be used by hardware/software developers to perform functions they choose, e.g.,

  • GPIO pins can be enabled or disabled.
  • GPIO pins can be configured to be input or output.
  • input values are readable, often with a 1 representing a high voltage, and a 0 representing a low voltage.
  • input GPIO pins can be used as “interrupt” lines, which allow a peripheral board connected via multiple pins to signal to the primary embedded board that it requires attention.
  • output pin values are both readable and writable.

GPIOs can be implemented in a variety of ways,

  • as a primary function of the microcontrollers, e.g., Intel 8255
  • as an accessory to the chip

While microcontrollers may use GPIOs are their primary external interface, many a time the pins may be capable of other functions as well. In such instances, it may be necessary to configure the pins using other functions.

Some examples of chips with GPIO pins:

Intel 8255 PIC microchip ASUS Tinker
24 GPIO pins 29 GPIO pins 28 GPIO pins

GPIOs are used in a diverse variety of applications, limited only by the electrical and timing specifications of the GPIO interface and the ability of software to interact with GPIOs in a sufficiently timely manner.

Some “properties”/applications of GPIOs:

  • GPIOs use standard logic levels and cannot supply significant current to output loads
  • high-current output buffers or relays can be used to control high-power devices
  • input buffers, relays, or opto-isolators translate incompatible signals to GPIO logic levels
  • GPIOs can control or monitor other circuitry on a board, such as enabling/disabling circuits, reading switch states, and driving LEDs
  • multiple GPIOs can implement bit banging communication interfaces like I²C or SPI
  • GPIOs can control analog processes via PWM, adjusting motor speed, light intensity, or temperature
  • PWM signals from GPIOs can be converted to analog control voltages using RC filters

GPIO interfaces vary widely. Most commonly, they’re simple groups of pins that can switch between input/output. On the other hand, each pin can be set up differently → set up/accept/source different voltages/drive strengths/pull ups and downs.

Programming the GPIO:

  • usually pin states are exposed via different interfaces, e.g., memory-mapped I/O peripherals or dedicated I/O port instructions
  • input values can be used as interrupts (IRQs)

For more information on programming/using GPIOs, read these: GPIO setup and use, Python scripting the GPIO in Raspberry Pis, general purpose I/O, GPIO setup in Raspberry Pi.

2.3.4 JTAG Debugging Interface

The JTAG standard (named after the “Joint Test Action Group”), technically the IEEE Std 1149.1-1990 IEEE Standard Test Access Port and Boundary-Scan Architecture, is an industry standard for testing and verification of printed circuit boards, after manufacture.

“JTAG”, depending on the context, could stand for one or more of the following:

  • implementation of IEEE 1149.x for Board Test, or Boundary Scan testing
  • appliance used to program on board flash or eeprom devices on a circuit board
  • hardware device used to debug microprocessor software
  • hardware device used to test a board using Boundary Scan

The basic building block of a JTAG OCD is the Test Access Point or TAP controller. This allows access to all the custom features within a specific processor, and must support a minimum set of commands. On-chip debugging is a combination of hardware and software.

type description
hardaware on chip debug (OCD)
software in-circuit-emulator (ICE)/JTAG emulator

The off-chip parts are actually PC peripherals that need corresponding drivers running on a separate computer. On most systems, JTAG-based debugging is available from the very first instruction after CPU reset, letting it assist with development of early boot software which runs before anything is set up. The JTAG emulator allows developers to access the embedded system at the machine code level if needed! Many silicon architectures (Intel, ARM, PowerPC, etc.) have built entire infrastructures and extensions around JTAG.

A high-level overview of the JTAG architecture/use:


JTAG now allows for,

  • processors can not be halted, single-stepped or run freely
  • can set code breakpoints for both, code in RAM as well as ROM/flash
  • data breakpoints are available
  • bulk data download to RAM
  • access to registers and buses, even without halting the processors!
  • complex logic routines, e.g., ignore the first seven accesses to a register from one particular subroutine

JTAG allows for device programmer hardware allows for transfering data into internal, non-volatile memory of the system! Hence, we can use JTAGs to program devices such as FPGAs. In fact, many memory chips also have JTAG interfaces. Some modern chips also allow access to the the (internal and external) data buses via JTAG.

JTAG interface: depending on the actual interface, JTAG has 2/4/5 pins. The 4/5 pin versions are designed so that multiple chips on a board can have their JTAG lines daisy-chained together if specific conditions are met.

Schematic Diagram of a JTAG enabled device:

The various pins signals in the JTAG TAP are:

signal description
TCK synchronizes the internal state machine operations
TMS sampled at the rising edge of TCK to determine the next state
TDI data shifted into the device’s test or programming logic; sampled at the rising edge of TCK when the internal state machine is in the correct state
TDO represents the data shifted out of the device’s test or programming logic and is valid on the falling edge of TCK when the internal state machine is in the correct state
TRST optional pin which, when available, can reset the tap controller’s state machine

The TAP controller implements the following state machine:


To use the JTAG interface,

  • host is connected to the target’s JTAG signals (TMS, TCK, TDI, TDO, etc.) through some kind of JTAG adapter
  • adapter connects to the host using some interface such as USB, PCI, Ethernet, etc.
  • host communicates with the TAPs by manipulating TMS and TDI in conjunction with TCK
  • host reads results through TDO (which is the only standard host-side input)
  • TMS/TDI/TCK output transitions create the basic JTAG communication primitive on which higher layer protocols build


For more information about JTAG, read: Intel JTAG Overview, Raspberry Pi JTAG programming, Technical Guide to JTAG and the JTAG Wikipedia Entry is quite detailed.

2.3.5 Controller Area Network (CAN)

CAN is a vehicle bus standard to enable efficient communication between electronic control units (ECUs). CAN is,

  • broadcast-based
  • message-oriented
  • uses arbitration → for data integrity/prioritization

CAN does not need a a host controller. ECUs connected via the CAN bus can easily share information with each other. all ECUs are connected on a two-wire bus consisting of a twisted pair: CAN high and CAN low. The wires are often color coded:

CAN high yellow
CAN low green
CAN wiring multi-ecu CAN setup


An ECU in a vehicle consists of:

components internal architecture
  • microcontroller to interpret/send out CAN messages
  • CAN controller ensures all communication adheres to CAN protocols
  • CAN transceiver connects CAN controller to the physical wires

Any ECU can broadcast on the CAN bus and the messages are accepted by all ECUs connected to it. Each ECU can either choose to ignore the message or act on it.

what are the implications for security?

While there is no “standard” CAN connector (each vehicle may use different ones), the CAN Bus DB9 connector has become the de facto standard:

The above figure shows the various pins and their signals.


CAN Communication Protocols: CAN is split into:

layer relation to OSI stack
  • data link: CAN frame formats,
    error handling, data transmission,
    data integrity
  • physical: cable types,
    electrical signal levels,
    node requirements,
    cable impedance, etc.


All communication over the CAN bus is done via the CAN frames. The standard CAN frame (with an 11-bit identifier) is shown below:


While the lower-level CAN protocols described so far work on the two lowest layers of the OSI networking stack, it is still limiting. For instance, the CAN standard doesn’t discuss how to,

  • decode RAW data
  • handle larger data (more than 8 bytes)

Hence, some higher-order protocols have been developed, viz.,

protocol description
OBD2 on-board diagnostics in cars/trucks for diagnostics, maintenance, emissions tests
UDS Unified Diagnostic Services (UDS) used in automotive ECUs for diagnostics, firmware updates, routine testing
CCP/XCP used in embedded control/industrial automation for off-the-shelf interoperability between CAN devices
SAE J1939 for heavy-duty vehicles
NMEA 2000  used in maritime industry for connecting e.g. engines, instruments, sensors on boats
ISOBUS used in agriculture and forestry machinery to enable plug and play integration between vehicles/implements, across brands

There also exist other higher-order protocols (numbering in the thousands) the most prominent of which are: ARINC, UAVCAN, DeviceNet, SafetyBUS p, MilCAN, HVAC CAN.


More details about CAN and its variants: CAN Bus Explained.

2.3.6 Other Broadly Used Protocols

Autonomous (and other embedded systems) use a variety of other communication protocols in order to interface with the external world and/or other systems (either other nodes in the system or external components such as back end clouds).

Note that since many of these are well known and publicly documented, we won’t elaborate much here.

Here are some of the well known communication protocols, also used in embedded systems:

protocol links
USB How USB works: part 1, part2, part 3; USB in a Nutshell (very detailed).
Ethernet Reliable Embedded Ethernet, Embedded Ethernet and Internet (book, online)
WiFi WiFi Sensing on the Edge (paper)
Bluetooth Bluetooth Basics, Bluetooth Low Energy
Radio Embedded Development with GNU Radio

2.4 Raspberry Pi and Navio2

Let us look at the two architectures we use extensively in this course:

The high-level architecture of the Pi shows many of the components we have discussed so far:

In particular, the Pi has,

component description/details
processor Broadcomm BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC at 1.8GHz
memory 1GB, 2GB, 4GB or 8GB LPDDR4-3200 SDRAM
network Wifi (2.4/5.0 GHz), Gigabit ethernet, Bluetooth/BLE
I/O 40 pin GPIO, USB 3.0/2.0/C
storage Micro-SD Card
misc micro-hdmi, stereo audio/video, displayport, camera port, power
os Raspberry Pi OS (formerly called Raspbian)


Read more about the Raspberry Pi: Raspberry PI – A Look Under the Hood


The Navio2 is a “hat” that adds the following to a Raspberry Pi:

  • autopilot functionality
  • multiple sensors

The high-level architecture,

As the figure shows, the Navio2 adds the following components:

component description/details
GNSS receiver for GPS signals
high-precision barometer for measuring pressure (and altitude)
(dual) IMU two 9 DOF with gyroscope, accelerometer, magnetometer, each
RC I/O co-processor PWM, ADC, SBUS, PPM
extension ports ADC, I2C, UART
power supply triple redundant


More details about the Navio2 and how to program it: Navio2 Documentation.



2.5 References

3 Sensors and Sensing

An embedded/autonomous system perceives the physical world via sensors – either to gather information about its environment or to model its own state. Hence it is a critical component in the sensing → planning → actuation loop and a critical component in the design of embedded and autonomous systems.

Modern autonomous systems used a wide array of sensors. This is necessary due to:

  • there is a need to measure different quantities, e.g., GPS, velocity, objects, etc.
  • sensor measurements often have errors → hence, we need multiple sensors, often using different physical properties to measure the same thing; e.g., LiDar and cameras can both be used to detect objects in front of, and around, an autonomous vehicle.

At its core,

a sensor captures a physical/chemical/environmental quantity and converts it to a digital quantity.

(hence the need for an Analog-to-Digital Convertor (ADC) as we shall see later)

By definition, sensors generate signals. A signal, s, is defined as a mapping from the time domain to a value domain:

s : Dt ↦ Dv

where,

symbol description
Dt continuous or discrete time domain
Dv continuous or discrete value domain

Note: remember that computers require discrete sequences of physical values. Hence, we need to convert the above into the discrete domain. The way to achieve this: sampling:

The figure shows a continuous signal being sampled (in red arrows). We will discuss sampling and related issues later in this topic.

3.1 Types of Sensors

Sensors come in various shapes and sizes. Usually designers of autonomous systems will develop a “sensor plan that will consider,

  • required functionality
  • sensor range(s)
  • cost

Hence, each autonomous system will likely have its own set of sensors (or sensor plan). Typical sensors found on modern autonomous systems can be classified based on the underlying physics used:

physical property sensor
internal measurements IMU
external measurements GPS
“bouncing” electromagnetic waves LiDAR, RADAR, mmWave Radar
optical cameras, infrared sensors
accoustic ultrasonic sensors

Some of the above can be combined to generate other sensing patterns, e.g., stereo vision using multiple cameras or camera+LiDAR.

We will go over some of these sensors and their underlying physical principles.

3.1.1 Inertial Measurement Units (IMU)

These sensors define the movement of a vehicle, along the three axes, in addition to other behaviors like acceleration and directionality. An IMU typically includes the following sensors:

As we see from the first picture above, an IMU also has a CPU (typically a microcontroller) to manage/collect/process the data from the sensors.

The functions of the three sensors are:

  1. gyroscope: is an inertial sensor that measure an object’s angular rate with respect to an inertial reference frame. It measures the following movements:
“yaw” “pitch” “roll”

IMUs come in all shapes and sizes. These days they’re very small but the original IMU’s ver really large, as evidenced by the one used in the Apollo space missions:


  1. accelerometer: is the primary sensor responsible for measuring inertial acceleration, or the change in velocity over time.

  2. magnetometer: measures the strength and direction of magnetic field – to find the magnetic north

3.1.2 Bouncing of Electromagnetic Waves | LiDAR and mmWave

A very common principle for measuring surroundings is to bounce electromagnetic waves off nearby objects and measuring the round trip times. Shorter times indicate closer objects while longer times indicate objects that are farther away. RADAR is a classic example of this type of sensor and its (basic) operation is shown in the following image (courtesy NOAA):

While many autonomous vehicles use RADAR, we will focus on other technologies that are more prevalent and provide much higher precision, viz.,

  1. LiDAR
  2. millimeter Wave RADAR (mmWave)

3.1.2.1 Light Detection and Ranging (LiDAR)

LiDAR is a sensor that uses (eye safe) laser beams for mapping surroundings and creating 3D representation of the environment. So lasers are used for,

  • imaging
  • detection
  • ranging

We can use LiDAR to distance, angle as well as the radial velocity of some objects – all relative to the autonomous system (rather the sensor). So, in practice, this is how it operates:

We define a roundtrip time, $ au$, as the time between when a pulse is sent out from the transmitter (TX) to when light reflected from the object is detected at the receiver (RX).

So, the target range (i.e., the distance to te object), R, is measured as:

R = raccau2

where, c is the speed of light.

More details (from Mahalati): > Lasers used in lidars have frequencies in the 100s of Terahetrz. Compared to RF waves, lasers have significantly smaller wavelengths and can hence be easily collected into narrow beams using lenses. This makes DOA estimation almost trivial in lidar and gives it significantly better reso- lution than MIMO imaging radar.

The end product of LiDAR is essentially a point cloud, defined as:

a collection of points generated by a sensor. Such collections can be very dense and contain billions of points, which enables the creation of highly detailed 3D representations of an area.

In reality, point cloud representations around autonomous vehicles end up looking like:

Point clouds provide valuable information, viz.,

  • 3D coordinates, (x, y, z)
  • strength of returned signal → provides valuable information about the density of the object (or even material composition)!
  • additional attributes: return number, scan angle, scan direction, point density, RGB color values, and time stamps → each can be used for refining the scan.

There are two types of scene illumination techniques for LiDAR:

type illumination method detector
flash lidar entire scene using wide laser receives all echoes on a photodetector array
scanning lidar very narrow laser beams, scan illumination spot with laser beam scanner single photodetector to sequentially estimate $ au$ for each spot


flash lidar scan lidar
architecture
resolution determined by photodetector array pizel size (like camera) laser beam size and spot fixing
frame rates higher (up to 100 fps) lower (< 30 fps)
range shorter (quick beam divergence, like photography) longer (100m+)
use less common most common


Now, consider the following scene (captured by a camera):



Compare this to the LiDAR images captured by the two methods:

flash lidar scan lidar (16 scan lines) scan lidar (32 scan lines)


A “LiDAR scan line” refers to a single horizontal line of laser pulses emitted by a LiDAR sensor, essentially capturing a cross-section of the environment at a specific angle as the sensor rotates, creating a 3D point cloud by combining multiple scan lines across the field of view; it’s the basic building block of a LiDAR scan, similar to how a single horizontal line is a building block of an image.

Potential Problems:

Atmospheric/environmental conditions can negatively affect the quality of the data captured by the LiDAR. For instance, fog can scatter the laser photons resulting in false positives.

As we see from the above image, the scattering due to the fog results in the system “identifying” multiple objects even though there is only one person in the scene.

Here are additional examples from the Velodyne VLP-32C sensor:

  1. light fog (camera vs LiDAR)

The LiDAR does a good job isolating the main subject with very few false positives.

  1. heavy fog (camera vs LiDAR)

The LiDAR struggles to isolate the main subject with very high false positives.

In spite of these issues, LiDAR is one of the most popular sensors used in autonomous vehicles. They’re getting smaller and more precise by the day; also decreasing costs means that we will see a proliferation of these types of sensors in many autonomous systems.

For an in-depth study on LiDARs, check this out: Stanford EE 259 LiDAR Lecture.

3.1.2.2 Millimeter Wave Radar [mmWave]

Short wavelengths like the *millimeter wave (mmWave**) in the electromagnetic spectrum allows for:

  • smaller antennae
  • integration of entire RADAR circuitry in a single chip!
  • spectrum of 10 millimeters (30 GHz) to 1 millimeter (300 GHz)

As we see from the above images, the sensors can be very small, yet very precise → some can detect movements up to 4 millionths of a meter!

Advantages of mmWave:

Advantage Description
small antenna caliber narrow beam gives high tracking, accuracy; high-level resolution, high-resistance interference performance of narrow beam; high antenna gain; smaller object detection
large bandwidth high information rate, details structural features of the target; reduces multipath, and enhances anti-interference ability; overcomes mutual interference; high-distance resolution
high doppler frequency good detection and recognition ability of slow objectives and vibration targets; can work in snow conditions
good anti-blanking performance works on the most used stealth material
robustness to atmospheric conditions such as dust, smoke, and fog compared to other sensors
operation under different lights radar can operate under bright lights, dazzling lights, or no lights
insusceptible to ground clutter allowing for close-range observations; the low reflectivity can be measured using mmwave radar
fine spatial resolution for the same range, mmwave radar offers finer spatial resolution than microwave radar >


mmWave is also used for in-cabin monitoring of drivers!


Limitations:

  • line of sight operations
  • affected by water content, gases in environments
  • affected by contaminated environment and physical obstacles


Resources:

For a more detailed description of mmWave RADAR, read: Understanding mmWave RADAR, its Principle & Applications

For programming a LiDAR, see: how to program a LiDAR with an Arduino.

3.1.3 Ultrasonic

Much like lidars, we can use reflected sounds waves to detect objects. They work by emitting high-frequency sound waves, typically above human hearing, and then listening for the echoes that bounce back from nearby objects. The sensor calculates the distance based on the time it takes for the echo to return, using the speed of sound. Popular modules like the HC-SR04 (Used in Lab#2) are easy to integrate with microcontrollers such as Arduino and Raspberry Pi. These sensors are widely used in robotics for obstacle avoidance, automated navigation, and liquid level sensing.

However, unlike optical (electromagnetic waves) detectors, ultrasonic sensors, while useful for basic distance measurements, cannot replicate the functionalities of LiDAR systems due to several key limitations. Unlike LiDAR, which employs laser beams to generate high-resolution, three-dimensional point clouds, ultrasonic sensors emit sound waves that provide only limited, single-point distance data with lower precision. LiDAR offers greater accuracy and longer range, enabling detailed mapping and object recognition essential for applications like autonomous vehicles and advanced robotics. Additionally, LiDAR systems can cover a wider field of view and operate effectively in diverse environments by rapidly scanning multiple directions, whereas ultrasonic sensors typically have a narrow detection cone and struggle with complex or cluttered scenes. Furthermore, LiDAR’s ability to capture data at high speeds allows for real-time processing and dynamic obstacle detection, which ultrasonics cannot match. This is because comparitively, it sounds waves take a lot of time to return since they’re much slower in speed compared to light waves (360m/s vs 299,792,458m/s). These differences in data richness, accuracy, and versatility make ultrasonic sensors unsuitable substitutes for the sophisticated capabilities offered by LiDAR technology.

We’ll be using ultrasonic distance finders in futures MPs to stop our rovers from colliding into objects. Since our rovers don’t moove to fast and complexity is relatively low, only a ultrasonic sensor would suffice.

3.2 Errors in Sensing

Since sensors deal with and measure the physical world, errors will creep in over time.

Some typical errors in the use of physical sensors:

error type description
sensor drift over time the sensor measurements will “drift”, i.e., a gradual change in its output → away from average values (e.g., due to wear and tear)
constant bias bias of an accelerometer is the offset of its output signal from the actual acceleration value. A constant bias error causes an error in position which grows with time
calibration errors ‘calibration errors’ refers to errors in the scale factors, alignments and linearities of the gyros. Such errors tend to produce errors when the device is turning. These errors can result in additional drift
scale factor scale factor is the relation of the accelerometer input to the actual sensor output for the measurement. Scale factor, expressed in ppm, is therefore the linear growth of input variation to actual measurement
vibration rectification errors vibration rectification error (VRE) is the response of an accelerometer to current rectification in the sensor, causing a shift in the offset of the accelerometer. This can be a significant cumulative error, which propagates with time and can lead to over compensation in stabilization
noise random variations in the sensor output that do not correspond to the actual measured value


Each error type must be dealt with in different ways though one of the commomn ways to prevent sensor errors from causing harm to autonomous systems → sensor fusion, i.e., use information from multiple sensors before making any decisions. We will dicuss sensor fusion later in this course.

3.3 Analog to Digital Convertors (ADCs)

As mentioned earlier, a sensor maps a physical quantity from the time domain to the value domain,

s : Dt ↦ Dv

where,

symbol description
Dt continuous or discrete time domain
Dv continuous or discrete value domain

Remember that computers require discrete sequences of physical values since microcontrollers cannot read values unless it is digital data. Microcontrollers can only see “levels” of voltage, which depends on the resolution of the ADC and the system voltage.

Hence, we need to convert the above into the discrete domain, i.e., we require Dv to be composed of discrete values.

According to Wikipedia,

A discrete signal or discrete-time signal is a time series consisting of a sequence of quantities. Unlike a continuous-time signal, a discrete-time signal is not a function of a continuous argument; however, it may have been obtained by sampling from a continuous-time signal. When a discrete-time signal is obtained by sampling a sequence at uniformly spaced times, it has an associated sampling rate.


A visual respresentation of the sampling rate and how it correlates to the sampling of an analog signal:

analog signal sampling rate sampling

Hence, a device that converts analog signals to digital data values is called → an analog-to-digital convertor (ADC). This is one of the most common circuits/microcontrollers in embedded (and hence, autonomous) systems. Any sensor that measures a physical property must pass its values through an ADC so that the sensor values can be used by the system (the embedded processor/microcontroller, really).

This is best described using an example:

The analog signal is discretized into the digital signal after passing through an ADC.

ADCs follow a sequence:

  • sample the signal
  • quantify it to determine the resolution of the signal
  • set binary values
  • send it to the system to read the digital signal

Hence, two important aspects of an ADC are:

3.3.1 ADC Sampling Rate

The sampling rate (aka Sampling Frequency) is measured in samples per second (SPS or S/s). It dictates how many samples (data points) are taken in one second. If an ADC records more samples, then it can handle higher frequencies.

The sample rate, fs is defined as,

fs = rac1T

where,

symbol definition
fs sampling rate/frequency
T period of the sample

Hence, in the previous example,

symbol value
fs 20 Hz
T 50 ms

While this looks slow (20 Hz), the digital signal tracks the original analog signal quite faithfully → the original signal itself is quite slow (1 Hz).

Now, if the sampling signal is considerably slower than the analog signal, then it loses fidelity and we see aliasing, where the reconstructed signal (the digital one in the case) differs from the original. Consider the following example of such a case:

As we see from the above figure, the digital output is nothing like the original. Hence, this (digital) output will not be of much use to the system.


Nyquist-Shannon Sampling Theorem:

to accurately reconstruct a signal from its samples, the sampling rate must be at least twice the highest frequency component present in the signal

If the sampling frequency is less than the Nyquist rate, then aliasing starts to creep in.

Hence,

fNyquist = 2 * fmax

where,

symbol definition
fNyquist Nyquist sampling rate/frequency
fmax the maximum frequency that appears in the signal

For instance, if your analog signal has a maximum frequency of 50 Hz then your sampling frequency must be at least, 100 Hz. If this principle is followed, then it is possible to accurately reconstruct the original signal and its values.

Note that sometimes noise can introduce additonal (high) frequencies into the system but we don’t want to sample those (for obvious purposes). Hence, it is a good idea to add anti-aliasing fitlers to the analog signal before it is passed to the ADC.

3.3.2 ADC Resolution

An ADC’s resolution is directly related to the precision of the ADC, determined by its bit length. The following examples shows the fidelity of the reconstruction, based on various bit lengths:

Increasing bit lengths the digital signal more closely represents the analog one.

There exists a correlation between the bit length and the voltage of the signal. Hence, the true resolution of the ADC is calculated using the bit length and the voltage as follows:

StepSize = racVrefN

where,

symbol definition
StepSize resolution of each level in terms of voltage
Vref voltage reference/range of voltages
N = 2n total “size” of the ADC
n bit size

This is easier to understand with a concrete example:

consider a sine wave with a voltage, 5 V that must be digitized.

If our ADC precision is 12 bits, then we get
N = 212 = 4096

Hence, StepSize = 5V/ 4096 which is 0.00122V (or 1.22mV)

Hence, the system can tell when a voltage level changes by 1.22 mV!

(Repeat the exercise for say, bit length, n = 4)


Visual Example:

The above maybe intuitively understood as follows:

Consider the following signal:

Now, if we want to sample this signal, we can obtain measurements at:


The figure shows 9 measurements.

Suppose, the ADC registers have a width of: 2 bits. Hence it can store at most: 4 values.

Since is is not possible to store 9 values → 2 bits, we must select only 4 values omn the digital side.

We then get the following representation:


which, to be honest, is not really a good representation of the original signal!

Now, consider the case where the ADC registers have a bit width: 4 bits16 values! Hence, we can easily store all 9 values easily.

So, we can get a digital representation as follows:


We see that this is a better representation, but still not exact. We can increase the bit length but at this point we are limited by the sampling as well. Since we only have 9 samples, adding more bits won’t help.

Hence, to get a better fidelity representation of the original signal, we see that sampling frequency and resolution need to be increased, since they determine the quality of output we get from an ADC.

Resources

4 Real-Time Operating Systems

Real-Time Operating Systems (RTOS) are specialized operating systems designed to manage hardware resources, execute applications and process data in a predictable manner. The main aim of this focus on “predictability” is to ensure that critical tasks complete in a timely fashion. Unlike general-purpose operating systems (GPOS) like Windows or Linux, which prioritize multitasking and user experience, RTOS focuses on meeting strict timing constraints, ensuring that tasks are completed within defined deadlines. This makes RTOS essential for systems where timing accuracy and reliability are critical, such as in embedded systems, autonomous driving, industrial automation, automotive systems, medical devices and aerospace applications, among others.

Hence, real-time systems (RTS), and RTOSes in general, have two criteria for “correctness”:

criteria description
functional correctness the system should work as expected, i.e., carry out its intended function without errors
temporal correctness the functionally correct operations must be completed within a predefined timing constraint (deadline)


To place ourselves in the context of this course, this is where we are:


We haven’t looked at the actuation part but we will come back to it later.

4.0.1 Key characteristics for RTOS

characteristic description
determinism primary feature of an RTOS is its ability to perform tasks within guaranteed time frames; this predictability ensures that high-priority tasks are executed without delay, even under varying system loads
task scheduling RTOS uses advanced scheduling algorithms (e.g., priority-based, round-robin or earliest-deadline-first) to manage task execution; RT tasks are often assigned priorities and the scheduler ensures that higher-priority tasks preempt lower-priority ones when necessary
low latency RTOS minimizes interrupt response times and context-switching overhead, enabling rapid task execution and efficient handling of time-sensitive operations (e.g., Linux spends many milliseconds handling interrupts such as disk access!)
resource management RTOS provides mechanisms for efficient allocation and management of system resources, such as memory, CPU and peripherals, to ensure optimal performance
scalability RTOS is often lightweight and modular, making it suitable for resource-constrained environments like microcontrollers and embedded systems
reliability and fault tolerance many RTOS implementations include features to enhance system stability, such as error detection, recovery mechanisms and redundancy

4.1 Kernels in RTOS

As with most operating systems, the kernel provides the essential services in an RTOS. In hard real-time systems, the kernel must guarantee predictable and deterministic behavior to ensure that all tasks meet their deadlines. In this chapter we focus on kernel aspects that are specific to RTS.

The RTOS kernel deals with,

  1. task management
  2. communication and synchronization
  3. memory management
  4. timer and interrupt handling
  5. performance metrics

4.1.1 Tasks, Jobs, Threads

The design of RTOSes (and RTS in general) deal with tasks, jobs and, for implementation-specific details, threads.

A real-time task, τi is defined using the following parameters: (ϕi, pi, ci, di) where,

Symbol Description
ϕi Phase (offset for the first job of a task)
pi Period
ci Worst-case execution time
di Deadline

Hence, a real-time tast set (of size ‘n’) is collection of such tasks, i.e., τ = τ1, τ2, ...τn. Given a real-time task set, the first step is to check if the task set is schedulable, i.e., check whether all jobs of a task will meet their deadlines (a job is an instance of a task). For this purpose, multiple schedulability tests have been developed, each depending on the scheduling algorithm being used.

  • remember that task is a set of parameters.
  • We “release” multiple “jobs” of each task, each with its own deadline
  • if all jobs of all tasks meet their deadlines, then the system remains safe.

A thread, then, is an implementation of task/job – depending on the actual OS, it could be either, or both.

At a high level, here is a comparison between tasks, jobs and threads (note: these details may vary depending on the specific RTOS):

aspect task job thread
definition a task is a unit of work that represents a program or function executing in the RTOS a job is a specific instance or execution of a task, often tied to a particular event or trigger a thread is the smallest unit of execution within a task, sharing the task’s resources
granularity coarse-grained; represents a complete function or program fine-grained; represents a single execution of a task fine-grained; represents a single flow of execution within a task
resource ownership owns its resources (e.g., stack, memory, state) does not own resources; relies on the task’s resources shares resources (e.g., memory, address space) with other threads in the same task
scheduling scheduled by the RTOS kernel based on priority or scheduling algorithm not directly scheduled; executed as part of a task’s execution scheduled by the RTOS kernel, often within the context of a task
concurrency tasks run concurrently, managed by the RTOS scheduler jobs are sequential within a task but may overlap across tasks threads run concurrently, even within the same task
state management maintains its own state (e.g., ready, running, blocked) state is transient and tied to the task’s execution maintains its own state but shares the task’s overall context
isolation high isolation; tasks do not share memory or resources by default ++ no isolation; jobs are part of a task’s execution low isolation; threads share memory and resources within a task
overhead higher overhead due to separate stacks and contexts minimal overhead, as it relies on the task’s resources moderate overhead, as threads share resources but require context switching
use case used to model independent functions or processes (e.g., control loops) used to represent a single execution of a task (e.g., processing a sensor reading) used to parallelize work within a task (e.g., handling multiple i/o operations)
example a task for controlling a motor a job for processing a specific motor command a thread for reading sensor data while another thread logs the data

(++ sometimes tasks do contend for resources, so we need to mitigate access to them, via locks, semaphores, etc. and then have to deal with thorny issues such as priority inversions)

A task is often described using a task control block (TCB):

Tasks typically cycle through a set of states, for instance (taken from the FreeRTOS real-time OS):


While the READY, RUNNING and BLOCKED states are similar to those in general-purpose operating systems (GPOS), periodic RTOSes must introduce an additional state: IDLE or SUSPENDED:

  • periodic task enters this state when it (rather one ‘job’) completes its execution → has to wait for the beginning of the next period
  • to be awakened by the timer (i.e., to launch the next instance/job), the task must notify the end of its cycle by executing a specific system call, end cycle → puts the job in the IDLE state and assigns the processor to another ready job
  • at the right time, each periodic task in IDLE state → awakened by kernel and inserted in the ready queue

This operation is carried out by a routine activated by a timer → verifies, at each tick, whether some task(job) has to be awakened.

TCBs are usually managed in kernel queues (the implementation details may vary depending on the particular RTOS).

Context Switch Overheads:

One of the main issues with multitasking and preepmtion is that of context switch overheads, i.e., the time and resources required to switch from one task to another. For instance, consider this example of two tasks running on an ARM Cortex-M4:

void Task1(void) {
    while(1) {
        // Task 1 operations
        LED_Toggle();
        delay_ms(100);
    }
}

and

void Task2(void) {
    while(1) {
        // Task 2 operations
        ReadSensor();
        delay_ms(200);
    }
}

When switching between Task1 and Task2, an RTOS might need to:

  • save 16 general-purpose registers
  • save the program counter and stack pointer
  • update the memory protection unit settings
  • load the new task’s context (program into memory, registers, cache, etc.)

So, on the ARM Cortex-M4,

effect cost
basic context switch 200-400 CPU cycles
cache and pipeline effects, total overhead 1000+ cycles
frequent switching (e.g., every 1 ms) could consume 1-2% of CPU time!

These costs can add up, especially if the system has,

  • many RT tasks and frequent preemption
  • high-frequency/short period jobs that execute frequently
  • if tasks contend with each other for shared resources

Hence and RTOS must not only be cognizant of such overheads but also actively manage/mitigate them. Some strategies could include:

  1. better task/schedule design: e.g., group related operations to reduce context switches
void Task_Sensors(void) {
    while(1) {
        // Handle multiple sensors in one task
        ReadTemperature();
        ReadPressure();
        ReadHumidity();
        delay_ms(500);
    }
}
  1. priority-based scheduling: e.g., high priority task gets more CPU
void CriticalTask(void) {
    // Set high priority
    setPriority(HIGH_PRIORITY);
    while(1) {
        ProcessCriticalData();
        delay_ms(50);
    }
}
  1. optimizing memory layouts: e.g., align task stacks to cache line boundaries
#define STACK_SIZE 1024
static __attribute__((aligned(32))) 
uint8_t task1_stack[STACK_SIZE];

Note: these are not comprehensive and other strategies could be followed, for instance avoiding multitasking altogether! All functions could be implemented in a single process that runs a giant, infinite loop known as a cyclic executive. Newer RTOSes shun ths cyclic executive in favor of the multitasking model since the latter provides more flexibility, control and adaptability but many critical systems (especially older, long-running ones) still use the cyclic executive. For instance, nuclear reactors, chemical plants, etc.

In any case, a precise understanding of these overheads is crucial for:

  • setting appropriate task priorities
  • determining minimum task periods
  • calculating worst-case execution times
  • meeting real-time deadlines
  • optimizing system performance

There is significant (ongoing) work, both in industry as well as academia, on how to get a handle on context switch overheads while still allowing for flexibility and modularity in the development of RTS.

4.1.2 (Inter-Task) Communication and Synchronization

RTOSes use various mechanisms like semaphores, mutexes, message queues and event flags for communication and synchronization between tasks. Here are some examples:

  1. Semaphores:
  • binary semaphores: work like a mutex, with values 0 or 1
  • counting semaphores: can have multiple values, useful for managing resource pools
// Example of binary semaphore usage
semaphore_t sem;
sem_init(&sem, 1);  // Initialize with 1

void TaskA(void) {
    while(1) {
        sem_wait(&sem);
        // Critical section
        accessSharedResource();
        sem_post(&sem);
    }
}
  1. Mutexes (mutual exclusion):
  • mutexes provide exclusive access to shared resources
  • they include priority inheritance to prevent priority inversion
mutex_t mutex;
mutex_init(&mutex);

void TaskB(void) {
    mutex_lock(&mutex);
    // Protected shared resource access
    updateSharedData();
    mutex_unlock(&mutex);
}
  1. Message Queues:
  • they allow ordered data transfer between tasks
  • provide for buffering capabilities
queue_t msgQueue;
queue_create(&msgQueue, MSG_SIZE, MAX_MSGS);

void SenderTask(void) {
    message_t msg = prepareMessage();
    queue_send(&msgQueue, &msg, TIMEOUT);
}

void ReceiverTask(void) {
    message_t msg;
    queue_receive(&msgQueue, &msg, TIMEOUT);
    processMessage(&msg);
}
  1. Event Flags:
  • enable multiple tasks to wait for one or more events
  • support AND/OR conditions for event combinations
event_flags_t events;
#define EVENT_SENSOR_DATA 0x01
#define EVENT_USER_INPUT  0x02

void TaskC(void) {
    // Wait for both events
    event_wait(&events, EVENT_SENSOR_DATA | EVENT_USER_INPUT, 
               EVENT_ALL, TIMEOUT);
    processEvents();
}
  1. Condition Variables:
  • tasks can wait for specific conditions
  • used with mutexes for complex synchronization
mutex_t mutex;
cond_t condition;

void ConsumerTask(void) {
    mutex_lock(&mutex);
    while(bufferEmpty()) {
        cond_wait(&condition, &mutex);
    }
    processData();
    mutex_unlock(&mutex);
}


Each mechanism has specific use cases:

mechanism use case
semaphores resource management and simple synchronization
mutexes exclusive access to shared resources
message queues data exchange and task communication
event flags multiple event synchronization
condition variables complex state-dependent synchronization

Common considerations:

  1. Priority Inversion Prevention: a high-priority (HP) task is indirectly preempted by a lower-priority (LP) task; HP → needs resource (R); R held by → LP, LP preempted by medium-priority (MP) task. So HP waits for MP → inversion of priorities! We will discuss solutions (priority inheritance/priority ceiling) later.
  2. Deadlock Avoidance: tasks are *permanently blocked** waiting on resources from each other; τ1 holds resource RA and waits for RB; τ2 holds resource RB and waits for RA.
  3. Timeout Handling: every synchronization mechanism should have a timeout to avoid indefinite blocking of critical tasks.
  4. Error Handling: detecting errors and handling them in a robust manner is critical to maintain system reliability; RTOSes use retry mechanisms, logging and, most importantly, have clear recovery procedures for failure scenarios.

These considerations are crucial for ensuring system reliability, maintaining real-time performance, preventing system deadlocks, managing system resources effectively and handling error conditions gracefully.

4.1.3 Memory Management

Real-time systems require predictable memory allocation and deallocation to avoid delays or fragmentation. Hence, they often use limited memory management techniques often eschewing even the use of dynamic memory allocation in favor of static memory allocation. For instance, many RTS don’t even use malloc() or new (i.e., no heap allocated memory) and very often avoid garbage collection. The main goal is for tight control of the memory management → this makes timing behavior more predictable. Hence, the following become easier:

  • wcet analysis
  • schedulability and other analyses
  • runtime monitoring and management
  • recovery/restart

Some goals for memory management in RTOSes:

  1. predictable execution times for memory operations
  2. fast allocation/deallocation
  3. minimal fragmentation, if any
  4. protection mechanisms between tasks

In fact, to achieve these goals, many RTSes don’t even use caches since they can be a major source of non-determinism in terms of timing behavior, e.g.,

if we cannot exactly calculate when some data/code will hit/miss in cache, then we cannot estimate its true timing behavior, leading to a lot of uncertainty → bad!

Some RTSes use scratchpads since they provide cache-like performance but have higher predictability since the data onboarding/offloading is explicitly managed (either by the program or the RTOS).

Some common memory-management techniques for RTOSes:

  1. static memory allocation: all memory used is allocated/deallocated at compile time.
  2. memory pools: fixed-size blocks are pre-allocated for specific purposes → fragmentation and provides deterministic allocation times.
  3. careful stack management: careful sizing/placing/management of the stack
  4. limited heap memory: using “safe” versions of malloc() for instance
  5. memory protection: using hardware such as memory protection units (MPUs)
  6. memory partitioning: explicitly partition memory/caches so that tasks cannot read/write in each others’ memory regions
  7. runtime mechanisms: such as memory usage monitoring, leak detection and managing the fragmentation

Of course, each of these mechanisms have their own problems and a deliberation on those is left as an exercise for the reader.

4.1.4 Timer and Interrupt Management

Timer and interrupt management are crucial components of an RTOS, ensuring that tasks are executed at precise intervals and that the system responds promptly to (internal and) external events. The role between timers and interrupts is closely related, since they offer the very basic timing mechanism in RTOSes (from Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications):

to generate a time reference, a timer circuit is programmed to interrupt the processor at a fixed rate and the internal system time is represented by an integer variable, which is reset at system initialization and is incremented at each timer interrupt. The interval of time with which the timer is programmed to interrupt defines the unit of time in the system; that is, the minimum interval of time handled by the kernel (time resolution). The unit of time in the system is also called a system tick.


Timers, in general, play important roles in such systems, viz.,

role description
task scheduling enable periodic execution of tasks
timeout management prevent indefinite blocking of resources
event timing measure intervals between events
system timing maintain system clock and timestamps
watchdog functions monitor system health and detect lockups


Typically these systems have the following three types of timers:

type properties
hardware - direct access to hardware timing resources
- highest precision and accuracy
- limited in number (hardware dependent)
- used for critical timing functions
software - implemented in software, using hardware timer as base
- more flexibility, less precise
- limited only by memory
- more suitable for non-critical timing functions
system tick - core timer for RTOS
- drives task scheduling
- fixed frequency


There are various design considerations for timers in an RTOS, viz.,

  1. resolution → the smaller the resolution, the higher the system/hardware/software/runtime overheads
  2. accuracy → need to understand and manage drift and jitter; timers may need to be calibrated often++
  3. power consumption → more accurate/high-precision a timer, higher the power consumption; also the tick can result in significant power consumption if not implemented/managed well

(++ drift indicates a gradual, long-term change in the timer’s frequency over time, whereas jitter refers to short-term, random fluctuations in the timing of individual clock pulses)

Interrupt Latencies → time from when an interrupt occurs to when the corresponding interrupt service routine (ISR) starts executing. As interrupts are integral to the operation of an RTOS, from the implementation of the system tick to notifcations of internal (watchdog timers) and external events (new sensor data), it is important to minimize interrupt latencies.

Optimization Techniques (to minimize latencies):

  • minimize interrupt frequency → oftean an RTOS will disable interrupts in critical sections
  • efficient timer and interrupt queue management → “nesting” interrupts,
  • power-aware timing strategies → “tickless” operating systems have been tried
  • optimize ISRs → keep them short, use other methods (deferred procedure calls or “bottom halves”).

4.1.5 Kernel Performance Metrics

Essentially, the kernel must be designed to minimize jitter and ensure that all operations have bounded and predictable execution times.

Hence, we can try to evaluate whether an RTOS kernel meets these goals using the following metrics (note: not exhaustive):

metric description
interrupt latency the time taken to respond to an interrupt
context switch time time to switch between tasks
dispatch latency time difference between task being ready and when it starts executing
throughput number of tasks?operations kernel can handle per unit time

4.2 Examples of RTOS

name description features
FreeRTOS a widely used, open-source RTOS for embedded systems small footprint, portable, supports a wide range of microcontrollers
VxWorks commercial RTOS used in aerospace, defense, applications high reliability, real-time performance, and support for multi-core processors
QNX a commercial RTOS known for its reliability and use in automotive and medical systems microkernel architecture, high security, support for posix apis
Zephyr open-source RTOS designed for IoT and Edge devices modular, scalable, supports a wide range of hardware architectures


Why is Linux not on the list? While it has many (increasing) list of real-time features, it is still far from a hard real-time system, mainly due to its complexity. It is difficult to analyze WCETs on Linux or completely control its timers → the list is endless. It still sees use in many real-time and embedded systems and we will (brielfy) explore its real-time capabilities soon.

4.2.1 FreeRTOS

As mentioned earlier, FreeRTOS is one of the most popular open-source RTOS options, widely used in embedded systems due to its simplicity, portability and extensive community support. It supports,

  • creation of multiple tasks, each with its own priority
  • preemptive and cooperative scheduling
  • mechanisms like queues, semaphores, and mutexes for communication and synchronization between tasks
  • several memory management schemes, including heap_1, heap_2, heap_3, heap_4, and heap_5, to suit different application requirements
  • highly portable and supports a wide range of microcontrollers and development boards, including ARM Cortex-M, ESP32 and STM32
  • a large and active community, with [extensive documentation, tutorials and examples available online

Here is an example that uses FreeRTOS to blink the LEDs on a microcontroller:

#include <FreeRTOS.h>
#include <task.h>
#include <gpio.h>

// Task to blink an LED
void vBlinkTask(void *pvParameters) {
    while (1) {
        GPIO_TogglePin(LED_PIN);  // Toggle LED
        vTaskDelay(pdMS_TO_TICKS(500));  // Delay for 500ms
    }
}

int main(void) {
    // Initialize hardware
    GPIO_Init(LED_PIN, GPIO_MODE_OUTPUT);

    // Create the blink task
    xTaskCreate(vBlinkTask, "Blink", configMINIMAL_STACK_SIZE, NULL, 1, NULL);

    // Start the scheduler
    vTaskStartScheduler();

    // The program should never reach here
    for (;;);
}

Resources:

  1. FreeRTOS Documentation
  2. FreeRTOS Tutorials
  3. Raspberry Pi and FreeRTOS [GitHub Repo]

4.2.2 Linux+Real-Time

As mentioned earlier, Linux, as a general-purpose operating system, is not inherently a real-time operating system (RTOS). However, it does provide several features and mechanisms that can be used to achieve real-time performance, especially when combined with real-time patches or specialized configurations.

Some of the real-time features of Linux include:

  • Preempt-RT Patch: a set of patches that convert the Linux kernel into a fully preemptible kernel, reducing latency and improving real-time performance; the Preempt-RT patch achieves this by:

    • making almost all kernel code preemptible: allows higher-priority tasks to preempt lower-priority tasks, even when the lower-priority tasks are executing kernel code
    • converting interrupt handlers to kernel threads: reduces time spent with interrupts disabled, for better predictability and lower latency
    • implementing priority inheritance: helps prevent priority inversion by temporarily elevating priority of lower-priority tasks holding a resource needed by higher-priority tasks
    • reducing non-preemptible sections: minimizes time during which preemption is disabled, further reducing latency
    • enhancing timer granularity: allows for more precise timing and scheduling of tasks, crucial for real-time applications

    the Preempt-RT patch is widely used in industries such as telecommunications, industrial automation and audio processing. It is actively maintained and supported by the Linux Foundation’s Real-Time Linux (RTL) collaborative project

  • Real-Time scheduling policies: support for real-time scheduling policies such as SCHED_FIFO and SCHED_RR, which provide deterministic scheduling behavior

  • High-resolution timers: support for high-resolution timers that allow for precise timing and scheduling of tasks

  • basic priority inheritance: mechanism to prevent priority inversion by temporarily elevating the priority of lower-priority tasks holding a resource needed by higher-priority tasks

  • CPU isolation: ability to isolate CPUs from the general scheduler, dedicating them to specific real-time tasks; also pinning processes to certain cores

  • Threaded interrupts: support for handling interrupts in kernel threads, reducing interrupt latency and improving predictability

  • Memory management techniques: such as memory locking to prevent pages from being swapped, the use of “huge” pages and memory pre-allocation

  • Control groups (cgroups): mechanism to allocate CPU, memory and I/O resources to specific groups of tasks, ensuring resource availability for real-time tasks

These features, when properly configured, can help achieve real-time performance on Linux, making it suitable for certain real-time and embedded applications.

4.2.3 Raspberry Pi OS+Real-Time

The Raspberry Pi OS can also be made “real-time” in the same manner as decribed above, since it is a Linux variant.

Though, there are some attempts at getting the Pi to behave in a real-time fashion, e.g.,: 1, 2, 3.


4.3 Robot Operating System (ROS)

ROS is an open source middleware framework built for robotics applications. The main goal → develop standards for robotic software. ROS provides many reusable modules for developing robotic applications.

Embedded/autonomous programs that do simple tasks (or operate with a single sensor/motor) are relatively easy to program. As more sensing, actuation, functionality is added (consider a larege industrial robot or even an autonomous car), programs quickly become quite complex – coordination of the data and system states becomes challenging.


ROS helps to develop and scale such applications and also manages communications between various parts of the software. As mentioned earlier, ROS is middleware:

Middleware is a software layer that connects the operating system to applications, data, and users. It provides common services and capabilities, like single-sign on (SSO), easy communication/coordination (like ROS) or application programming interface (API) management. Developers can rely on middleware to provide consistent, simplified integrations between application components. This frees up developers to build core features of applications, rather than spend time connecting those features to different endpoints and environments, including legacy systems.

At a high level, ROS,

  • creates a separation of code blocks → into reusable blocks
  • provides tools → easy communication between sub-programs
  • is language agnostic → allows different components to be written in, say Python and C and yet communicate using the ROS communication protocol

A simple example: control of a robotic arm+camera:


To write a ROS application to control this robotic arm, we first create a few subprograms:

  • one for the camera → node
  • another for → motion planning
  • one for → hardware drivers
  • finally one for → joystick

Now we use ROS → communication between these nodes.

ROS even provides plug and play libraries for designing your system, e.g., inverse kinematics libraries, trajectory planning for robotic arms, etc.

4.3.1 ROS Components

Some important components of ROS:


  1. node
  • a process that performs computation (a program/subprogram)
  • combined together into a graph
  • communicate via “topics”
  • operate at a fine-grained scale
  • a full system will have multiple nodes, e.g.,
    • one node controls a laser range-finder
    • one Node controls the robot’s wheel motors
    • one node performs localization
    • one node performs path planning
    • one node provides a graphical view of the system

The use of nodes has several benefits such as fault tolerance, reduced complexity and modularity.

  1. topics
  • they’re named buses over which nodes exchange “messages”
  • anonymous publish/subscribe semantics → decouples production of information from its consumption
  • nodes are not aware of who they are communicating with
  • nodes that are interested in data subscribe to the relevant topic
  • nodes that generate data publish to the relevant topic
  • can be multiple publishers and subscribers to a topic
  • topic is strongly typed by publisher → nodes can only receive messages with a matching type

Topics are meant for unidirectional, streaming communication. ROS includes other mechanisms such as services and [parameter servers]http://wiki.ros.org/Parameter%20Server) for different types of communciations.

  1. messages
  • nodes communicate with each other by publishing messages to topics
  • simple text files
  • simple data structure → typed fields
  • support standard primitives (int, float, boolean)
  • can include arbitrarily nested structs and arrays
  • nodes can exchange → request an response messages

A simple ROS message:

std_msgs/Header header
  uint32 seq
  time stamp
  string frame_id
geometry_msgs/Point point
  float64 x
  float64 y
  float64 z

Example: our first ROS message.

  1. ROS Master
  • provides naming and registration services to the rest of the nodes in the ROS system
  • also runs the parameter server → a shared, multi-variate dictionary that is accessible via network APIs, used by nodes to store/retrieve parameters
  • tracks publishers and subscribers to topics as well as services
  • enable individual ROS nodes to locate one anothe
  • once located, they communicate in a peer-to-peer fashion

Example:

  1. consider two nodes → camera node and image_viewer node
  2. camera notifies master → wants to publish images on the topic, images

  1. no one is subscribing to the topic, yet → no images sent
  2. image viewer → subscribe to images topic

  1. topic, images has both → publisher and subscriber
  2. master notifies both → of each others’ existence

  1. both start communicating with each other, directly


A more intricate example of the same:

  1. ROS transform
  • robotic system typically has many 3D coordinate frames that change over time
    • e.g., world frame, base frame, gripper frame, head frame, etc.
  • lets the user keep track of multiple coordinate frames over time
  • maintains the relationship between coordinate frames → manages spatial relationships
  • in a tree structure buffered in time
  • lets the user transform points, vectors, etc. → at any desired point in time
  • distributed → coordinate frames of robot available to all ROS components on any computer in the system
  • sensor fusion, motion planning, and navigation
  • organizes all coordinate frames and their relationships into a transform tree

An example of a ROS transform and tree:

4.3.2 ROS and Real-time?

ROS (the first version) is not real-time. Hence, systems that requires hard real-time guarantees shoud not use it. But ROS can be itegrated into systems that require some latency guarantees. If needed, ROS can be run on top of the RT_PREEMPT real-time patch on Linux. In addition, specific nodes can be designed to handle real-time functions or programmed to behave as real-time control systems.

If better real-time guarantees are required on ROS, then ROS 2 if your best bet.

Resources: more information on real-time and ROS2 can be found at RT ROS2 Xilinx and RT ROS Github.

4.3.3 Ros+Navio2

We use ROS (the original version, not ROS2) in our class. Note: while ROS has no real-time capabilities, one some embedded systems, if it fast enough that we can use it to control safety-critical systems such as drones and other small autonomous systems.

In fact, the basic Raspbian image comes installed with ROS. We can use it communicate between the Navio2 and the controller running on the Pi to exchange critical information, e.g., sensor data.

Resources: please read the step-by-step instructions on how to connect/use the Navio2 and the Pi using ROS.

5 Scheduling for Real-Time Systems

Consider an engine control system that cycles through the various phases of operation for an automotive engine:


This system periodically cycles through multiple tasks, viz.,

  1. air intake
  2. pressure
  3. fuel injection+combustion
  4. exhaust

If we correlate this to task “activations”, then we may see the following:

This is a (somewhat) simple execution model known as the cyclic executive that we shall return to later. Hence, there is a direct correlation between the physical aspects of a real-time and autonomous system and issues such as scheduling.

5.1 Real-Time Models

Most real-time systems can be classified into:

property hard soft
usefulness” of results
optimality criterion all deadlines must be satisfied most deadlines must be met
examples sensor readings, vehicular control video decoding, displaying messages

There are many ways to model a real-time system:

  1. workload model → describes applications supported by system, using
    • functional parameters
    • temporal parameters
    • precedence constraints and dependencies
  2. resource model → describes system resources available to applications
    • modeling resources, e.g., processors, network cards, etc.
    • resource parameters
  3. algorithms → defines how application uses resources at all times
    • scheduling policies
    • other resource management algorithms

5.1.1 Workload Model

We have already discussed tasks vs. jobs vs. thread earlier so we understand how to model computation. We also discussed how actual execution is modeled (via threads).

Let us focus on jobs and some of their properties:

note → deadlines and periods don’t have to match, but they usually do, _i.e., D = P

property/parameters description
temporal timing constraints and behavior
functional intrinsic properties of the job
resource resource requirements
interconnection how it depends on other jobs and how other jobs depend on it

As discussed earlier, we need to get a good understanding of the wcet of jobs.

5.1.2 Utilization

One of the important ways to understand the workload requirements for a task is to compute,

how much utilization is taken up by all jobs of the task?

One might ask: if there are (potentially) an infinite number of jobs for each task, since they’re periodic++ and long running, then how can one campute the utilization?

Recall that a periodic function is of the type → f(t) = f(t + T)

where T is the “period”

Note: utilization is not the time taken by a task (or its jobs). It is,

the fraction of the processor’s total available utilization that is soaked up by the jobs of a task

Hence, the utilization for a single task is,

$$ U_i = \frac{c_i}{T_i} $$

where,

symbol description
ci wcet of the task
Ti period of the task

Now, we can compute the utilization for the entire task set,

$$ U = \sum_{i=1}^{n} U_i = \sum_{i=1}^{n} \frac{c_i}{T_i} $$

Simple Exercise: what is the total utilization for the following task set?

Task c T
τ1 1 4
τ2 2 5
τ3 5 17

what does it mean if U > 1?

5.1.2.1 Precedence Constraints

Jobs can be either precedence constrained or independent → we can use directed acyclic graphs (DAGs) to specify/capture these constraints.

For instance, $ J_a J_b$ implies,

  • Ja is a predecessor of Jb
  • Jb is a successor of Ja

$ J_a J_b$ implies

  • Ja is an immediate predecessor of Jb

Consider the following example:

dag relationships
J1 ≺ J2
$ J_1 J_2$
J1 ≺ J4
J1 ↛ J4

Here is a more realistic example → the precedence constraints in an autonomous driving system:

5.1.3 Resource Model

A “resource” is some structure used by task → advance execution.

Type of resources:

  • active → e.g., processors (they execute jobs)
    • every job → one or more processors
    • same type if functionally identical and used interchangeably
  • passivee.g., files, network sockets, etc.
  • private vs shared

We have already discussed resource sharing and synchronization earlier → mutexes, locks, etc. Esentially there is a need for mutual exclusion that is guaranteed via one of these methods or by the use of critical sections.

5.1.4 Scheduling and Algorithms

Before we proceed, we need to understand → preemption:

preemption is the process of suspending a running task in favor of a higher priority task.

Most OS (real-time & non real-time) allow preemption since they allow,

  • greater flexibility in constructing schedules
  • exception handling
  • interrupt servicing

Preemptions, among others, help define a variety of scheduling events; essentially, these are the time when the scheduler is invoked:

  • in a fully preemptive system → task arrival, task completion, resource reauest, resource release, interrupts and exceptions, etc.
  • in a non-preemptive system → task completion
  • in limited preemptive systems → predefined preemptions points, e.g., POSIX thread cancel (pthread_testcancel())

5.1.4.1 Task Schedule

We have been talking about “scheduling” all this while so it is time to formally define what a schedule is.

Given → a set of jobs, J = {J1, J2, ..., Jn}

A schedule → an assignment of Jobs to the CPU, so that each task can execute till completion.

5.2 Schedulers

Finally, we get to the main topic at hand → schedulers and scheduling! Historically there have been many scheduling policies developed for a variety of systems, e.g., FIFO, round robin, time sharing, etc.

Here is a good comparison of the various types of (historical) CPU/OS schedulers: CPU Scheduling in Operating Systems.

In the realm of real-time systems, to formally define a scheduling problem, we need to specify:

  1. a set of tasks, τ
  2. a set of processors, P
  3. a set of resources, R

Hence, the general scheduling problem is,

assign processors and resources to tasks, such that the following constraints are met:

  • timing constraints
  • precedence constraints
  • resource constraints

There is a large body of literature in the domain of real-time scheduling algorithms. In this chapter, we will focus on a few of them, viz.,

One of the main problems with the scheduling problem, as defined above (and in general), is that many variants of the problem are intractable, i.e., NP-Hard or even NP-Complete.

Recall that an NP-Hard problem is one where no known polynomial time algorithm exists. So, all known algorithms are superlinear or, usually, of exponential time complexity!

[Will leave it to the reader to recall or look up the definition of NP-Complete.]

Since the scheduling problems may not be tractable (or “solvable” in a realistic time frame), we need to find heuristics but they can be “sub-optimal”. Luckily, we have a couple of provably optimal real-time schedulers (in the single core domain).

Additional, important definitions:

concept definition
feasibility schedule is feasible if all tasks can be completed by satisfying a set of specified constraints
schedulable set of tasks is schedulable if there exists at least one algorithm that can produce a feasible schedule
schedulability analysis analysis to confirm that deadlines for a set of threads can be guaranteed using a given scheduling policy

5.2.1 Cyclic Executives

In the automotive enginer example from earlier, we see that for each cycle, the same set of tasks repeat (i.e.., “periodic behavior”). Note though that the tasks need not execute in parallel – rather, they must execute sequentially for this application. Usually such applications use a scheduling mechanism known as a “cyclic executive”.

Consider this simple example with three tasks:

task c
T1 1
T2 2
T3 3

How would we schedule this? Assuming a single processors (hence a single timeline).

Well, the simplest mechanism is to just use a sequential schedule,

If, as in the case of the engine control example we saw earlier, the tasks repeat ad infinitum, then we see the pattern also repeating…

Cyclic executives were common in many critical RTS, since they’re simple and deterministic. An implementation could look like,

while(1)    // an infinite loop
{
    // Some Initialization

    Task_T1() ;

    // Some processing, maybe

    Task_T2() ;

    // Some other processing, maybe

    Task_T3() ;

    // Cleanup
}

Question: what problems, if any, can happen due to cyclic executives?

The very simplicity of such systems can also be their biggest weakness.

  1. lack of flexibility: as the example and code above demonstrate, once a pattern of executions is set, it cannot be changed, unless the system is stopped, redesigned/recompiled and restarted! This may not be possible for critical applications. Even for the engine control application in cars, this doesn’t just mean stopping and restarting the car, but re-flashing the firmware for the engine, which is quite an involved task.

  2. scalability: along similar lines, it is difficult to scale the system to deal with additional issues or add functionality.

  3. priority: there is no way to assign priority or preemption since all tasks essentially execute a the same priority. Hence, if we want to deal with higher-priority events (e.g., read a sensor) or even atypical (aperiodic/sporadic) events, such as sudden braking in an autonomous car, then a cyclic executive is not the right way to go about it.

  4. resource management: certain tasks can corral resources and hold on to them while others may starve – leading to the system becoming unstable. For instance, even in the simple example, we see that T3 can dominate the execution time on the CPU:

Since the system is one giant executable, it is difficult to stop a “runaway task” – the entire system must be stopped and restarted, which can lead to serious problems.

5.2.2 Frames

One way to mitigate some of the problems with cyclic executives, is to split the resource allocation into “frames” → fixed chunks of time when a task can claim exclusive access to a resource, e.g.. the processor:

  • once a frame starts, the task gets to execute uninterrupted
  • at the end of the frame, the task must give up the resource → regardless of whether it was done or not

So, if we revisit our simple example and break the processor schedule into frame sizes of 2 units, each,

why 2? Well, it is arbitrary for now. But, as we shall see later, we can calculate a “good” frame size

Now, our schedule looks like,

As we see from this picture, task T1 doesn’t end up using its entire frame and hence, can waste resources (one of the pifalls of this method).

Continuing further,

Task T3 is forced to relinquish the processo at t=6 even though it has some execution left → on account of the frame ending. Now T1 resumes in its own frame. T3 has to wait until t=10 to resume (and complete) its execution:

Other Static/Table-Driven Scheduling:

Cyclic executives are an example of schedulers where the tasks are fixed, ahead of time, and all that a scheduler has to do is to dispatch them one at a time in the exact same order. Often the tasks are stored in a lookup table (hence the name “table-driven”). Other examples of such systems (with some prioritization and other features built in) have been built, e.g., weighted round robin → also finds use in cloud computing and networking load balancing, etc.

5.2.3 Priority-Based Schedulers

One method that overcomes the disadvantages of a completely static method is to assign priorities for jobs as they arrive. Hence, when a job is released it gets assigned a priority and the scheduler then dispatches/schedules the job accordingly. Hence, it if is the highest priority job so far, it gets scheduled right away, by preempting any currenlty running tasks. If, on the other hand, it is not the highest priority task then it is inserted into the ready queue at the right position (priority level).

To deal with this, we need an online scheduler, i.e., one that is always available to make scheduling decisions – when tasks arrive, complete, miss their deadlines, etc.

Before we go any further, let’s update the task model a little, to make matters simple for us.

  • as before, we have a task set comprised of n periodic tasks, τ = τ1, τ2...τn
  • deadline is equal to period, i.e., T = D; task periods are T1, T2, ...Tn
  • all tasks are independant → no precedence or resource constraints exist
  • tasks cannot suspend themselves (or others)
  • tasks are preemptible by the OS → each time the highest priority task is executed (under preemptive scheduling)
  • execution time of each task is bounded → wcet (c1, c2, ...cn)
  • tasks are released (i.e., placed into the ready queue) as soon as they arrive
  • all kernel overheads (e.g., context switches) → assumed to be zero

While these may seem to be overly simplifying, they still fit the model of many systems and help us develop fundamental results. And we can always add them back one-by-one and still retain the correctness of the theoretical results we develop, while making the system more realistic.

Now, in the real of online, priority-driven schedulers, we have two options:

priority assignment algorithms
static Rate-Monotonic (RM), Deadline-Monotonic (DM)
dynamic Earliest-Deadline First, Least-Slack Time (LST)


Let’s look at one of each (the most popular ones), viz. the Rate-Monotonic (RM) and Earliest-Deadline First schedulers. Note that both were first described and analyzed in a seminal Computer Science Paper, that has since become one of the most cited and influential papers in the field: Scheduling Algorithms for Multiprogramming in a Hard- Real-Time Environment by Liu and Layland.

Interestingly, both of these algorithms are provably optimal, i.e., no other static or dynamic algorithm can do better than RM and EDF respectively! Not bad for the first paper in the area – talk about setting a high bar, or rather bound!

5.2.3.1 Rate-Monotonic Scheduler (RM)

The Rate-Montonic priority assignment algorithm assigns priority based on the period of the task → shorter the period, the higher the priority!

Consider the following example:

task c T
τ1 2 4
τ2 2 5
τ3 1 10

So, based on the RM algorithm, the priority will be:

τ1 > τ2 > τ3

since, T1 < T2 < T3.

The question now is whether the above task set is schedulable? Let us use our previous utilization-based check, i.e.,

$$U = \sum_{i=1}^{n} \frac{c_i}{T_i}$$

So, plugging in the numbers, we get,

$$ U = \frac{1}{2} + \frac{1}{4} + \frac{2}{6} = 0.5 + 0.4 + 0.1 = 1.0 $$

Our test was: U ≤ 1. So, this task set is…schedulable? Let us see – by plotting it on the timeline:

As we see, task τ3 misses its deadline! In fact, with the other two tasks, τ1 and τ2 constantly executing, τ3 will never get to execute and all of its jobs will miss their deadlines!

“Wait!”, you say. OK, one job has missed its deadline, maybe two (from the picture). So how can I make the bold claim that all will miss their deadlines?

If you pay close attention to the figure, I have only checked for 10 time steps. Why that number? Well it so happens that 20 is the LCM (lowest common multiple) of all the task periods, 4, 5, 10.

Why do we care about the LCM? Turns out, in real-time scheduling, the LCM of the task periods have a special significance. Turns out that if we construct the schedule for one LCM nunber of time units, then the schedule repeats exactly after that! Hence, the exact same schedule repeats every LCM number of units.

The LCM of the constituent (periodic) tasks in the system is referred to as → hyperperiod. So, we only need to check our schedule for one hyperiod. If the task set is schedulable in that timeframe then it will be and, if not, it will not be.

So, for this example, I can state, with confidence, that all jobs of τ3 will miss their deadlines since within half of the hyperperiod, one job has missed its deadline!

So, coming back to our analysis, we started with our utilization test U < 1 which this task set, passed, yet it **failed to schedule*! So, it seems we need something better.

Turns out the Liu and Layland paper has figured this out. So they created another test, one based on: utilization upper bound. Since RM is a static priority assignment mechanism, there is a theoretical limit on the utilization for a task set, that depends on the number of tasks, n, in the system.

So, we derive (I leave out the details here. Check the Liu and Layland paper for more details) another check for utilization,

$$ U = \sum_{i=1}^n \frac{c_i}{T_i} \le n.(2^{\frac{1}{n}} -1) $$

If the total utilization of the system is below this bound, then the task set is schedulable by RM. Note that this is a necessary but not sufficient test (more on that later).

As we see from above, the value of the right hand side will change with the number of tasks in the system. Hence, with more tasks, the upper bound for U will reduce.

let’s open up the simulator-plotter for checking this for various values of n and see for n = 1, 2, ...

So, we see that

$$n = 3\\ U_{ub} \approx 0.78 $$

The utilization for our task set was: 1.0 which is significantly higher! No wonder our task set wasn’t schedulable!

Here is a plot that shows the values for different numbers of tasks:

As we see, the value keeps reducing. Does it keep going all the way down to zero? What if I schedule 100 tasks? A 1000?

Turns out, we can check! With the exact same equation.

let’s open up the simulator-plotter for checking this for various values of n and see for n = 100, 1000, etc.

As we see from the figure (and the active plotting), the values seem to plateau out and converge…to 0.69! So, for any real-time system scheduled using the RM assignment mechanism, if the utilization bound is under 0.69 then it is schedulable.

Optimality: as mentioned earlier, RM is optimal, i.e.,

  • if a task set is schedulable by RM → then there is no other static algorithm that can do better (in terms of utilization)
  • if a task set is not schedulable by RM → there is no other static algorithm can schedule it
5.2.3.1.1 Exact (Response Time) Analysis

Now, let’s go back to one of our earlier examples (from the cyclic executive chapter):

task c T
τ1 1 4
τ2 2 6
τ3 3 12

We added some period information to the tasks.

We know that using the naive utilization test, U ≈ .0.833. But, recall that the utilization bound test, for n = 3 tasks requires, U < 0.78. So, this task set must be unschedulable, right? Let’s draw it out on the timeline and see:

Wait, this is schedulable? But it fails our test!

This is why I referred to it as a necessary but not sufficient test. Hence, for the Liu and Layland utilization bound test,

result of test schedulable?
pass, i.e., Uub < 0.69 yes
fail, Uub > 0.69 unsure

We we need a better test → Response Time Analysis (RTA): - if all jobs of a task are able to complete before their respective deadlines → task set is schedulable - caveat → we account for the interference (hence, delays) encountered by the jobs by all higher priority jobs

Let’s look at it in more detail:

  1. worst-case response time of task, τi is

Ri = ci + Ii

where Ii is the interference faced by that job from all higher prioriy jobs until then.

  1. For each higher priority job, τj, the number of jobs released during the time interval Ri is:

$$\left\lceil \frac{R_i}{T_j} \right\rceil$$

Since each period of task τj results in a new job being released. Since RM gives higher priority to shorter periods, those released jobs will execute ahead of the current teak, τi.

  1. Since Ri/Tj number of τj’s jobs execute before τi, the interference caused by all of them:

$$I_j = \left\lceil \frac{R_i}{T_j} \right\rceil .\ c_j$$

  1. the total interference then, is the sum of the individual interference by each of the higher priority jobs, i.e.,

$$I = \sum_{j\in hp(i)}\left\lceil \frac{R_i}{T_j} \right\rceil .\ c_j $$

  1. Finally, the response time for task, τi must combine its own (worst-case) execution time with the total interference from all higher priority tasks,

Ri = ci + Ii

$$ R_i = c_i + \sum_{j\in hp(i)}\left\lceil \frac{R_i}{T_j} \right\rceil .\ c_j $$

For each task, we carry out the above analysis → stop when consecutive iterations provide the same values.

  1. At each stage, check if the response time for a task is less than or equal to its deadline

τi : Ri < Ti

If the above test passes for all tasks, then the task set is schedulable even if the earlier tests fail. Hence, this is both, necessary and sufficient.

Example [contd.]: Now, applying this to our errant example:

  1. assign priorities: τ1 > τ2 > τ3

  2. response time calculations

For each task, we calculate the response time using the above formula via iterative analysis.

task iteration calculations Ri < Ti
τ1 1 R1 = c1 = 1 yes [ 1 < 4 ]
τ2 1 R20 = c2 = 2
$R_2^1 = c_2 + \lceil\frac{R_2^0}{T_1}\rceil c_1$
$R_2^1 = 2 + \lceil\frac{2}{4}\rceil \cdot 1 = 3$
2 $R_2^2 = 2 + \lceil\frac{3}{4}\rceil \cdot 1 = 3$ [stop] yes [ 3 < 6 ]
τ3 1 R30 = c3 = 3
$R_3^1 = c_3 + \lceil\frac{R_3^0}{T_1}\rceil c_1 + \lceil\frac{R_3^0}{T_2}\rceil c_2$
$R_3^1 = 3 + \lceil\frac{3}{4}\rceil \cdot 1 + \lceil\frac{3}{6}\rceil \cdot 2 = 6$
2 $R_3^2 = 3 + \lceil\frac{6}{4}\rceil \cdot 1 + \lceil\frac{6}{6}\rceil \cdot 2 = 8$
3 $R_3^3 = 3 + \lceil\frac{8}{4}\rceil \cdot 1 + \lceil\frac{8}{6}\rceil \cdot 2 = 8$ [stop] yes [ 8 < 12 ]

Since the response time of all tasks meet their deadlines under RM scheduling, therefore the task set is schedulable.

The response time analysis algorithm:


There exist many of static assignment algorithms, for instance the Deadline Monotonic Scheduler where priorities assigned to processes are inversely proportional to the length of the deadline.

Resources:

  1. to read more about the schedulability analysis details or other types of schedulers, see: Priority-Driven Scheduling.
  2. A Review of Priority Assignment in Real-Time Systems by Davis et al.

5.2.3.2 Earliest-Deadline First Scheduler (EDF)

The problem with static priority assignments, is that they are typically non-optimal. Wait, but we said earlier that RM is optimal? Well, among static priority assignment algorithms RM is optimal but, as we saw from the analyses and the graphs, upper bounds for tasksets often taper off at U = 0.69. While we can create task sets with higher utilization (as the response-time analyis shows us), it takes a lot of manual effort to create such task sets. This means, that in the common case, we are wasting nearly 31 % utilization!

A dynamic assignment of priorities can help mitigate these problems and ensure that we make better use of the system resources. Many dynamic scheduling algorithms have been proposed, e.g., FIFO and Least Slack Time among others.

In this section we will focus on the priority assignment mechanism for real-time systems which, actually, is one of the mainline schedulers in Linux, named SCHED_DEADLINE.

In a nutshell, as the name implies, EDF assigns priority based on the absolute deadline of the job. From Wikipedia: > Whenever a scheduling event occurs (task finishes, new task released, etc.) the queue will be searched for the process closest to its deadline. This process is the next to be scheduled for execution.

While the exact priority of a job depends on its deadline, the latter are calculated statically. Hence, even though jobs of a task may have different priorities, each job has only one priority level. As jobs complete, the priorities of the remaining (or new, incoming) jobs change.

EDF Schedulability Test: is the same as the first test we discussed, i.e.,

$$ U = \sum_{i=1}^{n} \frac{c_i}{T_i} \le 1 $$

As long as we’re following the task model described earlier, this is a very simple check to test for the validity of the task set for EDF.

Let’s revisit our example from RM that was not schedulable, _viz.,

task c T
τ1 2 4
τ2 2 5
τ3 1 10

Before we proceed, we need to decide on some policies:

  • two jobs have the same deadlines → pick the one that was released first
  • same deadline and released at the same time++ → pick one with the smaller index?

These are some simple rules that help in simplifying the decision-making process. We can change them, as long as it helps us make forward progress and we remain consistent.

++ in this scenario, ideally we should redesign the system to combine these two tasks – if they always run together, then why pay the extra costs of preemption/context-switch, etc.?

For the above task set, we see that,

$$U = \frac{2}{4} + \frac{2}{5} + \frac{1}{10} = 0.5 + 0.4 + 0.1 = 1 \le 1$$

Hence, this task set is schedulable using EDF (just barely though since its utilization is 1!). At least theoretically! Remember that this task set failed for RM.

The schedule diagram for this task set looks like,

So far, it looks schedulable. It is left as an exercise to the reader to check until the hyperperiod (20).

Optimality: EDF is an optimal scheduling algorithm → if the task set is schedulable by some algorithm, it is also schedulable under EDF.

EDF can definitely squeeze out the maximum from a system – as we just saw, it can even schedule task sets with the theoretical utilization vound (1)!

5.2.3.3 RM vs. EDF

So, let’s compare the two superstars of the real-time scheduling world, RM and EDF. The ✓ symbol indicates which one is better.

parameter RM EDF
optimality ✓ (static)
context switch overheads
preemptions
analysis overheads
utilization
implementation ease
predictability
total 4 4


Other Issues: note that we have mostly considered a simple task model. But these may vary in real world systems, e.g.,











6 Control Theory

Consider a simple problem → how do you balance a ball?


I guess that’s more complicated than what we wanted! So, let’s make it really simple and try in a one dimensional plane, as follows:

We want to balance the ball in the middle of the table. And the ball moves either left or right, based on how we tilt the table.

As we see from this picture, a naive attempt at balancing the ball can quickly make it “unstable”. But, our objective, is to make sure that,

  • the ball remains stable and
  • it is in the middle of the table

The options that are available to us are:

  1. tilt the table down on the left (anti-clockwise)
  2. title the table down on the right (clockwise)

We also have the ability to control the speed at which the table tilts to either side. We can actually combine these, as we shall see.

Hence, the parameters for the problem are:

type options
inputs speed (clockwise, anticlockwise)
output ball velocity, acceleration

Some how, we need to control the outputs by modifying the inputs to the system. Enter control theory.

6.1 Control Theory | Introduction

Control theory is a multidisciplinary field at the intersection of applied mathematics and engineering. Engineering fields that heavily utilize control theory include mechanical, aerospace, electrical and chemical engineering. Control theory has been applied to the biological sciences, finance, you name it.

Anything that you,

  • want to control and
  • can develop a model

you can develop a “controller” for managing it, using the tools of control theory.

In our everyday life, we interact with technologies that utilize control theory. They appear in applications like adaptive cruise control, thermostats, ovens and even lawn sprinkler systems. The field of control theory is huge and there’s a wide range of subdisciplines.

The basic idea behind control theory is to understand a process or a system by developing a model for it that represents,

the relationships between inputs and outputs for the system

We then use this model to adjust the inputs → to get desired outputs. The relationship between the inputs and outputs is usually obtained through empirical analysis, viz.,,

  1. make changes to the input
  2. wait for the system to respond
  3. observe changes in the output.

Even if the model is based on an equation from physics, the parameters within the model are still identified experimentally or through computer simulations.

We repeat the experiments/simulations as needed to “understand” the system as well as we can, in ordero to develop the model. Once the model has been developed, we develop a control model that can used to tune the input → output relationship.

In effect, we are inverting the original model (input → output) to develop,

control model: input ← output

To better understand this, consider the example of a light bulb and switch:

Even if we didn’t know the relationship between the switch and bulb, we can conduct a few experiments to figure out the following:

switch state
(input)
bulb state
(output)
off off
on on

Now we have our “model” of input (switch state) → output (ligthbulb state). This model works as as no external disturbances occur (power failure or bulb burn out).

But, this is not our control model. For that, we need to invert the model we’ve built so far.

So, we start with the desired output state, e.g., the “lightbulb must be on”. Then, we reason backwards to: “what should the input be to achieve this desired state?”. Should the switch be on or off?

From our original model (and experiments), we have created the I/O relationship table above. Hence, it stands to reason that we can “invert” it as:

desired output
lightbulb state
corresponding input
switch state
on on
off off


Now, let’s formalize things a little.

Consider the following mathematical model that describes the behavior of a system:

The model says that if we change the input u the output y will change to be twice the value of the input u.

Now, in control theory, we are concerned about how to get to a specific output. Hence, if we want to reach a specific value of y, say → y*, we need to manipulate the model to now create a “control model”, i.e.,

$$u = \frac{y^*}{2}$$

This model says for any value of the output y* that we want, we can identify the input u → essentially dividing y* by 2. Notice that this equation is now in terms of u → we have our control law! Restating the obvious, this is an “inverse” of the original model of the system.

We have just developed our first controller! The desired value, y* is referred to as the setpoint. We get to the setpoint by picking the right value for u.

Developing a control law, in a sense, is inverting your model to rewrite the output in terms of the input so that you can get the output you want. More complex systems lead to more complicated control laws. Practitioners have developed methods of developing control laws for systems whose models cannot be cleanly inverted, e.g., such as nonlinear systems or systems with dead times.

For context, this is where we are in this course map:

6.1.1 Open-Loop vs Closed-Loop Control

For the control law we just developed, if our model is accurate and there are no disturbances then,

y = y*

However, note that there is nothing ensuring that the value of y = y*. We just assume (rather, expect) that it would be the case. This is known as an open loop controller. You desire a certain output and hope that the controller actually gets there.

So, the open loop controller is depicted as:

What we really want, is to somehow ensure that the controllers gets to its setpoint. How do we do that?

The problem is that while the input drives the output, there is no way to guarantee that the controller will get to the set point.

What we really need, is a closed-loop controller → one that uses feedback to,

  • adjust u
  • ensure that we get to y* (or, at least as close to it as possible).

The feedback typically comes from the output of the controller model that we created. So,

Note that the feedback can be positive or negative.

[The above description is distillation of the excellent description found here.]

6.2 Feedback Control

[The following material is thanks to Prof. Sathish Gopalakrishan, Univ. of British Columbia].

Consider the following problem (that we increasingly encounter in the real world):

how do you ensure that a car remains in the center of its lane?

So, we have a car moving on the road, thus:


the blue arrow shows the direction of motion of the car. Hence, for the car to remain in the center of the lane, we need to apply a correction to its direction of motion,


There are some questions that come up:

  • how do we apply the corrections?
  • how much and
  • when do we stop?

Enter feedback control:

  • compare system state to the desired state
  • apply a change to the system inputs → counteract the deviations
  • repeat until desired outcome → setpoint

Example: let’s see how feedback control can be applied to a temperature control of a room.

Given a “desired” room temperature (as input to a thermostat), what do we need to consider while attempting to achieve this temperature?

The thermostat needs to control/provide inputs to a furnace/AC,

which then affects the temperature in the room:

Easy! Done…right?

Except, the real world is far from ideal. We have to deal with disturbances…


Well not that kind of disturbance, but pesky issues like heat loss, bad insulation and physical problems in general:

As we see from the picture, we may not get to the expected behavior due to external factors. So, as before, just the input will not suffice to reach the setpoint.

So, we provide “feedback” to the controller:

Essentially the temperature reading of the room, after the thermostat and furnace/AC have completed their operations based on the original inputs (desired temperature).

Let’s introduce some generic technical terms for each of these components:

The “controller” is based on the “control model” that we developed earlier. It sends commands (“actuation signals”) to an actuator and then affects the process under control. Finally, the process variable (the “output” from the earlier discussions) is what we want to drive towards the set point.

Note: in the case that feedback is not possible, there is work on → open-loop control and feedforward control.

Another example → cruise control.

Note how the feedback reaches the controller in this case.

So, at a high-level, a closed-loop feedback control system looks like,

Some of these inputs/edges have specific names:

Note: the main goal → error is minimized (ideally 0).

A more formal definition of the same quantities,

quantity definition
r(t) reference/set point
e(t) error
u(t) control signal/“input”
y(t) (expected/final) output
$\overline{y(t)}$ “feedback”/estimate

Now let’s apply this feedback control model to the earlier problem of lane following.

6.3 Feedback Control Applied to Lane Following

Recall that we want to keep the car in the center of its lane:

But here’s a question → how do you find the center of the lane?

Consider a road with lane markings on either side,

Now, let’s assume that some system (say, a camera), can track the yellow lines to an extent. We need to find the center of the lane, as marked in the figure:

Based on the two points marked up ahead (that we can detect to be on the same plane), we can calculate,

$$x_{center} = \frac{x_{left\_end}+x_{right\_end}}{2}$$

Now, a car need not be in the actual center of the lane,

Now, assuming that the camera is mounted at the center of the car,

The car’s position can be calculated as:

$$x_{car} = \frac{width}{2}$$

From this we can calculate the cross-track error (CTE),

CTE = xcar − xcenter

What happens when,

  • CTE > 0
  • CTE < 0?

Now, back to our original problem → keeping the car in the center of the lane. We do this by → keep CTE as small as possible and applying corrections,

The $64,000 question is: how?

Answer: feedback control!

Recall the various components of the feedback control:


Now, let’s map the lane following components on to this:

As we see from the figure, the lane following controller sends the control/actuation signal to the steering unit. Sensors (perhaps a camera equipped with vision algorithms in this case) provide feedback to the controller ($\overline y_t$). Mapping this back to the lane variables and CTE,

This figure shows the important part → the CTE is the feedback for the lane following controller! The input then is the negative error, i.e., the goal is to reduce the CTE. Also note that the output of the controller is the steering input.

So, let’s focus in on how the controller operates, i.e., this part:


Problem statement:

given the CTE, how do we compute the control signal so that the car stays in the middle of the lane?

The final “corrections”, when applied, may look something like this:

6.4 PID Control

Let’s start with one goal → lateral position control:

process variable y(t) y position at time, t
goal y = 0 keep the car at position 0
control signal u(t) steering

Let’s say we have the car’s start and end position,

And we know the relationship between u(t) and y(t):

Basically we want u(t) to be negative → so that y tends towards its eventual goal, y = 0.

So, what should our control input, e(t) = ?


As we see below, we want the input to be a decreasing value of the feedback,

e(t) = −y(t)


This is called proportional (P) control.

6.4.1 Proportional (P) Control

The correction is proportional to the size of the error, i.e.,

So, going back to our example of lateral control, let us try to apply the proportional control methodology to it:

We have the following choices:

  • Kp > 0
  • Kp < 0

We pick the Kp > 0 since we want the following relationship to hold (following from e(t) = −y(t)):

Kpe(t) = −Kpy(t)


We see that this proportional controller helps us move the car towards our goal, y = 0.

Now, let’s consider a few situations:

  1. what happens if Kptoo small (small “gain”)?


The response is too slow/gradual. We may never get to the goal in this case.

  1. what happens if Kptoo large (large “gain”)?


The response is too sudden. The system may overshoot the goal!

So, the next question that comes up → can the car be stabilized at y = 0? This is unlikely using just the proportional control method since,

gain effect
small stead-state error
large oscillations

The question then becomes → how can we reduce oscillation, i.e., can we get to the following situation (smoother, actual approach to the goal)?

6.4.2 Derivative (D) Control

Derivative control improves the dynamic response of the system by,

  • studying the rate of change of the error and
  • decreasing oscillation


So, what does this mean, in practice? What we’re measuring and trying to control, is the rate of change, i.e., the change from,

y(t − 1) → y(t)


Typically, proportional and derivative control are often used together, as a way to counteract each others’ influences. So, we get:

As with the proportional controller, we have two options for the derivative as well. Should,

  • $\frac{dy(t)}{dt} < 0$
  • $\frac{dy(t)}{dt} > 0$

Note that the derivative controller’s job is to steer away from the reference line, to act as a counter to the proportional controller pushing in the opposite direction. Hence, we pick $\frac{dy(t)}{dt} < 0$.

The derivative controller acts like a brake and counteracts the correctional force. It reduces overshoot by slowing the correctional factor → as the reference goal approaches.

We see numerous uses of the combination, known as the P-D controllers in every day life, from the smallest to the largest, even rocket ships!

6.4.2.1 Tuning P-D Controllers

Consider the following values for the two coefficients, Kp and Kd:

As we see, tuning Kd has a significant impact! The oscillations pretty much go away and we quickly get to the reference line with very little oscillation. Of course, the car overshoots a little but the combination of P-D brings it back soon enough.

An interesting question arises → what if we increase the value of Kd – eventually making it very large?

While we see from the first graph (Kp = 0.2, Kd = 4.0) that the oscillations have gone away, increasing Kd further makes the situation worse – it drives the car away from the reference goal!

How do we deal with this?

Well, by tuning the paramters, of course! For instance, in the last case, if we make a change K_p = 3.0,

We see a quick, “smooth” path to the reference!

In fact, a lot of the design of control systems involves the tuning of such parameters, depending on the system, to get things “just right”.

6.4.3 Integral (I) Control

Are we done? Not quite. Let’s take a closer look at the results:

As we see from this image, even though we reached the reference, the behavior is not smooth! There could be many reasons for this, such as steering drift, caused by the mechanical design of the system:


There are a variety of unmodeled disturbances and systemic errors (“bias”):

  • actuators and processes → not ideal
  • friction, steering, drift, changing workloads, misalignments, etc.

Hence, the signal may never reach the set points! It will end up “settling” near the reference, which is not always ideal.

To deal with this, we need integral (I) control.

First, let’s define steady state error (SSE):

difference between the reference and the steady-state process variable

Hence, when time goes to infinity,

SSE = limx → ∞[r(t) − y(t)]

The correction must be proportional to both → error and duration of the error. Essentially it sums the error over time.


Note: unless the error is zero, this term will grow! In fact, the error keeps adding up, so the correction must also increase → this drives the steady state error to zero.

In many instances, integral control is used in conjunction with the P and D controllers,


This is known as: PID Control,

Let’s look at some examples of tuning the various paramters of a PID controller, as applied to our problem:


Let’s increase Kp now and see the effect:

We see that the system has stabilized well around the reference point and, if we zoom in, we will see fewer disturbances.

Now, let’s keep increasing Kp,

Wait, the signal oscillates? The main reason is that the I term is not zero when crossing the reference, y(t) = 0 and it takes a little while to wash out the cumulative error.

In summary:

  • P is required
  • depending on the system, one or both of I/D is combined with P
    • PI, PD or PID

Tuning P, I, D gains. There is no “optimal” way to tune the PID gains

  1. start with → Kp = 0, Kd = 0, Ki = 0
  2. slowly increase Kp until → system oscillates around set point
  3. slowly increase Kd until → system settles around set point
  4. if steady-state error exists → slowly increase Ki until corrected without causing additional oscillations

6.4.4 Some additional feedback control applications:

  1. Segway balance control
  1. Drone control
  1. Motor speed control

References:

  1. Control theory introductions: 1, 2, 3
  2. Map of Control Theory
  3. What is PID Control by Mathworks
  4. Control Theory Lectures by Brian Douglas.
  5. Introduction to Control Theory and Application to Computing Systems by Abdelzaher et al.
  6. Automotive Control Systems
  7. PID Controller Design
  8. Introduction to PID controllers
  9. Construction and theoretical study of a ball balancing platform by Frank and TJERNSTRÖM.
  10. Understanding PID Control: 2-DOF Ball Balancer Experiments
  11. Control Engineering for Industry from Stanford University.
  12. A Line Following Robot Using PID Controller
  13. Ball and Beam: System Modeling
  14. Open Loop Control System

7 Actuation

A controller will typically generate a control signal which, in many physical systems, is used to “actuate” a physical component – i.e., make it move.

An actuator, then, is a part of a device or machine that helps it to achieve physical movements by converting energy, such as electrical, air or hydraulic, into mechanical force. Simply put, it is the component in any machine that enables movement. They’re like muscles on a human body – converting energy to physical action. Actuators are present in almost every machine around us, from simple electronic access control systems, the vibrator on your mobile phone and household appliances to vehicles, industrial devices, and robots. Common examples of actuators include electric motors, stepper motors, jackscrews, electric muscular stimulators in robots, etc.

Defined simply, an actuator is a device that converts energy, which may be electric, hydraulic, pneumatic, etc., to mechanical in such a way that it can be controlled. The quantity and the nature of input depend on:

  • the kind of energy to be converted and
  • function of the actuator.


Electric and piezoelectric actuators, for instance, work on the input of electric current or voltage, for hydraulic actuators, its incompressible liquid, and for pneumatic actuators, the input is air. The output is always mechanical energy.

They are responsible for ensuring a device such as a robotic arm is able to move when electric input is provided. A car uses actuators in the engine control system to regulate air flaps for torque and optimization of power, idle speed and fuel management for ideal combustion.

An actuator requires,

  • a control device (which provides control signal) and
  • a source of energy.

The displacement achieved is commonly linear or rotational, as exemplified by

  • linear motors and
  • rotary motors.

Another broad classification of actuators separates them into two types:

  1. continuous-drive actuators and
  2. incremental-drive actuators (e.g., stepper motors).

Brushed DC motors move continuously when DC voltage is applied to their terminals.

Stepper motors are a variant of motors, named brushless motors, that rotate in a series of small and discrete angular steps. Stepper motors can be set to any given step position without needing a position sensor for feedback. step position can be rapidly increased or decreased to create continuous rotation, or the motor can be ordered to actively hold its position at one given step. Motors vary in size, speed, step resolution and torque.

The stepper motor is known for its property of converting a train of input pulses (typically square waves) into a precisely defined increment in the shaft’s rotational position. Each pulse rotates the shaft through a fixed angle.

[From Wikipedia: Animation of a simplified stepper motor turned on, attracting the nearest teeth of the gear-shaped iron rotor - with the teeth aligned to electromagnet 1, they will be slightly offset from right electromagnet (2) - Frame 2: The top electromagnet (1) is turned off, and the right electromagnet (2) is energized, pulling the teeth into alignment with it. This results in a rotation of 3.6° in this example. - Frame 3: The bottom electromagnet (3) is energized; another 3.6° rotation occurs. - Frame 4: The left electromagnet (4) is energized, rotating again by 3.6°.

When the top electromagnet (1) is again enabled, the rotor will have rotated by one tooth position; since there are 25 teeth, it will take 100 steps to make a full rotation in this example.]

Motor Control: motor speed and direction are dictated by the voltage applied – change or reverse the polarity of the voltage and the motor will respond in a similar fashion. Voltage can be changed by raising the series resistance within the electrical circuit, which in turn lowers the current through the motor. This change in voltage can be accomplished by series resistors, potentiometers or rheostats. While these devices may be effective for small changes in voltage, the power and torque of the motor are decreased as the current drops. In addition, significant resistance from these devices can produce a lot of heat which could damage other devices within the electrical system.

A more efficient way to vary voltage is to use a PWM controller.

7.1 Pulse Width Modulation

Digital Signals are represented as 0 and 1. Analog signals, on the other hand, have a greater range of values, often continuous in nature (as we saw in the bit about ADCs). To control a physical/“analog” device using a microcontroller, we need to do the opposite → convert from digital to analog signal.

Some microcontrollers have an onboard digital-to-analog converter (DAC) to output a true analog signal in order to control analog devices and we can even use an external DAC. But a DAC is relatively expensive to produce in terms of cost and it also takes up a lot of silicon area. To overcome these issues and to easily achieve the functionality of a DAC in a much more cost-efficient way, we use pulse-width modulation (PWM).

PWM is a method to control analog devices using digital signals. We output an “analog-like signal” from the microcontroller that can then control motors, lights and various actuators.

Note: the PWM is not a true analog signal, just a digital one modified to behave like one. It is essentially a rectangular wave with varying “duty cycle” and periods.

In the following example, an idealized inductor is driven by a set of voltage pulses (in blue) that result in a sine-wave-like current (in red).

PWM is useful for controlling the average power or amplitude delivered by an electrical signal. The average value of voltage (and current) fed to the load is controlled by switching the supply between 0 and 100% at a rate faster than it takes the load to change significantly. The longer the switch is on, the higher the total power supplied to the load.

Example: Consider the following analogy. Imagine you have a ceiling fan that has just an off-on switch, i.e., it is either stationary or goes to 100%.

What if I say: I want the fan to operate at 50%? The only “control” you have is the on-off switch. Can you do it?

Solution:

  • turn on switch
  • wait till fan reaches 50% (or close to it)
  • turn it off
  • when it starts to slow down → turn it on again
  • repeat fast enough and you get close to the 50%
  • the faster you do this → closer to the desired value (aka setpoint)

Ideally I don’t recommend doing this…


So a PWM “wave” looks like:

on-off switching pulse
duration for which the pulse is held at a high state pulse width
T period

A PWM wave has two important properties that needs to be tuned:

  1. duty cycle
  2. period → one complete cycle of the signal/pulse.

7.1.1 Duty Cycle

Recall that logic highon (or off depending on the system, but pick one for consistency). To represent an on time, we use the concept of the duty cycle, defined as:

duty cycle describes the proportion of ‘on’ time to the regular interval or ‘period’ of time.

Duty cycles are represented as percentages (of time that the signal is on, relative to its period).

The duty cycle can be calculated as:

$$D = \frac{T_{on}}{T} * 100$$

where,

D duty cycle (percentage)
Ton duration when signal is on
T total period

Consider the periodic pulse wave, f(t), with a low value, ymin, high value, ymax, and constant duty cycle, D, as shown below:


The average value of a wave is,

$$\bar{y} = \frac{1}{T}\int^T_0f(t)\,dt$$

Since f(t) is a pulse wave, its value is,

ymax 0 < t < D.T
ymin D.T < t < T

Now, we can expand the above expression as,

$$ \begin{align*} \bar{y} &= \frac{1}{T} \left(\int_0^{DT} y_\text{max}\,dt + \int_{DT}^T y_\text{min}\,dt\right)\\ &= \frac{1}{T} \left(D \cdot T \cdot y_\text{max} + T\left(1 - D\right) y_\text{min}\right)\\ &= D\cdot y_\text{max} + \left(1 - D\right) y_\text{min} \end{align*} $$

So we can now compute how long the signal should be at ymax and how much at ymin.

7.1.2 PWM Period

The period (or frequency, recall that $f=\frac{1}{T}$) is another important parameter that defines a PWM signal. It essentially determines the number of times a signal repeats per second. The choice of T depends heavily on the application. For instance, when controlling an LED,

the frequency of a PWM signal should be sufficiently high if we wish to see a proper dimming effect while controlling LEDs. A duty cycle of 20% at 1 Hz will be noticeable to the human eye that the LED is turning ON and OFF. However, if we increase the frequency to 100Hz, we’ll get to see the proper dimming of the LED.

7.1.3 PWM Sampling Theorem

Is directly related to the Nyquist-Shannon Sampling Theorem discussed earlier. A simple summary,

number of pulses in the waveform is equal to the number of Nyquist samples.

Recall that the Nyquist rate is, a value equal to twice the highest frequency.

7.1.4 Example | Servo Motor Control

Servos (also RC servos) are small, cheap, mass-produced servomotors or other actuators used for radio control and small-scale robotics.

They’re controlled by sending the servo a PWM (pulse-width modulation) signal, a series of repeating pulses of variable width where either the width of the pulse (most common modern hobby servos) or the duty cycle of a pulse train (less common today) determines the position to be achieved by the servo.

7.1.5 PWM Generation in Microcontrollers

We can use the built-in PWM components in many microcontrollers or timer ICs. Using Arduino, generating a PWM is as simple as writing out a few lines of code!

analogWrite(pin, value)

Note that not all pins of an Arduino can generate a PWM signal. In the case of Arduino Uno, there are only 6 I/O pins (3,5,6,9,10,11) that support PWM generation and they are marked with a tilde (~) in front of their pin number on the board.


Examples of various duty cycles:

analogWrite(PWM_PIN, 64);   // 25% Duty Cycle or 25% of max speed
analogWrite(PWM_PIN, 127);  // 50% Duty Cycle or 50% of max speed
analogWrite(PWM_PIN, 191);  // 75% Duty Cycle or 75% of max speed
analogWrite(PWM_PIN, 255);  // 100% Duty Cycle or full speed


The Raspberry Pi also has a variety of GPIO pins that can be used for generating PWM signals:


Consider this code for controlling the brightness of an LED:

import RPi.GPIO as IO       #calling header file which helps us use GPIO’s of PI
import time                 #calling time to provide delays in program
IO.setwarnings(False)       #do not show any warnings
IO.setmode (IO.BCM)         #we are programming the GPIO by BCM pin numbers. (PIN35 as ‘GPIO19’)
IO.setup(19,IO.OUT)         # initialize GPIO19 as an output.
p = IO.PWM(19,100)          #GPIO19 as PWM output, with 100Hz frequency
p.start(0)                  #generate PWM signal with 0% duty cycle
while 1:                    #execute loop forever
    for x in range (50):    #execute loop for 50 times, x being incremented from 0 to 49.
        p.ChangeDutyCycle(x) #change duty cycle for varying the brightness of LED.
        time.sleep(0.1)      #sleep for 100m second
    for x in range (50):     #execute loop for 50 times, x being incremented from 0 to 49.
        p.ChangeDutyCycle(50-x)  #change duty cycle for changing the brightness of LED.
        time.sleep(0.1)          #sleep for 100m second

References:

Additional reading/examples/etc. if you want to learn more about PWMs, programming, etc.:

  1. What is a PWM Signal?
  2. Servo Motors programing
  3. What is a DAC and why do you need on anyways?
  4. An Engineer’s Primer on Actuators
  5. What is a Linear Actuator? [YouTube Video]
  6. Understanding the Basics of PWM
  7. Raspberry Pi: PWM Outputs with Python (Fading LED)
  8. Raspberry Pi PWM tutorial
  9. Actuation and Avionics

  1. TBD↩︎

  2. https://dl.acm.org/doi/10.5555/244522.244548↩︎

  3. https://www.cecs.uci.edu/~papers/compendium94-03/papers/2002/date02/pdffiles/05a_1.pdf ↩︎