what is “autonomy”?
we see various examples of it…
perception | how do you “see” the world around you? |
sensing | various ways to perceive the world around you (e.g, camera, LiDar) |
compute | what do you “do” with the information about the world? |
motion | do your computations result in any “physical” changes? |
actuation | what “actions”, if any, do you take for said physical changes? |
planning | can you do some “higher order” thinking (i.e., not just your immediate next move) |
Autonomy is the ability to perform given tasks based on the systems perception |
![]() |
cyber | ![]() |
![]() |
![]() |
physical | ![]() |
![]() |
![]() |
hence, they fall under the class of systems → cyber-physical systems
![]() |
![]() |
…are everywhere!
the embedded components → interactions with the real world
consider the following example of two cars…
the second car is approaching the first
sensors → constantly gathering data/sensing
on detection (of other car) → quickly compute what to do
take physical action (actuation) → say by braking in time
“control”
Remember this → on detection (of other car) →
“quickly” compute → complete computation/actuation → before a deadline
This is a real-time system.
Multiple sensors in an autonomous vehicle → need to combine them somehow
sensor fusion
Once we have information from the sensors (fused or otherwise)…
We need state estimation (kalman filter, ekf).
So far, we have (briefly) talked about…
Sensing:
Actuation:
But the system includes…an operating system (OS) in there
and it includes real-time mechanisms.
We have briefly discussed, EKF:
note: ekf is versatile; can be used for sensor fusion, slam, etc.
All of it integrates with…control:
There are some real-time functions in there…
like braking, engine control.
Question: if we design such a system…
is it “autonomous”?
We are missing some “higher order” functionss from the perspective of the autonomous system:
let us not forget the most important question of all…
why is gamora?
In order to answer the following, we need additional functionality. Let us go through what that might be.
![]() |
|
Simultaneous localization and mapping → figure out where we are.
Understand how to move in the right direction at the micro level, i.e., find waypoints.
Is it “you only live once”? Actually this stands for: “you only look once”. It is an object detection model that uses convolutional neural networks (cnns)
The objective is to avoid objects in the immediate path.
i.e., how to get to destination at the macro level → uses waypoints.
To run all of these functions, we need low power, embedded platforms.
any guesses what they could be?
Essentially safety of → operator, other people, the
vehicle, environment This is cross-cutting
issue → affected
Security is another cross-cutting issue →
Hence this figure is a (loose) map of this course:
Just like “autonomy” describing and “embedded system” is hard. What (typically) distinguishes it from other types of computer systems (e.g., laptops, servers or GPUs even) is that such systems are typically created for specific functionality and often remain fixed and operational for years, decades even.
Embedded systems often trade off between performance and other considerations such as power (or battery life), less memory, fewer peripherals, limited applications, smaller operating system (OS) and so on. There are numerous reasons for this – chief among them is predictability – designers need to guarantee that the system works correctly, and remains safe, all the time. Hence, it must be easy to certify 1 the entire system. This process ensures that the system operates safely.
One piece of information that is required to ensure predictability and guarentee safety is worst-case execution time (WCET). The WCET/BCET is the longest/shortest execution time possible for a program, on a specific hardware platform – and it has to consider all possible inputs. WCET is necessary to ensure the “schedulability”, resource requirements and performance limits of embedded and real-time programs. There are lots of approaches to computing the WCET, e.g.,
At a high-level, the execution time distributions of applications look like:
WCET analysis is a very active area of research and hundreds of papers have been written about it, since it directly affects the safety of many critical systems (aircraft, power systems, nuclear reactors, space vehicles and…autonomous systems).
There are structural challenges (both in software and hardware) that prevent the computation of proper wcet for anything but trivial examples. For instance, consider,
void main()
{
int max = 10 ;
int sum = 0;
for( int i = 0 ; i < max ; ++i)
+= i ;
sum }
How do you compute the WCET for this code? Say running on some known processor, P?
Well, there’s some information we need,
Let’s assume (from the manual for P), we get the following information,
1 void main() // startup cost = 100 cycles
2 {
3 int max = 15 ; // 10 cycles
4 int sum = 0; // 10 cycles
5 for( int i = 0 ; i < max ; ++i) // 5 cycles, once
6 sum += i ; // 20 cycles each iteration
7 } // cleanup cost = 120 cycles
So, based on this, we can calculate the total time to execute this program:
wcet = startup_cost + line_3 + line_4 + loop_startup_cost + (line_6 * max) [1]
wcet = 100 + 10 + 10 + 5 + (20 * 15)
wcet = 425 cycles
Now consider this slight change to the above code:
void main( int argc, char* argv[] )
{
int max = atoi( argv[1] ) ; // convert the command line arg to max
int sum = 0;
for( int i = 0 ; i < max ; ++i) // how many iterations?
+= i ;
sum }
The problem is that equation [1] above fails since we no
longer know the value of max
. Hence the
program can run for any arbitrary amount of time,
depending on the given input! Note that
none of the aforemention wcet methods will
help in this case since the input can be completely
arbitrary. Hence, the structure of the software code can
affect wcet calculations.
Another problem is that of hardware (and interactions between hardware and software). Now consider if we modify the original code as,
#define VERY_LARGE_ARRAY+SIZE 1>>18
void main()
{
int first_array[VERY_LARGE_ARRAY_SIZE] ;
int second_array[VERY_LARGE_ARRAY_SIZE] ;
int sum_first = 0;
int sum_second = 0;
for( int i = 0 ; i < VERY_LARGE_ARRAY_SIZE * 2 ; ++i)
{
if( i%2 )
+= first_array[i/2] ;
first_sum else
+= second_array[(int)((i/2)+1)] ;
second_sum }
}
Now, while we can compute, using equation [1] the wcet
from the code perspective (since we know the loop runs for
VERY_LARGE_ARRAY_SIZE * 2
iterations), there
will be significant non-obvious hardware issues, in the
cache. Each iteration is accessing a
different large array. Hence, it will load the
cache with lines from that array and in the very next
iteration the other array will be loaded, also missing
in the cache. For instance,
iteration | operation | cache state | reason |
---|---|---|---|
1 | first_array loaded |
miss | evicts whatever was previously in cache |
2 | second_array loaded |
miss | evicts first_array due to
lack of space |
3 | first_array loaded again |
miss | evicts second_array due to
lack of space |
… | |||
Hence, this program will constantly sufffer cache misses and since caches misses (and reloads) are expensive (in terms of time), the loop’s execution time will balloon out of control! Hence, even though we fixed the code issue (upper bound on number of iterations, hardware artifacts can change the wcet calculations). So now, we need to model cache behavior for each program and data variable! This is notoriously complicated even for the simplest of programs.
Other hardware designs further complicate matters, e.g.,
Any contemporary processor design that improves performance, turns out to be bad for wcet analysis. So, the fewer (or simpler versions of) these features, the better it is for the (eventual) safety and certification of the system.
This is one of the main reasons why embedded (and especially real-time) systems prefer simpler processors (simple pipelines, fewer complex features, simpler memory/cache architectures, if any) since they’re easier to analyze. In fact, many critical systems (e.g., aircraft, cars, etc.) use older processors (often designed in the 1980s and 1990s) – even the ones beind design today!
Just as embedded systems are varied, embedded processors come in a myriad of shapes and sizes as well. From the very small and simple (e.g., DSPs) to the very large and complex (modern multicore chips, some with GPUs!). Here is a (non-exhaustive) list of the types of embedded processors/architectures in use today:
According to Wikipedia,
“A microcontroller (MC, UC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit.”
These may be among the most common type of “processors” used in embedded systems. According to many studies, more than 55% of the world’s processors are microntrollers! Microcontrollers are typically used in small, yet critical, systems such as car engine control, implantable medical devices, thermal monitoring, fault detection and classification among millions of other applications.
Microcontrollers hardware features typically include,
component | details |
---|---|
one (sometimes more) CPU cores | typically simple 4 or 8 bit
chips |
small pipelined architectues | sometimes 2 or 4 stage
pipelines |
some limited memory | typically a few hundred kilobytes, perhaps in the form of EEPROMs or FLASH |
some programmable I/O | to interact with the real world |
low operating frequencies | e.g., 4 KHz ; simpler/older processors, yet
more predictable |
low power consumption | in the milliwatts or microwatts ranges; might even be nanowatts when the system is sleeping |
interrupts (some programmable) | often real-time (ficed/low latency) |
several general-purpose I/O (GPIO) pins | for I/O |
timers | e.g., a programmable interval timer (PIT) |
There are some additional features found on some microcontrollers, viz.,
component | details |
---|---|
analog to digital (ADC) signal convertors | to convert incoming (real-world, sensor) data to a digital form that the uC can operate on |
digital-to-analog (DAC) convertor | to do the opposite, convert from digital to analog signals to send outputs in that form |
universal asynchronous transmitter/receiver (UART) | to receive/send data over a serial line |
pulse width modulation (PWM) | so that the CPU can control motors (significant for us in autonomous/automotive systems), power systems, resistive loads, etc. |
JTAG interace | debugging interface |
Some examples of popular microcontroller families:
![]() Atmel ATmega |
![]() Microchip Technology |
![]() Motorola (Freescale) |
![]() NXP |
Microcontroller programs and data,
DSPs are specialized microcontrollers optimized for digital signal processing. They find wide use in audio processing, radar and sonar, speech recognition systems, image processing, satellites, telecommunications, mobile phones, televisions, etc. Their main goals are to isoloate, measure, compress and filter analog signals in the real world. They often have stringent real-time constraints.
The Texas Instruments DSP chip, TMS320 Series is one of the most famous example of this type of system:
Typical digital signal processing (of any kind) requires repetitive mathematical operations over a large number of samples, in real-time, viz., - analog to digital conversion - maniupulation (the core algorithm) - digital to analog conversion
Often, the entire process must be completed with low latency, even within a fixed deadline. They also have low power requirements since DSPs are often used in battery-constrained devices such as mobile phones. Hence, the proliferation of specialized DSP chips (instead of pure software implementations, which also exist; MATLAB has an entire DSP System Toolbox).
Typical DSP architecture/flow (credit: Wikipedia):
These types of chips typically have custom instructions
for optimizing certain (mathematical) operations (apart from
the typical add
, subtract
,
multiply
and divide
), e.g., -
saturate
; caps the minimum or maximum value
that can be held in a fixed-point representation -
ed
; euclidian distance -
accumulate
instructions ; for multiply-and-accumulate
operations, i.e., a ← a + (b * c)
See the Microchip instruction set details for more information for a typical DSP ISA.
DSPs require optimization of streaming data and hence, - require optimized memories and caches → fetch multiple data elements at the same time - code may need to be aware of, and explicitly manipulate caches - may have rudimentary OS but no virtual memory
Microprocessors are, then, general-purpose chips (as opposed to microcontrollers and DSPs) that are also used extensively in embedded systems. They are used in systems that need more heavy duty computing/memory and/or more flexibility in terms of programming and management of the system. They use a number of commodity processor architectures (e.g,, ARM, Intel x86).
Main features of microprocessors:
component | details |
---|---|
cores | single or multicore; powerful |
pipelines | more complex pipelines; better performance, harder to analyze (e.g., wcet) |
clock speeds | higher clock speeds; 100s of khz, or even
GHz |
ISA | common ISA; well understood, not custom |
memory | significant memory; megabytes, even gigabytes |
cache hierarchies | multiple levels, optimized |
power consumption | much higher, but can be reduced (e.g., via voltage and frequency scaling) |
size, cost | often higher |
interrupts, timers | more varied, easily programmable |
I/O | more interfaces, including commodity ones like USB |
security | often includes additional hardware security features, e.g., ARM TrustZone. |
The ARM M-85 Embedded Microprocessor architecture:
When compared to microcontrollers (or even SoCs), most microprpcessors do not include components such as DSPs, ADCs, DACs, etc. It is possible to augment the microprocessor to include this functionality → usually by connecting one or more microcontrollers to it!
On the software side, microprocessors typically have the most flexibility:
Due to their power (and cost) these types of systems are only used when really necessary or in higher-end systems such as mobile phones and autonomous cars.
An SoC integrates most components in and around a processor into a single circuit, viz.,
All of these are placed on a single substrate.
SoCs are often designed in C++
,
MATLAB
, SystemC
, etc. Once the
hardware architectures are defined, additional hardware
elements are written in hardware description languages,
e.g., register transfer levels (RTL
) 2.
Additional components could include,
In some sense, an SoC is an integration of a processor with peripherals. New hardware elements
Some examples of modern SoCs:
Broadcom Soc from Raspberry Pi
Apple M1 SoC
The integration of all hardware components has some interesting side-effects:
effect | benefit | problems |
---|---|---|
tight integration | better performance, fewer latencies | cannot replace individual components |
custom code/firmware | better use of hardware | not reusable in other systems |
custom software libraries | easier programming of SoC | reduces code reusability in other systems |
low power consumption | better battery life, less heat | (potentially) slower |
Depending on the processor/microcontroller that sits at the center of the SoC, the software stack/capabilities can vary. Many commons SoCs exhibit the following software properties:
C
, C++
,
python
, java
, even lisp
!The Raspberry Pi is a common example of a system that uses a Broadcom BCM series of SoCs. We use the BCM2711 SoC in our course for the Raspberry Pi 4-B.
There are hardware platforms that include accelerators in embedded systems, e.g., GPUs, AI-enabled silicon, extra programmable FPGA fabric, security features, etc. The main idea is that certain computation can be offloaded to these accelerators while the main CPU continues to process other code/requests. The accelerators are specialized for certain computations (e.g., parallel matrix multiplications on GPUs, AES encryption). Some chips include FPGA fabric where the designer/user can implement their own custom logic/accelerators.
In a loose sense, the Navio2 can be considered as a hardware coprocessor for the Raspbery Pi.
The NVidia Jetson Orin is a good example of an AI/GPU focussed embedded processor:
This system’s specifications:
These systems are finding a lot of use in autonomous systems since they pack so much processing power into such a small form factor
Application-specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs). These platforms combine the advantages of both, hardware (speed) and software (flexibility/programmability). They are similar, yet different. Both are semiconductor devices that include programmable logic gates but an ASIC is static – i.e., once the board has been “programmed” it cannot be changed while an FPGA, as the name implies, allows for “reprogramming”.
ASICs are custom-designed for specific applications and provide high efficiency and performance. FPGAs are reprogramamble devices that provide significant flexibility. Many designers also used it for prototyping hardware components (before they are eventually included either in the processors or custom ASICs). The choice between ASICs and FPGAs depends entirely on the application requirements and other factors such as cost.
![]() |
![]() |
An ASIC | Xilinx Spartan FPGA |
These are specialized semiconductor devices – to implement a custom function, e.g., cryptocurrency mining, nuclear reactor control, televisions. ASICs are tailored to their specific applications. Once created, it cannot be reprogrammed or modified. ASICs are created using a process known as photolithography, a method to prepare nanoparticles, that allows components to be “etched” on to a silicon wafer.
The ASIC design process, while expensive and time consuming, becomes valuable for high-volume products as the per-unit cost decrease when production nunbers increase.
advantages | disadvantages |
---|---|
high performance | lack of flexibility |
low power consumption | high initial costs |
small form factor | long development time |
ip protection | obsolescence risk |
good for mass production | risks with manufacturing yields |
can integrate multiple functions | design complexity |
These are also semiconductor devices but they can be preprogrammed to implement various circuits and functions. Designers can change the functionality after the curcuits have been embossed onto the hardware. Hence, they’re good for systems that might require changes at design time and rapid prototyping. An FPGA is a collection of programmable logic and interconnects. They include lookup tables (LUTs) and other parts that can be used to develop multiple, fairly wide-ranging, functions. The programmable blocks can be connected to each other via the interconnects. Some FPGAs even come with additional flash memory.
FPGAs are programmed using hardware description languages such as Verilog/VHDL.
advantages | disadvantages |
---|---|
flexibility | lower performance |
shorter development time | higher power consumption |
upgradability | high design complexity |
lower (initial) costs | higher per-unit costs |
better processing capabilities | design complexity |
lower obsolescence risks | larger form factor |
Embedded systems need to communicate and/or interface with various elements:
Hence a large number of communication standards and I/O interfaces have been developed over the years. Let’s look at a few of them:
Serial communication standards are used extensively across many domains, mainly due to their simplicity and low hardware overheads. The most common among these are the asynchronous serial communication systems.
From Wikipedia:
Asynchronous serial communication is a form of serial communication in which the communicating endpoints’ interfaces are not continuously synchronized by a common clock signal. Instead of a common synchronization signal, the data stream contains synchronization information in form of start and stop signals, before and after each unit of transmission, respectively. The start signal prepares the receiver for arrival of data and the stop signal resets its state to enable triggering of a new sequence.
The following figure shows a communication sample that demonstrates these principles:
We see that each byte has a start
bit,
stop
bit and eight data
bits. The
last bit is often used as a parity
bit. All of
these “standards” (i.e., the start/stop/parity bits) must be
agreed upon ahead of time.
A universal asynchronous receiver-transmitter (UART) then is a peripheral device for such asynchronous commnication; the data format and transmission speeds are configurable. It sends data bits one-by-one (from least significant to most). The precise timing is handlded by the communication channel.
The electric signalling levels are handled by an external driver circuit. Common signal levels:
Here we will focus on the RS-232 standard since it is most widely used UART signaling level standard today. The full name of the standard is: “EIA/TIA-232-E Interface Between Data Terminal Equipment and Data Circuit-Termination Equipment Employing Serial Binary Data Interchange” (“EIA/TIA” stands for the Electronic Industry Association and the Telecommunications Industry Association). It was introduced in 1962 and has since been updated four times to meet evolving needs.
The RS-232 is a complete standard in that it specifies,
The RS-232 specifies the electrical, functional and mechanical characteristics to meet all of the above criteria.
For instance, the electrical characteristics are defined in the following figure:
Details:
0
] (aka “marking”) → +5V
to +15V
(realistically +3V
to
+15V
)1
] (aka “spacing”) → -5V
to -15V
(realistically -3V
to
-15V
)Other properties also defined, e.g., “slew rate”, impedance, capacitive loads, etc.
The standard also defines the mechanical interfaces, i.e., the pin connector:
While the official standard calls for a 25-pin connector, it is rarely used. Instead, the 9-pin connector (shown on the right in the above figure) is in common use.
You can read more details about the standard here: RS 232
Synchronous Serial Interfaces (SSIs) are a widely used in industrial applications between a master device (e.g. controller) and a slave device (e.g. sensor). It is based on the RS-422 standards and has a high protocol efficiency as well multiple hardware implementations.
SSI properties:
The Inter-Integrated Circuit (I2C, IIC, I2C) is a synchronous, multi-controller/multi-target (historically termed as multi-master/multi-slave), single-ended, serial communication bus. I2C systems are used for attaching low-power integrated circuits to processors and microcontrollers – usually for short distance or intra-board communication.
I2C components are found in a wide variety of products, e.g.,
The main advantage of I2C is that a microcontroller can control a network of chips with just two general-purpose I/O pins (serial data line and a serial clock line) and software. A controller device can communicate with any target device through a unique I2C address sent through the serial data line. Hence the two signals are:
line | voltage | description |
---|---|---|
serial data line (SDL) | +5V |
transmit data to or from target devices |
serial clock line (SCL) | +3V |
synchronously clock data in or out of the target device |
Both are bidirectional and pulled up with resistors.
Here is a typical implementation of I2C:
An I2C chip example (used for controlling certain TV signals):
I2C is half-duplex communication where only a single controller or a target device is sending data on the bus at a time. In comparison, the serial peripheral interface (SPI) is a full-duplex protocol where data can be sent to and received back at the same time. An I2C controller device starts and stops communication, which removes the potential problem of bus contention. Communication with a target device is sent through a unique address on the bus. This allows for both multiple controllers and multiple target devices on the I2C bus.
I2C communication details (initiated from the controller device):
condition | description |
---|---|
I2C START |
the controller device first pulls the SDA low and then pulls the SCL low |
I2C STOP |
the SCL releases high and then SDA releases high |
I2C communication is split into: frames.
Communciation starts when one controller sends an
address frame
after a START
. This
is followed by one or more data frames
, each
consisting of one byte. Each frame also has
an acknowledgement
bit. An example of two I2C
communication frames:
You can read more at: I2C.
The Serial Peripheral Interface (SPI) has become the de facto standard for synchronous serial communication. It is used in embedded systems, especially between microcontrollers and peripheral ICs such as sensors, ADCs, DACs, shift registers, SRAM, etc.
The main aspect of SPI is that one main device orchestrates communication with one ore more sub/peripheral devices by driving the clock and chip select signals.
SPI interface properties:
microchip SPI | basic SPI Interface |
---|---|
![]() |
![]() |
The SPI interface contains the following wires:
signal | description | function |
---|---|---|
SCLK |
serial clock | clock signal from main |
CS |
chip/serial select | To select which host to communicate with |
MOSI |
main out, subnode In | serial data out (SDO) for host to target communication |
MISO |
main in, subnode Out | serial data in (SDI) for target to host communication |
The main node generates the clock signal. Data
transmissions between main ahd sub nodes is synchronized by
that clock signal generated by main. SPI devices support
much higher clock frequencies than I2C. The
CS
signal is used to select the subnode. Note
that this is an active low signal,
i.e., a low (0
) is a selection and a
high (1
) is a disconnect. SPI is a full-duplex
interface; both main and subnode can send data at the same
time via the MOSI and MISO lines respectively. During SPI
communication, the data is simultaneously transmitted
(shifted out serially onto the MOSI/SDO bus) and received
(the data on the bus (MISO/SDI) is sampled or read in).
Example: the following example demonstrates the significant savings and simplification in systems design (reduce the number of GPIO pins required).
Consider the ADG1412 switch being managed by a microcontroller as follows:
Now, as the number of switches increases, the requirement
on GPIO pins also increases significantly. A
4x4
configuration requires 16
GPI
pins, thus reducing the number of pins available for the
microcontroller for other tasks, as follows:
One approach to reduce the number of pins would be to use a serial-to-parallel convertor:
This reduces the pressure on the number of GPIO pins but still introduces additional circuitry.
Using an SPI-enabled microcontroller reduces the number of GPIOs required and and eliminates the overheads of the needing additional chips (serial-to-paralle convertor):
In fact, using a different SPI configuration (“daisy-chain”), we can optimize the GPIO count even further!
You can read more about SPI here.
A GPIO is a signal pin on an integrated circuit or board that can be used to perform digital I/O operations. By design, it has no predefined purpose → can be used by hardware/software developers to perform functions they choose, e.g.,
GPIOs can be implemented in a variety of ways,
While microcontrollers may use GPIOs are their primary external interface, many a time the pins may be capable of other functions as well. In such instances, it may be necessary to configure the pins using other functions.
Some examples of chips with GPIO pins:
Intel 8255 | PIC microchip | ASUS Tinker |
---|---|---|
![]() |
![]() |
|
24 GPIO pins | 29 GPIO pins | 28 GPIO pins |
GPIOs are used in a diverse variety of applications, limited only by the electrical and timing specifications of the GPIO interface and the ability of software to interact with GPIOs in a sufficiently timely manner.
Some “properties”/applications of GPIOs:
GPIO interfaces vary widely. Most commonly, they’re simple groups of pins that can switch between input/output. On the other hand, each pin can be set up differently → set up/accept/source different voltages/drive strengths/pull ups and downs.
Programming the GPIO:
For more information on programming/using GPIOs, read these: GPIO setup and use, Python scripting the GPIO in Raspberry Pis, general purpose I/O, GPIO setup in Raspberry Pi.
The JTAG standard (named after the “Joint Test Action Group”), technically the IEEE Std 1149.1-1990 IEEE Standard Test Access Port and Boundary-Scan Architecture, is an industry standard for testing and verification of printed circuit boards, after manufacture.
“JTAG”, depending on the context, could stand for one or more of the following:
The basic building block of a JTAG OCD is the Test Access Point or TAP controller. This allows access to all the custom features within a specific processor, and must support a minimum set of commands. On-chip debugging is a combination of hardware and software.
type | description |
---|---|
hardaware | on chip debug (OCD) |
software | in-circuit-emulator (ICE)/JTAG emulator |
The off-chip parts are actually PC peripherals that need corresponding drivers running on a separate computer. On most systems, JTAG-based debugging is available from the very first instruction after CPU reset, letting it assist with development of early boot software which runs before anything is set up. The JTAG emulator allows developers to access the embedded system at the machine code level if needed! Many silicon architectures (Intel, ARM, PowerPC, etc.) have built entire infrastructures and extensions around JTAG.
A high-level overview of the JTAG architecture/use:
JTAG now allows for,
JTAG allows for device programmer hardware allows for transfering data into internal, non-volatile memory of the system! Hence, we can use JTAGs to program devices such as FPGAs. In fact, many memory chips also have JTAG interfaces. Some modern chips also allow access to the the (internal and external) data buses via JTAG.
JTAG interface: depending on the actual interface, JTAG has 2/4/5 pins. The 4/5 pin versions are designed so that multiple chips on a board can have their JTAG lines daisy-chained together if specific conditions are met.
Schematic Diagram of a JTAG enabled device:
The various pins signals in the JTAG TAP are:
signal | description |
---|---|
TCK |
synchronizes the internal state machine operations |
TMS |
sampled at the rising edge of TCK to
determine the next state |
TDI |
data shifted into the device’s test or programming
logic; sampled at the rising edge of TCK when
the internal state machine is in the correct state |
TDO |
represents the data shifted out of the device’s test or
programming logic and is valid on the falling edge of
TCK when the internal state machine is in the
correct state |
TRST |
optional pin which, when available, can reset the tap controller’s state machine |
The TAP controller implements the following state machine:
To use the JTAG interface,
TMS
, TCK
, TDI
,
TDO
, etc.) through some kind of JTAG
adapterTMS
and TDI
in conjunction with
TCK
TDO
(which is
the only standard host-side input)TMS
/TDI
/TCK
output transitions create the basic JTAG communication
primitive on which higher layer protocols buildFor more information about JTAG, read: Intel JTAG Overview, Raspberry Pi JTAG programming, Technical Guide to JTAG and the JTAG Wikipedia Entry is quite detailed.
CAN is a vehicle bus standard to enable efficient communication between electronic control units (ECUs). CAN is,
CAN does not need a a host controller. ECUs connected via the CAN bus can easily share information with each other. all ECUs are connected on a two-wire bus consisting of a twisted pair: CAN high and CAN low. The wires are often color coded:
CAN high | yellow |
CAN low | green |
CAN wiring | multi-ecu CAN setup |
An ECU in a vehicle consists of:
components | internal architecture |
---|---|
|
|
Any ECU can broadcast on the CAN bus and the messages are accepted by all ECUs connected to it. Each ECU can either choose to ignore the message or act on it.
what are the implications for security?
While there is no “standard” CAN connector (each vehicle may use different ones), the CAN Bus DB9 connector has become the de facto standard:
The above figure shows the various pins and their signals.
CAN Communication Protocols: CAN is split into:
layer | relation to OSI stack |
---|---|
|
|
All communication over the CAN bus is done via the
CAN frames. The standard CAN frame
(with an 11-bit
identifier) is shown below:
While the lower-level CAN protocols described so far work on the two lowest layers of the OSI networking stack, it is still limiting. For instance, the CAN standard doesn’t discuss how to,
Hence, some higher-order protocols have been developed, viz.,
protocol | description |
---|---|
OBD2 | on-board diagnostics in cars/trucks for diagnostics, maintenance, emissions tests |
UDS | Unified Diagnostic Services (UDS) used in automotive ECUs for diagnostics, firmware updates, routine testing |
CCP/XCP | used in embedded control/industrial automation for off-the-shelf interoperability between CAN devices |
SAE J1939 | for heavy-duty vehicles |
NMEA 2000 | used in maritime industry for connecting e.g. engines, instruments, sensors on boats |
ISOBUS | used in agriculture and forestry machinery to enable plug and play integration between vehicles/implements, across brands |
There also exist other higher-order protocols (numbering in the thousands) the most prominent of which are: ARINC, UAVCAN, DeviceNet, SafetyBUS p, MilCAN, HVAC CAN.
More details about CAN and its variants: CAN Bus Explained.
Autonomous (and other embedded systems) use a variety of other communication protocols in order to interface with the external world and/or other systems (either other nodes in the system or external components such as back end clouds).
Note that since many of these are well known and publicly documented, we won’t elaborate much here.
Here are some of the well known communication protocols, also used in embedded systems:
protocol | links |
---|---|
USB | How USB works: part 1, part2, part 3; USB in a Nutshell (very detailed). |
Ethernet | Reliable Embedded Ethernet, Embedded Ethernet and Internet (book, online) |
WiFi | WiFi Sensing on the Edge (paper) |
Bluetooth | Bluetooth Basics, Bluetooth Low Energy |
Radio | Embedded Development with GNU Radio |
An embedded/autonomous system perceives the physical world via sensors – either to gather information about its environment or to model its own state. Hence it is a critical component in the sensing → planning → actuation loop and a critical component in the design of embedded and autonomous systems.
![]() |
![]() |
Modern autonomous systems used a wide array of sensors. This is necessary due to:
At its core,
a sensor captures a physical/chemical/environmental quantity and converts it to a digital quantity.
(hence the need for an Analog-to-Digital Convertor (ADC) as we shall see later)
By definition, sensors generate signals.
A signal, s
, is defined as a mapping from the
time domain to a value domain:
s : Dt ↦ Dv
where,
symbol | description |
---|---|
Dt | continuous or discrete time domain |
Dv | continuous or discrete value domain |
Note: remember that computers require discrete sequences of physical values. Hence, we need to convert the above into the discrete domain. The way to achieve this: sampling:
The figure shows a continuous signal being sampled (in red arrows). We will discuss sampling and related issues later in this topic.
Sensors come in various shapes and sizes. Usually designers of autonomous systems will develop a “sensor plan that will consider,
Hence, each autonomous system will likely have its own set of sensors (or sensor plan). Typical sensors found on modern autonomous systems can be classified based on the underlying physics used:
physical property | sensor |
---|---|
internal measurements | IMU |
external measurements | GPS |
“bouncing” electromagnetic waves | LiDAR, RADAR, mmWave Radar |
optical | cameras, infrared sensors |
accoustic | ultrasonic sensors |
Some of the above can be combined to generate other sensing patterns, e.g., stereo vision using multiple cameras or camera+LiDAR.
We will go over some of these sensors and their underlying physical principles.
These sensors define the movement of a vehicle, along the three axes, in addition to other behaviors like acceleration and directionality. An IMU typically includes the following sensors:
![]() |
![]() |
![]() |
![]() |
As we see from the first picture above, an IMU also has a CPU (typically a microcontroller) to manage/collect/process the data from the sensors.
The functions of the three sensors are:
![]() |
![]() |
![]() |
“yaw” | “pitch” | “roll” |
IMUs come in all shapes and sizes. These days they’re very small but the original IMU’s ver really large, as evidenced by the one used in the Apollo space missions:
accelerometer: is the primary sensor responsible for measuring inertial acceleration, or the change in velocity over time.
magnetometer: measures the strength and direction of magnetic field – to find the magnetic north
A very common principle for measuring surroundings is to bounce electromagnetic waves off nearby objects and measuring the round trip times. Shorter times indicate closer objects while longer times indicate objects that are farther away. RADAR is a classic example of this type of sensor and its (basic) operation is shown in the following image (courtesy NOAA):
While many autonomous vehicles use RADAR, we will focus on other technologies that are more prevalent and provide much higher precision, viz.,
LiDAR is a sensor that uses (eye safe) laser beams for mapping surroundings and creating 3D representation of the environment. So lasers are used for,
We can use LiDAR to distance, angle as well as the radial velocity of some objects – all relative to the autonomous system (rather the sensor). So, in practice, this is how it operates:
We define a roundtrip time, $ au$, as
the time between when a pulse is sent out from the
transmitter (TX
) to when light reflected from
the object is detected at the receiver
(RX
).
So, the target range (i.e., the distance to te object), R, is measured as:
R = raccau2
where, c
is the speed of light.
More details (from Mahalati): > Lasers used in lidars have frequencies in the 100s of Terahetrz. Compared to RF waves, lasers have significantly smaller wavelengths and can hence be easily collected into narrow beams using lenses. This makes DOA estimation almost trivial in lidar and gives it significantly better reso- lution than MIMO imaging radar.
The end product of LiDAR is essentially a point cloud, defined as:
a collection of points generated by a sensor. Such collections can be very dense and contain billions of points, which enables the creation of highly detailed 3D representations of an area.
In reality, point cloud representations around autonomous vehicles end up looking like:
Point clouds provide valuable information, viz.,
There are two types of scene illumination techniques for LiDAR:
type | illumination method | detector |
---|---|---|
flash lidar | entire scene using wide laser | receives all echoes on a photodetector array |
scanning lidar | very narrow laser beams, scan illumination spot with laser beam scanner | single photodetector to sequentially estimate $ au$ for each spot |
flash lidar | scan lidar | |
---|---|---|
architecture | ![]() |
![]() |
resolution determined by | photodetector array pizel size (like camera) | laser beam size and spot fixing |
frame rates | higher (up to 100 fps ) |
lower (< 30 fps ) |
range | shorter (quick beam divergence, like photography) | longer (100m+ ) |
use | less common | most common |
Now, consider the following scene (captured by a camera):
Compare this to the LiDAR images captured by the two methods:
flash lidar | scan lidar (16 scan lines) | scan lidar (32 scan lines) |
---|---|---|
![]() |
![]() |
![]() |
A “LiDAR scan line” refers to a single horizontal line of laser pulses emitted by a LiDAR sensor, essentially capturing a cross-section of the environment at a specific angle as the sensor rotates, creating a 3D point cloud by combining multiple scan lines across the field of view; it’s the basic building block of a LiDAR scan, similar to how a single horizontal line is a building block of an image.
Potential Problems:
Atmospheric/environmental conditions can negatively affect the quality of the data captured by the LiDAR. For instance, fog can scatter the laser photons resulting in false positives.
As we see from the above image, the scattering due to the fog results in the system “identifying” multiple objects even though there is only one person in the scene.
Here are additional examples from the Velodyne VLP-32C sensor:
The LiDAR does a good job isolating the main subject with very few false positives.
The LiDAR struggles to isolate the main subject with very high false positives.
In spite of these issues, LiDAR is one of the most popular sensors used in autonomous vehicles. They’re getting smaller and more precise by the day; also decreasing costs means that we will see a proliferation of these types of sensors in many autonomous systems.
For an in-depth study on LiDARs, check this out: Stanford EE 259 LiDAR Lecture.
Short wavelengths like the *millimeter wave (mmWave**) in the electromagnetic spectrum allows for:
30 GHz
) to 1
millimeter (300 GHz
)![]() |
![]() |
As we see from the above images, the sensors can be very small, yet very precise → some can detect movements up to 4 millionths of a meter!
Advantages of mmWave:
Advantage | Description |
---|---|
small antenna caliber | narrow beam gives high tracking, accuracy; high-level resolution, high-resistance interference performance of narrow beam; high antenna gain; smaller object detection |
large bandwidth | high information rate, details structural features of the target; reduces multipath, and enhances anti-interference ability; overcomes mutual interference; high-distance resolution |
high doppler frequency | good detection and recognition ability of slow objectives and vibration targets; can work in snow conditions |
good anti-blanking performance | works on the most used stealth material |
robustness to atmospheric conditions | such as dust, smoke, and fog compared to other sensors |
operation under different lights | radar can operate under bright lights, dazzling lights, or no lights |
insusceptible to ground clutter | allowing for close-range observations; the low reflectivity can be measured using mmwave radar |
fine spatial resolution | for the same range, mmwave radar offers finer spatial resolution than microwave radar > |
mmWave is also used for in-cabin monitoring of drivers!
Limitations:
Resources:
For a more detailed description of mmWave RADAR, read: Understanding mmWave RADAR, its Principle & Applications
For programming a LiDAR, see: how to program a LiDAR with an Arduino.
Much like lidars, we can use reflected sounds waves to detect objects. They work by emitting high-frequency sound waves, typically above human hearing, and then listening for the echoes that bounce back from nearby objects. The sensor calculates the distance based on the time it takes for the echo to return, using the speed of sound. Popular modules like the HC-SR04 (Used in Lab#2) are easy to integrate with microcontrollers such as Arduino and Raspberry Pi. These sensors are widely used in robotics for obstacle avoidance, automated navigation, and liquid level sensing.
However, unlike optical (electromagnetic waves) detectors, ultrasonic sensors, while useful for basic distance measurements, cannot replicate the functionalities of LiDAR systems due to several key limitations. Unlike LiDAR, which employs laser beams to generate high-resolution, three-dimensional point clouds, ultrasonic sensors emit sound waves that provide only limited, single-point distance data with lower precision. LiDAR offers greater accuracy and longer range, enabling detailed mapping and object recognition essential for applications like autonomous vehicles and advanced robotics. Additionally, LiDAR systems can cover a wider field of view and operate effectively in diverse environments by rapidly scanning multiple directions, whereas ultrasonic sensors typically have a narrow detection cone and struggle with complex or cluttered scenes. Furthermore, LiDAR’s ability to capture data at high speeds allows for real-time processing and dynamic obstacle detection, which ultrasonics cannot match. This is because comparitively, it sounds waves take a lot of time to return since they’re much slower in speed compared to light waves (360m/s vs 299,792,458m/s). These differences in data richness, accuracy, and versatility make ultrasonic sensors unsuitable substitutes for the sophisticated capabilities offered by LiDAR technology.
We’ll be using ultrasonic distance finders in futures MPs to stop our rovers from colliding into objects. Since our rovers don’t moove to fast and complexity is relatively low, only a ultrasonic sensor would suffice.
Since sensors deal with and measure the physical world, errors will creep in over time.
Some typical errors in the use of physical sensors:
error type | description |
---|---|
sensor drift | over time the sensor measurements will “drift”, i.e., a gradual change in its output → away from average values (e.g., due to wear and tear) |
constant bias | bias of an accelerometer is the offset of its output signal from the actual acceleration value. A constant bias error causes an error in position which grows with time |
calibration errors | ‘calibration errors’ refers to errors in the scale factors, alignments and linearities of the gyros. Such errors tend to produce errors when the device is turning. These errors can result in additional drift |
scale factor | scale factor is the relation of the accelerometer input to the actual sensor output for the measurement. Scale factor, expressed in ppm, is therefore the linear growth of input variation to actual measurement |
vibration rectification errors | vibration rectification error (VRE) is the response of an accelerometer to current rectification in the sensor, causing a shift in the offset of the accelerometer. This can be a significant cumulative error, which propagates with time and can lead to over compensation in stabilization |
noise | random variations in the sensor output that do not correspond to the actual measured value |
Each error type must be dealt with in different ways though one of the commomn ways to prevent sensor errors from causing harm to autonomous systems → sensor fusion, i.e., use information from multiple sensors before making any decisions. We will dicuss sensor fusion later in this course.
As mentioned earlier, a sensor maps a physical quantity from the time domain to the value domain,
s : Dt ↦ Dv
where,
symbol | description |
---|---|
Dt | continuous or discrete time domain |
Dv | continuous or discrete value domain |
Remember that computers require discrete sequences of physical values since microcontrollers cannot read values unless it is digital data. Microcontrollers can only see “levels” of voltage, which depends on the resolution of the ADC and the system voltage.
Hence, we need to convert the above into the discrete domain, i.e., we require Dv to be composed of discrete values.
According to Wikipedia,
A discrete signal or discrete-time signal is a time series consisting of a sequence of quantities. Unlike a continuous-time signal, a discrete-time signal is not a function of a continuous argument; however, it may have been obtained by sampling from a continuous-time signal. When a discrete-time signal is obtained by sampling a sequence at uniformly spaced times, it has an associated sampling rate.
A visual respresentation of the sampling rate and how it correlates to the sampling of an analog signal:
analog signal | sampling rate | sampling |
---|---|---|
![]() |
![]() |
![]() |
Hence, a device that converts analog signals to digital data values is called → an analog-to-digital convertor (ADC). This is one of the most common circuits/microcontrollers in embedded (and hence, autonomous) systems. Any sensor that measures a physical property must pass its values through an ADC so that the sensor values can be used by the system (the embedded processor/microcontroller, really).
This is best described using an example:
The analog signal is discretized into the digital signal after passing through an ADC.
ADCs follow a sequence:
Hence, two important aspects of an ADC are:
The sampling rate (aka Sampling Frequency) is measured in samples per second (SPS or S/s). It dictates how many samples (data points) are taken in one second. If an ADC records more samples, then it can handle higher frequencies.
The sample rate, fs is defined as,
fs = rac1T
where,
symbol | definition |
---|---|
fs | sampling rate/frequency |
T | period of the sample |
Hence, in the previous example,
symbol | value |
---|---|
fs | 20 Hz |
T | 50 ms |
While this looks slow (20 Hz
), the digital
signal tracks the original analog signal quite faithfully →
the original signal itself is quite slow
(1 Hz
).
Now, if the sampling signal is considerably slower than the analog signal, then it loses fidelity and we see aliasing, where the reconstructed signal (the digital one in the case) differs from the original. Consider the following example of such a case:
As we see from the above figure, the digital output is nothing like the original. Hence, this (digital) output will not be of much use to the system.
Nyquist-Shannon Sampling Theorem:
to accurately reconstruct a signal from its samples, the sampling rate must be at least twice the highest frequency component present in the signal
If the sampling frequency is less than the Nyquist rate, then aliasing starts to creep in.
Hence,
fNyquist = 2 * fmax
where,
symbol | definition |
---|---|
fNyquist | Nyquist sampling rate/frequency |
fmax | the maximum frequency that appears in the signal |
For instance, if your analog signal has a maximum
frequency of 50 Hz
then your sampling frequency
must be at least, 100 Hz
. If this
principle is followed, then it is possible to
accurately reconstruct the original signal
and its values.
Note that sometimes noise can introduce additonal (high) frequencies into the system but we don’t want to sample those (for obvious purposes). Hence, it is a good idea to add anti-aliasing fitlers to the analog signal before it is passed to the ADC.
An ADC’s resolution is directly related to the precision of the ADC, determined by its bit length. The following examples shows the fidelity of the reconstruction, based on various bit lengths:
Increasing bit lengths the digital signal more closely represents the analog one.
There exists a correlation between the bit length and the voltage of the signal. Hence, the true resolution of the ADC is calculated using the bit length and the voltage as follows:
StepSize = racVrefN
where,
symbol | definition |
---|---|
StepSize | resolution of each level in terms of voltage |
Vref | voltage reference/range of voltages |
N = 2n | total “size” of the ADC |
n | bit size |
This is easier to understand with a concrete example:
consider a sine wave with a voltage,
5 V
that must be digitized.
If our ADC precision is12 bits
, then we get
N = 212 = 4096
Hence, StepSize = 5V/ 4096 which is0.00122V
(or1.22mV
)
Hence, the system can tell when a voltage level changes by1.22 mV
!
(Repeat the exercise for say, bit length, n = 4)
Visual Example:
The above maybe intuitively understood as follows:
Consider the following signal:
Now, if we want to sample this signal, we can obtain measurements at:
The figure shows 9
measurements.
Suppose, the ADC registers have a width of:
2 bits
. Hence it can store at most:
4 values
.
Since is is not possible to store
9
values → 2
bits, we must select
only 4
values omn the digital
side.
We then get the following representation:
which, to be honest, is not really a good representation of the original signal!
Now, consider the case where the ADC registers have a bit
width: 4 bits
→
16 values
! Hence, we can easily store
all 9 values
easily.
So, we can get a digital representation as follows:
We see that this is a better representation, but
still not exact. We can increase the bit length but at
this point we are limited by the sampling as well. Since we
only have 9
samples, adding more bits won’t
help.
Hence, to get a better fidelity representation of the original signal, we see that sampling frequency and resolution need to be increased, since they determine the quality of output we get from an ADC.
Resources
Real-Time Operating Systems (RTOS) are specialized operating systems designed to manage hardware resources, execute applications and process data in a predictable manner. The main aim of this focus on “predictability” is to ensure that critical tasks complete in a timely fashion. Unlike general-purpose operating systems (GPOS) like Windows or Linux, which prioritize multitasking and user experience, RTOS focuses on meeting strict timing constraints, ensuring that tasks are completed within defined deadlines. This makes RTOS essential for systems where timing accuracy and reliability are critical, such as in embedded systems, autonomous driving, industrial automation, automotive systems, medical devices and aerospace applications, among others.
Hence, real-time systems (RTS), and RTOSes in general, have two criteria for “correctness”:
criteria | description |
---|---|
functional correctness | the system should work as expected, i.e., carry out its intended function without errors |
temporal correctness | the functionally correct operations must be completed within a predefined timing constraint (deadline) |
To place ourselves in the context of this course, this is where we are:
We haven’t looked at the actuation part but we will come back to it later.
characteristic | description |
---|---|
determinism | primary feature of an RTOS is its ability to perform tasks within guaranteed time frames; this predictability ensures that high-priority tasks are executed without delay, even under varying system loads |
task scheduling | RTOS uses advanced scheduling algorithms (e.g., priority-based, round-robin or earliest-deadline-first) to manage task execution; RT tasks are often assigned priorities and the scheduler ensures that higher-priority tasks preempt lower-priority ones when necessary |
low latency | RTOS minimizes interrupt response times and context-switching overhead, enabling rapid task execution and efficient handling of time-sensitive operations (e.g., Linux spends many milliseconds handling interrupts such as disk access!) |
resource management | RTOS provides mechanisms for efficient allocation and management of system resources, such as memory, CPU and peripherals, to ensure optimal performance |
scalability | RTOS is often lightweight and modular, making it suitable for resource-constrained environments like microcontrollers and embedded systems |
reliability and fault tolerance | many RTOS implementations include features to enhance system stability, such as error detection, recovery mechanisms and redundancy |
As with most operating systems, the kernel provides the essential services in an RTOS. In hard real-time systems, the kernel must guarantee predictable and deterministic behavior to ensure that all tasks meet their deadlines. In this chapter we focus on kernel aspects that are specific to RTS.
The RTOS kernel deals with,
The design of RTOSes (and RTS in general) deal with tasks, jobs and, for implementation-specific details, threads.
A real-time task, τi is defined using the following parameters: (ϕi, pi, ci, di) where,
Symbol | Description |
---|---|
ϕi | Phase (offset for the first job of a task) |
pi | Period |
ci | Worst-case execution time |
di | Deadline |
Hence, a real-time tast set (of size ‘n’) is collection of such tasks, i.e., τ = τ1, τ2, ...τn. Given a real-time task set, the first step is to check if the task set is schedulable, i.e., check whether all jobs of a task will meet their deadlines (a job is an instance of a task). For this purpose, multiple schedulability tests have been developed, each depending on the scheduling algorithm being used.
- remember that task is a set of parameters.
- We “release” multiple “jobs” of each task, each with its own deadline
- if all jobs of all tasks meet their deadlines, then the system remains safe.
A thread, then, is an implementation of task/job – depending on the actual OS, it could be either, or both.
At a high level, here is a comparison between tasks, jobs and threads (note: these details may vary depending on the specific RTOS):
aspect | task | job | thread |
---|---|---|---|
definition | a task is a unit of work that represents a program or function executing in the RTOS | a job is a specific instance or execution of a task, often tied to a particular event or trigger | a thread is the smallest unit of execution within a task, sharing the task’s resources |
granularity | coarse-grained; represents a complete function or program | fine-grained; represents a single execution of a task | fine-grained; represents a single flow of execution within a task |
resource ownership | owns its resources (e.g., stack, memory, state) | does not own resources; relies on the task’s resources | shares resources (e.g., memory, address space) with other threads in the same task |
scheduling | scheduled by the RTOS kernel based on priority or scheduling algorithm | not directly scheduled; executed as part of a task’s execution | scheduled by the RTOS kernel, often within the context of a task |
concurrency | tasks run concurrently, managed by the RTOS scheduler | jobs are sequential within a task but may overlap across tasks | threads run concurrently, even within the same task |
state management | maintains its own state (e.g., ready, running, blocked) | state is transient and tied to the task’s execution | maintains its own state but shares the task’s overall context |
isolation | high isolation; tasks do not share memory or resources by default ++ | no isolation; jobs are part of a task’s execution | low isolation; threads share memory and resources within a task |
overhead | higher overhead due to separate stacks and contexts | minimal overhead, as it relies on the task’s resources | moderate overhead, as threads share resources but require context switching |
use case | used to model independent functions or processes (e.g., control loops) | used to represent a single execution of a task (e.g., processing a sensor reading) | used to parallelize work within a task (e.g., handling multiple i/o operations) |
example | a task for controlling a motor | a job for processing a specific motor command | a thread for reading sensor data while another thread logs the data |
(++ sometimes tasks do contend for resources, so we need to mitigate access to them, via locks, semaphores, etc. and then have to deal with thorny issues such as priority inversions)
A task is often described using a task control block (TCB):
Tasks typically cycle through a set of states, for instance (taken from the FreeRTOS real-time OS):
While the READY
, RUNNING
and
BLOCKED
states are similar to those in
general-purpose operating systems (GPOS), periodic
RTOSes must introduce an additional state:
IDLE
or
SUSPENDED
:
end cycle
→ puts the job in the IDLE state and
assigns the processor to another ready jobThis operation is carried out by a routine activated by a timer → verifies, at each tick, whether some task(job) has to be awakened.
TCBs are usually managed in kernel queues (the implementation details may vary depending on the particular RTOS).
Context Switch Overheads:
One of the main issues with multitasking and preepmtion is that of context switch overheads, i.e., the time and resources required to switch from one task to another. For instance, consider this example of two tasks running on an ARM Cortex-M4:
void Task1(void) {
while(1) {
// Task 1 operations
();
LED_Toggle(100);
delay_ms}
}
and
void Task2(void) {
while(1) {
// Task 2 operations
();
ReadSensor(200);
delay_ms}
}
When switching between Task1 and Task2, an RTOS might need to:
16
general-purpose registersSo, on the ARM Cortex-M4,
effect | cost |
---|---|
basic context switch | 200-400 CPU cycles |
cache and pipeline effects, total overhead | 1000+ cycles |
frequent switching (e.g., every 1 ms ) |
could consume 1-2% of CPU time! |
These costs can add up, especially if the system has,
Hence and RTOS must not only be cognizant of such overheads but also actively manage/mitigate them. Some strategies could include:
void Task_Sensors(void) {
while(1) {
// Handle multiple sensors in one task
();
ReadTemperature();
ReadPressure();
ReadHumidity(500);
delay_ms}
}
void CriticalTask(void) {
// Set high priority
(HIGH_PRIORITY);
setPrioritywhile(1) {
();
ProcessCriticalData(50);
delay_ms}
}
#define STACK_SIZE 1024
static __attribute__((aligned(32)))
uint8_t task1_stack[STACK_SIZE];
Note: these are not comprehensive and other strategies could be followed, for instance avoiding multitasking altogether! All functions could be implemented in a single process that runs a giant, infinite loop known as a cyclic executive. Newer RTOSes shun ths cyclic executive in favor of the multitasking model since the latter provides more flexibility, control and adaptability but many critical systems (especially older, long-running ones) still use the cyclic executive. For instance, nuclear reactors, chemical plants, etc.
In any case, a precise understanding of these overheads is crucial for:
There is significant (ongoing) work, both in industry as well as academia, on how to get a handle on context switch overheads while still allowing for flexibility and modularity in the development of RTS.
RTOSes use various mechanisms like semaphores, mutexes, message queues and event flags for communication and synchronization between tasks. Here are some examples:
// Example of binary semaphore usage
;
semaphore_t sem(&sem, 1); // Initialize with 1
sem_init
void TaskA(void) {
while(1) {
(&sem);
sem_wait// Critical section
();
accessSharedResource(&sem);
sem_post}
}
;
mutex_t mutex(&mutex);
mutex_init
void TaskB(void) {
(&mutex);
mutex_lock// Protected shared resource access
();
updateSharedData(&mutex);
mutex_unlock}
;
queue_t msgQueue(&msgQueue, MSG_SIZE, MAX_MSGS);
queue_create
void SenderTask(void) {
= prepareMessage();
message_t msg (&msgQueue, &msg, TIMEOUT);
queue_send}
void ReceiverTask(void) {
;
message_t msg(&msgQueue, &msg, TIMEOUT);
queue_receive(&msg);
processMessage}
AND
/OR
conditions for
event combinations;
event_flags_t events#define EVENT_SENSOR_DATA 0x01
#define EVENT_USER_INPUT 0x02
void TaskC(void) {
// Wait for both events
(&events, EVENT_SENSOR_DATA | EVENT_USER_INPUT,
event_wait, TIMEOUT);
EVENT_ALL();
processEvents}
;
mutex_t mutex;
cond_t condition
void ConsumerTask(void) {
(&mutex);
mutex_lockwhile(bufferEmpty()) {
(&condition, &mutex);
cond_wait}
();
processData(&mutex);
mutex_unlock}
Each mechanism has specific use cases:
mechanism | use case |
---|---|
semaphores | resource management and simple synchronization |
mutexes | exclusive access to shared resources |
message queues | data exchange and task communication |
event flags | multiple event synchronization |
condition variables | complex state-dependent synchronization |
Common considerations:
These considerations are crucial for ensuring system reliability, maintaining real-time performance, preventing system deadlocks, managing system resources effectively and handling error conditions gracefully.
Real-time systems require predictable memory
allocation and deallocation to avoid delays or
fragmentation. Hence, they often use limited memory
management techniques often eschewing even the use
of dynamic memory allocation in favor of static
memory allocation. For instance, many RTS don’t
even use malloc()
or new
(i.e., no heap allocated memory) and very often
avoid garbage collection. The main goal is for tight control
of the memory management → this makes timing behavior
more predictable. Hence, the following become
easier:
Some goals for memory management in RTOSes:
In fact, to achieve these goals, many RTSes don’t even use caches since they can be a major source of non-determinism in terms of timing behavior, e.g.,
if we cannot exactly calculate when some data/code will hit/miss in cache, then we cannot estimate its true timing behavior, leading to a lot of uncertainty → bad!
Some RTSes use scratchpads since they provide cache-like performance but have higher predictability since the data onboarding/offloading is explicitly managed (either by the program or the RTOS).
Some common memory-management techniques for RTOSes:
malloc()
for instanceOf course, each of these mechanisms have their own problems and a deliberation on those is left as an exercise for the reader.
Timer and interrupt management are crucial components of an RTOS, ensuring that tasks are executed at precise intervals and that the system responds promptly to (internal and) external events. The role between timers and interrupts is closely related, since they offer the very basic timing mechanism in RTOSes (from Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications):
to generate a time reference, a timer circuit is programmed to interrupt the processor at a fixed rate and the internal system time is represented by an integer variable, which is reset at system initialization and is incremented at each timer interrupt. The interval of time with which the timer is programmed to interrupt defines the unit of time in the system; that is, the minimum interval of time handled by the kernel (time resolution). The unit of time in the system is also called a system
tick
.
Timers, in general, play important roles in such systems, viz.,
role | description |
---|---|
task scheduling | enable periodic execution of tasks |
timeout management | prevent indefinite blocking of resources |
event timing | measure intervals between events |
system timing | maintain system clock and timestamps |
watchdog functions | monitor system health and detect lockups |
Typically these systems have the following three types of timers:
type | properties |
---|---|
hardware | - direct access to hardware timing resources - highest precision and accuracy - limited in number (hardware dependent) - used for critical timing functions |
software | - implemented in software, using hardware timer as
base - more flexibility, less precise - limited only by memory - more suitable for non-critical timing functions |
system tick |
- core timer for RTOS - drives task scheduling - fixed frequency |
There are various design considerations for timers in an RTOS, viz.,
tick
can result in
significant power consumption if not implemented/managed
well(++ drift indicates a gradual, long-term change in the timer’s frequency over time, whereas jitter refers to short-term, random fluctuations in the timing of individual clock pulses)
Interrupt Latencies → time from when an
interrupt occurs to when the corresponding interrupt service
routine (ISR) starts executing. As interrupts are integral
to the operation of an RTOS, from the implementation of the
system tick
to notifcations of internal
(watchdog timers) and external events (new sensor data), it
is important to minimize interrupt
latencies.
Optimization Techniques (to minimize latencies):
Essentially, the kernel must be designed to minimize jitter and ensure that all operations have bounded and predictable execution times.
Hence, we can try to evaluate whether an RTOS kernel meets these goals using the following metrics (note: not exhaustive):
metric | description |
---|---|
interrupt latency | the time taken to respond to an interrupt |
context switch time | time to switch between tasks |
dispatch latency | time difference between task being ready and when it starts executing |
throughput | number of tasks?operations kernel can handle per unit time |
name | description | features |
---|---|---|
FreeRTOS | a widely used, open-source RTOS for embedded systems | small footprint, portable, supports a wide range of microcontrollers |
VxWorks | commercial RTOS used in aerospace, defense, applications | high reliability, real-time performance, and support for multi-core processors |
QNX | a commercial RTOS known for its reliability and use in automotive and medical systems | microkernel architecture, high security, support for posix apis |
Zephyr | open-source RTOS designed for IoT and Edge devices | modular, scalable, supports a wide range of hardware architectures |
Why is Linux not on the list? While it has many (increasing) list of real-time features, it is still far from a hard real-time system, mainly due to its complexity. It is difficult to analyze WCETs on Linux or completely control its timers → the list is endless. It still sees use in many real-time and embedded systems and we will (brielfy) explore its real-time capabilities soon.
As mentioned earlier, FreeRTOS is one of the most popular open-source RTOS options, widely used in embedded systems due to its simplicity, portability and extensive community support. It supports,
Here is an example that uses FreeRTOS to blink the LEDs on a microcontroller:
#include <FreeRTOS.h>
#include <task.h>
#include <gpio.h>
// Task to blink an LED
void vBlinkTask(void *pvParameters) {
while (1) {
(LED_PIN); // Toggle LED
GPIO_TogglePin(pdMS_TO_TICKS(500)); // Delay for 500ms
vTaskDelay}
}
int main(void) {
// Initialize hardware
(LED_PIN, GPIO_MODE_OUTPUT);
GPIO_Init
// Create the blink task
(vBlinkTask, "Blink", configMINIMAL_STACK_SIZE, NULL, 1, NULL);
xTaskCreate
// Start the scheduler
();
vTaskStartScheduler
// The program should never reach here
for (;;);
}
Resources:
As mentioned earlier, Linux, as a general-purpose operating system, is not inherently a real-time operating system (RTOS). However, it does provide several features and mechanisms that can be used to achieve real-time performance, especially when combined with real-time patches or specialized configurations.
Some of the real-time features of Linux include:
Preempt-RT Patch: a set of patches that convert the Linux kernel into a fully preemptible kernel, reducing latency and improving real-time performance; the Preempt-RT patch achieves this by:
the Preempt-RT patch is widely used in industries such as telecommunications, industrial automation and audio processing. It is actively maintained and supported by the Linux Foundation’s Real-Time Linux (RTL) collaborative project
Real-Time
scheduling policies: support for real-time
scheduling policies such as SCHED_FIFO
and
SCHED_RR
, which provide deterministic
scheduling behavior
High-resolution timers: support for high-resolution timers that allow for precise timing and scheduling of tasks
basic priority inheritance: mechanism to prevent priority inversion by temporarily elevating the priority of lower-priority tasks holding a resource needed by higher-priority tasks
CPU isolation: ability to isolate CPUs from the general scheduler, dedicating them to specific real-time tasks; also pinning processes to certain cores
Threaded interrupts: support for handling interrupts in kernel threads, reducing interrupt latency and improving predictability
Memory management techniques: such as memory locking to prevent pages from being swapped, the use of “huge” pages and memory pre-allocation
Control groups (cgroups): mechanism to allocate CPU, memory and I/O resources to specific groups of tasks, ensuring resource availability for real-time tasks
These features, when properly configured, can help achieve real-time performance on Linux, making it suitable for certain real-time and embedded applications.
The Raspberry Pi OS can also be made “real-time” in the same manner as decribed above, since it is a Linux variant.
Though, there are some attempts at getting the Pi to behave in a real-time fashion, e.g.,: 1, 2, 3.
ROS is an open source middleware framework built for robotics applications. The main goal → develop standards for robotic software. ROS provides many reusable modules for developing robotic applications.
Embedded/autonomous programs that do simple tasks (or operate with a single sensor/motor) are relatively easy to program. As more sensing, actuation, functionality is added (consider a larege industrial robot or even an autonomous car), programs quickly become quite complex – coordination of the data and system states becomes challenging.
ROS helps to develop and scale such applications and also manages communications between various parts of the software. As mentioned earlier, ROS is middleware:
Middleware is a software layer that connects the operating system to applications, data, and users. It provides common services and capabilities, like single-sign on (SSO), easy communication/coordination (like ROS) or application programming interface (API) management. Developers can rely on middleware to provide consistent, simplified integrations between application components. This frees up developers to build core features of applications, rather than spend time connecting those features to different endpoints and environments, including legacy systems.
At a high level, ROS,
A simple example: control of a robotic arm+camera:
To write a ROS application to control this robotic arm, we first create a few subprograms:
node
motion planning
hardware drivers
joystick
Now we use ROS → communication between these nodes.
ROS even provides plug and play libraries for designing your system, e.g., inverse kinematics libraries, trajectory planning for robotic arms, etc.
Some important components of ROS:
The use of nodes has several benefits such as fault tolerance, reduced complexity and modularity.
Topics are meant for unidirectional, streaming communication. ROS includes other mechanisms such as services and [parameter servers]http://wiki.ros.org/Parameter%20Server) for different types of communciations.
int
,
float
, boolean
)structs
and
arrays
request
an
response
messagesA simple ROS message:
std_msgs/Header header
uint32 seq
time stamp
string frame_id
geometry_msgs/Point point
float64 x
float64 y
float64 z
Example: our first ROS message.
Example:
camera
node and
image_viewer
nodecamera
notifies master
→ wants
to publish images on the topic, images
image viewer
→ subscribe to
images
topicimages
has both → publisher and
subscribermaster
notifies both → of each others’
existenceA more intricate example of the same:
An example of a ROS transform and tree:
ROS (the first version) is not
real-time. Hence, systems that requires hard
real-time guarantees shoud not use it. But ROS can
be itegrated into systems that require some latency
guarantees. If needed, ROS can be run on top of the
RT_PREEMPT
real-time patch on Linux. In
addition, specific nodes can be designed to
handle real-time functions or programmed to behave as
real-time control systems.
If better real-time guarantees are required on ROS, then ROS 2 if your best bet.
Resources: more information on real-time and ROS2 can be found at RT ROS2 Xilinx and RT ROS Github.
Consider an engine control system that cycles through the various phases of operation for an automotive engine:
This system periodically cycles through multiple tasks, viz.,
If we correlate this to task “activations”, then we may see the following:
![]() |
![]() |
This is a (somewhat) simple execution model known as the cyclic executive that we shall return to later. Hence, there is a direct correlation between the physical aspects of a real-time and autonomous system and issues such as scheduling.
Most real-time systems can be classified into:
property | hard | soft |
---|---|---|
“usefulness” of results | ![]() |
![]() |
optimality criterion | all deadlines must be satisfied | most deadlines must be met |
examples | sensor readings, vehicular control | video decoding, displaying messages |
There are many ways to model a real-time system:
We have already discussed tasks vs. jobs vs. thread earlier so we understand how to model computation. We also discussed how actual execution is modeled (via threads).
Let us focus on jobs and some of their properties:
note → deadlines and periods don’t have to match, but they usually do, _i.e., D = P
property/parameters | description |
---|---|
temporal | timing constraints and behavior |
functional | intrinsic properties of the job |
resource | resource requirements |
interconnection | how it depends on other jobs and how other jobs depend on it |
As discussed earlier, we need to get a good understanding of the wcet of jobs.
One of the important ways to understand the workload requirements for a task is to compute,
how much utilization is taken up by all jobs of the task?
One might ask: if there are (potentially) an infinite number of jobs for each task, since they’re periodic++ and long running, then how can one campute the utilization?
Recall that a periodic function is of the type → f(t) = f(t + T)
where T is the “period”
Note: utilization is not the time taken by a task (or its jobs). It is,
the fraction of the processor’s total available utilization that is soaked up by the jobs of a task
Hence, the utilization for a single task is,
$$ U_i = \frac{c_i}{T_i} $$
where,
symbol | description |
---|---|
ci | wcet of the task |
Ti | period of the task |
Now, we can compute the utilization for the entire task set,
$$ U = \sum_{i=1}^{n} U_i = \sum_{i=1}^{n} \frac{c_i}{T_i} $$
Simple Exercise: what is the total utilization for the following task set?
Task | c | T |
---|---|---|
τ1 | 1 | 4 |
τ2 | 2 | 5 |
τ3 | 5 | 17 |
what does it mean if U > 1?
Jobs can be either precedence constrained or independent → we can use directed acyclic graphs (DAGs) to specify/capture these constraints.
For instance, $ J_a J_b$ implies,
$ J_a J_b$ implies
Consider the following example:
dag | relationships |
---|---|
![]() |
J1 ≺ J2
$ J_1 J_2$ J1 ≺ J4 J1 ↛ J4 |
Here is a more realistic example → the precedence constraints in an autonomous driving system:
A “resource” is some structure used by task → advance execution.
Type of resources:
We have already discussed resource sharing and synchronization earlier → mutexes, locks, etc. Esentially there is a need for mutual exclusion that is guaranteed via one of these methods or by the use of critical sections.
Before we proceed, we need to understand → preemption:
preemption is the process of suspending a running task in favor of a higher priority task.
Most OS (real-time & non real-time) allow preemption since they allow,
Preemptions, among others, help define a variety of scheduling events; essentially, these are the time when the scheduler is invoked:
pthread_testcancel()
)We have been talking about “scheduling” all this while so it is time to formally define what a schedule is.
Given → a set of jobs, J = {J1, J2, ..., Jn}
A schedule → an assignment of Jobs to the CPU, so that each task can execute till completion.
Finally, we get to the main topic at hand → schedulers and scheduling! Historically there have been many scheduling policies developed for a variety of systems, e.g., FIFO, round robin, time sharing, etc.
Here is a good comparison of the various types of (historical) CPU/OS schedulers: CPU Scheduling in Operating Systems.
In the realm of real-time systems, to formally define a scheduling problem, we need to specify:
Hence, the general scheduling problem is,
assign processors and resources to tasks, such that the following constraints are met:
- timing constraints
- precedence constraints
- resource constraints
There is a large body of literature in the domain of real-time scheduling algorithms. In this chapter, we will focus on a few of them, viz.,
One of the main problems with the scheduling problem, as defined above (and in general), is that many variants of the problem are intractable, i.e., NP-Hard or even NP-Complete.
Recall that an NP-Hard problem is one where no known polynomial time algorithm exists. So, all known algorithms are superlinear or, usually, of exponential time complexity!
[Will leave it to the reader to recall or look up the definition of NP-Complete.]
Since the scheduling problems may not be tractable (or “solvable” in a realistic time frame), we need to find heuristics but they can be “sub-optimal”. Luckily, we have a couple of provably optimal real-time schedulers (in the single core domain).
Additional, important definitions:
concept | definition |
---|---|
feasibility | schedule is feasible if all tasks can be completed by satisfying a set of specified constraints |
schedulable | set of tasks is schedulable if there exists at least one algorithm that can produce a feasible schedule |
schedulability analysis | analysis to confirm that deadlines for a set of threads can be guaranteed using a given scheduling policy |
In the automotive enginer example from earlier, we see that for each cycle, the same set of tasks repeat (i.e.., “periodic behavior”). Note though that the tasks need not execute in parallel – rather, they must execute sequentially for this application. Usually such applications use a scheduling mechanism known as a “cyclic executive”.
Consider this simple example with three tasks:
task | c |
---|---|
T1 | 1 |
T2 | 2 |
T3 | 3 |
How would we schedule this? Assuming a single processors (hence a single timeline).
Well, the simplest mechanism is to just use a sequential schedule,
If, as in the case of the engine control example we saw earlier, the tasks repeat ad infinitum, then we see the pattern also repeating…
Cyclic executives were common in many critical RTS, since they’re simple and deterministic. An implementation could look like,
while(1) // an infinite loop
{
// Some Initialization
Task_T1() ;
// Some processing, maybe
Task_T2() ;
// Some other processing, maybe
Task_T3() ;
// Cleanup
}
Question: what problems, if any, can happen due to cyclic executives?
The very simplicity of such systems can also be their biggest weakness.
lack of flexibility: as the example and code above demonstrate, once a pattern of executions is set, it cannot be changed, unless the system is stopped, redesigned/recompiled and restarted! This may not be possible for critical applications. Even for the engine control application in cars, this doesn’t just mean stopping and restarting the car, but re-flashing the firmware for the engine, which is quite an involved task.
scalability: along similar lines, it is difficult to scale the system to deal with additional issues or add functionality.
priority: there is no way to assign priority or preemption since all tasks essentially execute a the same priority. Hence, if we want to deal with higher-priority events (e.g., read a sensor) or even atypical (aperiodic/sporadic) events, such as sudden braking in an autonomous car, then a cyclic executive is not the right way to go about it.
resource management: certain tasks can corral resources and hold on to them while others may starve – leading to the system becoming unstable. For instance, even in the simple example, we see that T3 can dominate the execution time on the CPU:
Since the system is one giant executable, it is difficult to stop a “runaway task” – the entire system must be stopped and restarted, which can lead to serious problems.
One way to mitigate some of the problems with cyclic executives, is to split the resource allocation into “frames” → fixed chunks of time when a task can claim exclusive access to a resource, e.g.. the processor:
So, if we revisit our simple example and break the
processor schedule into frame sizes of 2
units,
each,
why
2
? Well, it is arbitrary for now. But, as we shall see later, we can calculate a “good” frame size
Now, our schedule looks like,
As we see from this picture, task T1 doesn’t end up using its entire frame and hence, can waste resources (one of the pifalls of this method).
Continuing further,
Task T3 is
forced to relinquish the processo at
t=6
even though it has some execution left → on
account of the frame ending. Now T1 resumes in
its own frame. T3 has to
wait until t=10
to resume (and complete) its
execution:
Other Static/Table-Driven Scheduling:
Cyclic executives are an example of schedulers where the tasks are fixed, ahead of time, and all that a scheduler has to do is to dispatch them one at a time in the exact same order. Often the tasks are stored in a lookup table (hence the name “table-driven”). Other examples of such systems (with some prioritization and other features built in) have been built, e.g., weighted round robin → also finds use in cloud computing and networking load balancing, etc.
One method that overcomes the disadvantages of a completely static method is to assign priorities for jobs as they arrive. Hence, when a job is released it gets assigned a priority and the scheduler then dispatches/schedules the job accordingly. Hence, it if is the highest priority job so far, it gets scheduled right away, by preempting any currenlty running tasks. If, on the other hand, it is not the highest priority task then it is inserted into the ready queue at the right position (priority level).
To deal with this, we need an online scheduler, i.e., one that is always available to make scheduling decisions – when tasks arrive, complete, miss their deadlines, etc.
Before we go any further, let’s update the task model a little, to make matters simple for us.
While these may seem to be overly simplifying, they still fit the model of many systems and help us develop fundamental results. And we can always add them back one-by-one and still retain the correctness of the theoretical results we develop, while making the system more realistic.
Now, in the real of online, priority-driven schedulers, we have two options:
priority assignment | algorithms |
---|---|
static | Rate-Monotonic (RM), Deadline-Monotonic (DM) |
dynamic | Earliest-Deadline First, Least-Slack Time (LST) |
Let’s look at one of each (the most popular ones), viz. the Rate-Monotonic (RM) and Earliest-Deadline First schedulers. Note that both were first described and analyzed in a seminal Computer Science Paper, that has since become one of the most cited and influential papers in the field: Scheduling Algorithms for Multiprogramming in a Hard- Real-Time Environment by Liu and Layland.
Interestingly, both of these algorithms are provably optimal, i.e., no other static or dynamic algorithm can do better than RM and EDF respectively! Not bad for the first paper in the area – talk about setting a high bar, or rather bound!
The Rate-Montonic priority assignment algorithm assigns priority based on the period of the task → shorter the period, the higher the priority!
Consider the following example:
task | c | T |
---|---|---|
τ1 | 2 | 4 |
τ2 | 2 | 5 |
τ3 | 1 | 10 |
So, based on the RM algorithm, the priority will be:
τ1 > τ2 > τ3
since, T1 < T2 < T3.
The question now is whether the above task set is schedulable? Let us use our previous utilization-based check, i.e.,
$$U = \sum_{i=1}^{n} \frac{c_i}{T_i}$$
So, plugging in the numbers, we get,
$$ U = \frac{1}{2} + \frac{1}{4} + \frac{2}{6} = 0.5 + 0.4 + 0.1 = 1.0 $$
Our test was: U ≤ 1. So, this task set is…schedulable? Let us see – by plotting it on the timeline:
As we see, task τ3 misses its deadline! In fact, with the other two tasks, τ1 and τ2 constantly executing, τ3 will never get to execute and all of its jobs will miss their deadlines!
“Wait!”, you say. OK, one job has missed its deadline, maybe two (from the picture). So how can I make the bold claim that all will miss their deadlines?
If you pay close attention to the figure, I have only checked for
10
time steps. Why that number? Well it so happens that20
is the LCM (lowest common multiple) of all the task periods,4
,5
,10
.Why do we care about the LCM? Turns out, in real-time scheduling, the LCM of the task periods have a special significance. Turns out that if we construct the schedule for one LCM nunber of time units, then the schedule repeats exactly after that! Hence, the exact same schedule repeats every LCM number of units.
The LCM of the constituent (periodic) tasks in the system is referred to as → hyperperiod. So, we only need to check our schedule for one hyperiod. If the task set is schedulable in that timeframe then it will be and, if not, it will not be.
So, for this example, I can state, with confidence, that all jobs of τ3 will miss their deadlines since within half of the hyperperiod, one job has missed its deadline!
So, coming back to our analysis, we started with our utilization test U < 1 which this task set, passed, yet it **failed to schedule*! So, it seems we need something better.
Turns out the Liu and
Layland paper has figured this out. So they created
another test, one based on: utilization upper
bound. Since RM is a static priority assignment
mechanism, there is a theoretical limit on
the utilization for a task set, that depends on the
number of tasks, n
, in the system.
So, we derive (I leave out the details here. Check the Liu and Layland paper for more details) another check for utilization,
$$ U = \sum_{i=1}^n \frac{c_i}{T_i} \le n.(2^{\frac{1}{n}} -1) $$
If the total utilization of the system is below this bound, then the task set is schedulable by RM. Note that this is a necessary but not sufficient test (more on that later).
As we see from above, the value of the right hand side will change with the number of tasks in the system. Hence, with more tasks, the upper bound for U will reduce.
let’s open up the simulator-plotter for checking this for various values of
n
and see for n = 1, 2, ...
So, we see that
$$n = 3\\ U_{ub} \approx 0.78 $$
The utilization for our task set was: ≈ 1.0
which is
significantly higher! No wonder our task set wasn’t
schedulable!
Here is a plot that shows the values for different numbers of tasks:
As we see, the value keeps reducing. Does it keep going
all the way down to zero? What if I schedule
100
tasks? A 1000
?
Turns out, we can check! With the exact same equation.
let’s open up the simulator-plotter for checking this for various values of
n
and see for n = 100, 1000, etc.
As we see from the figure (and the active plotting), the
values seem to plateau out and converge…to
0.69
! So, for any real-time
system scheduled using the RM assignment mechanism, if the
utilization bound is under 0.69
then it is
schedulable.
Optimality: as mentioned earlier, RM is optimal, i.e.,
Now, let’s go back to one of our earlier examples (from the cyclic executive chapter):
task | c | T |
---|---|---|
τ1 | 1 | 4 |
τ2 | 2 | 6 |
τ3 | 3 | 12 |
We added some period information to the tasks.
We know that using the naive utilization test, U ≈ .0.833. But, recall that the utilization bound test, for n = 3 tasks requires, U < 0.78. So, this task set must be unschedulable, right? Let’s draw it out on the timeline and see:
Wait, this is schedulable? But it fails our test!
This is why I referred to it as a necessary but not sufficient test. Hence, for the Liu and Layland utilization bound test,
result of test | schedulable? |
---|---|
pass, i.e., Uub < 0.69 | yes |
fail, Uub > 0.69 | unsure |
We we need a better test → Response Time Analysis (RTA): - if all jobs of a task are able to complete before their respective deadlines → task set is schedulable - caveat → we account for the interference (hence, delays) encountered by the jobs by all higher priority jobs
Let’s look at it in more detail:
Ri = ci + Ii
where Ii is the interference faced by that job from all higher prioriy jobs until then.
$$\left\lceil \frac{R_i}{T_j} \right\rceil$$
Since each period of task τj results in a new job being released. Since RM gives higher priority to shorter periods, those released jobs will execute ahead of the current teak, τi.
$$I_j = \left\lceil \frac{R_i}{T_j} \right\rceil .\ c_j$$
$$I = \sum_{j\in hp(i)}\left\lceil \frac{R_i}{T_j} \right\rceil .\ c_j $$
Ri = ci + Ii
$$ R_i = c_i + \sum_{j\in hp(i)}\left\lceil \frac{R_i}{T_j} \right\rceil .\ c_j $$
For each task, we carry out the above analysis → stop when consecutive iterations provide the same values.
∀τi : Ri < Ti
If the above test passes for all tasks, then the task set is schedulable even if the earlier tests fail. Hence, this is both, necessary and sufficient.
Example [contd.]: Now, applying this to our errant example:
assign priorities: τ1 > τ2 > τ3
response time calculations
For each task, we calculate the response time using the above formula via iterative analysis.
task | iteration | calculations | Ri < Ti |
---|---|---|---|
τ1 | 1 | R1 = c1 = 1 | yes [ 1 < 4 ] |
τ2 | 1 | R20 = c2 = 2
$R_2^1 = c_2 + \lceil\frac{R_2^0}{T_1}\rceil c_1$ $R_2^1 = 2 + \lceil\frac{2}{4}\rceil \cdot 1 = 3$ |
|
2 | $R_2^2 = 2 + \lceil\frac{3}{4}\rceil \cdot 1 = 3$ [stop] | yes [ 3 < 6 ] | |
τ3 | 1 | R30 = c3 = 3
$R_3^1 = c_3 + \lceil\frac{R_3^0}{T_1}\rceil c_1 + \lceil\frac{R_3^0}{T_2}\rceil c_2$ $R_3^1 = 3 + \lceil\frac{3}{4}\rceil \cdot 1 + \lceil\frac{3}{6}\rceil \cdot 2 = 6$ |
|
2 | $R_3^2 = 3 + \lceil\frac{6}{4}\rceil \cdot 1 + \lceil\frac{6}{6}\rceil \cdot 2 = 8$ | ||
3 | $R_3^3 = 3 + \lceil\frac{8}{4}\rceil \cdot 1 + \lceil\frac{8}{6}\rceil \cdot 2 = 8$ [stop] | yes [ 8 < 12 ] | |
Since the response time of all tasks meet their deadlines under RM scheduling, therefore the task set is schedulable.
The response time analysis algorithm:
There exist many of static assignment algorithms, for instance the Deadline Monotonic Scheduler where priorities assigned to processes are inversely proportional to the length of the deadline.
Resources:
The problem with static priority assignments, is that
they are typically non-optimal. Wait, but we said
earlier that RM is optimal? Well, among static
priority assignment algorithms RM is optimal but, as we saw
from the analyses and the graphs, upper bounds for tasksets
often taper off at U = 0.69. While we can
create task sets with higher utilization (as the
response-time analyis shows us), it takes a lot of manual
effort to create such task sets. This means, that in the
common case, we are wasting nearly
31 %
utilization!
A dynamic assignment of priorities can help mitigate these problems and ensure that we make better use of the system resources. Many dynamic scheduling algorithms have been proposed, e.g., FIFO and Least Slack Time among others.
In this section we will focus on the
priority assignment mechanism for real-time systems which,
actually, is one of the mainline schedulers in Linux, named
SCHED_DEADLINE
.
In a nutshell, as the name implies, EDF assigns priority based on the absolute deadline of the job. From Wikipedia: > Whenever a scheduling event occurs (task finishes, new task released, etc.) the queue will be searched for the process closest to its deadline. This process is the next to be scheduled for execution.
While the exact priority of a job depends on its deadline, the latter are calculated statically. Hence, even though jobs of a task may have different priorities, each job has only one priority level. As jobs complete, the priorities of the remaining (or new, incoming) jobs change.
EDF Schedulability Test: is the same as the first test we discussed, i.e.,
$$ U = \sum_{i=1}^{n} \frac{c_i}{T_i} \le 1 $$
As long as we’re following the task model described earlier, this is a very simple check to test for the validity of the task set for EDF.
Let’s revisit our example from RM that was not schedulable, _viz.,
task | c | T |
---|---|---|
τ1 | 2 | 4 |
τ2 | 2 | 5 |
τ3 | 1 | 10 |
Before we proceed, we need to decide on some policies:
These are some simple rules that help in simplifying the decision-making process. We can change them, as long as it helps us make forward progress and we remain consistent.
++ in this scenario, ideally we should redesign the system to combine these two tasks – if they always run together, then why pay the extra costs of preemption/context-switch, etc.?
For the above task set, we see that,
$$U = \frac{2}{4} + \frac{2}{5} + \frac{1}{10} = 0.5 + 0.4 + 0.1 = 1 \le 1$$
Hence, this task set is schedulable
using EDF (just barely though since its utilization is
1
!). At least theoretically! Remember that this
task set failed for RM.
The schedule diagram for this task set looks like,
So far, it looks schedulable. It is left as an exercise
to the reader to check until the hyperperiod
(20
).
Optimality: EDF is an optimal scheduling algorithm → if the task set is schedulable by some algorithm, it is also schedulable under EDF.
EDF can definitely squeeze out the maximum from
a system – as we just saw, it can even schedule task sets
with the theoretical utilization vound (1
)!
So, let’s compare the two superstars of the real-time scheduling world, RM and EDF. The ✓ symbol indicates which one is better.
parameter | RM | EDF |
---|---|---|
optimality | ✓ (static) | ✓ |
context switch overheads | ✓ | ✗ |
preemptions | ✗ | ✓ |
analysis overheads | ✗ | ✓ |
utilization | ✗ | ✓ |
implementation ease | ✓ | ✗ |
predictability | ✓ | ✗ |
total | ||
Other Issues: note that we have mostly considered a simple task model. But these may vary in real world systems, e.g.,
Consider a simple problem → how do you balance a ball?
I guess that’s more complicated than what we wanted! So, let’s make it really simple and try in a one dimensional plane, as follows:
We want to balance the ball in the middle of the table. And the ball moves either left or right, based on how we tilt the table.
As we see from this picture, a naive attempt at balancing the ball can quickly make it “unstable”. But, our objective, is to make sure that,
The options that are available to us are:
We also have the ability to control the speed at which the table tilts to either side. We can actually combine these, as we shall see.
Hence, the parameters for the problem are:
type | options |
---|---|
inputs | speed (clockwise, anticlockwise) |
output | ball velocity, acceleration |
Some how, we need to control the outputs by modifying the inputs to the system. Enter control theory.
Control theory is a multidisciplinary field at the intersection of applied mathematics and engineering. Engineering fields that heavily utilize control theory include mechanical, aerospace, electrical and chemical engineering. Control theory has been applied to the biological sciences, finance, you name it.
Anything that you,
you can develop a “controller” for managing it, using the tools of control theory.
In our everyday life, we interact with technologies that utilize control theory. They appear in applications like adaptive cruise control, thermostats, ovens and even lawn sprinkler systems. The field of control theory is huge and there’s a wide range of subdisciplines.
The basic idea behind control theory is to understand a process or a system by developing a model for it that represents,
the relationships between inputs and outputs for the system
We then use this model to adjust the inputs → to get desired outputs. The relationship between the inputs and outputs is usually obtained through empirical analysis, viz.,,
Even if the model is based on an equation from physics, the parameters within the model are still identified experimentally or through computer simulations.
We repeat the experiments/simulations as needed to “understand” the system as well as we can, in ordero to develop the model. Once the model has been developed, we develop a control model that can used to tune the input → output relationship.
In effect, we are inverting the original model (input → output) to develop,
control model: input ← output
To better understand this, consider the example of a light bulb and switch:
Even if we didn’t know the relationship between the switch and bulb, we can conduct a few experiments to figure out the following:
switch state (input) |
bulb state (output) |
---|---|
off | off |
on | on |
Now we have our “model” of input (switch state) → output (ligthbulb state). This model works as as no external disturbances occur (power failure or bulb burn out).
But, this is not our control model. For that, we need to invert the model we’ve built so far.
So, we start with the desired output
state, e.g., the “lightbulb must be on”.
Then, we reason backwards to: “what should the input be
to achieve this desired state?”. Should the switch be
on
or off
?
From our original model (and experiments), we have created the I/O relationship table above. Hence, it stands to reason that we can “invert” it as:
desired output lightbulb state |
corresponding input switch state |
---|---|
on | on |
off | off |
Now, let’s formalize things a little.
Consider the following mathematical model that describes the behavior of a system:
The model says that if we change the input u
the output y
will change to be
twice the value of the input
u
.
Now, in control theory, we are concerned about how to
get to a specific output. Hence, if we want
to reach a specific value of y
, say →
y*,
we need to manipulate the model to now create a
“control model”, i.e.,
$$u = \frac{y^*}{2}$$
This model says for any value of the output y* that we
want, we can identify the input u
→ essentially
dividing y* by
2
. Notice that this equation is now in
terms of u
→ we have our
control law! Restating the obvious, this is
an “inverse” of the original model of the system.
We have just developed our first
controller! The desired value, y* is
referred to as the setpoint. We get to the
setpoint by picking the right value for u
.
Developing a control law, in a sense, is inverting your model to rewrite the output in terms of the input so that you can get the output you want. More complex systems lead to more complicated control laws. Practitioners have developed methods of developing control laws for systems whose models cannot be cleanly inverted, e.g., such as nonlinear systems or systems with dead times.
For context, this is where we are in this course map:
For the control law we just developed, if our model is accurate and there are no disturbances then,
y = y*
However, note that there is nothing ensuring that the value of y = y*. We just assume (rather, expect) that it would be the case. This is known as an open loop controller. You desire a certain output and hope that the controller actually gets there.
So, the open loop controller is depicted as:
What we really want, is to somehow ensure that the controllers gets to its setpoint. How do we do that?
The problem is that while the input drives the output, there is no way to guarantee that the controller will get to the set point.
What we really need, is a closed-loop controller → one that uses feedback to,
u
The feedback typically comes from the output of the controller model that we created. So,
Note that the feedback can be positive or negative.
[The above description is distillation of the excellent description found here.]
[The following material is thanks to Prof. Sathish Gopalakrishan, Univ. of British Columbia].
Consider the following problem (that we increasingly encounter in the real world):
how do you ensure that a car remains in the center of its lane?
So, we have a car moving on the road, thus:
the blue arrow shows the direction of motion of the car. Hence, for the car to remain in the center of the lane, we need to apply a correction to its direction of motion,
There are some questions that come up:
Enter feedback control:
- compare system state to the desired state
- apply a change to the system inputs → counteract the deviations
- repeat until desired outcome → setpoint
Example: let’s see how feedback control can be applied to a temperature control of a room.
Given a “desired” room temperature (as input to a thermostat), what do we need to consider while attempting to achieve this temperature?
The thermostat needs to control/provide inputs to a furnace/AC,
which then affects the temperature in the room:
Easy! Done…right?
Except, the real world is far from ideal. We have to deal with disturbances…
Well not that kind of disturbance, but pesky issues like heat loss, bad insulation and physical problems in general:
As we see from the picture, we may not get to the expected behavior due to external factors. So, as before, just the input will not suffice to reach the setpoint.
So, we provide “feedback” to the controller:
Essentially the temperature reading of the room, after the thermostat and furnace/AC have completed their operations based on the original inputs (desired temperature).
Let’s introduce some generic technical terms for each of these components:
The “controller” is based on the “control model” that we developed earlier. It sends commands (“actuation signals”) to an actuator and then affects the process under control. Finally, the process variable (the “output” from the earlier discussions) is what we want to drive towards the set point.
Note: in the case that feedback is not possible, there is work on → open-loop control and feedforward control.
Another example → cruise control.
Note how the feedback reaches the controller in this case.
So, at a high-level, a closed-loop feedback control system looks like,
Some of these inputs/edges have specific names:
Note: the main goal → error is
minimized (ideally 0
).
A more formal definition of the same quantities,
quantity | definition |
---|---|
r(t) | reference/set point |
e(t) | error |
u(t) | control signal/“input” |
y(t) | (expected/final) output |
$\overline{y(t)}$ | “feedback”/estimate |
Now let’s apply this feedback control model to the earlier problem of lane following.
Recall that we want to keep the car in the center of its lane:
But here’s a question → how do you find the center of the lane?
Consider a road with lane markings on either side,
Now, let’s assume that some system (say, a camera), can track the yellow lines to an extent. We need to find the center of the lane, as marked in the figure:
Based on the two points marked up ahead (that we can detect to be on the same plane), we can calculate,
$$x_{center} = \frac{x_{left\_end}+x_{right\_end}}{2}$$
Now, a car need not be in the actual center of the lane,
Now, assuming that the camera is mounted at the center of the car,
The car’s position can be calculated as:
$$x_{car} = \frac{width}{2}$$
From this we can calculate the cross-track error (CTE),
CTE = xcar − xcenter
What happens when,
Now, back to our original problem → keeping the car in the center of the lane. We do this by → keep CTE as small as possible and applying corrections,
The $64,000 question is: how?
Answer: feedback control!
Recall the various components of the feedback control:
Now, let’s map the lane following components on to this:
As we see from the figure, the lane following controller sends the control/actuation signal to the steering unit. Sensors (perhaps a camera equipped with vision algorithms in this case) provide feedback to the controller ($\overline y_t$). Mapping this back to the lane variables and CTE,
This figure shows the important part → the CTE is the feedback for the lane following controller! The input then is the negative error, i.e., the goal is to reduce the CTE. Also note that the output of the controller is the steering input.
So, let’s focus in on how the controller operates, i.e., this part:
Problem statement:
given the CTE, how do we compute the control signal so that the car stays in the middle of the lane?
The final “corrections”, when applied, may look something like this:
Let’s start with one goal → lateral position control:
process variable | y(t) | y position at time, t |
goal | y = 0 | keep the car at position 0 |
control signal | u(t) | steering |
Let’s say we have the car’s start and end position,
And we know the relationship between u(t) and y(t):
Basically we want u(t) to be negative → so that y tends towards its eventual goal, y = 0.
So, what should our control input, e(t) = ?
As we see below, we want the input to be a decreasing value of the feedback,
e(t) = −y(t)
This is called proportional (P) control.
The correction is proportional to the size of the error, i.e.,
So, going back to our example of lateral control, let us try to apply the proportional control methodology to it:
We have the following choices:
We pick the Kp > 0 since we want the following relationship to hold (following from e(t) = −y(t)):
Kpe(t) = −Kpy(t)
We see that this proportional controller helps us move the car towards our goal, y = 0.
Now, let’s consider a few situations:
The response is too slow/gradual. We may never get to the goal in this case.
The response is too sudden. The system may overshoot the goal!
So, the next question that comes up → can the car be stabilized at y = 0? This is unlikely using just the proportional control method since,
gain | effect |
---|---|
small | stead-state error |
large | oscillations |
The question then becomes → how can we reduce oscillation, i.e., can we get to the following situation (smoother, actual approach to the goal)?
Derivative control improves the dynamic response of the system by,
So, what does this mean, in practice? What we’re measuring and trying to control, is the rate of change, i.e., the change from,
y(t − 1) → y(t)
Typically, proportional and derivative control are often used together, as a way to counteract each others’ influences. So, we get:
As with the proportional controller, we have two options for the derivative as well. Should,
Note that the derivative controller’s job is to steer away from the reference line, to act as a counter to the proportional controller pushing in the opposite direction. Hence, we pick $\frac{dy(t)}{dt} < 0$.
The derivative controller acts like a brake and counteracts the correctional force. It reduces overshoot by slowing the correctional factor → as the reference goal approaches.
We see numerous uses of the combination, known as the P-D controllers in every day life, from the smallest to the largest, even rocket ships!
Consider the following values for the two coefficients, Kp and Kd:
As we see, tuning Kd has a significant impact! The oscillations pretty much go away and we quickly get to the reference line with very little oscillation. Of course, the car overshoots a little but the combination of P-D brings it back soon enough.
An interesting question arises → what if we increase the value of Kd – eventually making it very large?
While we see from the first graph (Kp = 0.2, Kd = 4.0) that the oscillations have gone away, increasing Kd further makes the situation worse – it drives the car away from the reference goal!
How do we deal with this?
Well, by tuning the paramters, of course! For instance, in the last case, if we make a change K_p = 3.0,
→
We see a quick, “smooth” path to the reference!
In fact, a lot of the design of control systems involves the tuning of such parameters, depending on the system, to get things “just right”.
Are we done? Not quite. Let’s take a closer look at the results:
As we see from this image, even though we reached the reference, the behavior is not smooth! There could be many reasons for this, such as steering drift, caused by the mechanical design of the system:
There are a variety of unmodeled disturbances and systemic errors (“bias”):
Hence, the signal may never reach the set points! It will end up “settling” near the reference, which is not always ideal.
To deal with this, we need integral (I) control.
First, let’s define steady state error (SSE):
difference between the reference and the steady-state process variable
Hence, when time goes to infinity,
SSE = limx → ∞[r(t) − y(t)]
The correction must be proportional to both → error and duration of the error. Essentially it sums the error over time.
Note: unless the error is zero, this term will grow! In fact, the error keeps adding up, so the correction must also increase → this drives the steady state error to zero.
In many instances, integral control is used in conjunction with the P and D controllers,
This is known as: PID Control,
Let’s look at some examples of tuning the various paramters of a PID controller, as applied to our problem:
Let’s increase Kp now and see the effect:
We see that the system has stabilized well around the reference point and, if we zoom in, we will see fewer disturbances.
Now, let’s keep increasing Kp,
Wait, the signal oscillates? The main reason is that the I term is not zero when crossing the reference, y(t) = 0 and it takes a little while to wash out the cumulative error.
In summary:
Tuning P, I, D gains. There is no “optimal” way to tune the PID gains
References:
A controller will typically generate a control signal which, in many physical systems, is used to “actuate” a physical component – i.e., make it move.
An actuator, then, is a part of a device or machine that helps it to achieve physical movements by converting energy, such as electrical, air or hydraulic, into mechanical force. Simply put, it is the component in any machine that enables movement. They’re like muscles on a human body – converting energy to physical action. Actuators are present in almost every machine around us, from simple electronic access control systems, the vibrator on your mobile phone and household appliances to vehicles, industrial devices, and robots. Common examples of actuators include electric motors, stepper motors, jackscrews, electric muscular stimulators in robots, etc.
Defined simply, an actuator is a device that converts energy, which may be electric, hydraulic, pneumatic, etc., to mechanical in such a way that it can be controlled. The quantity and the nature of input depend on:
Electric and piezoelectric actuators, for instance, work on the input of electric current or voltage, for hydraulic actuators, its incompressible liquid, and for pneumatic actuators, the input is air. The output is always mechanical energy.
They are responsible for ensuring a device such as a robotic arm is able to move when electric input is provided. A car uses actuators in the engine control system to regulate air flaps for torque and optimization of power, idle speed and fuel management for ideal combustion.
An actuator requires,
The displacement achieved is commonly linear or rotational, as exemplified by
Another broad classification of actuators separates them into two types:
Brushed DC motors move continuously when DC voltage is applied to their terminals.
Stepper motors are a variant of motors, named brushless motors, that rotate in a series of small and discrete angular steps. Stepper motors can be set to any given step position without needing a position sensor for feedback. step position can be rapidly increased or decreased to create continuous rotation, or the motor can be ordered to actively hold its position at one given step. Motors vary in size, speed, step resolution and torque.
The stepper motor is known for its property of converting a train of input pulses (typically square waves) into a precisely defined increment in the shaft’s rotational position. Each pulse rotates the shaft through a fixed angle.
[From Wikipedia: Animation of a simplified stepper motor turned on, attracting the nearest teeth of the gear-shaped iron rotor - with the teeth aligned to electromagnet 1, they will be slightly offset from right electromagnet (2) - Frame 2: The top electromagnet (1) is turned off, and the right electromagnet (2) is energized, pulling the teeth into alignment with it. This results in a rotation of 3.6° in this example. - Frame 3: The bottom electromagnet (3) is energized; another 3.6° rotation occurs. - Frame 4: The left electromagnet (4) is energized, rotating again by 3.6°.
When the top electromagnet (1) is again enabled, the rotor will have rotated by one tooth position; since there are 25 teeth, it will take 100 steps to make a full rotation in this example.]
Motor Control: motor speed and direction are dictated by the voltage applied – change or reverse the polarity of the voltage and the motor will respond in a similar fashion. Voltage can be changed by raising the series resistance within the electrical circuit, which in turn lowers the current through the motor. This change in voltage can be accomplished by series resistors, potentiometers or rheostats. While these devices may be effective for small changes in voltage, the power and torque of the motor are decreased as the current drops. In addition, significant resistance from these devices can produce a lot of heat which could damage other devices within the electrical system.
A more efficient way to vary voltage is to use a PWM controller.
Digital Signals are represented as 0
and
1
. Analog signals, on the other hand, have a
greater range of values, often continuous in nature (as we
saw in the bit about ADCs). To
control a physical/“analog” device using a microcontroller,
we need to do the opposite → convert from digital to
analog signal.
Some microcontrollers have an onboard digital-to-analog converter (DAC) to output a true analog signal in order to control analog devices and we can even use an external DAC. But a DAC is relatively expensive to produce in terms of cost and it also takes up a lot of silicon area. To overcome these issues and to easily achieve the functionality of a DAC in a much more cost-efficient way, we use pulse-width modulation (PWM).
PWM is a method to control analog devices using digital signals. We output an “analog-like signal” from the microcontroller that can then control motors, lights and various actuators.
Note: the PWM is not a true analog signal, just a digital one modified to behave like one. It is essentially a rectangular wave with varying “duty cycle” and periods.
In the following example, an idealized inductor is driven by a set of voltage pulses (in blue) that result in a sine-wave-like current (in red).
PWM is useful for controlling the average power or amplitude delivered by an electrical signal. The average value of voltage (and current) fed to the load is controlled by switching the supply between 0 and 100% at a rate faster than it takes the load to change significantly. The longer the switch is on, the higher the total power supplied to the load.
Example: Consider the following analogy. Imagine you have a ceiling fan that has just an off-on switch, i.e., it is either stationary or goes to 100%.
What if I say: I want the fan to operate at 50%? The only “control” you have is the on-off switch. Can you do it?
Solution:
on
switchoff
on
againIdeally I don’t recommend doing this…
So a PWM “wave” looks like:
on -off switching |
pulse |
duration for which the pulse is held at a high state | pulse width |
T | period |
A PWM wave has two important properties that needs to be tuned:
Recall that logic high → on
(or off
depending on the system, but pick one
for consistency). To represent an on
time, we
use the concept of the duty cycle, defined
as:
duty cycle describes the proportion of ‘on’ time to the regular interval or ‘period’ of time.
Duty cycles are represented as percentages (of time that
the signal is on
, relative to its period).
The duty cycle can be calculated as:
$$D = \frac{T_{on}}{T} * 100$$
where,
D | duty cycle (percentage) |
Ton | duration when signal is on |
T | total period |
Consider the periodic pulse wave, f(t), with a low value, ymin, high value, ymax, and constant duty cycle, D, as shown below:
The average value of a wave is,
$$\bar{y} = \frac{1}{T}\int^T_0f(t)\,dt$$
Since f(t) is a pulse wave, its value is,
ymax | 0 < t < D.T |
ymin | D.T < t < T |
Now, we can expand the above expression as,
$$ \begin{align*} \bar{y} &= \frac{1}{T} \left(\int_0^{DT} y_\text{max}\,dt + \int_{DT}^T y_\text{min}\,dt\right)\\ &= \frac{1}{T} \left(D \cdot T \cdot y_\text{max} + T\left(1 - D\right) y_\text{min}\right)\\ &= D\cdot y_\text{max} + \left(1 - D\right) y_\text{min} \end{align*} $$
So we can now compute how long the signal should be at ymax and how much at ymin.
The period (or frequency, recall that $f=\frac{1}{T}$) is another important parameter that defines a PWM signal. It essentially determines the number of times a signal repeats per second. The choice of T depends heavily on the application. For instance, when controlling an LED,
the frequency of a PWM signal should be sufficiently high if we wish to see a proper dimming effect while controlling LEDs. A duty cycle of 20% at 1 Hz will be noticeable to the human eye that the LED is turning ON and OFF. However, if we increase the frequency to 100Hz, we’ll get to see the proper dimming of the LED.
Is directly related to the Nyquist-Shannon Sampling Theorem discussed earlier. A simple summary,
number of pulses in the waveform is equal to the number of Nyquist samples.
Recall that the Nyquist rate is, a value equal to twice the highest frequency.
Servos (also RC servos) are small, cheap, mass-produced servomotors or other actuators used for radio control and small-scale robotics.
They’re controlled by sending the servo a PWM (pulse-width modulation) signal, a series of repeating pulses of variable width where either the width of the pulse (most common modern hobby servos) or the duty cycle of a pulse train (less common today) determines the position to be achieved by the servo.
We can use the built-in PWM components in many microcontrollers or timer ICs. Using Arduino, generating a PWM is as simple as writing out a few lines of code!
analogWrite(pin, value)
Note that not all pins of an Arduino can generate a PWM signal. In the case of Arduino Uno, there are only 6 I/O pins (3,5,6,9,10,11) that support PWM generation and they are marked with a tilde (~) in front of their pin number on the board.
Examples of various duty cycles:
analogWrite(PWM_PIN, 64); // 25% Duty Cycle or 25% of max speed
analogWrite(PWM_PIN, 127); // 50% Duty Cycle or 50% of max speed
analogWrite(PWM_PIN, 191); // 75% Duty Cycle or 75% of max speed
analogWrite(PWM_PIN, 255); // 100% Duty Cycle or full speed
The Raspberry Pi also has a variety of GPIO pins that can be used for generating PWM signals:
Consider this code for controlling the brightness of an LED:
import RPi.GPIO as IO #calling header file which helps us use GPIO’s of PI
import time #calling time to provide delays in program
False) #do not show any warnings
IO.setwarnings(#we are programming the GPIO by BCM pin numbers. (PIN35 as ‘GPIO19’)
IO.setmode (IO.BCM) 19,IO.OUT) # initialize GPIO19 as an output.
IO.setup(= IO.PWM(19,100) #GPIO19 as PWM output, with 100Hz frequency
p 0) #generate PWM signal with 0% duty cycle
p.start(while 1: #execute loop forever
for x in range (50): #execute loop for 50 times, x being incremented from 0 to 49.
#change duty cycle for varying the brightness of LED.
p.ChangeDutyCycle(x) 0.1) #sleep for 100m second
time.sleep(for x in range (50): #execute loop for 50 times, x being incremented from 0 to 49.
50-x) #change duty cycle for changing the brightness of LED.
p.ChangeDutyCycle(0.1) #sleep for 100m second time.sleep(
References:
Additional reading/examples/etc. if you want to learn more about PWMs, programming, etc.: