Smaller Vector Tables
Warning
Migrated from: https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables
One of the largest OS data structures is the vector table,
g_irqvector[]
. This is the table that holds the vector
information when irq_attach()
is called and used to
dispatch interrupts by irq_dispatch()
. Recent changes
have made that table even larger, for 32-bit arm the
size of that table is given by:
nbytes = number_of_interrupts * (2 * sizeof(void *))
We will focus on the STM32 for this discussion to keep things simple. However, this discussion applies to all architectures.
The number of (physical) interrupt vectors supported by
the MCU hardwared given by the definition NR_IRQ
which
is provided in a header file in arch/arm/include/stm32
.
This is, by default, the value of number_of_interrupts
in the above equation.
For a 32-bit ARM like the STM32 with, say, 100 interrupt vectors, this size would be 800 bytes of memory. That is not a lot for high-end MCUs with a lot of RAM memory, but could be a show stopper for MCUs with minimal RAM.
Two approaches for reducing the size of the vector tables
are described below. Both depend on the fact that not all
interrupts are used on a given MCU. Most of the time,
the majority of entries in g_irqvector[]
are zero because
only a small number of interrupts are actually attached
and enabled by the application. If you know that certain
IRQ numbers are not going to be used, then it is possible
to filter those out and reduce the size to the number of
supported interrupts.
For example, if the actual number of interrupts used were 20, the the above requirement would go from 800 bytes to 160 bytes.
Software IRQ Remapping
[On March 3, 2017, support for this “Software IRQ Remapping” as included in the NuttX repository.]
One of the simplest way of reducing the size of
g_irqvector[]
would be to remap the large set of physical
interrupt vectors into a much small set of interrupts that
are actually used. For the sake of discussion, let’s
imagine two new configuration settings:
CONFIG_ARCH_MINIMAL_VECTORTABLE
: Enables IRQ mappingCONFIG_ARCH_NUSER_INTERRUPTS
: The number of IRQs after mapping.
Then it could allocate the interrupt vector table to be
size CONFIG_IRQ_NMAPPED_IRQ
instead of the much bigger
NR_IRQS
:
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS];
#else
struct irq_info_s g_irqvector[NR_IRQS];
#endif
The g_irqvector[]
table is accessed in only three places:
irq_attach()
irq_attach()
receives the physical vector number along
with the information needed later to dispatch interrupts:
int irq_attach(int irq, xcpt_t isr, FAR void *arg);
Logic in irq_attach()
would map the incoming physical
vector number to a table index like:
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
int ndx = g_irqmap[irq];
#else
int ndx = irq;
#endif
where up_mapirq[]
is an array indexed by the physical
interrupt vector number and contains the new, mapped
interrupt vector table index. This array must be
provided by platform-specific code.
irq_attach()
would this use this index to set the g_irqvector[]
.
g_irqvector[ndx].handler = isr;
g_irqvector[ndx].arg = arg;
irq_dispatch()
irq_dispatch()
is called by MCU logic when an interrupt is received:
void irq_dispatch(int irq, FAR void *context);
Where, again irq is the physical interrupt vector number.
irq_dispatch()
would do essentially the same thing as
irq_attach()
. First it would map the irq number to
a table index:
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
int ndx = g_irqmap[irq];
#else
int ndx = irq;
#endif
Then dispatch the interrupt handling to the attached interrupt handler. NOTE that the physical vector number is passed to the handler so it is completely unaware of the underlying shell game:
vector = g_irqvector[ndx].handler;
arg = g_irqvector[ndx].arg;
vector(irq, context, arg);
irq_initialize()
irq_initialize()
: simply set the g_irqvector[]
table
a known state on power-up. It would only have to distinguish
the difference in sizes.
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
# define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS
#else
# define TAB_SIZE NR_IRQS
#endif
for (i = 0; i < TAB_SIZE; i++)
g_mapirq[]
An implementation of up_mapirq()
might be something like:
#include <nuttx/irq.h>
const irq_mapped_t g_irqmap[NR_IRQS] =
{
... IRQ to index mapping values ...
};
g_irqmap[]
is a array of mapped irq table indices. It
contains the mapped index value and is itself indexed
by the physical interrupt vector number. It provides
an irq_mapped_t
value in the range of 0 to
CONFIG_ARCH_NUSER_INTERRUPTS
that is the new, mapped
index into the vector table. Unsupported IRQs would
simply map to an out of range value like IRQMAPPED_MAX
.
So, for example, if g_irqmap[37] == 24
, then the hardware
interrupt vector 37 will be mapped to the interrupt vector
table at index 24. if g_irqmap[42] == IRQMAPPED_MAX
, then
hardware interrupt vector 42 is not used and if it occurs
will result in an unexpected interrupt crash.
Hardware Vector Remapping
[This technical approach is discussed here but is discouraged because of technical “Complications” and “Dubious Performance Improvements” discussed at the end of this section.]
Most ARMv7-M architectures support two mechanism for handling interrupts:
The so-called common vector handler logic enabled with
CONFIG_ARMV7M_CMNVECTOR=y
that can be found inarch/arm/src/armv7-m/
, andMCU-specific interrupt handling logic. For the STM32, this logic can be found at
arch/arm/src/stm32/gnu/stm32_vectors.S
.
The common vector logic is slightly more efficient, the MCU-specific logic is slightly more flexible.
If we don’t use the common vector logic enabled with
CONFIG_ARMV7M_CMNVECTOR=y
, but instead the more
flexible MCU-specific implementation, then we can
also use this to map the large set of hardware
interrupt vector numbers to a smaller set of software
interrupt numbers. This involves minimal changes to
the OS and does not require any magic software lookup
table. But is considerably more complex to implement.
This technical approach requires changes to three files:
A new header file at
arch/arm/include/stm32
, sayxyz_irq.h
for the purposes of this discussion. This new header file is like the other IRQ definition header files in that directory except that it defines only the IRQ number of the interrupts after remapping. So, instead of having the 100 IRQ number definitions of the original IRQ header file based on the physical vector numbers, this header file would defineonly
the small set of 20mapped
IRQ numbers in the range from 0 through 19. It would also setNR_IRQS
to the value 20.A new header file at
arch/arm/src/stm32/hardware
, sayxyz_vector.h
. It would be similar to the other vector definitions files in that directory: It will consist of a sequence of 100VECTOR
andUNUSED
macros. It will defineVECTOR
entries for the 20 valid interrupts and 80UNUSED
entries for the unused interrupt vector numbers. More about this below.Modification of the
stm32_vectors.S
file. These changes are trivial and involve only the conditional inclusion of the new, specialxyz_vectors.h
header file.
REVISIT: This needs to be updated. Neither the xyz_vector.h
files nor the stm32_vectors.S
exist in the current realization.
This has all been replaced with the common vector handling at
arch/arm/src/armv7-m
.
Vector Definitions
In arch/arm/src/stm32/gnu/stm32_vector.S
, notice that the
xyz_vector.h
file will be included twice. Before each
inclusion, the macros VECTOR
and UNUSED
are defined.
The first time that xyz_vector.h
included, it defines the
hardware vector table. The hardware vector table consists
of NR_IRQS
32-bit addresses in an array. This is
accomplished by setting:
#undef VECTOR
#define VECTOR(l,i) .word l
#undef UNUSED
#define UNUSED(i) .word stm32_reserved
Then including xyz_vector.h
. So consider the following
definitions in the original file:
...
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */
VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */
...
Suppose that we wanted to support only USART1 and that
we wanted to have the IRQ number for USART1 to be 12.
That would be accomplished in the xyz_vector.h
header
file like this:
...
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
UNUSED(0) /* Vector 16+38: USART2 global interrupt */
UNUSED(0) /* Vector 16+39: USART3 global interrupt */
...
Where the value of STM32_IRQ_USART1
was defined to
be 12 in the arch/arm/include/stm32/xyz_irq.h
header
file. When xyz_vector.h
is included by stm32_vectors.S
with the above definitions for VECTOR
and UNUSED
, the
following would result:
...
.word stm32_usart1
.word stm32_reserved
.word stm32_reserved
...
These are the settings for vector 53, 54, and 55,
respectively. The entire vector table would be populated
in this way. stm32_reserved
, if called would result in
an “unexpected ISR” crash. stm32_usart1
, if called will
process the USART1 interrupt normally as we will see below.
Interrupt Handler Definitions
in the vector table, all of the valid vectors are set to
the address of a handler function. All unused vectors
are force to vector to stm32_reserved
. Currently, only
vectors that are not supported by the hardware are
marked UNUSED
, but you can mark any vector UNUSED
in
order to eliminate it.
The second time that xyz_vector.h
is included by
stm32_vector.S
, the handler functions are generated.
Each of the valid vectors point to the matching handler
function. In this case, you do NOT have to provide
handlers for the UNUSED
vectors, only for the used
VECTOR
vectors. All of the unused vectors will go
to the common stm32_reserved
handler. The remaining
set of handlers is very sparse.
These are the values of UNUSED
and VECTOR
macros on the
second time the xzy_vector.h
is included by stm32_vectors.S
:
.macro HANDLER, label, irqno
.thumb_func
label:
mov r0, #\irqno
b exception_common
.endm
#undef VECTOR
#define VECTOR(l,i) HANDLER l, i
#undef UNUSED
#define UNUSED(i)
In the above USART1 example, a single handler would be
generated that will provide the IRQ number 12. Remember
that 12 is the expansion of the macro STM32_IRQ_USART1
that is provided in the arch/arm/include/stm32/xyz_irq.h
header file:
.thumb_func
stm32_usart1:
mov r0, #12
b exception_common
Now, when vector 16+37 occurs it is mapped to IRQ 12 with no significant software overhead.
A Complication
A complication in the above logic has been noted by David Sidrane:
When we access the NVIC in stm32_irq.c
in order to enable
and disable interrupts, the logic requires the physical
vector number in order to select the NVIC register and
the bit(s) the modify in the NVIC register.
This could be handled with another small IRQ lookup table
(20 uint8_t
entries in our example situation above). But
then this approach is not so much better than the Software
Vector Mapping described about which does not suffer from
this problem. Certainly enabling/disabling interrupts in a
much lower rate operation and at least does not put the
lookup in the critical interrupt path.
Another option suggested by David Sidrane is equally ugly:
Don’t change the
arch/arm/include/stm32
IRQ definition file.Instead, encode the IRQ number so that it has both the index and physical vector number:
...
VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1)
UNUSED(0)
UNUSED(0)
...
The STM32_INDEX_USART1 would have the value 12 and
STM32_IRQ_USART1 would be as before (53). This encoded
value would be received by irq_dispatch()
and it would
decode both the index and the physical vector number.
It would use the index to look up in the g_irqvector[]
table but would pass the physical vector number to the
interrupt handler as the IRQ number.
A lookup would still be required in irq_attach()
in
order to convert the physical vector number back to
an index (100 uint8_t
entries in our example). So
some lookup is unavoidable.
Based upon these analysis, my recommendation is that we do not consider the second option any further. The first option is cleaner, more portable, and generally preferable.is well worth that.
Dubious Performance Improvements
The intent of this second option was to provide a higher performance mapping of physical interrupt vectors to IRQ numbers compared to the pure software mapping of option 1. However, in order to implement this approach, we had to use the less efficient, non-common vector handling logic. That logic is not terribly less efficient, the cost is probably only a 16 bit load immediate instruction and branch to another location in FLASH (which will cause the CPU pipeline to be flushed).
The variant of option 2 where both the physical vector number
and vector table index are encoded would require even more
processing in irq_dispatch()
in order to decode the
physical vector number and vector table index.
Possible just AND and SHIFT instructions.
However, the minimal cost of the first pure software
mapping approach was possibly as small as a single
indexed byte fetch from FLASH in irq_attach()
.
Indexing is, of course, essentially free in the ARM
ISA, the primary cost would be the FLASH memory access.
So my first assessment is that the performance of both
approaches is the essentially the same. If anything, the
first approach is possibly the more performant if
implemented efficiently.
Both options would require some minor range checking in
irq_attach()
as well.
Because of this and because of the simplicity of the first option, I see no reason to support or consider this second option any further.
Complexity and Generalizability
Option 2 is overly complex; it depends on a deep understanding on how the MCU interrupt logic works and on a high level of Thumb assembly language skills.
Another problem with option 2 is that really only applies to the Cortex-M family of processors and perhaps others that support interrupt vectored interrupts in a similar fashion. It is not a general solution that can be used with any CPU architectures.
And even worse, the MCU-specific interrupt handling logic that this support depends upon is is very limited. As soon as the common interrupt handler logic was added, I stopped implementing the MCU specific logic in all newer ARMv7-M ports. So that MCU specific interrupt handler logic is only present for EFM32, Kinetis, LPC17, SAM3/4, STM32, Tiva, and nothing else. Very limited!
These are further reasons why option 2 is no recommended and will not be supported explicitly.