=====================
Smaller Vector Tables
=====================

.. warning:: 
    Migrated from: 
    https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables 


One of the largest OS data structures is the vector table, 
``g_irqvector[]``. This is the table that holds the vector 
information when ``irq_attach()`` is called and used to 
dispatch interrupts by ``irq_dispatch()``. Recent changes 
have made that table even larger, for 32-bit arm the 
size of that table is given by:

.. code-block:: c

    nbytes = number_of_interrupts * (2 * sizeof(void *))

We will focus on the STM32 for this discussion to keep 
things simple. However, this discussion applies to all 
architectures.

The number of (physical) interrupt vectors supported by 
the MCU hardwared given by the definition ``NR_IRQ`` which 
is provided in a header file in ``arch/arm/include/stm32``. 
This is, by default, the value of ``number_of_interrupts`` 
in the above equation.

For a 32-bit ARM like the STM32 with, say, 100 interrupt 
vectors, this size would be 800 bytes of memory. That is 
not a lot for high-end MCUs with a lot of RAM memory, 
but could be a show stopper for MCUs with minimal RAM.

Two approaches for reducing the size of the vector tables 
are described below. Both depend on the fact that not all 
interrupts are used on a given MCU. Most of the time, 
the majority of entries in ``g_irqvector[]`` are zero because 
only a small number of interrupts are actually attached 
and enabled by the application. If you know that certain 
IRQ numbers are not going to be used, then it is possible 
to filter those out and reduce the size to the number of 
supported interrupts.

For example, if the actual number of interrupts used were 
20, the the above requirement would go from 800 bytes to 
160 bytes.

Software IRQ Remapping
======================

`[On March 3, 2017, support for this "Software IRQ Remapping" 
as included in the NuttX repository.]`

One of the simplest way of reducing the size of 
``g_irqvector[]`` would be to remap the large set of physical 
interrupt vectors into a much small set of interrupts that 
are actually used. For the sake of discussion, let's 
imagine two new configuration settings:

* ``CONFIG_ARCH_MINIMAL_VECTORTABLE``: Enables IRQ mapping
* ``CONFIG_ARCH_NUSER_INTERRUPTS``: The number of IRQs after mapping.

Then it could allocate the interrupt vector table to be 
size ``CONFIG_IRQ_NMAPPED_IRQ`` instead of the much bigger 
``NR_IRQS``:

.. code-block:: c 

    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
    struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS];
    #else
    struct irq_info_s g_irqvector[NR_IRQS];
    #endif

The ``g_irqvector[]`` table is accessed in only three places:

``irq_attach()``
----------------

``irq_attach()`` receives the physical vector number along 
with the information needed later to dispatch interrupts:

.. code-block:: c

    int irq_attach(int irq, xcpt_t isr, FAR void *arg);

Logic in ``irq_attach()`` would map the incoming physical 
vector number to a table index like:

.. code-block:: c 

    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
    int ndx = g_irqmap[irq];
    #else
    int ndx = irq;
    #endif

where ``up_mapirq[]`` is an array indexed by the physical 
interrupt vector number and contains the new, mapped 
interrupt vector table index. This array must be 
provided by platform-specific code.

``irq_attach()`` would this use this index to set the ``g_irqvector[]``.

.. code-block:: c 

    g_irqvector[ndx].handler = isr;
    g_irqvector[ndx].arg     = arg;

``irq_dispatch()``
------------------

``irq_dispatch()`` is called by MCU logic when an interrupt is received:

.. code-block:: c 

    void irq_dispatch(int irq, FAR void *context);

Where, again irq is the physical interrupt vector number.

``irq_dispatch()`` would do essentially the same thing as 
``irq_attach()``. First it would map the irq number to 
a table index:

.. code-block:: c 

    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
    int ndx = g_irqmap[irq];
    #else
    int ndx = irq;
    #endif

Then dispatch the interrupt handling to the attached 
interrupt handler. NOTE that the physical vector 
number is passed to the handler so it is completely 
unaware of the underlying `shell` game:

.. code-block:: c 

    vector = g_irqvector[ndx].handler;
    arg    = g_irqvector[ndx].arg;
    
    vector(irq, context, arg);

``irq_initialize()``
--------------------

``irq_initialize()``: simply set the ``g_irqvector[]`` table 
a known state on power-up. It would only have to distinguish 
the difference in sizes.

.. code-block:: c 

    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
    #  define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS
    #else
    #  define TAB_SIZE NR_IRQS
    #endif
    
    for (i = 0; i < TAB_SIZE; i++)

``g_mapirq[]``
--------------

An implementation of ``up_mapirq()`` might be something like:

.. code-block:: c 

    #include <nuttx/irq.h>

    const irq_mapped_t g_irqmap[NR_IRQS] =
    {
    ... IRQ to index mapping values ...
    };

``g_irqmap[]`` is a array of mapped irq table indices. It 
contains the mapped index value and is itself indexed 
by the physical interrupt vector number. It provides 
an ``irq_mapped_t`` value in the range of 0 to 
``CONFIG_ARCH_NUSER_INTERRUPTS`` that is the new, mapped 
index into the vector table. Unsupported IRQs would 
simply map to an out of range value like ``IRQMAPPED_MAX``. 
So, for example, if ``g_irqmap[37] == 24``, then the hardware 
interrupt vector 37 will be mapped to the interrupt vector 
table at index 24. if ``g_irqmap[42] == IRQMAPPED_MAX``, then 
hardware interrupt vector 42 is not used and if it occurs 
will result in an unexpected interrupt crash.

Hardware Vector Remapping
=========================

`[This technical approach is discussed here but is 
discouraged because of technical "Complications" and 
"Dubious Performance Improvements" discussed at the 
end of this section.]`

Most ARMv7-M architectures support two mechanism for handling interrupts:

* The so-called `common` vector handler logic enabled with 
  ``CONFIG_ARMV7M_CMNVECTOR=y`` that can be found in 
  ``arch/arm/src/armv7-m/``, and
* MCU-specific interrupt handling logic. For the 
  STM32, this logic can be found at ``arch/arm/src/stm32/gnu/stm32_vectors.S``.

The `common` vector logic is slightly more efficient, 
the MCU-specific logic is slightly more flexible.

If we don't use the `common` vector logic enabled with 
``CONFIG_ARMV7M_CMNVECTOR=y``, but instead the more 
flexible MCU-specific implementation, then we can 
also use this to map the large set of hardware 
interrupt vector numbers to a smaller set of software 
interrupt numbers. This involves minimal changes to 
the OS and does not require any magic software lookup 
table. But is considerably more complex to implement.

This technical approach requires changes to three files:

* A new header file at ``arch/arm/include/stm32``, say 
  ``xyz_irq.h`` for the purposes of this discussion. 
  This new header file is like the other IRQ definition 
  header files in that directory except that it 
  defines only the IRQ number of the interrupts after 
  remapping. So, instead of having the 100 IRQ number 
  definitions of the original IRQ header file based on 
  the physical vector numbers, this header file would 
  define ``only`` the small set of 20 ``mapped`` IRQ numbers in 
  the range from 0 through 19. It would also set ``NR_IRQS`` 
  to the value 20.
* A new header file at ``arch/arm/src/stm32/hardware``, say 
  ``xyz_vector.h``. It would be similar to the other vector 
  definitions files in that directory: It will consist 
  of a sequence of 100 ``VECTOR`` and ``UNUSED`` macros. It will 
  define ``VECTOR`` entries for the 20 valid interrupts and 
  80 ``UNUSED`` entries for the unused interrupt vector numbers. 
  More about this below.
* Modification of the ``stm32_vectors.S`` file. These changes 
  are trivial and involve only the conditional inclusion 
  of the new, special ``xyz_vectors.h`` header file.

**REVISIT**: This needs to be updated. Neither the ``xyz_vector.h`` 
files nor the ``stm32_vectors.S`` exist in the current realization. 
This has all been replaced with the common vector handling at 
``arch/arm/src/armv7-m``.

Vector Definitions
==================

In ``arch/arm/src/stm32/gnu/stm32_vector.S``, notice that the 
``xyz_vector.h`` file will be included twice. Before each 
inclusion, the macros ``VECTOR`` and ``UNUSED`` are defined.

The first time that ``xyz_vector.h`` included, it defines the 
hardware vector table. The hardware vector table consists 
of ``NR_IRQS`` 32-bit addresses in an array. This is 
accomplished by setting:

.. code-block:: c 

    #undef VECTOR
    #define VECTOR(l,i) .word l
    
    #undef UNUSED
    #define UNUSED(i)   .word stm32_reserved

Then including ``xyz_vector.h``. So consider the following 
definitions in the original file:

.. code-block:: c

    ...
    VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
    VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */
    VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */
    ...

Suppose that we wanted to support only USART1 and that 
we wanted to have the IRQ number for USART1 to be 12. 
That would be accomplished in the ``xyz_vector.h`` header 
file like this:

.. code-block:: c

    ...
    VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
    UNUSED(0)                              /* Vector 16+38: USART2 global interrupt */
    UNUSED(0)                              /* Vector 16+39: USART3 global interrupt */
    ...

Where the value of ``STM32_IRQ_USART1`` was defined to 
be 12 in the ``arch/arm/include/stm32/xyz_irq.h`` header 
file. When ``xyz_vector.h`` is included by ``stm32_vectors.S`` 
with the above definitions for ``VECTOR`` and ``UNUSED``, the 
following would result:

.. code-block:: c 

    ...
    .word stm32_usart1
    .word stm32_reserved
    .word stm32_reserved
    ...

These are the settings for vector 53, 54, and 55, 
respectively. The entire vector table would be populated 
in this way. ``stm32_reserved``, if called would result in 
an "unexpected ISR" crash. ``stm32_usart1``, if called will 
process the USART1 interrupt normally as we will see below.

Interrupt Handler Definitions
-----------------------------

in the vector table, all of the valid vectors are set to 
the address of a `handler` function. All unused vectors 
are force to vector to ``stm32_reserved``. Currently, only 
vectors that are not supported by the hardware are 
marked ``UNUSED``, but you can mark any vector ``UNUSED`` in 
order to eliminate it.

The second time that ``xyz_vector.h`` is included by 
``stm32_vector.S``, the `handler` functions are generated. 
Each of the valid vectors point to the matching handler 
function. In this case, you do NOT have to provide 
handlers for the ``UNUSED`` vectors, only for the used 
``VECTOR`` vectors. All of the unused vectors will go 
to the common ``stm32_reserved`` handler. The remaining 
set of handlers is very sparse.

These are the values of ``UNUSED`` and ``VECTOR`` macros on the 
second time the ``xzy_vector.h`` is included by ``stm32_vectors.S``:

.. code-block:: asm

    .macro HANDLER, label, irqno
        .thumb_func
    label:
        mov r0, #\irqno
        b       exception_common
    .endm
    
    #undef VECTOR
    #define VECTOR(l,i) HANDLER l, i
    
    #undef UNUSED
    #define UNUSED(i)

In the above USART1 example, a single handler would be 
generated that will provide the IRQ number 12. Remember 
that 12 is the expansion of the macro ``STM32_IRQ_USART1`` 
that is provided in the ``arch/arm/include/stm32/xyz_irq.h`` 
header file:

.. code-block:: asm 

        .thumb_func
    stm32_usart1:
        mov r0, #12
        b       exception_common

Now, when vector 16+37 occurs it is mapped to IRQ 12 
with no significant software overhead.

A Complication
--------------

A complication in the above logic has been noted by David Sidrane: 
When we access the NVIC in ``stm32_irq.c`` in order to enable 
and disable interrupts, the logic requires the physical 
vector number in order to select the NVIC register and 
the bit(s) the modify in the NVIC register.

This could be handled with another small IRQ lookup table 
(20 ``uint8_t`` entries in our example situation above). But 
then this approach is not so much better than the `Software 
Vector Mapping` described about which does not suffer from 
this problem. Certainly enabling/disabling interrupts in a 
much lower rate operation and at least does not put the 
lookup in the critical interrupt path.

Another option suggested by David Sidrane is equally ugly:

* Don't change the ``arch/arm/include/stm32`` IRQ definition file.
* Instead, encode the IRQ number so that it has both 
  the index and physical vector number:

.. code-block:: c 

    ...
    VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1)
    UNUSED(0)
    UNUSED(0)
    ...

The STM32_INDEX_USART1 would have the value 12 and 
STM32_IRQ_USART1 would be as before (53). This encoded 
value would be received by ``irq_dispatch()`` and it would 
decode both the index and the physical vector number. 
It would use the index to look up in the ``g_irqvector[]`` 
table but would pass the physical vector number to the 
interrupt handler as the IRQ number.

A lookup would still be required in ``irq_attach()`` in 
order to convert the physical vector number back to 
an index (100 ``uint8_t`` entries in our example). So 
some lookup is unavoidable.

Based upon these analysis, my recommendation is that 
we do not consider the second option any further. The 
first option is cleaner, more portable, and generally 
preferable.is well worth that.

Dubious Performance Improvements
--------------------------------

The intent of this second option was to provide a higher 
performance mapping of physical interrupt vectors to IRQ 
numbers compared to the pure software mapping of option 1. However, 
in order to implement this approach, we had 
to use the less efficient, non-common vector handling 
logic. That logic is not terribly less efficient, the 
cost is probably only a 16 bit load immediate instruction 
and branch to another location in FLASH (which will cause 
the CPU pipeline to be flushed).

The variant of option 2 where both the physical vector number 
and vector table index are encoded would require even more 
processing in ``irq_dispatch()`` in order to decode the 
physical vector number and vector table index. 
Possible just AND and SHIFT instructions.

However, the minimal cost of the first pure software 
mapping approach was possibly as small as a single 
indexed byte fetch from FLASH in ``irq_attach()``. 
Indexing is, of course, essentially `free` in the ARM 
ISA, the primary cost would be the FLASH memory access. 
So my first assessment is that the performance of both 
approaches is the essentially the same. If anything, the 
first approach is possibly the more performant if 
implemented efficiently.

Both options would require some minor range checking in 
``irq_attach()`` as well.

Because of this and because of the simplicity of the 
first option, I see no reason to support or consider 
this second option any further.

Complexity and Generalizability
-------------------------------

Option 2 is overly complex; it depends on a deep understanding 
on how the MCU interrupt logic works and on a high level of 
Thumb assembly language skills.

Another problem with option 2 is that really only applies to 
the Cortex-M family of processors and perhaps others that 
support interrupt vectored interrupts in a similar fashion. 
It is not a general solution that can be used with any CPU 
architectures.

And even worse, the MCU-specific interrupt handling logic 
that this support depends upon is is very limited. As soon 
as the common interrupt handler logic was added, I stopped 
implementing the MCU specific logic in all newer ARMv7-M 
ports. So that MCU specific interrupt handler logic is 
only present for EFM32, Kinetis, LPC17, SAM3/4, STM32, 
Tiva, and nothing else. Very limited!

These are further reasons why option 2 is no recommended and 
will not be supported explicitly.