Monday, February 25, 2013

Mutex implementation in ARM architecture


LDREX and STREX are the instructions on which mutex implementation is built on. Let's look briefly at those two instructions.


 
LDREX


LDREX loads data from memory.
  • If the physical address has the Shared TLB attribute, LDREX tags the physical address as exclusive access for the current processor, and clears any exclusive access tag for this processor for any other physical address.
  • Otherwise, it tags the fact that the executing processor has an outstanding tagged physical address.


STREX

STREX performs a conditional store to memory. The conditions are as follows:
  • If the physical address does not have the Shared TLB attribute, and the executing processor has an outstanding tagged physical address, the store takes place and the tag is cleared.
  • If the physical address does not have the Shared TLB attribute, and the executing processor does not have an outstanding tagged physical address, the store does not take place.
  • If the physical address has the Shared TLB attribute, and the physical address is tagged as exclusive access for the executing processor, the store takes place and the tag is cleared.
  • If the physical address has the Shared TLB attribute, and the physical address is not tagged as exclusive access for the executing processor, the store does not take place.


LDREX{size}{cond} Rd, {Rd2,} [Rn {, #offset}]
STREX{size}{cond} Rd, Rm, {Rm2,} [Rn {, #offset}]

Rd
is the destination register. After the instruction, this contains:
  • for LDREX, the data loaded from memory
  • for STREX, either:
    • 0: if the instruction succeeds
    • 1: if the instruction is locked out.


For more information about the above mentioned ARM instructions, refer the following link:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204g/Cihbghef.html


The following discussion is based on the assumption that the you are aware of the inline assembly in gcc. If you are not aware of the inline assembly in gcc, check out the following links.
http://www.ethernut.de/en/documents/arm-inline-asm.html
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html



mutex_lock( ) in kernel/mutex.c

void __sched mutex_lock(struct mutex *lock)
{
        might_sleep();
        /*
         * The locking fastpath is the 1->0 transition from
         * 'unlocked' into 'locked' state.
         */
        __mutex_fastpath_lock(&lock->count, __mutex_lock_slowpath);
        mutex_set_owner(lock);
}

__mutex_lock_slowpath( ) is the function which places the current task in the wait queue and this function is called only if the mutex is not acquired.

Let's look at __mutex_fastpath_lock( ) which is architecture specific (arch/arm/include/asm/mutex.h).

static inline void
__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
{
        int __ex_flag, __res;

        __asm__ (

                "ldrex  %0, [%2]        \n\t"
                "sub    %0, %0, #1      \n\t"
                "strex  %1, %0, [%2]    "

                : "=&r" (__res), "=&r" (__ex_flag)
                : "r" (&(count)->counter)
                : "cc","memory" );

        __res |= __ex_flag;
        if (unlikely(__res != 0))
                fail_fn(count);
}


In the above code, %1 i.e. __ex_flag contain the result of STREX. If the STEX instruction is locked out, then %1 i.e. __ex_flag contain 1. Thereby __res is set to non-zero, as __res |= __ex_flag;. In this case (i.e. we didn't succeed in acquiring the lock),  we will call fail_fn() which is __mutex_lock_slowpath( ) and put the current task in the waitqueue and go to sleep.

In the above code, %1 i.e. __ex_flag contain the result of STREX. If the STEX instruction succeed, then %1 i.e. __ex_flag contain 0. If it mutex is available, then __res is set to zero, as __res |= __ex_flag;. In this case (i.e. we succeeded in acquiring the lock),  we don't call fail_fn() which is __mutex_lock_slowpath( ) and current task cotinues.



mutex_unlock( )  in kernel/mutex.c

void __sched mutex_unlock(struct mutex *lock)
{
        /*
         * The unlocking fastpath is the 0->1 transition from 'locked'
         * into 'unlocked' state:
         */
...
__mutex_fastpath_unlock(&lock->count, __mutex_unlock_slowpath);
}

__mutex_unlock_slowpath( ) is the function which wakes up the tasks that are waiting on this mutex.

Let's look at __mutex_fastpath_unlock( ) which is architecture specific (arch/arm/include/asm/mutex.h).

static inline void
__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
{
        int __ex_flag, __res, __orig;

        __asm__ (

                "ldrex  %0, [%3]        \n\t"
                "add    %1, %0, #1      \n\t"
                "strex  %2, %1, [%3]    "

                : "=&r" (__orig), "=&r" (__res), "=&r" (__ex_flag)
                : "r" (&(count)->counter)
                : "cc","memory" );

        __orig |= __ex_flag;
        if (unlikely(__orig != 0))
                fail_fn(count);
}

In the similar lines as explained for __mutex_fastpath_lock( ), the fail_fn() i.e. __mutex_unlock_slowpath will be called if the mutex is unlocked successfully.

Check the following files for more information:

  • arch/arm/include/asm/mutex.h
  • kernel/mutex.c


No comments:

Post a Comment