Atomic operations and Critical Sections in assembly...in x-86
We all know how atomic operations and semaphores work in higher-level languages, but very few actually know how they are implemented by the CPU!
I was working on a project in C++ where I had to make use of the atomic integers provided by the language. Immediately, I was struck with a realization of a void of knowledge in me regarding atomic instructions at the assembly level. So I did some research and stumbled upon the LOCK instruction prefix which does exactly that.
Let's understand the context first.
Critical Sections
A single-core processor executes a lot of instructions in a second, but it does so sequentially. It has circuitry which enables it to switch processes very quickly, almost giving an illusion of running multiple processes in parallel, but in reality, everything is sequential.
Now there can be multiple such CPU cores
in the system running different threads(a single process can have multiple threads) simultaneously, each executing with its own cache memory
and sharing a common RAM between them. Sometimes, it is hard to keep track of which code is executed by which thread and you may not want this uncertainty to bug your code at any point in time.
So people started using semaphores
and mutexes
in their code to create some kind of No Entry Zone in the code. This is called a Critical Section
. Only one thread at a time is allowed to execute a critical section.
For example, a critical section in a C code looks like this:
#include <pthread.h> pthread_mutex_t myMutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_lock( &myMutex );//CRITICAL SECTION STARTS //code to be executed by a single thread at a time. pthread_mutex_unlock( &myMutex );//CRITICAL SECTION ENDS
In order to really create a critical section, we need to make sure of the following actions that core will take.
- CPU does not leave the current thread execution by doing a context switch to a different thread and starts executing the same critical section without the first thread ever completing its course.
- An interrupt should not force the core to stop processing the thread in the middle of the critical section.
What are atomic operations?
An atomic operation is the one that either completes executing fully or reverts back to its not even started state in case it could not finish.
It is very useful in many applications, the most important ones include financial transactions. You would not want your money to be deducted from your account while not showing up in the destination account. That would be a problem and in that case, you would want the refund of the money essentially returning your state to its original form before the faulty transaction ever happened.
Is a single instruction in assembly atomic?
Well, some instructions are atomic in nature, while some are not. Consider reading a large chunk of data, which may include several small steps to execute fully. There should be a way to tell the CPU about the instructions that are to run atomically. Well, there is, obviously, and that is the lock
instruction prefix.
The LOCK
prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible. If the bus is locked then it is only for the duration of the locked instruction.
An example will show how it is used in the assembly,
atomic_increment: movl 4(%esp), %ecx lock ; this is where the cpu is asked to disable interrupts and context switches for a while incl (%ecx) mov $0,%eax setne %al ret
In the above increment example, only the next instruction following the lock
is locked, in this case, incl (%ecx)
is executed atomically. There are limited instructions that can follow the lock
prefix. Some of them are: add, or, adc, sbb, and, sub, xor
. If lock
is used with an instruction it does not support a trap is raised in the CPU.
Assembly for Critical Sections -
Spinlocks
Now, critical sections are not as easy as using atomic operations. We can't use multiple instructions with the lock
keyword. Spinlocks
can be used to solve this problem. A Spinlock
is like someone waiting at the door, ringing the doorbell, at some point the door will open. We need to jump to the critical section only when we acquire the spinlock else keep looping and trying to acquire the spinlock.
Here it is how we use lock
to create a mutex spinlock in assembly,
;; A spinloop which keeps on looping till we check and set MUTEX ;; bts -- Bit Test and Set, tests if the bit is 0 and if yes set it to 1 spinloop: lock bts MUTEX, 0; carry flag will be set to 0 and the bit to 1 jc spinloop; ;; anything after this point is a critical section! mov MUTEX, 0 ;; Release the mutex
Setting the `MUTEX` to 0 releases the mutex as some other thread running the spinloop
will acquire the lock.
test-and-set
To be added later!!