By Piyush Shukla in Computers — Nov 4, 2021

Atomic operations and Critical Sections in assembly...in x-86

We all know how atomic operations and semaphores work in higher-level languages, but very few actually know how they are implemented by the CPU!

I was working on a project in C++ where I had to make use of the atomic integers provided by the language. Immediately, I was struck with a realization of a void of knowledge in me regarding atomic instructions at the assembly level. So I did some research and stumbled upon the LOCK instruction prefix which does exactly that.

Let's understand the context first.

Critical Sections

A single-core processor executes a lot of instructions in a second, but it does so sequentially. It has circuitry which enables it to switch processes very quickly, almost giving an illusion of running multiple processes in parallel, but in reality, everything is sequential.
Now there can be multiple such CPU cores in the system running different threads(a single process can have multiple threads) simultaneously, each executing with its own cache memory and sharing a common RAM between them. Sometimes, it is hard to keep track of which code is executed by which thread and you may not want this uncertainty to bug your code at any point in time.
So people started using semaphores and mutexes in their code to create some kind of No Entry Zone in the code. This is called a Critical Section. Only one thread at a time is allowed to execute a critical section.
For example, a critical section in a C code looks like this:

#include <pthread.h>
pthread_mutex_t myMutex = PTHREAD_MUTEX_INITIALIZER;

pthread_mutex_lock( &myMutex );//CRITICAL SECTION STARTS

	//code to be executed by a single thread at a time.

pthread_mutex_unlock( &myMutex );//CRITICAL SECTION ENDS

In order to really create a critical section, we need to make sure of the following actions that core will take.

CPU does not leave the current thread execution by doing a context switch to a different thread and starts executing the same critical section without the first thread ever completing its course.
An interrupt should not force the core to stop processing the thread in the middle of the critical section.

What are atomic operations?

An atomic operation is the one that either completes executing fully or reverts back to its not even started state in case it could not finish.
It is very useful in many applications, the most important ones include financial transactions. You would not want your money to be deducted from your account while not showing up in the destination account. That would be a problem and in that case, you would want the refund of the money essentially returning your state to its original form before the faulty transaction ever happened.

Is a single instruction in assembly atomic?

Well, some instructions are atomic in nature, while some are not. Consider reading a large chunk of data, which may include several small steps to execute fully. There should be a way to tell the CPU about the instructions that are to run atomically. Well, there is, obviously, and that is the lock instruction prefix.
The LOCK prefix ensures that the CPU has exclusive ownership of the appropriate cache line for the duration of the operation, and provides certain additional ordering guarantees. This may be achieved by asserting a bus lock, but the CPU will avoid this where possible. If the bus is locked then it is only for the duration of the locked instruction.

An example will show how it is used in the assembly,

atomic_increment:
    movl 4(%esp), %ecx
    lock ; this is where the cpu is asked to disable interrupts and context switches for a while
    incl (%ecx)
    mov $0,%eax
    setne %al
    ret

In the above increment example, only the next instruction following the lock is locked, in this case, incl (%ecx) is executed atomically. There are limited instructions that can follow the lock prefix. Some of them are: add, or, adc, sbb, and, sub, xor. If lock is used with an instruction it does not support a trap is raised in the CPU.

Assembly for Critical Sections -

`Spinlocks`

Now, critical sections are not as easy as using atomic operations. We can't use multiple instructions with the lock keyword. Spinlocks can be used to solve this problem. A Spinlock is like someone waiting at the door, ringing the doorbell, at some point the door will open. We need to jump to the critical section only when we acquire the spinlock else keep looping and trying to acquire the spinlock.

Here it is how we use lock to create a mutex spinlock in assembly,

;; A spinloop which keeps on looping till we check and set MUTEX
;; bts -- Bit Test and Set, tests if the bit is 0 and if yes set it to 1

spinloop:
	lock bts MUTEX, 0; carry flag will be set to 0 and the bit to 1
    	jc spinloop;
    ;; anything after this point is a critical section!
    
    
    mov MUTEX, 0 ;; Release the mutex

Setting the `MUTEX` to 0 releases the mutex as some other thread running the spinloop will acquire the lock.

`test-and-set`

To be added later!!