We have already seen in theory how all of the above differ and where each of them is applicable and useful. In this post, I will try to demonstrate the same concept using practical examples. I could have spun up tens of devices in GNS3/Eve-ng but I figured just using manual delays to simulate the I/O delays is easier 🙂
Example1- Synchronous, the good old way
Assume we have TOTAL_DEVICES = 10 and each device takes 2 seconds to establish the connection and send the output of all show commands you want.
import time
def connect_and_fetch():
#iterate over multiple devices
#simulate send command and wait for command return
#assume it takes 2 second for each device to establish connection and respond to your show commands
time.sleep(2)
print(f"Data saved for device {i}")
if __name__ == "__main__":
TOTAL_DEVICES = 10
start_time = time.time()
for i in range(1, TOTAL_DEVICES+1):
connect_and_fetch()
print("--- %s seconds ---" % (time.time() - start_time))
╰─ python3 synchronous.py ─╯
Data saved for device 1
Data saved for device 2
Data saved for device 3
Data saved for device 4
Data saved for device 5
Data saved for device 6
Data saved for device 7
Data saved for device 8
Data saved for device 9
Data saved for device 10
--- 20.02708387374878 seconds ---
Example2- Multi-threading
Keeping the number of devices the same, we are assigning each device to its own thread and you can clearly see how much improvement you get. There is no point spawning more connections than the total number of devices but if your number of devices is in thousands, you probably would want to hit and trial the number of threads that give you the best performance. Another observation to make is the order of completion of the function is random because each thread is handling its task separately while they are not absolutely executing parallel but they are close to it.
import time
import concurrent.futures
def connect_and_fetch(i):
#iterate over multiple devices
#simulate send command and wait for command return
#assume it takes 2 second for each device to establish connection and respond to your show commands
time.sleep(2)
print(f"Data saved for device {i}")
if __name__ == "__main__":
TOTAL_DEVICES = 10
start_time = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=TOTAL_DEVICES) as executor:
executor.map(connect_and_fetch, list(range(1, TOTAL_DEVICES+1)))
print("--- %s seconds ---" % (time.time() - start_time))
╰─ python3 multithreading.py ─╯
Data saved for device 1
Data saved for device 4
Data saved for device 2
Data saved for device 9
Data saved for device 7
Data saved for device 3
Data saved for device 6
Data saved for device 8
Data saved for device 5
Data saved for device 10
--- 2.0046212673187256 seconds ---
Example3:- Multi-processing
Now, instead of using multiple threads to handle each device, we will divide the overall task of processing 10 devices across multiple cores of the CPU. I have an 8 core machine but for the benefit of this example, I am going to limit the number of processes to 4 because if I don’t python by default would split the overall task across all cores.
The syntax is very similar to multi-threading.
import time
from multiprocessing import Pool
def connect_and_fetch(i):
#iterate over multiple devices
#simulate send command and wait for command return
#assume it takes 2 second for each device to establish connection and respond to your show commands
time.sleep(2)
print(f"Data saved for device {i}")
if __name__ == "__main__":
TOTAL_DEVICES = 10
start_time = time.time()
with Pool(processes=4) as pool:
pool.map(connect_and_fetch, list(range(1, TOTAL_DEVICES+1)))
print("--- %s seconds ---" % (time.time() - start_time))
╰─ python3 multi-processing.py ─╯
Data saved for device 1
Data saved for device 2
Data saved for device 3
Data saved for device 4
Data saved for device 5
Data saved for device 8
Data saved for device 6
Data saved for device 7
Data saved for device 9
Data saved for device 10
--- 6.256372928619385 seconds ---
If I don’t specify the value of processes, it will split tasks amongst all the 8 cores on my CPU and the time taken will reduce overall but what is extremely important to understand is, within the process in each core, it’s all synchronous single-threaded.
Example:- 12 devices, num_of_processes = 4. So each process will have to handle 3 devices split equally. But these 3 devices will be handled sequentially by the process since that process by default is single-threaded. So multiprocessing is definitely not a better solution for I/O bound applications than a multi-threaded application.
Example4:- Asynchronous single threaded
Instead of having to spawn multiple threads, if we can create sub-routines inside the python script in such a way that we eliminate the blocking nature of the code. The “AWAIT” keyword basically tells the thread to move ahead with future execution while the response is awaiting and hereby removing the blocking element of the code. “AWAIT” keyword can only be defined in the function definition specified by “async def” keyword. Basically, each await needs to be associated with a corresponding async definition.
Control flow:-
- Entry point is asyncio.run(main()) for python > 3.7. The sytanx is different for versions <3.7 but same concept.
- main is an async function that is iterating through the connect_and_fetch() function for however number of times we need and puts it into a list we call coroutines. its just a convention but call it whatever you want.
- await asyncio.gather tells python to gather the result of the execution of all the coroutines in a sequential order but the await keyword in each of the async def connect_and_fetch() tells python to not to wait for its completion and move onto next.
- gather will finally re-order the results in the sequence.
import time
import asyncio
async def connect_and_fetch(i):
#iterate over multiple devices
#simulate send command and wait for command return
#assume it takes 2 second for each device to establish connection and respond to your show commands
await asyncio.sleep(2)
print(f"Data saved for device {i}")
async def main():
coroutines = [connect_and_fetch(i) for i in range(1, TOTAL_DEVICES+1)]
await asyncio.gather(*coroutines)
if __name__ == "__main__":
TOTAL_DEVICES = 10
start_time = time.time()
asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))
╰─ python3 async.py ─╯
Data saved for device 1
Data saved for device 2
Data saved for device 3
Data saved for device 4
Data saved for device 5
Data saved for device 6
Data saved for device 7
Data saved for device 8
Data saved for device 9
Data saved for device 10
--- 2.006028175354004 seconds ---
Even if I bump the number of devices to 1000, the performance of multi-threading and Asyncio remains similar while multi-processing versions lag behind by a lot unless there is a way to run multi-threading inside each process of multiprocessing processes. The true benefit of Asyncio is realized when the number of devices goes in the order of thousands. On a MacBook Pro i7 16Gb, I could see AsyncIO winning the race over multithreading at around 5000 devices.
There may be a lot more to this but this is just my take and my understanding of this very intriguing topic.
Thanks for reading. I hope this was useful.
See AsyncIO for network automation example
1 thought on “[Practical] Multithreading vs Multiprocessing vs Asynchronous”