I have been using a multi-threaded version of the synchronous codes that we usually see to interact with devices and for over a year now and I have a network of nearly 28k devices (routers / switches / WLCs / FWs ) that I need to manage. With that multithreaded version, I was able to fetch the output of multiple show commands that I would further process into structured data formats and build all kinds of analytics reasonably fast in contrast to the synchronous version which would take not hours but days. I am talking about 28,000 devices sending the output of nearly 15 commands.
However, after coming across Carl Montanari’s Scrapli, it introduced me to the concept of Asynchronous operations and also intrigued me a lot to find out why and why not should I re-write my code to use asyncio instead of using threading when at the end of the day the performance of asyncio and multi-threads is quite similar.
This all led me into knowing a little more about
- What is Multiprocessing
- What is Multithreading
- What is Asynchronous
I will try to keep the examples relative to the networking domain as our scripts deal a lot with I/O and that is where AsyncIO truly shines. I am no computer science student so it’s highly likely that the explanation is not semantically correct but it should help us non-programmers get a reasonably good understanding of all that we really care about. I understand it’s a lot of text but there was no better way to get it across while still maintaining my and your sanity 🙂
Few Basic terminologies
- What is a process ?
The moment you execute your python script, it spawns a process for however long the script runs. This set of instructions along with the dependencies ( memory, disk, scheduler, etc ) to execute those is a process.
2. What is a thread?
Instructions/applications in the process can be single-threaded or multi-threaded. Think of a thread as the actual component of the process that coordinates the task you have given end to end. So when you execute a script, it spawns a process that further creates a thread to get the work done.
If you want to collect data from 50 devices, for example, you can run it the conventional way or the synchronous way where the usual workflow is
- Connect to device 1
- send multiple commands to device
- wait for device to produce output of all commands
- log them to a file
- Connect to device 2
- repeat the same process for 50 devices
If you look closely, the time the script is waiting for the devices to send the output of commands, it’s doing absolutely nothing, sitting idle and that is the area we can optimize. So instead of using a single thread and access your devices 1 by 1. We could spawn multiple threads inside the same process where each thread is responsible for let’s say 1 device. So theoretically we are now 50 times faster as compared to single-threaded operations but it’s not the case really due to the fact that overall resources that are shared by all the threads are still the same that are available to the global process they are a part of and the overhead although minimal of managing multiple threads but it is still blazing fast when compared to single-threaded.
While it may seem that you are achieving parallel task execution but the reality is no two threads can execute in parallel but the context switching between the two threads happens at such blazing-fast speeds that you perceive it as parallel task execution.
In my opinion, even though you can simultaneously create thousands and thousands of threads at once but there is always a sweet spot where you get the max efficiency which unfortunately you will have to do a hit and trial to find out the number that best suits the need while still maintaining the reliability of operations because the number of parallel threads the bigger is the fight for the same resource pool which can lead to further complications.
I think of threads loosely speaking as multi-docker-container application, each container doing its own thing and at the end, each container combines the result of their individual efforts and join them together to yield the result.
4. What is multi-processing?
CPU has multiple cores. Each core can have multiple processes and each process can further have multiple threads. In my head multiprocessing is not a replacement for multi-threading even though you get performance benefits from it vs single-threaded synchronous operations.
Again taking an example, if you had 100 devices to pull data from, you can split the workload into 4 cores. Each core handling a separate process of 25 devices each. Right here you can see there are performance benefits of 4 times minus the overhead of handling multiple processes which effectively is less than 4 times but each process in each core is single-threaded so you don’t really get a whole lot of benefit unless you can achieve multiple threads in each core as well. So why don’t we do it?
Clearly, the application that we are bothered about is not CPU intensive, but it’s just slow because of the I/O wait times while the devices respond with the command outputs coupled with the network latency. so utilizing multiple cores is not giving us enough benefit unless we were trying to code a brute force password cracking application which is doing CPU intensive work. What we really need is something that can optimize the way threading is handled which can be done via two ways
- a. Either we use multiple threads as we discussed above and run into issues of resource crunch and sharing issues across multiple threads.
- Or, we somehow can optimize the single-threaded version and tell the thread to not to wait for the duration the read/write channel I/O is idle but instead move to the next device, connect to it and then don’t wait for it to return the response and so on. Essentially keep juggling between the different sub tasks of the code to maximize efficiency.
This second point above essentially is what is called Asynchronous Operations which fits really well into our requirement. So instead of the CPU scheduler deciding how multiple threads are handled, in this case, it’s your own code that decides what to execute and what to wait for with special keywords called async and await.
What missing above is “ASYNCHRONOUS ONE PROCESS ONE THREAD”
To summarize the key takeaway
In the next post, we will how to use Carl Montanari’s Scrapli library for AsyncIO usage. Stay Tuned.
Part1 of scrapli