Skip to content

Network Automation

My journey with Network & Cloud Automation

Menu
  • Beginner
  • DevOps-NetDevOps
  • Network Automation
    • Docker
    • Python Libraries
      • NAPALM
      • Netmiko
      • Jinja2
      • Scrapli
      • Yang
  • Cloud Automation
    • Terraform
  • Python 🐍 Tips and Tricks
Menu
asyncio

[Theory] Multithreading vs Multiprocessing vs AsyncIO

Posted on August 31, 2021September 1, 2021 by Gurpreet Kochar

I have been using a multi-threaded version of the synchronous codes that we usually see to interact with devices and for over a year now and I have a network of nearly 28k devices (routers / switches / WLCs / FWs ) that I need to manage. With that multithreaded version, I was able to fetch the output of multiple show commands that I would further process into structured data formats and build all kinds of analytics reasonably fast in contrast to the synchronous version which would take not hours but days. I am talking about 28,000 devices sending the output of nearly 15 commands.

However, after coming across Carl Montanari’s Scrapli, it introduced me to the concept of Asynchronous operations and also intrigued me a lot to find out why and why not should I re-write my code to use asyncio instead of using threading when at the end of the day the performance of asyncio and multi-threads is quite similar.

This all led me into knowing a little more about

  1. What is Multiprocessing
  2. What is Multithreading
  3. What is Asynchronous

I will try to keep the examples relative to the networking domain as our scripts deal a lot with I/O and that is where AsyncIO truly shines. I am no computer science student so it’s highly likely that the explanation is not semantically correct but it should help us non-programmers get a reasonably good understanding of all that we really care about. I understand it’s a lot of text but there was no better way to get it across while still maintaining my and your sanity 🙂

Few Basic terminologies

  1. What is a process ?

The moment you execute your python script, it spawns a process for however long the script runs. This set of instructions along with the dependencies ( memory, disk, scheduler, etc ) to execute those is a process.

The moment you execute your python script it spawns a process you can see in the activity monitor or top cmd

2. What is a thread?

Instructions/applications in the process can be single-threaded or multi-threaded. Think of a thread as the actual component of the process that coordinates the task you have given end to end. So when you execute a script, it spawns a process that further creates a thread to get the work done.

3. Multi-threaded

If you want to collect data from 50 devices, for example, you can run it the conventional way or the synchronous way where the usual workflow is

  • Connect to device 1
  • send multiple commands to device
  • wait for device to produce output of all commands
  • log them to a file
  • Connect to device 2
  • repeat the same process for 50 devices

If you look closely, the time the script is waiting for the devices to send the output of commands, it’s doing absolutely nothing, sitting idle and that is the area we can optimize. So instead of using a single thread and access your devices 1 by 1. We could spawn multiple threads inside the same process where each thread is responsible for let’s say 1 device. So theoretically we are now 50 times faster as compared to single-threaded operations but it’s not the case really due to the fact that overall resources that are shared by all the threads are still the same that are available to the global process they are a part of and the overhead although minimal of managing multiple threads but it is still blazing fast when compared to single-threaded.

While it may seem that you are achieving parallel task execution but the reality is no two threads can execute in parallel but the context switching between the two threads happens at such blazing-fast speeds that you perceive it as parallel task execution.

In my opinion, even though you can simultaneously create thousands and thousands of threads at once but there is always a sweet spot where you get the max efficiency which unfortunately you will have to do a hit and trial to find out the number that best suits the need while still maintaining the reliability of operations because the number of parallel threads the bigger is the fight for the same resource pool which can lead to further complications.

I think of threads loosely speaking as multi-docker-container application, each container doing its own thing and at the end, each container combines the result of their individual efforts and join them together to yield the result.

Multithreading with Python for Network Engineers

4. What is multi-processing?

CPU has multiple cores. Each core can have multiple processes and each process can further have multiple threads. In my head multiprocessing is not a replacement for multi-threading even though you get performance benefits from it vs single-threaded synchronous operations.

Again taking an example, if you had 100 devices to pull data from, you can split the workload into 4 cores. Each core handling a separate process of 25 devices each. Right here you can see there are performance benefits of 4 times minus the overhead of handling multiple processes which effectively is less than 4 times but each process in each core is single-threaded so you don’t really get a whole lot of benefit unless you can achieve multiple threads in each core as well. So why don’t we do it?

Clearly, the application that we are bothered about is not CPU intensive, but it’s just slow because of the I/O wait times while the devices respond with the command outputs coupled with the network latency. so utilizing multiple cores is not giving us enough benefit unless we were trying to code a brute force password cracking application which is doing CPU intensive work. What we really need is something that can optimize the way threading is handled which can be done via two ways

  • a. Either we use multiple threads as we discussed above and run into issues of resource crunch and sharing issues across multiple threads.
  • Or, we somehow can optimize the single-threaded version and tell the thread to not to wait for the duration the read/write channel I/O is idle but instead move to the next device, connect to it and then don’t wait for it to return the response and so on. Essentially keep juggling between the different sub tasks of the code to maximize efficiency.

This second point above essentially is what is called Asynchronous Operations which fits really well into our requirement. So instead of the CPU scheduler deciding how multiple threads are handled, in this case, it’s your own code that decides what to execute and what to wait for with special keywords called async and await.

What missing above is “ASYNCHRONOUS ONE PROCESS ONE THREAD”

To summarize the key takeaway

In the next post, we will how to use Carl Montanari’s Scrapli library for AsyncIO usage. Stay Tuned.

Part1 of scrapli

How to use Scrapli for Network Automation

Know someone who may benefit? Share this:

  • Tweet
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to email a link to a friend (Opens in new window) Email
  • More
  • Click to print (Opens in new window) Print
  • Click to share on Reddit (Opens in new window) Reddit
  • Share on Tumblr
  • Pocket

Like this:

Like Loading...

Related

3 thoughts on “[Theory] Multithreading vs Multiprocessing vs AsyncIO”

  1. LifeCanvas says:
    February 16, 2024 at 12:14 PM

    this is a very good blog.

    https://www.lifeandcanvas.com/ssh-request-guide/

    Loading...
    Reply
  2. Pingback: [Practical] Multithreading vs Multiprocessing vs Asynchronous –
  3. Pingback: How to use Scrapli for Network Automation – Network Automation

Leave a ReplyCancel reply

All Blog Posts
My Resume

Upcoming Posts

Sorry - nothing planned yet!

Recent Posts

  • How to backup configuration to TFTP Server using Ansible – Part II
  • How to backup network devices using Ansible – Part I
  • Netmiko SSH Proxy/JumpServer
  • A short note on SASE
  • Understanding Ansible

Recent Comments

  1. Jack on Multithreading with Python for Network Engineers
  2. LifeCanvas on [Theory] Multithreading vs Multiprocessing vs AsyncIO
  3. Jasper Horng on Netmiko SSH Proxy/JumpServer
  4. asdfasdf on Python API Using FASTAPI – UPDATE – PUT – PATCH – Part V
  5. Gurpreet Kochar on Python Scrapli AsyncIO Usage

Archives

  • September 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
Topic Request / Suggestion
Loading
© 2025 Network Automation | Powered by Minimalist Blog WordPress Theme
 

Loading Comments...
 

    %d