While reading about FASTAPI, it became more imperative to understand a little more about Pydantic to be able to build a better and more robust self-made API. In this post, I will try to answer the questions I had on my journey to understand more about FASTAPI.
What is Pydantic and What is it used for?
I will try to explain this using an example that is relatable to us as network engineers. Assume we have an excel sheet with details about a device like a hostname, IP, version, etc, etc and we want to build a data model out of the excel sheet for each device. The task of data modeling, validation, data parsing, and data conversion into different formats is something that pydantic inherently does
Suppose we have information about a device in the below format.
facts = {
'uptime': 29160,
'vendor': 'Cisco',
'os_version': 'Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.3.1a, RELEASE SOFTWARE (fc3)',
'serial_number': '9ESGOBARV9D',
'model': 'CSR1000V',
'hostname': 'csr1000v-1',
'fqdn': 'csr1000v-1.lab.devnetsandbox.local',
'interface_list': ['GigabitEthernet1', 'GigabitEthernet2', 'GigabitEthernet3', 'Loopback1']
}
arp_table = [
{'interface': 'GigabitEthernet1', 'mac': '00:50:56:BF:49:0F', 'ip': '10.10.20.28', 'age': 1.0},
{'interface': 'GigabitEthernet1', 'mac': '00:50:56:BF:78:AC', 'ip': '10.10.20.48', 'age': -1.0},
{'interface': 'GigabitEthernet1', 'mac': '00:50:56:BF:D6:36', 'ip': '10.10.20.254', 'age': 5.0},
{'interface': 'GigabitEthernet2', 'mac': '00:50:56:BF:4E:A3', 'ip': '100.100.100.100', 'age': -1.0}
]
We know want to build a data validation and parsing framework to make sure the input data conforms with certain policies.
Our requirements:-
- uptime is always an Integer
- vendor, os_version, serial_number, model, hostname, fqdn is a string
- interface_list is a list of strings
- arp_table has interface as strings, mac should adhere to well defined mac formats and ip is always ipv4 format
- We can ignore fqdn and age field for the demo purpose.
Let’s define a pydantic basemodel for our requirements:-
from pydantic import BaseModel, Field
class DeviceFacts(BaseModel):
hostname: str = Field(..., alias='Hostname') # ... means a mandate
uptime: int = Field(..., alias='Uptime')
vendor: str = Field(..., alias='Vendor')
os_version: str = Field(..., alias='OS_Version')
print(DeviceFacts(**facts))
print(DeviceFacts.parse_obj(facts))
#Both the methods above create same end result as below
hostname='csr1000v-1' uptime=29160 vendor='Cisco' os_version='Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.3.1a, RELEASE SOFTWARE (fc3)'
hostname='csr1000v-1' uptime=29160 vendor='Cisco' os_version='Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.3.1a, RELEASE SOFTWARE (fc3)'
Even though the facts dictionary as multiple keys but the data model automatically extracts only those values that are defined in the model and rest are filtered. On top of it you can make fields optional too like
class DeviceFacts(BaseModel):
hostname: Optional[str] = Field(title='Hostname')
uptime: int = Field(title='Uptime')
vendor: str = Field(title='Vendor')
os_version: str = Field(title='Os_Version')
You could also wrap this inside a function to leverage try and except blocks to return error messages with added meaning
def validate_model(model: BaseModel, data: dict) -> Optional[BaseModel]:
try:
return model(**data)
except ValidationError as e:
return e.json()
print(validate_model(DeviceFacts, facts))
#Example output when hostname key is missing from facts dictionary and its mandatory.
[
{
"loc": [
"hostname"
],
"msg": "field required",
"type": "value_error.missing"
}
]
We see if a field is missing altogether, it will raise a missing error message but what if the field is actually not missing but it has empty value like hostname is left blank below
facts = {
'uptime': 29160,
'vendor': 'Cisco',
'os_version': 'Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.3.1a, RELEASE SOFTWARE (fc3)',
'serial_number': '9ESGOBARV9D',
'model': 'CSR1000V',
'hostname': '',
'fqdn': 'csr1000v-1.lab.devnetsandbox.local',
'interface_list': ['GigabitEthernet1', 'GigabitEthernet2', 'GigabitEthernet3', 'Loopback1']
}
print(validate_model(DeviceFacts, facts))
hostname='' uptime=29160 vendor='Cisco' os_version='Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.3.1a, RELEASE SOFTWARE (fc3)'
It’s not entirely desirable to have missing data because the end goal of pydantic models is to have normalized / serialized data.
from pydantic import BaseModel, Field, IPvAnyAddress, ValidationError, validator
class DeviceFacts(BaseModel):
hostname: str = Field(..., title='Hostname')
uptime: int = Field(..., title='Uptime')
vendor: str = Field(..., title='Vendor')
os_version: str = Field(..., title='Os_Version')
@validator('hostname', pre=True)
def validate_hostname(cls, v):
if not v:
raise ValidationError('Hostname is required')
return v
Or, instead of just raising the error, you could replace emptry strings with a None or whatever you like or can even set a default value to hostname field incase it is missing in the facts dictionary altogether.
class DeviceFacts(BaseModel):
hostname: str = Field(..., title='Hostname')
uptime: int = Field(..., title='Uptime')
vendor: str = Field(..., title='Vendor')
os_version: str = Field(..., title='Os_Version')
@validator('*', pre=True)
def replace_empty_with_null(cls, v):
return v if v else None
# '*' means this validator applies to all fields of the model
You could also leverage some inbuilt validation methods for strings like min_length or max_length
class DeviceFacts(BaseModel):
hostname: str = Field(..., title='Hostname', min_length=1, max_length=15)
# You could also have it validated against a regex too like so
class DeviceFacts(BaseModel):
hostname: str = Field(..., title='Hostname', min_length=1, max_length=15, regex='^[a-zA-Z0-9-]+$')
# You could write a validator too for achieving the same end result like demonstrated above but its
# good to know whats available out of the box.
Let’s build a model for arp_table now
class ArpTable(BaseModel):
interface: str = Field(..., title='Interface')
mac: str = Field(..., title='Mac')
ip: IPvAnyAddress = Field(..., title='Ipv4')
age: int = Field(..., title='Age')
print(arp_table=[ArpTable(**arp) for arp in arp_table])
####
[
ArpTable(interface='GigabitEthernet1', mac='00:50:56:BF:49:0F', ip=IPv4Address('10.10.20.28'), age=1),
ArpTable(interface='GigabitEthernet1', mac='00:50:56:BF:78:AC', ip=IPv4Address('10.10.20.48'), age=-1),
ArpTable(interface='GigabitEthernet1', mac='00:50:56:BF:D6:36', ip=IPv4Address('10.10.20.254'), age=5),
ArpTable(interface='GigabitEthernet2', mac='00:50:56:BF:4E:A3', ip=IPv4Address('100.100.100.100'), age=-1)
]
Just like hostname, we could write a validator for mac address like below
class ArpTable(BaseModel):
interface: str = Field(..., title='Interface')
mac: str = Field(..., title='Mac')
ip: IPvAnyAddress = Field(..., title='Ipv4')
age: int = Field(..., title='Age')
@validator('mac', pre=True)
def validate_mac(cls, v):
if not re.match("[0-9a-f]{2}([-:]?)[0-9a-f]{2}(\\1[0-9a-f]{2}){4}$", v.lower()):
raise ValidationError('Mac is required')
return v
# now if we supply the mac address in any other format, it will throw an error message.
# We can make the validation logic more comprehensive to check all possible mac formats
You could ignore arp entries that have negative values or in other words any value less than 0 for whatever reason.
print([ArpTable(**arp) for arp in arp_table if arp['age'] > 0])
Now comes the interesting part of combining the two data models into one. Let’s try to combine the models into a single model named DeviceDetails
class DeviceFacts(BaseModel):
hostname: str = Field(..., title='Hostname', min_length=1, max_length=15, regex='^[a-zA-Z0-9-]+$')
uptime: int = Field(..., title='Uptime')
vendor: str = Field(..., title='Vendor')
os_version: str = Field(..., title='Os_Version')
class ArpTable(BaseModel):
interface: str = Field(..., title='Interface')
mac: str = Field(..., title='Mac')
ip: IPvAnyAddress = Field(..., title='Ipv4')
age: int = Field(..., title='Age')
class DeviceDetails(BaseModel):
facts: DeviceFacts
arp_table: List[ArpTable]
print(DeviceDetails(facts=DeviceFacts(**facts), arp_table=[ArpTable(**arp) for arp in arp_table]))
# or to simplify
facts = DeviceFacts(**facts)
arp_table = [ArpTable(**arp) for arp in arp_table]
print(DeviceDetails(facts = facts, arp_table=arp_table))
facts=DeviceFacts(hostname='csr1000v-1', uptime=29160, vendor='Cisco', os_version='Virtual XE Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.3.1a, RELEASE SOFTWARE (fc3)')
arp_table=[ArpTable(interface='GigabitEthernet1', mac='00:50:56:BF:49:0F', ip=IPv4Address('10.10.20.28'), age=1), ArpTable(interface='GigabitEthernet1', mac='00:50:56:BF:78:AC', ip=IPv4Address('10.10.20.48'),
age=-1), ArpTable(interface='GigabitEthernet1', mac='00:50:56:BF:D6:36', ip=IPv4Address('10.10.20.254'), age=5), ArpTable(interface='GigabitEthernet2', mac='00:50:56:BF:4E:A3',
ip=IPv4Address('100.100.100.100'), age=-1)]
At this point we are able to access our normalized and verified data for our programmes. For instance you may want to write a API using something like FASTAPI that inherently leverages the power of pydantic, this will fit nicely into the overall scheme of things.
x = DeviceDetails(facts = facts, arp_table=arp_table)
print(x.facts.hostname)
print(x.arp_table[0].mac)
#Output
csr1000v-1
00:50:56:BF:49:0F
While I am still learning myself, I hope this documentation of my learning can help you in some way to push you on your journey to become a better network automation engineer.