How to run Cisco Modeling Labs (CML) in the cloud without incurring another mortgage payment

Liam Keegan
7 min readAug 20, 2024

--

CML and Azure Spot Instances

I learned something about CML and support for Azure public cloud. Or lack thereof.

I have a project going on where I need to build out a bunch of IOS-XRv 9000s, which take a whopping 8G of memory per box. I thought I’d be super clever and run CML out on Azure, since my home lab PC isn’t beefy enough to do it.

I needed a machine that would support around 24 cores and 128 gig of RAM. Those instances are pretty expensive — the two that I screen captured here are anywhere from $1000 to $1500/mo.

So, I thought I’d be super clever and run them as Azure spot instances. You get a 90% discount but with the caveat that Azure can shut it down with a 30 second notice. Since I only need it for a couple of weeks, spending $50-$75 would be much more palatable. Plus, since Azure handles compute and storage separately, you can turn off your VM but still pay for the data disks.

Cool, right?

Not so fast..

I’m a lowly Pay as You Go (PAYG) customer. Meaning, I don’t have an enterprise agreement or a subscription that’s purchased through an enterprise channel (i.e. CSP).

For PAYG customers, Azure has a limit of 10 total vCPUs per region, as well as three spot instance vCPUs per region. With that VM profile, I needed 32.

Here’s where it gets not-so-great — once you submit a quota increase request, you get auto denied. It then prompts you to open a case for an engineer to review. The good news is that the Azure support team got back to me within three minutes of opening my case. The bad news is that they immediately deny the request with a “SpotVMNotAllowedForPayGCustomer” reason code and the following message.

QMS Update - Status: ResourceType: crpCores
{
Quota Bucket: TotalLowPriorityCores
Status Description: Due to very high rates of Spot consumption,
Microsoft is unable to approve additional quota at this time
State: SpotVMNotAllowedForPayGCustomer
Current Quota: 3
New Quota: 20
}
Properties: [location, eastus]
}

Even though spot instances are advertised, they’re not really for anyone unless you only need three (or less) vCPUs. Kinda janky if you ask me.

What is the best way to run CML when you need more capacity?

There are a couple options:

First, I could buy a server. I don’t love this idea since the last thing I want to do is spend $800 for some server that’ll get used twice a year and collect dust in my basement. This is the perfect use case for “cloud”, or as we used to call it, “renting a server for a bit”.

My technical requirements are pretty simple:

  • ~18–24 cores of Intel processor that supports virtualization. In this case, AMD won’t work because of the underlying VM requirements that require an Intel chipset.
  • Anywhere from 96–128gb RAM.
  • Couple hundred gigs of SSD storage — if it has RAID, great, otherwise skip it.
  • ESXi as the hypervisor. Yes, you can run CML on bare metal, but I like having a hypervisor and they don’t have any other options (yet) aside from the VMWare product line.
  • Ideally, some way to remotely manage the server.

But, the non-technical requirements are also important:

  • No contract. I don’t want to buy a machine for a year. I just need a month to get this project done.
  • Ideally, no or minimal setup fees.
  • No wait to turn it on. I don’t want to make some poor person in a remote data center rack and stack a server. I want to pay and get access.

Finally, there’s always a push to get the latest and greatest CPUs, DDR5 memory and BillionBaseT networking. I don’t need anything new and fancy. Older processors with DDR4 memory paired with SSD will be plenty fast for what I’m doing.

I scoured… and I found a match!

WHO??!!

There are a ton of bare-metal providers out there, so this isn’t the only one, but GTHost.com (this is an affiliate link if you care to click on it — it’ll help pay for my cloud server addiction) has checked all the boxes for me. How?

  • Pick your VM size of cores and memory from any of their 18 data centers. They don’t sell brand new systems — most of the processors are 2014–2018 vintage, but they’ll work perfectly when paired with fast storage.
  • No contracts. You can even rent the server for 10 days for a fixed per-day price. Auto renewal is OFF by default!
  • No setup fees and active within 15 minutes. ESXi 6.7 trial installed by their control panel.
  • IPMI access so you can get to the server console via web browser.
  • One public IP address that you can allocate as you choose. I opted for a second IP address so that I could easily manage ESXi plus have my Ubuntu VM accessible via the Internet.

Here’s my “lab in a box” CML topology when all is said and done:

A couple of things about the configuration. I’m not running any sort of firewall or NAT gateway. Instead, I use the Cloudflare WARP Connector on an Ubuntu VM. This allows me (and anyone else who wants access) to connect directly to the internal LAN side via the Zero Trust client or just a web browser. (Which is free for up to 50 users and if you’re not using it, you should be.) I don’t need to expose any ports to the Internet, so it’s about as braindead simple and secure as it gets. If I wanted to, I could hide the ESXi management interface from the internal LAN so it is inaccessible unless you have the right permissions.

The IPMI OOB is great because it gives me access to the ESXi console and allows me to get the Ubuntu VM setup. Once that’s done, I can move the IP to the Ubuntu box and then have full access via Zero Trust.

Finally, I copy all the ISOs for CML to the ESXi datastore and boom, done. I’ve done it manually but if I keep redoing this, I’ll automate all of this with a set of Ansible scripts.

Let’s take a look at a couple of configurations and cost.

Small CML Instance

For a quick and dirty beefy CML lab, an $84/mo. server is the way to go. It’s got an older 2650v4 processor and a single 1.9TB SSD. Plenty to get the job done for a reasonably sized lab that includes IOS/NXOS devices plus Firepower/FTD/SD-WAN or third-party images.

GThost.com’s pricing is too high for lower specs — if you’re looking for a 64Gig machine, the pricing is right around this same level, so in order to save some dollars, you might want to look elsewhere.

Medium CML Instance

Includes a newer processor but double the memory and disk space at just under $150/month. Note that when ESXi installs, the hard drives will show as two separate datastores.

Large CML Instance

Finally, if you need a beefy server, you’re stepping up in price, but you get 88 cores (176 with hyperthreading), 384GB of RAM, and a bunch more disk space.

Closing Thoughts (One month in..)

I’ve been running this setup for a month, and it’s been fantastic. Zero issues, fast speeds (although I see more like 250Mbps than 300, but that hasn’t been a major factor), and zero performance issues. Using zero trust is a game changer, since it allows transparent access just via IP or DNS. I can publish the portal via Cloudflare’s infrastructure and give anyone access to it.

The next time I do this, I’ll probably skip ESXi and have Debian loaded natively, then load KVM as the hypervisor and put CML on top of that. There isn’t a supported installer, but it’s replicating the scripts found in the Cisco Cloud CML repo.

Finally, here’s the utilization of my current lab. I’ve had the CPU as high as 106% and it’s run fine. Total cost after a month: $84.

Here’s the lab topology:

About

Liam Keegan is a long-time networking, infrastructure, security and automation nerd. He can be reached via Twitter and LinkedIn.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Liam Keegan
Liam Keegan

Written by Liam Keegan

Data center/security/collab hack, CCIE #5026, focusing on automation, programmability, operational efficiency and getting rid of technical debt.

No responses yet

Write a response