Capacity planning, sizing, and scaling

Capacity planning helps you determine the system resources needed to meet user demand, respond to peaks, and enable growth. It also helps you avoid having excess capacity sitting idel and driving up storage and maintenance costs.

The two primary tools in capacity planning are sizing and scaling. Sizing is the calculation of memory, hard-disk space, CPU processing power, and throughput capability needed to support your capacity. Scaling is the process of increasing or decreasing an Access Gateway cluster size to meet that calculation.

These topics describe how to assess capacity planning factors so you can determine the right strategy for your environment.

Topics

Capacity planning concepts

To determine what you need for your deployment, you can assess your environment's current statistics.

Okta recommends that you assess these factors in relation to each other, and not in isolation, when you perform your calculations:

  • Users: The total number of system users.
  • Accesses: The total number of times a user accesses a system in a given time period. Okta recommends using longer periods, such as weeks or months, for your analysis. This ensure that you're seeing a more accurate pattern, including peaks and averages, for your environment.
  • Peak authentication or authorization rates: The highest expected levels of access. Peaks can occur at the end of months, quarters, or years, or in response to system upgrades, outages, and seasonal events.
  • Average authentication or authorization rates: The expected norm over a given time period.
  • Total number of applications protected by Access Gateway: The number of applications in your environment that Access Gateway protects. IT environments with fewer applications may require less capacity in your Access Gateway deployment. But if your environment has many users who authenticate frequently, you might need more capacity.

Other factors may limit capacity, such as network throughput or the back-end application performance. See Network interfaces.

Estimate access rates

Estimating access rates is the process of calculating how many accesses and authentications Access Gateway needs to support. You calculate average access rates, divide your users into groups of frequent, infrequent, and rare accesses and authentications, and determine the rates for each group and for all users.

The sample calculations use the following symbols for mathematical notation:

* multiplication
/ division
+ addition
- subtraction
= equals

Calculate average access rates

Average access rates represent a lower bound on how many accesses a given instance of Access Gateway needs to support.

You can estimate average access rates by looking at the sets of users that access the system.

Start by determining the following values, and then use them in the calculation examples:

  • Total users: This value represents the total number of users who might ever access Access Gateway.
  • Estimated daily users: The percentage of users who use an application on a specific day.
  • Estimated daily accesses: The number of times a specific user accesses an application on a specific day.
  • Page accesses for each session: For a given set of authenticated users, what is the expected number of page accesses during a single session?

Calculation examples: average and overall accesses

Average authentication rate

Overall accesses

Average users = Total users / Estimated daily users Overall accesses = Average accesses * Page accesses
Average accesses = Average users / Estimated daily accesses Overall average accesses = Average users * accesses per day

Group by user access type

Group users according to the frequency with which they access the system:

  • Frequent users: Frequent users access the system regularly, typically multiple times each day.
  • Infrequent users: Infrequent users access the system on occasion but much less often than frequent users.
  • Rare users: Rare users access the system a maximum of one to three times a week.

Calculation examples: access rates for each user access type

These examples use a sample size of 10,000 total users. Use your own values for the sample size and number of accesses per day for each user access type.

Frequent users

Infrequent users

Rare users

Total for all user access types

Sample size 5,000 2,500 2,500

10,000

Accesses per day 5 2 1

8

Formula

Frequent users * accesses per day

5 * 5,000

Infrequent users * accesses per day

2 * 2,500

Infrequent users * accesses per day

1 * 2,500

These users might not access the system each day. If they access the system every other day, divide the accesses by half:

1 * 2,500 / 0.5

Frequent + Infrequent + Rare accesses

25,000 + 5,000 + 2500

Sample total 25,000 5,000 2,500 or 1,250

32,500

Estimate hardware sizing requirements

Hardware sizing refers to how much memory and hard-disk space, how many CPUs and cores, and the level of throughput you need to ensure optimal performance for your Access Gateway deployment.

Memory sizing

Access Gateway requires the following amounts of appliance memory:

  • Operating system, Access Gateway engine, and micro-services: 1.5 gigabytes is the minimum for production environments.
  • Cached Sessions: 128 megabytes minimum.

The operating system, core Access Gateway, and micro-services memory is fixed. As a result, determining memory requirements is primarily focused on cache session sizing. Calculate cache session sizing using these factors:

  • Total sessions: The maximum number of in-memory sessions at any given time. Total sessions are calculated using one of the following elements:
    • Number of users
    • Percentage of user sign-in events per day
    • Applications accessed
    • Total sessions = number of users * % sign-ins per day * applications accessed
  • Average session size: The average expected size of any given session. Session size is a function of these elements:
    • Application session and application attributes, with default size of approximately 1024 bytes
    • Kerberos tickets (where applicable), approximately 1024 bytes, but are often larger based on then number of IIS applications accessed

To calculate cache session sizing, use this formula: Session cache = Total sessions * (average session size * 2)

Calculation examples: web application session cache

Users

Percentage of sign-in events per day

Applications accessed

Total sessions

Session size

Session cache

5,000 50% 5

Users % sign-ins per day * applications accessed

12,500

1024 B * 2 approx 25 MB
10,000 75% 10 75,000 1024 B * 2 approx 150 MB
25,000 50% 100 125,000 1024 B * 2 approx 500 MB

Calculation examples: Kerberos apps

Kerberos adds caching requirements.

Users

Percentage of sign-in events per day

IIS applications accessed

Total sessions

Session size

Session cache

10,000

50%

5

% of user sign-ins per day * applications accessed

25,000

1024 B

approx 50 MB

Total application memory should then include at least 1.5 gigabytes for fixed requirements and session cache plus Kerberos requirements.

Session considerations:

  • Sessions are cleared using a Least Recently Used (LRU) algorithm .
    When the cache is full and new sessions are created, the oldest idle session is removed.
  • The Session Monitoring logger raises alerts for a cache full and near full conditions. You can find statistics in the Access Gateway Management console.
    Consider increasing the appliance memory to reduce the incidence of full caches.
  • Always consider peak session usage situations and plan accordingly.
    For example, consider the times when loads typically increase, such as the time of year when employees are enrolled, morning and after lunch sign-in events, and similar situations.

Hard-disk sizing

Access Gateway requires hard disk space for the following elements. Determine the number and size of these elements over a given time period:

  • Software: Access Gateway software and the operating system. The software file size is usually small.
  • Backups: Performed nightly and retained for 30 days. Backup file size is usually small.
  • System Log: Log files are spooled to the local disk, and include Authentication, Authorization, Audit, Access, Session, and All Log files. Typically, there's one entry for each HTTP/HTTPS request. Consider the number of users, the number of times users access applications, and the number of page views they make for that application.
  • Log archives: These are maintained for 30 days, and are rolled and compressed.

Calculation examples: log entry sizes and growth

Use these calculations to estimate how much your disk size needs will grow over time. Replace the example figures with your own. A reasonable rule is to allocate twice the expected consumption plus additional overhead space for software updates, configuration, and backups. In this example, there would be a disk requirement of approximately 14 gigabytes plus 10 to 20% extra, or roughly 17 to 20 gigabytes of growth each month.

User base

10,000

Access rate

75%

Applications per user

10

Accesses per day

User base * Access rate * Applications per user

10,000 * 0.75 * 10 = 75,000

Typical log entry size

(Access Gateway, authorization, and authentication sessions)

1024 bytes * 3 = 3072 bytes

Disk size growth per day

Accesses per day * typical log entry size

75,000 * 3072 bytes = 76,800,000 bytes or approximately 230.4 megabytes per day

Disk size growth per month (30 days)

Disk size growth per day * 30

230.4 * 30 = 6.912 gigabytes

Disk size growth per year (365 days)

Disk size growth per day * 365 = 84.096 gigabytes

Hard Disk considerations:

  • Monitor logger alerts on low disk levels to avoid low disk warning size for maximum or peak requests. The check runs hourly and gives warnings at 70% usage and alerts for 90% usage.
  • Every HTTP request results in audit and access logs.
  • Faster disk IO improves throughput.
  • Session size affects audit logging with authorization and audit logs contain session contents.
  • Don’t be conservative when calculating hard-disk sizing. Allocate double the estimated disk requirements to avoid burst and large page requests resulting in low disk space warnings.

CPU sizing

The Access Gateway engine autoscales across CPUs, which results in one worker per CPU. Each additional core results in an additional thread, which allows additional processing.

  • More CPUs or cores improve capacity.

  • Network throughput is typically the processing bottleneck, not CPU processing.

Throughput sizing

Throughput is the rate of data delivery through the network. In Access Gateway, the data delivered through the network include authentication, authorization, and return content data:

  • Authentications = SAML assertions processed (from Okta to each application).
  • Authorizations = Policy check per HTTP request (all HTTP requests).

Each authentication and authorization request uses approximately 1024 bytes each, resulting in 2048 bytes of returned data. You can calculate network throughput using this formula:

  • Sign-ins per second * (authentications + authorizations per second + returned data size).

To calculate the average network bandwidth, use this formula:

Average response size * average request arrival rate

If the average response is approximately 20 kilobytes in size, for 500 requests the result becomes 20 kilobytes * 500, which equals 10 megabytes per second.

You can find exact data, including the time to perform a request and the size of the return data, in the AuthN and All logs.

Instance sizing

The number of users and authentications that each instance size can support depends on many complex factors. The relationship between total disk and memory size available on the server, processor and network connection speeds, and other factors all affect capacity. You can deploy Access Gateway in different configurations to suit your environment. You can set up one or two instances of Access Gateway on hardware with multiple cores and larger memory modules. Or you may deploy Access Gateway on more and smaller instances spread across multiple servers in a cluster. For help with determining how many users your Access Gateway instances can support, use the calculations on this page, or contact Okta Support for help with your unique environment.

This table describes the minimum hardware configurations required to support different Access Gateway instance sizes. The number of users and apps shown in the table are provided as guidelines only. Actual performance depends on factors unique to your environment.

Use Physical/virtual hardware AWS equivalent

Proof Of Concept

  • 1 instance with the following hardware:
    • 2 cores of 2 gigabytes of memory
    • 220 gigabytes (default) hard drive
    • Single 1 Gbps NIC

t2.medium

Small

  • 1000-5000 users
  • 1-10 apps
  • 2 instances, each with the following hardware:
    • 2 cores of 4 gigabytes of memory
    • 220 gigabytes (default) hard drive
    • Single 1 Gbps NIC
t2.medium

Medium

  • 5000-20,000 users
  • 10-100 apps
  • 3 instances, each with the following hardware:
    • 2 cores of 8 gigabytes of memory
    • 500-gigabytes hard drive
    • Single 1 Gbps NIC
m4.large

Large

  • 20,000 users or more
  • 100 apps or more
  • 3 instances, each with the following hardware:
    • 4 cores of 16 gigabytes of memory
    • 500-gigabytes hard drive
    • Single 1 Gbps NIC
m4.xlarge

See AWS Instance Types.

Estimate scaling requirements

Scaling is the process of increasing or decreasing an Access Gateway cluster size.

Clusters can be scaled vertically or horizontally:

  • Scaled vertically: Add or remove memory, disk, or CPUs from a given instance.
  • Scaled horizontally: Add or remove Access Gateway instances from a cluster.

Okta recommends that you deploy all Access Gateway high-availability cluster members with the same CPU, memory, and disk configurations.

When examining cluster performance, consider the following tips:

  • The best performance increases result from adding CPUs or using solid-state disks.
  • To improve overall cluster throughput, Okta recommends horizontal scaling or adding additional Access Gateway instances. For example, for a two-node cluster that handles 1500 requests, you can double the capacity by adding two additional nodes with the same CPU, memory, and disk configuration.
  • In general, horizontal scaling is linear due to Access Gateway's use of sticky sessions (session affinity). Access Gateway doesn't share sessions between nodes.

Capacity may be limited by other factors not related to Access Gateway, such as network throughput or the back-end application performance.
See Network interfaces for more information on Access Gateway networking and expanding networking throughput.