Proper way to define a few hundred GDGs at once

One of the practices at my site is when a batch cycle kicks off, we allocate new generations of all of the GDGs that will be used in the entire run before running any programs.

This means we now have scenario where were are allocating over 500 files before our processes even starts. I have been tasked with trying to ways to make this massive cycle more efficient. I was wonder which path I should take for these allocation:

Run 1 enormous IDCAMS step and make all of the new versions at once (current functionality)
Break the allocations down into multiple IDCAMS steps that only allocate a portion of the files

Is there much over head in calling IDCAMS multiple times back to back?

I have a hunch that breaking these down into smaller step could improve the performance overall, but I don't really have clear cut way to test it. Our testing environment isn't really a great place to run metrics because our jobs usually end with as low priority in JES so we gets bounced around a lot, so the elapsed time isn't really a good indicator of what actually took place and because these are IDCAMS allocations, the CPU stats are always low anyway.

TLDR; Does anyone know which is more efficient, or how I can find which is more efficient?

Solution

Truth is, defining several hundred datasets isn't something that should stress most modern z/OS systems if done properly. Each allocation goes through a predictable sequence of system services - catalog functions, allocation functions, security, SMF logging and so forth - and while there are certainly subtle differences, each takes a fairly similar amount of time no matter how you do it.

As a rule of thumb, a typical new file allocation shouldn't take more than 100 millisec on a modern average-tuned mainframe. If it's taking more than maybe a minute to allocate your 500 datasets, you might have something wrong that has nothing to do with your use of IDCAMS.

Just as an example, your job might fall into a low-priority class that gets starved for resources once it consumes a certain amount...in this case, it might just be waiting rather then being dispatched (a simple calculation of CPU time divided by elapsed time will tell you if this is the problem). If this is your problem, then a common way to "cheat" is to define the GDG in JCL rather than via IDCAMS...your JCL allocations take place at the priority of the batch initiator, which is usually higher than the job step itself. Keep in mind though that this means an error will result in a JCL error, rather than the non-zero return code you might get from an error in IDCAMS.

You might also want to check your GDG base definition - keeping huge numbers of generations tends to slow things down...perhaps you can come up with a better scheme that stores fewer total generations.

One thing to do is to make sure your systems programmers have done a good job of tuning things properly, especially the catalog environment...there are many parameters that control caching, buffering and so forth, and having a properly tuned catalog is essential if you want good performance. There's a lot of good information in this IBM document. Most of the tasks require special authorization, so this is probably something you can't handle on your own.

If you're actually allocating disk space for the new datasets, you'll also want to make sure that your allocation parameters are good. For instance, if you're putting lots of datasets on the same disk volume, this would be a bad thing. Allocation does a lot of serialization at a volume level, so that means the more you can spread your datasets across multiple disk volumes, the less chance of contention. You can use tools like RMF (or whatever vendor product your site might have) to monitor enqueue delays and so on - this is often a culprit in slow allocation performance.

It's an iterative process, and if you really want to be methodical about it, create a test job that allocates a bunch of your GDG files and collect performance statistics on it. Different allocation parameters and system settings will give you different throughput, and you'll want to home in on the best combinations rather than guess. No matter what your elapsed time, you can get service unit counts for CPU and I/O, and these are your best guide to figuring out what works best.

Once you convince yourself that the system is tuned properly and there are no unnecessary delays going on, the next choice is whether you want to trade CPU utilization for better elapsed time by techniques like parallelism. What you're doing is mostly I/O bound work, so assuming your system is tuned well, splitting your single job into multiple jobs having a subset of the files will use slightly more processor resources, but will run much faster from an elapsed time point of view. The best case will be when either you run out of processor engines, or when you drive the catalog or your disks to high utilization.

Just splitting your allocations into multiple jobs is a simple path to parallelism, assuming your site lets them run in parallel (that is, has enough batch initiators and so forth). If you do this and the elapsed time is no better than running one large job, then it's time to dig in and research where the contention is, as I explain above.

If you're up for a little adventure, a nifty way to do lots of allocations in parallel is to use the UNIX Services shell and something like BPXWDYN instead of IDCAMS (be sure to specify the GDGNT flag to BPXWDYN). Done correctly, you can write yourself a shell script that launches any number of sub processes, with each doing a subset of your allocations. Configured properly, this has the advantage of running in one big address space, rather than batch jobs that would require multiple address spaces to achieve parallelism.