workload run as many independently-scheduled jobs (1 or more tasks per job)
task: an distinct unit of work within a workload
job: an individually-scheduled set of compute work
(may execute 1 or more tasks)
workload: a related set of tasks defined by the user (per dataset, project, experiment, etc.)
Overall time will increase significantly as individual tasks become longer AND as the number of tasks grows in orders of magnitude.
Parallelize, right?!?
Excellent for long simulations
- within-task parallelization
- interdependent sub-tasks
Require/benefit from:
shared filesystems
fast networking (e.g. Infiniband)
CPU speed and homogeneity
Scheduler must wait for and reserve multiple processors for the same duration.
Some softwares limit the number of processors to what is available on a single server. Even multi-server (MPI) software still limits users to an extent of parallelization dependent on the capacity and availability of a single cluster, filesystem, queue, and local network.
Expensive networking is poorly utilized.
Numerous small files and file-writes create problems for shared file systems.
The more scalability and parallelization desired...
greater susceptibility to failures
greater difficulty of recovery
presentation as dynamic Prezi at:
http://tinyurl.com/WhyHTC-Clemson
CPUs (don't have to be fast)
manage many compute jobs
match jobs with resources
break up workload
submit as independent jobs
understand resource needs
manage input and output
number of cores
n
1
2
3
time
if order DOESN'T matter
break up input (as files)
port software, as needed
jobs short and/or resumable
readily transfer data/software
ability to backfill (kill&recover)
security measures
if order DOES matter
number of cores
1
2
time
3
n
If you're a researcher:
more research, faster
better research
If you're a provider:
better utilization of existing hardware
better research outcomes for your
organization
lmichael@wisc.edu
scheduling/wait time
1 core
1
2
time
3
n
www.lasvegas2005.org
www.lasvegas2005.org