Loading…
Transcript

"Parallelizing" Computational Work

HTC-able

Research Examples

High Throughput Computing (HTC)

workload run as many independently-scheduled jobs (1 or more tasks per job)

task: an distinct unit of work within a workload

job: an individually-scheduled set of compute work

(may execute 1 or more tasks)

workload: a related set of tasks defined by the user (per dataset, project, experiment, etc.)

  • CPU speed and homogeneity become less important

  • number of concurrently running jobs becomes more important
  • easier to match to 1 CPU at a time than many (shorter wait)
  • easier recovery from failure
  • no special programming
  • managing quantity becomes important

Overall time will increase significantly as individual tasks become longer AND as the number of tasks grows in orders of magnitude.

Parallelize, right?!?

Excellent for long simulations

- within-task parallelization

- interdependent sub-tasks

Require/benefit from:

shared filesystems

fast networking (e.g. Infiniband)

CPU speed and homogeneity

Scheduler must wait for and reserve multiple processors for the same duration.

HPC approach often used for

HTC-able work

Some softwares limit the number of processors to what is available on a single server. Even multi-server (MPI) software still limits users to an extent of parallelization dependent on the capacity and availability of a single cluster, filesystem, queue, and local network.

Expensive networking is poorly utilized.

Numerous small files and file-writes create problems for shared file systems.

The more scalability and parallelization desired...

greater susceptibility to failures

greater difficulty of recovery

What is High Throughput Computing (HTC)?

How to Do HTC Well

  • What is not HTC?
  • How to identify HTC-able research problems
  • What is needed to execute HTC well?

What does "High Throughput" mean?

At Scale ...

Where to Start?

Access to HTC Systems

Someone Else's (Free)

  • ask nicely for access
  • Open Science Grid
  • campus members
  • OSG Connect

(individual researchers)

A Local HTC System

  • dedicated hardware
  • backfill another cluster

and/or desktops

  • with HTCondor
  • combine the above!

Why High Throughput Computing?

presentation as dynamic Prezi at:

http://tinyurl.com/WhyHTC-Clemson

User

System/Scheduler

CPUs (don't have to be fast)

manage many compute jobs

match jobs with resources

break up workload

submit as independent jobs

understand resource needs

manage input and output

number of cores

n

1

2

3

time

if order DOESN'T matter

break up input (as files)

port software, as needed

jobs short and/or resumable

readily transfer data/software

ability to backfill (kill&recover)

security measures

helps with both!

High Performance Computing (HPC)

if order DOES matter

number of cores

1

2

time

3

n

Pay-for Services

  • Cycle Computing
  • Globus Genomics
  • others

parameter sweeps

statistical model optimization

(MCMC, etc.)

text analysis

multi-start simulations

image analysis

If you're a researcher:

more research, faster

better research

If you're a provider:

better utilization of existing hardware

better research outcomes for your

organization

lmichael@wisc.edu

scheduling/wait time

Serial Computing

1 core

1

2

time

3

n

www.lasvegas2005.org

www.lasvegas2005.org