Loading…

What is High Throughput Computing at Clemson

Lauren Michael

Updated July 20, 2016

Transcript

"Parallelizing" Computational Work

HTC-able

Research Examples

High Throughput Computing (HTC)

workload run as many independently-scheduled jobs (1 or more tasks per job)

task: an distinct unit of work within a workload

job: an individually-scheduled set of compute work

(may execute 1 or more tasks)

workload: a related set of tasks defined by the user (per dataset, project, experiment, etc.)

CPU speed and homogeneity become less important

number of concurrently running jobs becomes more important

easier to match to 1 CPU at a time than many (shorter wait)
easier recovery from failure
no special programming
managing quantity becomes important

Overall time will increase significantly as individual tasks become longer AND as the number of tasks grows in orders of magnitude.

Parallelize, right?!?

Excellent for long simulations

- within-task parallelization

- interdependent sub-tasks

Require/benefit from:

shared filesystems

fast networking (e.g. Infiniband)

CPU speed and homogeneity

Scheduler must wait for and reserve multiple processors for the same duration.

HPC approach often used for

HTC-able work

Some softwares limit the number of processors to what is available on a single server. Even multi-server (MPI) software still limits users to an extent of parallelization dependent on the capacity and availability of a single cluster, filesystem, queue, and local network.

Expensive networking is poorly utilized.

Numerous small files and file-writes create problems for shared file systems.

The more scalability and parallelization desired...

greater susceptibility to failures

greater difficulty of recovery

What is High Throughput Computing (HTC)?

How to Do HTC Well

What is not HTC?
How to identify HTC-able research problems
What is needed to execute HTC well?

What does "High Throughput" mean?

At Scale ...

Where to Start?

Access to HTC Systems

Someone Else's (Free)

ask nicely for access
Open Science Grid
campus members
OSG Connect

(individual researchers)

A Local HTC System

dedicated hardware
backfill another cluster

and/or desktops

with HTCondor
combine the above!

Why High Throughput Computing?

presentation as dynamic Prezi at:

http://tinyurl.com/WhyHTC-Clemson

User

System/Scheduler

CPUs (don't have to be fast)

manage many compute jobs

match jobs with resources

break up workload

submit as independent jobs

understand resource needs

manage input and output

number of cores

time

if order DOESN'T matter

break up input (as files)

port software, as needed

jobs short and/or resumable

readily transfer data/software

ability to backfill (kill&recover)

security measures

helps with both!

High Performance Computing (HPC)

if order DOES matter

number of cores

time

Pay-for Services

Cycle Computing
Globus Genomics
others

parameter sweeps

statistical model optimization

(MCMC, etc.)

text analysis

multi-start simulations

image analysis

If you're a researcher:

more research, faster

better research

If you're a provider:

better utilization of existing hardware

better research outcomes for your

organization

lmichael@wisc.edu

scheduling/wait time

Serial Computing

1 core

time