coldwa.st
All guidesProgrammingWebDataToolsDatabasesHaskellConceptsCabal & buildsToolchainCompilerPerformanceEditor & HLS

Programming · concepts · performance

Concurrency vs parallelism

By ColdwastUpdated Jul 1, 20269 min read#concurrency#parallelism#performance
A macro photograph of a printed circuit board
A macro photograph of a printed circuit board. Modern chips carry multiple cores, and using them at the same time is what turns concurrency into real parallelism.

Concurrency and parallelism are two of the most confused words in programming, and the confusion is understandable: both are about "doing more than one thing", and in casual speech people use them interchangeably. But they answer different questions. Concurrency is about how you structure a program to handle many tasks that overlap in time; parallelism is about actually running multiple tasks at the same instant on multiple processors. One is a way of organising work, the other is a way of executing it. This guide draws the line clearly, shows how they relate, and works through real examples in Python and Haskell.

The short definitions

Concurrency is the ability of a program to deal with many things at once by interleaving them: it starts a task, pauses it while it waits, switches to another, comes back later. The tasks make progress over overlapping periods of time, but not necessarily at the same instant. Concurrency is fundamentally about structure and composition - breaking a program into independently progressing pieces.

Parallelism is doing many things at literally the same time, which requires hardware that can execute more than one instruction stream simultaneously: multiple CPU cores, multiple processors, or a GPU. Parallelism is about execution - putting several pieces of work onto several workers so they finish sooner.

Rob Pike's framing

The clearest widely-cited framing comes from Rob Pike, one of Go's designers, in his talk "Concurrency Is Not Parallelism". His central point is that concurrency is about dealing with lots of things at once, while parallelism is about doing lots of things at once - concurrency is a way to structure software, and parallelism is a property of execution that a good concurrent structure can then exploit. In other words, concurrency is a design tool; parallelism is a runtime outcome. A well-structured concurrent program may run in parallel if the hardware allows, or it may run on a single core by interleaving - and it is still correct either way.

A kitchen analogy

Imagine one cook preparing three dishes. A concurrent cook does not stand and stare at the oven while a cake bakes: they put the cake in, start chopping vegetables for the next dish, stir a sauce, then check the cake. Only one pair of hands is ever moving, but three dishes are all "in progress". That is concurrency on a single worker - interleaving tasks so waiting time is never wasted.

Parallelism is hiring three cooks who each make one dish at the same time. Now three things really are happening at the same instant, because there are three workers. Notice you can have concurrency with one cook (structure, no simultaneity) and parallelism with many cooks (simultaneity). The two ideas are independent, which is exactly why they need separate words.

How they relate

Because they are independent, all four combinations exist:

Concurrency without parallelism. A single CPU core rapidly switching between tasks (time-slicing) runs many things "at once" from the user's point of view, but only one instruction stream executes at any given moment. This is how a single-core machine ran dozens of programs, and how an async event loop juggles thousands of network connections on one thread.

Parallelism needs multiple cores. To genuinely execute two computations at the same instant, you need at least two hardware execution units. On a laptop that means multiple CPU cores; at scale it means many-core servers. Because parallel speedup scales with the number of cores you can throw at a workload, renting a bigger box is a practical lever - multi-core cloud servers such as high-CPU droplets give you the cores a parallel workload actually needs.

Parallelism without (explicit) concurrency. A numeric library that splits a big matrix multiply across cores gives you parallel execution without you writing any concurrent control flow yourself. Both together is the common modern case: a concurrent design (many tasks) mapped onto parallel hardware (many cores).

Colourful syntax-highlighted source code on a dark screen
Source code on a screen. Whether that code runs concurrently, in parallel, or both depends on the constructs you choose and the hardware it runs on.

Threads vs async

These are two common ways to express concurrency, and the difference matters.

Threads are independent lines of execution managed by the operating system, sharing the same memory. On a multi-core machine the OS can schedule different threads onto different cores, so threads can give you real parallelism. The cost is that shared mutable memory makes correctness hard (see race conditions below), and each thread carries some overhead.

Async (an event loop with async/await) runs many tasks on a single thread by cooperative switching: a task voluntarily yields at an await point whenever it would otherwise wait - for a network reply, a disk read, a timer. While one task waits, the loop runs another. Async gives you concurrency with very low overhead, but on its own it does not give parallelism, because everything runs on one thread. It shines when tasks spend most of their time waiting rather than computing.

Examples in Python

Python's standard library offers both styles. Async with asyncio is ideal for many I/O-bound tasks on one thread:

import asyncio

async def fetch(name):
    await asyncio.sleep(1)      # stands in for a slow network call
    return name

async def main():
    # three tasks overlap; total wait is about 1s, not 3s
    results = await asyncio.gather(fetch("a"), fetch("b"), fetch("c"))
    print(results)

asyncio.run(main())

The three "fetches" overlap in time even though a single thread runs them, because each yields control while it waits. That is concurrency without parallelism.

Threads look similar but involve an important nuance. In the standard interpreter, CPython has a Global Interpreter Lock (GIL): a mutex that lets only one thread execute Python bytecode at a time. So Python threads give real concurrency and real parallelism for I/O (the GIL is released while a thread waits on the OS), but they do not give parallel speedup for pure CPU-bound Python code, because the GIL serialises the bytecode. For CPU-bound parallelism the classic answer is multiprocessing, which runs separate processes each with its own interpreter and its own GIL, spread across cores:

from concurrent.futures import ProcessPoolExecutor

def heavy(n):
    return sum(i * i for i in range(n))   # CPU-bound work

with ProcessPoolExecutor() as pool:
    print(list(pool.map(heavy, [10_000_000] * 4)))   # runs across cores

Note the GIL is a CPython implementation detail, not part of the language spec; recent CPython releases have added an experimental optional build that can run without it, but the mainstream default still has the GIL, so treat CPU-bound thread parallelism in Python as limited unless you have verified your build.

Examples in Haskell

Haskell is well known for strong support of both, and it keeps the two ideas cleanly separate. For concurrency, forkIO spawns a lightweight green thread; the runtime multiplexes huge numbers of them onto a small pool of OS threads and can spread them across cores:

import Control.Concurrent (forkIO)

main :: IO ()
main = do
  _ <- forkIO (putStrLn "from another thread")
  putStrLn "from main"

For deterministic parallelism, Haskell offers a separate mechanism. Because pure functions have no side effects, evaluating two of them in parallel cannot change the result - so you can hint the runtime to do so without any locks. The par combinator creates a spark: a suggestion that a value may be evaluated in parallel, which the runtime can pick up on a spare core.

import Control.Parallel (par, pseq)

-- hint that 'a' may be evaluated in parallel with 'b'
parAdd :: Int -> Int -> Int
parAdd a b = a `par` (b `pseq` (a + b))

The separation is the point: forkIO is a concurrency tool (structuring interacting threads, including for I/O), while par/sparks are a parallelism tool for speeding up pure computation. You compile with the threaded runtime and pass a flag such as +RTS -N to let the program use multiple cores.

When each one matters

A useful rule of thumb turns on whether your bottleneck is waiting or computing.

I/O-bound work leans on concurrency. If tasks spend most of their time waiting on the network, disk, or a database, you do not need many cores - you need a structure that stops one waiting task from blocking the others. Async or a modest thread pool lets one core keep thousands of connections in flight. Adding cores barely helps, because the cores would just sit idle waiting too.

CPU-bound work leans on parallelism. If tasks are busy computing - image processing, numeric simulation, compiling, cryptography - the way to finish sooner is to split the work across cores and run it at the same time. Here concurrency alone buys nothing; you need genuine parallel execution, and your speedup is capped by how many cores you have and how much of the work is parallelisable.

Common pitfalls

Both approaches share memory or state, and that is where bugs breed.

Race conditions. When two threads read and write the same data without coordination, the result depends on unpredictable timing. The classic example is two threads each doing count = count + 1: the read, add, and write can interleave so one update is lost. Fixes include locks/mutexes, atomic operations, or avoiding shared mutable state entirely (message passing, immutable data).

Deadlocks. When threads wait on each other in a cycle - thread A holds lock 1 and wants lock 2, thread B holds lock 2 and wants lock 1 - nobody can proceed and the program freezes. Acquiring locks in a consistent global order is a standard way to prevent it.

Other hazards include starvation (a task never gets scheduled), and the false assumption that more threads always means more speed - past the core count, extra threads add scheduling overhead without extra parallelism. Immutable data and pure functions, as in functional style, sidestep whole categories of these bugs, which is a large part of why languages like Haskell make concurrency and parallelism feel safer.

Concurrency and parallelism are the runtime side of ideas that also shape how we design programs: expressing work as independent, composable steps is close to how functional programming avoids shared mutable state, and both build on the same foundations as any algorithm and its use of recursion to break work into smaller pieces. Browse more clear explainers in our guides index.

Frequently asked questions

What is the difference between concurrency and parallelism?

Concurrency is structuring a program to deal with many tasks that overlap in time by interleaving them; parallelism is executing multiple tasks at literally the same instant on multiple cores. Concurrency is about structure and composition, parallelism is about simultaneous execution. You can have concurrency without parallelism (one core time-slicing many tasks), and parallelism needs more than one hardware execution unit.

Can you have concurrency without parallelism?

Yes. A single CPU core can run many tasks concurrently by rapidly switching between them (time-slicing), or an async event loop can juggle thousands of connections on one thread. The tasks overlap in time and all make progress, but only one instruction stream executes at any given moment, so there is no parallel execution. Parallelism specifically requires multiple cores or processors running at the same instant.

Does Python's GIL prevent parallelism?

The GIL in CPython lets only one thread run Python bytecode at a time, so threads do not give parallel speedup for CPU-bound Python code. Threads still help I/O-bound work, because the GIL is released while a thread waits on the OS. For CPU-bound parallelism, use multiprocessing (separate processes, each with its own interpreter) or a C-extension that releases the GIL. The GIL is a CPython detail, not part of the language spec.

Should I use threads or async?

Use async for many I/O-bound tasks that spend most of their time waiting, because an event loop can handle thousands of them cheaply on one thread. Use threads (or processes) when you need work scheduled by the OS or spread across cores. For CPU-bound parallel work in Python specifically, prefer multiprocessing over threads because of the GIL. Match the tool to whether your bottleneck is waiting or computing.

Is I/O-bound work concurrency and CPU-bound work parallelism?

As a rule of thumb, yes. I/O-bound work benefits from concurrency: you want a structure that keeps other tasks progressing while one waits, and extra cores barely help. CPU-bound work benefits from parallelism: the way to finish sooner is to split the computation across multiple cores and run it simultaneously, so your speedup is bounded by core count and how parallelisable the work is.

Independent, community-maintained guide. coldwa.st is a programming-resources site; this article is new, original explanatory writing about concurrency and parallelism. Language behaviour (the CPython GIL, Haskell's runtime, thread scheduling) evolves and varies by version; verify against your language's current documentation.