MervCodes

Tech Reviews From A Programmer

How to Optimize Docker Image Size: From 1GB to Under 100MB

1 min read

How to Optimize Docker Image Size: From 1GB to Under 100MB

If you have ever pulled a Docker image and watched the download bar crawl past 1GB, you know the pain. Bloated images slow down CI/CD pipelines, eat through storage budgets, increase attack surface, and make deployments sluggish. The good news is that with a handful of proven techniques, you can often shrink a Docker image from over 1GB to well under 100MB without sacrificing functionality.

This guide walks through the practical strategies that make it possible, from choosing the right base image to advanced multi-stage build patterns.

Why Docker Image Size Matters

Before diving into the how, it is worth understanding the why. Smaller images deliver concrete benefits across the entire software delivery lifecycle:

  • Faster builds and deployments. A 100MB image pushes to a registry and pulls to a production node in a fraction of the time a 1GB image takes. In autoscaling scenarios, this directly affects how quickly new instances come online.
  • Lower storage and bandwidth costs. Registries, CI runners, and container hosts all store and transfer images. Multiply a 900MB saving across hundreds of builds per week and the numbers add up fast.
  • Reduced attack surface. Every package, library, and shell utility inside a container is a potential vulnerability. Fewer components mean fewer CVEs to patch and fewer vectors for attackers to exploit.
  • Simpler debugging. When a container only contains what it needs, there is less noise to sift through during incident response.

Start With the Right Base Image

The single highest-impact change you can make is choosing a smaller base image. Here is a rough comparison of common options for context:

Base Image Approximate Size
ubuntu:24.04 ~78MB
node:22 ~1.1GB
node:22-slim ~220MB
node:22-alpine ~140MB
python:3.13 ~1GB
python:3.13-slim ~150MB
python:3.13-alpine ~55MB
golang:1.23 ~820MB
alpine:3.20 ~7MB
distroless/static ~2MB
scratch 0MB

The pattern is clear. The default "full" images ship an entire OS userland. The -slim variants strip out documentation, dev headers, and rarely used tools. The -alpine variants swap the GNU/Linux base for the musl-based Alpine Linux. And scratch is literally an empty filesystem.

Practical tip: Start with -slim or -alpine variants of your language runtime. If you hit compatibility issues with Alpine (musl vs glibc), fall back to -slim rather than the full image.

Use Multi-Stage Builds

Multi-stage builds are the most powerful technique for producing minimal production images. The idea is simple: use one stage to build your application and a second stage to run it, copying over only the compiled artifact.

# Stage 1: Build
FROM golang:1.23 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server .

# Stage 2: Run
FROM alpine:3.20
RUN apk add --no-cache ca-certificates
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

The build stage pulls in the entire Go toolchain (820MB+), but the final image only contains the ~7MB Alpine base plus your compiled binary and root certificates. The result is typically 15-25MB.

For statically linked binaries, you can go even further and use scratch:

FROM scratch
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

This produces an image that is literally just your binary, often under 10MB.

Multi-stage builds work for interpreted languages too. For a Node.js application, you can use a full Node image to install dependencies and build assets, then copy just the production node_modules and built files into a slim runtime image.

FROM node:22 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
EXPOSE 3000
CMD ["node", "dist/index.js"]

Minimize Layers and Clean Up in the Same Layer

Each RUN instruction creates a new layer in the image. If you install packages in one layer and clean up in the next, the installed files still exist in the first layer and contribute to the total image size. Always combine installation and cleanup in a single RUN statement.

# Bad: cleanup happens in a separate layer, saving nothing
RUN apt-get update && apt-get install -y build-essential
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Good: single layer, cache is removed before the layer is committed
RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Notice the --no-install-recommends flag. On Debian-based images, this prevents APT from pulling in suggested packages you do not need, often saving 100MB or more.

Leverage .dockerignore

Before Docker sends the build context to the daemon, it packages everything in the build directory. Without a .dockerignore file, that includes node_modules, .git, test fixtures, local IDE configuration, and any other large files that have no place in your image.

Create a .dockerignore file at the root of your project:

.git
node_modules
dist
*.md
.env*
.vscode
__pycache__
*.pyc
tests
coverage

This speeds up the build and prevents accidentally copying secrets or unnecessary bulk into the image.

Only Install What You Need

It is tempting to install "just in case" tools like curl, vim, wget, or git inside production containers. Resist the urge. Each additional package increases size and attack surface.

If you need a tool only during the build phase, install it in the builder stage of a multi-stage build. It will not appear in the final image.

For Alpine-based images, prefer apk add --no-cache to avoid caching the package index:

RUN apk add --no-cache curl

Optimize Language-Specific Artifacts

Node.js

  • Use npm ci --omit=dev or yarn install --production to skip dev dependencies.
  • If you use a bundler like esbuild or webpack, bundle your application into a single file and skip copying node_modules entirely.
  • Prune unnecessary files from node_modules with tools like node-prune.

Python

  • Use pip install --no-cache-dir to avoid caching wheel files.
  • Pin dependencies in a requirements.txt and only install what is needed.
  • Consider using pip install --target to install into a specific directory, then copy only that directory in a multi-stage build.

Go

  • Build with CGO_ENABLED=0 for fully static binaries that run on scratch or distroless.
  • Use go build -ldflags="-s -w" to strip debug information and symbol tables, reducing binary size by 20-30%.
  • Consider using UPX to compress the binary further, though this adds startup decompression time.

Java

  • Use jlink to create a custom JRE containing only the modules your application uses, cutting the JRE from ~300MB to 30-50MB.
  • Prefer Eclipse Temurin Alpine images as your runtime base.
  • Strip debug information and unnecessary metadata from JAR files.

Use Docker BuildKit and Cache Mounts

Docker BuildKit, enabled by default in modern Docker versions, offers cache mounts that let you cache package manager data across builds without including it in the final layer:

RUN --mount=type=cache,target=/var/cache/apt \
    --mount=type=cache,target=/var/lib/apt/lists \
    apt-get update && apt-get install -y --no-install-recommends build-essential

This keeps build times fast (cached packages are reused) while keeping the image clean (the cache mount is not part of the layer).

Consider Distroless Images

Google's distroless images contain only your application and its runtime dependencies. They have no shell, no package manager, and no other OS utilities. This makes them both small and secure.

FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

Distroless images are available for Java, Python, Node.js, Go, and other runtimes. The tradeoff is that debugging becomes harder since you cannot shell into the container, but for production workloads, that is often an acceptable price.

Analyze and Audit Your Images

You cannot optimize what you do not measure. Use tools to understand where the size is coming from:

  • docker image history <image> shows the size contribution of each layer.
  • dive <image> is an excellent open-source tool that lets you interactively explore each layer and identify wasted space.
  • docker scout analyzes images for both size and security vulnerabilities.

Run these tools after each optimization pass to validate your changes and identify the next target.

A Real-World Example: Putting It All Together

Here is a before-and-after for a typical Node.js API:

Before (1.1GB):

FROM node:22
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "src/index.js"]

After (85MB):

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build && npx node-prune

FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
EXPOSE 3000
CMD ["node", "dist/index.js"]

The key changes: Alpine base, multi-stage build, production-only dependencies, built output only, and pruned node_modules. The result is a 92% reduction in image size.

FAQ

Does Alpine cause compatibility issues?

Alpine uses musl libc instead of glibc. Most applications work fine, but some native modules (especially in Node.js and Python) may fail to compile or behave differently. If you run into issues, use the -slim variant of your language image instead. It is still much smaller than the full image while maintaining glibc compatibility.

Will a smaller image be slower at runtime?

No. Image size affects pull time and storage, not runtime performance. The application binary and its dependencies are identical regardless of how much extra OS tooling is bundled alongside them.

Can I use scratch for interpreted languages like Python or Node.js?

Not directly, because interpreted languages need a runtime. However, you can use distroless images that include only the interpreter and essential libraries, or you can bundle your application into a standalone executable using tools like pkg for Node.js or PyInstaller for Python, and then use scratch.

How do I debug a distroless or scratch-based container?

Use docker exec with a debug image, or use ephemeral debug containers in Kubernetes (kubectl debug). You can also build a separate debug variant of your Dockerfile that uses an Alpine base for troubleshooting.

Do compressed layers make image size less important?

Docker registries store and transfer compressed layers, so the transfer size is smaller than the uncompressed size. However, the uncompressed size still matters because it determines disk usage on every host that runs the container, and it correlates with attack surface. Optimizing the uncompressed size almost always reduces the compressed size proportionally.

How often should I rebuild my base images?

Rebuild or update your base images at least monthly to pick up security patches. Use specific version tags rather than latest so your builds are reproducible, and automate the update process with tools like Dependabot or Renovate.

Is there a minimum viable image size I should target?

There is no universal target, but as a guideline: Go services can easily be under 20MB, Node.js and Python services under 100-150MB, and Java services under 200MB. If your image significantly exceeds these ranges, there is likely room to optimize.

Related Articles