# What is a magic number?

> A magic number is a short fixed byte sequence at the start of a file that identifies its true format. Extensions are labels; magic numbers are the contract. This page documents 14 common signatures and the four verdicts our File Type Checker returns.

Source: <https://bousemutton.com/what-is-a-magic-number>

### Key facts

- **What is it?** A magic number is a fixed byte sequence (usually 2 to 8 bytes) at the start of a file that identifies the format.
- **Where does it live?** Almost always at byte offset 0. ISO 9660 disc images are the well-known exception: their `CD001` signature sits at offset 32769 (sector 16).
- **Extension or bytes?** The bytes win. Renaming `report.exe` to `report.pdf` changes the label, not the content. The magic bytes still read `4D 5A`.
- **What you get back** Our File Type Checker reports one of four verdicts: MATCH, MISMATCH, AMBIGUOUS, or UNKNOWN. Each is a deterministic answer about format identity, not safety.
- **Privacy** The free single-file flow runs entirely in your browser. The bytes never leave your device.
- **Safety scope** A magic-byte check identifies format, not malware. Use it as a first signal, not a virus verdict.

### Magic-byte signature reference

| Format | Magic bytes (hex) | ASCII | Extensions | MIME type | Offset |
|--------|-------------------|-------|------------|-----------|--------|
| PDF | `25 50 44 46` | `%PDF` | .pdf | application/pdf | byte 0 |
| PNG | `89 50 4E 47 0D 0A 1A 0A` | `.PNG....` | .png | image/png | byte 0 |
| JPEG | `FF D8 FF` | `...` | .jpg, .jpeg | image/jpeg | byte 0 |
| GIF | `47 49 46 38 39 61` | `GIF89a` | .gif | image/gif | byte 0 |
| BMP | `42 4D` | `BM` | .bmp | image/bmp | byte 0 |
| ZIP | `50 4B 03 04` | `PK..` | .zip | application/zip | byte 0 |
| RAR (v5) | `52 61 72 21 1A 07 01 00` | `Rar!....` | .rar | application/vnd.rar | byte 0 |
| 7-Zip | `37 7A BC AF 27 1C` | `7z....` | .7z | application/x-7z-compressed | byte 0 |
| DOCX (Office Open XML) | `50 4B 03 04` | `PK..` | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | byte 0 |
| XLSX (Office Open XML) | `50 4B 03 04` | `PK..` | .xlsx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | byte 0 |
| APK (Android package) | `50 4B 03 04` | `PK..` | .apk | application/vnd.android.package-archive | byte 0 |
| MP4 (ISO BMFF) | `00 00 00 20 66 74 79 70` | `....ftyp` | .mp4, .m4v | video/mp4 | byte 0 |
| Windows PE / EXE | `4D 5A` | `MZ` | .exe, .dll, .sys | application/vnd.microsoft.portable-executable | byte 0 |
| ISO 9660 | `43 44 30 30 31` | `CD001` | .iso | application/x-iso9660-image | byte 32769 (sector 16) |

- **PDF** The four bytes spell `%PDF`. PDF version follows immediately, e.g. `%PDF-1.7`.
- **PNG** Eight bytes including the PNG end-of-line markers (`0D 0A 1A 0A`) so transmission errors are detectable.
- **JPEG** Start of Image marker. The fourth byte distinguishes JFIF (`E0`) from EXIF (`E1`).
- **GIF** GIF89a is the modern variant. GIF87a (`47 49 46 38 37 61`) is the legacy spelling and is also valid.
- **BMP** Two bytes. Easy to spoof; pair with the file size header at offset 2 to disambiguate.
- **ZIP** Local file header. Empty archives use `50 4B 05 06` (end-of-central-directory) instead.
- **RAR (v5)** RAR 5 signature. Older RAR 1.5 to 4.x files use a 7-byte signature ending in `00`.
- **7-Zip** Six bytes spelling `7z` followed by three magic bytes.
- **DOCX (Office Open XML)** DOCX is a ZIP container. The signature alone cannot distinguish a Word doc from a generic ZIP. Look at the inner `[Content_Types].xml`.
- **XLSX (Office Open XML)** Same caveat as DOCX. Inspect the OOXML manifest to confirm the spreadsheet variant.
- **APK (Android package)** APK is a ZIP container with an Android manifest inside. ZIP signature alone is not sufficient proof.
- **MP4 (ISO BMFF)** The `ftyp` box at offset 4 carries the brand identifier. Common brands: `isom`, `mp42`, `iso5`.
- **Windows PE / EXE** Two bytes (Mark Zbikowski). The PE header offset is read from byte 0x3C; the actual `PE\0\0` magic sits there.
- **ISO 9660** ISO 9660 places the volume descriptor at sector 16, so the signature lives at byte 32769 (16 * 2048 + 1).

### Glossary

- **Magic number**: A fixed byte sequence at the start of a file used to identify its format. Synonyms: magic bytes, file signature.
- **File signature**: Another name for a magic number. Often used in forensics tooling and in the Wikipedia article on the topic.
- **File extension**: The trailing piece of a filename after the last dot, e.g. `.pdf`. A label, not a contract; trivial to change.
- **MIME type**: A two-part identifier (e.g. `application/pdf`) that describes a file format. Defined by RFC 6838 and registered with IANA.
- **Container format**: A file format that wraps other formats inside a single envelope. ZIP, MP4, and Matroska are containers; their magic number identifies the envelope, not the contents.
- **Polyglot file**: A single file that is valid as more than one format simultaneously, e.g. a PDF that is also a valid ZIP. Triggers an AMBIGUOUS verdict.
- **Double extension**: A filename with two extensions, e.g. `invoice.pdf.exe`, used to trick humans into thinking the file is the safer of the two formats.
- **Byte offset**: The position in the file where a value lives. Magic numbers are usually at offset 0; ISO 9660 is the rare exception with offset 32769.
- **PE / COFF**: Portable Executable / Common Object File Format. The Windows executable container format identified by the `MZ` magic bytes.
- **ftyp box**: The first box in an ISO Base Media File (MP4, MOV, HEIC). Its brand field is the second-tier signature that distinguishes MP4 variants.

### Frequently asked questions

#### Can renaming a file change its type?

No. The bytes inside the file do not change when you rename it. A `.pdf` filename does not turn an executable into a PDF. The magic-byte check reports the bytes as they actually are.

#### Why does a DOCX file start with `PK`?

DOCX is a ZIP archive containing XML and media files. ZIP archives start with `50 4B 03 04`, which is `PK` in ASCII (the initials of Phil Katz, who designed PKZIP). The DOCX-specific marker lives inside the archive, not in the first bytes.

#### How many bytes do you actually need to read?

For most formats, 4 to 8 bytes are enough. Our checker reads up to 4096 bytes from the head of the file because some formats (like ISO 9660 with its sector-16 offset) hide the signature deeper in.

#### Do all files have a magic number?

No. Plain text files, source code, and CSV files have no fixed signature. They are detected by content classifiers (heuristics or, in our case, a small AI model) rather than by magic-byte lookup.

#### What does an AMBIGUOUS verdict mean?

Multiple known formats match the same prefix. Every Office document (DOCX, XLSX, PPTX) is a ZIP archive, so the magic bytes alone cannot distinguish them. The checker reports AMBIGUOUS and suggests inspecting the contents.

#### Is a magic-byte check the same as a malware scan?

No. A magic-byte check identifies format, not safety. A perfectly-formed PDF can still contain malicious JavaScript. For malware verdicts you need an antivirus or EDR product. See the honest scope page for the boundary.

> Format identity is not safety. Use the magic-byte check as a first signal, then run a real malware scanner if the file came from somewhere you do not fully trust.
