Bytes, nibbles, and highlighting: writing your own TUI hex editor in Python

30 May, 2026

Many of us have opened a hex editor to analyze a file or to learn reverse engineering. One day I came across an interesting article on color-coding bytes. Without much hesitation, I decided to write an article — not just a tutorial on highlighting bytes, but a deep dive into the inner workings of a minimal hex editor. We'll explore how bytes become a hex dump, what approaches to highlighting exist, and why this matters for analyzing binary data.

Before we begin, I’d like to show you the result:

Example

What We See When Opening a Hex Editor
- Byte, Nibble, and Hex Decomposition
- The ASCII Column
Highlighting!
Project Architecture
Practice!
Conclusion

What We See When Opening a Hex Editor

Let's start with a brief overview of how a hex editor is structured. As you know, it allows you to open binary files.

The foundation is the editor window itself:

Structure of a Hex Editor

It displays data in a matrix format.

The first part is the address, or offset — the offset in bytes from the beginning of the file. The first line starts at zero; each subsequent line adds the dump width — the number of bytes per line. A byte is two hex digits. A decimal address like 00000016 tells you nothing about boundaries, while a hex address like 00000010 immediately shows that we've crossed a 16-byte boundary — precisely one dump line.

The second part is the bytes themselves. Bytes are typically laid out 16 per line (though other powers of two are possible) in columns. Bytes are grouped in fours for readability.

And the third part is the ASCII panel — the same bytes interpreted as characters (ASCII range from 0x20 to 0x7E, from space to tilde). If a character is non-printable, a dot is usually displayed instead.

In fact, all of this is the same set of bytes in three different representations.

This basic representation is the foundation for any editor. Moving forward, as we build our own editor, we'll layer on color, structure, and context, but the layout always remains the same: address, hex, ASCII.

Byte, Nibble, and Hex Decomposition

Our hex editor will feature byte highlighting, which we'll implement later. But first, some theory.

A hex editor works with three things: bytes, bits, and nibbles. A byte is the smallest addressable unit of memory, equal to 8 bits. A nibble is 4 bits, or half a byte. The term "nibble" comes from the English word meaning "to take a small bite" (a play on words: byte → bite → nibble). A nibble takes a value from 0 to 15 — that is, 16 possible values. And, conveniently, there are 16 hexadecimal digits. As a result, one nibble equals one hex digit.

So the byte 0x7F consists of two halves: the high nibble 7 and the low nibble F.

Moreover, nibbles allow us to interpret a byte's value as a color, which is exactly what we'll explore in this article.

Let's look at an example. Take the byte with the value 0xA5 and see what's inside:

Byte 0xA5 Internally

How do you get a byte's value from nibbles? Easily: the high nibble is multiplied by 16 and added to the low nibble: 0xA * 16 + 0x5 = 10 * 16 + 5 = 165. Or in binary: 1010 0101₂ = 165₁₀ = 0xA5.

The processor doesn't operate on nibbles directly, but for us humans, it's convenient to think in terms of nibbles when reading a hex dump. When you see a cell like A5 in the hex panel, you can subconsciously split it into two parts: "A is the high half, 5 is the low half."

Now, an important implication for the hex editor. When we display a byte in the hex panel, we do exactly two things: extract the high nibble (byte >> 4) & 0x0F, extract the low nibble byte & 0x0F, and convert each into a character from '0' to 'F'. This involves literally a few bitwise operations:

Example

Why do we group four bytes in the hex panel? Because four bytes = 32 bits = a machine word. A group of four bytes is eight nibbles, or eight hex digits. A whole 32-bit word looks like a continuous sequence of eight characters, like 0000000D.

Understanding that a byte consists of two nibbles, and a nibble is one hex digit, is crucial for everything described later in this article. When we start coloring the hex dump, we can choose: color each digit separately (nibble-based highlighting) or the whole byte (gradients based on value). These are two different perspectives on the same data, each providing its own information.

The ASCII Column

The ASCII panel exists so we can see that the bytes 48 65 6C 6C 6F represent "Hello".

ASCII (American Standard Code for Information Interchange) is a 7-bit encoding adopted in 1963 and revised in 1968. It describes 128 characters: 33 control characters and 95 printable ones. Despite its age, it remains the foundation upon which UTF-8, JSON strings, HTTP headers, file names, and almost everything related to text in computer systems are built.

The readable characters are the 95 symbols from space (0x20) to tilde (0x7E): Latin letters in both cases, digits, punctuation marks, brackets, and mathematical symbols.

To display a byte in ASCII format, a single line of code suffices:

ch = chr(b) if 32 <= b <= 126 else "·"

Let's analyze the first line of a PNG file:

00000000    89 50 4E 47  0D 0A 1A 0A     ·PNG····

Byte 0x89 → outside ASCII → · Byte 0x50 → printable, P → P Byte 0x4E → printable, N → N Byte 0x47 → printable, G → G Byte 0x0D → CR, control → · Byte 0x0A → LF, control → ·

Furthermore, the ASCII column can be highlighted, as in the example from our hex editor (Python, curses):

ASCII_PRINTABLE_START: int = 32
ASCII_PRINTABLE_END: int = 126
PAIR_ASCII_BASE: int = 270
PAIR_HEX_BASE: int = 10

def ascii_color(bval: int) -> int:
    if ASCII_PRINTABLE_START <= bval <= ASCII_PRINTABLE_END:
        return curses.color_pair(PAIR_ASCII_BASE + (bval - ASCII_PRINTABLE_START))
    return curses.color_pair(PAIR_HEX_BASE + bval)

Printable characters (0x20–0x7E) get their own grayscale: 95 shades from dark gray for low codes to nearly white for high ones. Non-printable and out-of-ASCII-range bytes inherit the gradient from the hex panel, which we'll explore later.

It's also worth noting that while UTF-8 is backward compatible with ASCII (bytes 0x00–0x7F mean exactly the same thing), multi-byte characters (Cyrillic, hieroglyphs) are encoded as sequences of bytes with values 0x80 and above. So Russian text will appear as a scattering of dots.

Highlighting!

After reading the article «Your hex editor should color-code bytes», I came up with the idea for this article on developing a hex editor with highlighting as a killer feature.

Here, I want to examine three levels of highlighting: basic nibble-based highlighting, gradient highlighting based on byte value, and, as a bonus feature, structural highlighting based on file format.

The project source code is available in my repository.

Nibble-Based Highlighting

The idea is to assign each hex digit a color depending on the value of its high nibble. Since a nibble has 16 possible values, this yields 16 fixed colors.

This is implemented in the codebase in the file simple_colored.py:

def color_for_high_nibble(byte_value: int) -> str:
    high_nibble = (byte_value >> 4) & 0x0F

    if byte_value == 0x00:
        return "\033[90m"
    if byte_value == 0xFF:
        return "\033[97m"

    colors = [              # Colors (you can choose others):
        "\033[91m",         # Bright Red
        "\033[38;5;208m",   # Orange (256 colors)
        "\033[93m",         # Bright Yellow
        "\033[92m",         # Bright Green
        "\033[38;5;82m",    # Bright Yellowish-Green (256 colors)
        "\033[96m",         # Bright Cyan
        "\033[94m",         # Bright Blue
        "\033[95m",         # Bright Magenta
        "\033[38;5;205m",   # Pinkish-Purple (256 colors)
        "\033[38;5;50m",    # Light Green-Cyan (256 colors)
        "\033[38;5;39m",    # Bright Blue (256 colors)
        "\033[95m",         # Repeat — Bright Magenta
        "\033[35m",         # Purple (non-bright)
        "\033[91m",         # Repeat — Bright Red
        "\033[90m",         # Dark Gray / Bright Black
        "\033[91m",         # Again Bright Red
    ]

    return colors[high_nibble]

The color is tied to the high nibble of the byte: bytes 0x00–0x0F get one color for the digit pair, 0x10–0x1F get another, and so on. Zero and 0xFF are handled separately.

This highlighting algorithm reveals recurring patterns. Two identical bytes yield the same color pair of two hex digits.

Nibble-Based Highlighting Example

Of course, there's a limitation: classification is based on the nibble, not on meaning. Bytes 0x00 (NUL) and 0x20 (SPACE) will receive different colors, even though both relate to control/separator characters. The color answers the question "What is the value of the high nibble?" rather than "What does this byte represent?".

Gradient Highlighting

Unlike discrete coloring by nibble, here the color continuously depends on the byte's value. In the current project, this is implemented in the gradient_colored.py example:

COLOR_RESET = "\033[0m"


def hsv_to_rgb(h: float, s: float, v: float) -> tuple[int, int, int]:
    h = h % 360.0
    c = v * s
    x = c * (1.0 - abs((h / 60.0) % 2.0 - 1.0))
    m = v - c

    if 0 <= h < 60:
        r, g, b = c, x, 0.0
    elif 60 <= h < 120:
        r, g, b = x, c, 0.0
    elif 120 <= h < 180:
        r, g, b = 0.0, c, x
    elif 180 <= h < 240:
        r, g, b = 0.0, x, c
    elif 240 <= h < 300:
        r, g, b = x, 0.0, c
    else:
        r, g, b = c, 0.0, x

    return int((r + m) * 255), int((g + m) * 255), int((b + m) * 255)


def gradient_color(byte_value: int) -> str:
    if byte_value == 0x00:
        return "\033[38;2;64;64;64m"

    if byte_value == 0xFF:
        return "\033[38;2;255;255;255m"

    hue = (byte_value / 255.0) * 360.0

    r, g, b = hsv_to_rgb(hue, 0.8, 0.9)

    return f"\033[38;2;{r};{g};{b}m"

Mapping the byte value (0–255) onto the HSV color wheel gives a continuous spectrum: hue runs from 0° to 360° proportionally to the value. Saturation 0.8 and brightness 0.9 were chosen experimentally by me to ensure comfortable colors.

The result is a heatmap, where sharp boundaries between different colors hint at where the file's data characteristics change.

Gradient Highlighting Example

In the main editor, gradient highlighting is implemented in colors.py. The function _byte_to_rgb does exactly the same thing as gradient_color in the example, but returns RGB components for curses instead of an ANSI escape sequence, since our main editor is built with curses. Curses is a library for TUI that uses built-in functionality of color pairs for highlighting.

def _byte_to_rgb(bval: int) -> tuple[int, int, int]:
    if bval == BYTE_MIN:
        return DEFAULT_BYTE_RGB
    if bval == BYTE_MAX:
        return MAX_BYTE_RGB

    hue = (bval / 255.0) * 360.0
    return _hsv_to_rgb(hue, 0.8, 0.9)

During initialization, a separate color pair is created for each of the 256 possible byte values, but if the terminal doesn't support changing the palette, all bytes receive a white color.

Gradient highlighting can already answer the question "What is the byte's value?". Unlike the nibble-based approach, here the color conveys information about magnitude: two adjacent values like 0x7F and 0x80 will be nearly the same shade, while 0x00 and 0xFF will be at opposite ends of the spectrum.

Structural Highlighting by Format

Gradients undeniably make the user experience friendlier. But I couldn't shake the thought that it's possible to highlight file format signatures separately!

Gradients are great for a general overview, but they don't answer the question "What does this byte mean?". A PNG signature \x89PNG and an image width field — both just get a color based on their value. To give the editor context, a third layer was added: structural highlighting based on the file format.

Signature Example

When opening a file, our editor reads the first 1024 bytes and runs them through a format detection function. This function, in turn, iterates over all registered formats and checks them against a list of signatures. If everything matches, the format is identified.

The project includes four built-in formats. PNG looks for \x89PNG\r\n\x1a\n at offset 0. ELF looks for \x7fELF. JPEG looks for \xff\xd8\xff. ZIP looks for PK\x03\x04. Each is described as a FormatDef — a dataclass with a name, MIME type, list of signatures, and list of fields.

@dataclass
class FormatDef:
    name: str
    mime: str
    signatures: list[tuple[int, bytes]]
    fields: list[FieldDef]
    _index: dict[int, FieldDef] = field(default_factory=dict, init=False, repr=False)

And here is the list of built-in formats. I confess, I took a shortcut here to ease the workload, as this is routine and tedious work:

class FieldType(Enum): # Field Type
    MAGIC = auto()
    SIZE = auto()
    OFFSET = auto()
    FLAGS = auto()
    CHECKSUM = auto()
    VERSION = auto()
    DATA = auto()
    RESERVED = auto()
    HEADER = auto()
    UNKNOWN = auto()


@dataclass
class FieldDef: # Field Definition
    offset: int
    length: int
    name: str
    ftype: FieldType


BUILTIN_FORMATS: list[FormatDef] = [
    FormatDef(
        name="PNG",
        mime="image/png",
        signatures=[(0, b"\x89PNG\r\n\x1a\n")],
        fields=[
            FieldDef(0, 8, "Signature", FieldType.MAGIC),
            FieldDef(8, 4, "IHDR Length", FieldType.SIZE),
            FieldDef(12, 4, "IHDR Chunk Type", FieldType.HEADER),
            FieldDef(16, 4, "Width", FieldType.SIZE),
            FieldDef(20, 4, "Height", FieldType.SIZE),
            FieldDef(24, 1, "Bit Depth", FieldType.FLAGS),
            FieldDef(25, 1, "Color Type", FieldType.FLAGS),
            FieldDef(26, 1, "Compression", FieldType.FLAGS),
            FieldDef(27, 1, "Filter", FieldType.FLAGS),
            FieldDef(28, 1, "Interlace", FieldType.FLAGS),
            FieldDef(29, 4, "CRC", FieldType.CHECKSUM),
        ],
    ),
    FormatDef(
        name="ELF",
        mime="application/x-elf",
        signatures=[(0, b"\x7fELF")],
        fields=[
            FieldDef(0, 4, "Magic", FieldType.MAGIC),
            FieldDef(4, 1, "Class", FieldType.VERSION),
            FieldDef(5, 1, "Endianness", FieldType.FLAGS),
            FieldDef(6, 1, "Version", FieldType.VERSION),
            FieldDef(7, 1, "OS/ABI", FieldType.FLAGS),
            FieldDef(8, 1, "ABI Version", FieldType.VERSION),
            FieldDef(9, 7, "Padding", FieldType.RESERVED),
            FieldDef(16, 2, "Type", FieldType.FLAGS),
            FieldDef(18, 2, "Machine", FieldType.FLAGS),
            FieldDef(20, 4, "ELF Version", FieldType.VERSION),
            FieldDef(24, 4, "Entry Point (32-bit)", FieldType.OFFSET),
            FieldDef(28, 4, "PH Offset (32-bit)", FieldType.OFFSET),
            FieldDef(32, 4, "SH Offset (32-bit)", FieldType.OFFSET),
            FieldDef(36, 4, "Flags", FieldType.FLAGS),
            FieldDef(40, 2, "Header Size", FieldType.SIZE),
            FieldDef(42, 2, "PH Entry Size", FieldType.SIZE),
            FieldDef(44, 2, "PH Count", FieldType.SIZE),
            FieldDef(46, 2, "SH Entry Size", FieldType.SIZE),
            FieldDef(48, 2, "SH Count", FieldType.SIZE),
            FieldDef(50, 2, "SH String Index", FieldType.OFFSET),
        ],
    ),
    FormatDef(
        name="JPEG",
        mime="image/jpeg",
        signatures=[(0, b"\xff\xd8\xff")],
        fields=[
            FieldDef(0, 2, "SOI Marker", FieldType.MAGIC),
            FieldDef(2, 1, "APP0 Marker", FieldType.MAGIC),
            FieldDef(3, 1, "APP0 Marker", FieldType.MAGIC),
            FieldDef(4, 2, "APP0 Length", FieldType.SIZE),
            FieldDef(6, 5, "JFIF Identifier", FieldType.HEADER),
            FieldDef(11, 2, "JFIF Version", FieldType.VERSION),
            FieldDef(13, 1, "Density Units", FieldType.FLAGS),
            FieldDef(14, 2, "X Density", FieldType.SIZE),
            FieldDef(16, 2, "Y Density", FieldType.SIZE),
            FieldDef(18, 1, "Thumbnail Width", FieldType.SIZE),
            FieldDef(19, 1, "Thumbnail Height", FieldType.SIZE),
        ],
    ),
    FormatDef(
        name="ZIP",
        mime="application/zip",
        signatures=[(0, b"PK\x03\x04")],
        fields=[
            FieldDef(0, 4, "Local File Signature", FieldType.MAGIC),
            FieldDef(4, 2, "Version Needed", FieldType.VERSION),
            FieldDef(6, 2, "Flags", FieldType.FLAGS),
            FieldDef(8, 2, "Compression Method", FieldType.FLAGS),
            FieldDef(10, 2, "Last Mod Time", FieldType.DATA),
            FieldDef(12, 2, "Last Mod Date", FieldType.DATA),
            FieldDef(14, 4, "CRC-32", FieldType.CHECKSUM),
            FieldDef(18, 4, "Compressed Size", FieldType.SIZE),
            FieldDef(22, 4, "Uncompressed Size", FieldType.SIZE),
            FieldDef(26, 2, "Filename Length", FieldType.SIZE),
            FieldDef(28, 2, "Extra Field Length", FieldType.SIZE),
        ],
    ),
]

Each field in a format has a type from the FieldType enumeration and a human-readable name. The type determines the color in both the hex panel and the ASCII column:

Field Type	Color	Purpose
MAGIC	Yellow	signatures and magic numbers
SIZE	Green	sizes of blocks, fields, files
OFFSET	Cyan	pointers, offsets
CHECKSUM	Red	checksums, CRC
VERSION	Blue	format versions
FLAGS	Magenta	bit flags, enumerations
HEADER	Black on Yellow background	section headers
RESERVED	White on White (inverted)	reserved fields
DATA	White	data area, payload
UNKNOWN	White	type not specified or unknown

Color pairs for field types are initialized in _init_field_pairs() using standard curses colors. These are separate pairs that don't overlap with the gradient slots.

def _init_field_pairs() -> None:
    field_colors = [
        (PAIR_FIELD_MAGIC, curses.COLOR_YELLOW, -1),
        (PAIR_FIELD_SIZE, curses.COLOR_GREEN, -1),
        (PAIR_FIELD_OFFSET, curses.COLOR_CYAN, -1),
        (PAIR_FIELD_FLAGS, curses.COLOR_MAGENTA, -1),
        (PAIR_FIELD_CHECKSUM, curses.COLOR_RED, -1),
        (PAIR_FIELD_VERSION, curses.COLOR_BLUE, -1),
        (PAIR_FIELD_DATA, curses.COLOR_WHITE, -1),
        (PAIR_FIELD_RESERVED, -1, curses.COLOR_WHITE),
        (PAIR_FIELD_HEADER, curses.COLOR_BLACK, curses.COLOR_YELLOW),
        (PAIR_FIELD_UNKNOWN, curses.COLOR_WHITE, -1),
    ]
    for pair_id, fg, bg in field_colors:
        curses.init_pair(pair_id, fg, bg)

As you can see from the code, I tried to adhere to openness and extensibility. Built-in formats are stored in the BUILTIN_FORMATS list and registered at startup. If you want to add your own format, it's simple — either through a new FormatDef directly in the code, or via a JSON file. Since the field format was already defined, this was easy to implement.

Here is an example of a user-defined format:

    {
        "name": "GIF",
        "mime": "image/gif",
        "signatures": [
            {"offset": 0, "hex": "47494638"}
        ],
        "fields": [
            {"offset": 0, "length": 3, "name": "Signature", "type": "MAGIC"},
            {"offset": 3, "length": 3, "name": "Version", "type": "VERSION"},
            {"offset": 6, "length": 2, "name": "Screen Width", "type": "SIZE"},
            {"offset": 8, "length": 2, "name": "Screen Height", "type": "SIZE"},
            {"offset": 10, "length": 1, "name": "Flags", "type": "FLAGS"},
            {"offset": 11, "length": 1, "name": "Background Color Index", "type": "DATA"},
            {"offset": 12, "length": 1, "name": "Pixel Aspect Ratio", "type": "FLAGS"},
            {"offset": 13, "length": 3, "name": "Image Descriptor", "type": "HEADER"},
            {"offset": 16, "length": 2, "name": "Image Left Position", "type": "OFFSET"},
            {"offset": 18, "length": 2, "name": "Image Top Position", "type": "OFFSET"},
            {"offset": 20, "length": 2, "name": "Image Width", "type": "SIZE"},
            {"offset": 22, "length": 2, "name": "Image Height", "type": "SIZE"},
            {"offset": 24, "length": 1, "name": "Image Flags", "type": "FLAGS"},
            {"offset": 25, "length": 1, "name": "LZW Minimum Code Size", "type": "FLAGS"},
            {"offset": 26, "length": 1, "name": "Trailer", "type": "CHECKSUM"}
        ]
    },

New formats are added to the global FORMATS list, which detect_format iterates over when a file is opened. The order of registration matters — the last registered format wins, and built-in formats are loaded before custom ones, so a user can override a built-in format by registering their own with the same name after loading the JSON.

Actually, signatures are an extremely interesting thing to study if you want to get into reverse engineering. And a hex editor with format highlighting will help with that.

All three of these highlighting levels improve the UX in the most direct sense. Even I find it easier to use with highlighting; it helps avoid drowning in a sea of bytes.

Project Architecture

Alright, time to get to the sweetest part — the code itself and creating the hex editor! We'll write it in pure Python 3.14 using the built-in curses library. Windows users will have to suffer a bit: curses is not part of the standard Windows distribution, and you'll need to install windows-curses. But even then, nuances might arise — primarily due to colors. So functionality on Windows is not guaranteed; you can work through WSL.

The project is simple, so I didn't split it into many layers and abstractions and made do with 10 files:

src/cbhe/
├── __init__.py     # Entry point, main loop, command-line arguments
├── constants.py    # Configuration: color pair numbers, key layouts, dump widths
├── keys.py         # Key codes (wrapper over curses.KEY_*)
├── terminal.py     # Wrapper over curses: setup, read_key, screen_size
├── hexfile.py      # Reading, caching, writing, searching the file
├── state.py        # Editor state: mode, cursor, undo/redo, search
├── handlers.py     # Key handling for each mode
├── ui.py           # Rendering: dump lines, header, status, interpretation panel
├── colors.py       # Initialization of 350+ color pairs, byte color selection functions
├── formats.py      # Format descriptions, signature detection, loading from JSON
└── interpret.py    # Byte interpretation as numbers and strings

I tried to follow fundamental principles: DRY, KISS, and most importantly, SRP from SOLID (Single Responsibility Principle). The hexfile module doesn't know about curses, the ui module doesn't read keys. This will allow swapping the rendering library for another without rewriting several files, leaving the data logic and state untouched.

Why did I choose curses? It's all about ultimate simplicity. Textual provides beautiful layout but drags in Rich and a lot of boilerplate for such a small project. Curses natively operates with color pairs, of which we have 256 just for the gradient. And most importantly — curses requires no installation beyond Python's standard library.

Furthermore, I took into account that the terminal might not support changing colors, so if support is absent, the gradient color styling won't be present.

This same principle is applied to the interpretation panel: it's drawn only if the terminal width is sufficient. If the window is too narrow, the panel simply isn't shown, without breaking the layout.

Interface Screenshot

Let's move on to how I planned the editor's UI/UX. I decided to draw a little inspiration from VIM's modes, adapting them for my purposes. There are three modes:

Standard READ (r). Reading without editing, searching, and scrolling.
HEX (h) — hex panel mode. Allows not only scrolling but also moving the cursor across bytes. The e keybinding enters editing mode; you can change bytes by nibbles.
ASCII (a) — cursor in the ASCII panel. The e keybinding also enters editing mode; entering characters directly changes the bytes.

During editing, changed bytes are highlighted with a red background (dirty) until saved. Undo/redo works: u undoes the last change, Ctrl+R redoes it. History holds 1000 entries, with two separate stacks. Saving with Ctrl+S resets the dirty state and history.

Search comes in two flavors: / — ASCII search (string search), ? — hex search (enter ff d8 ff or FFD8FF). Found matches are highlighted with a yellow background. n — next match, N — previous match. The search wraps around: reaching the end of the file goes back to the beginning.

The i key toggles the interpretation panel on the right. For the byte under the cursor, it shows: int8/uint8, int16/32/64 in LE and BE, float32/64, bit representation, UTF-8 from 4 and 8 bytes. The panel automatically hides if the terminal width is insufficient.

The w key cycles the dump width: 8, 16, or 32 bytes per line. g — go to address.

The header line shows: mode, filename, size, dump width, recognized format, view percentage. The status line below the dump shows the offset in hex and dec, byte value in hex/dec/char, field name and type if the byte belongs to a known field. The right part of the status line shows messages: search results, undo/redo, errors.

The entry point is the __init__.py file. It parses command-line arguments, loads formats, initializes colors, and creates objects. The loop on each iteration renders the entire frame, reads a key, and dispatches it. A simple synchronous loop that updates the whole state.

By the way, regarding command-line arguments: I've laid them out in the table below:

Flag	Description
`-f`, `--formats`	JSON file with custom formats (can be used multiple times)
`-w`, `--width`	Initial dump width: 8, 16, or 32 (default 16)
`-m`, `--mode`	Initial mode: read, hex, or ascii (default read)
`--no-auto-detect`	Disable automatic format detection
`--format-dir`	Load all *.json from the specified directory as formats

Primarily, I added support for these because I wanted to support custom user-defined file formats for highlighting. Also, by tradition, I used the standard argparse module, although I initially considered installing click. But since the project has no external dependencies and the logic isn't complex, the standard argparse does the job.

HexFile: Working with the File

The editor shouldn't load the entire file into memory at once.

You could, of course, get by with the following construct:

with open(filename, 'rb') as f:
    data = f.read()

# or

with open('file.bin', 'rb') as f:
    data = bytearray(f.read())

But in the case of a TUI, this is bad, as it can cause rendering issues. Instead of reading the whole file directly, the technique of lazy loading in chunks should be used. HexFile does exactly this: it maintains an LRU (Least Recently Used) cache of rows, implemented via OrderedDict:

class _LRURowCache:
    def __init__(self, capacity: int) -> None:
        self._cap = capacity
        self._store: OrderedDict[int, bytearray] = OrderedDict()

    def get(self, key: int) -> Optional[bytearray]:
        if key not in self._store:
            return None
        self._store.move_to_end(key)
        return self._store[key]

    def put(self, key: int, value: bytearray) -> None:
        if key in self._store:
            self._store.move_to_end(key)
        self._store[key] = value
        if len(self._store) > self._cap:
            self._store.popitem(last=False)

    def update(self, key: int, col: int, value: int) -> None:
        row = self._store.get(key)
        if row is not None and col < len(row):
            row[col] = value

    def clear(self) -> None:
        self._store.clear()

    def __contains__(self, key: int) -> bool:
        return key in self._store

OrderedDict was chosen because it preserves the insertion order of elements. Yes, since Python 3.7 the standard dict also preserves order, but using OrderedDict expresses the intent of ordering explicitly and also provides the move_to_end method, which requires less code than a standard dict would.

When the cache overflows, the oldest item is evicted. The capacity is 8192 lines. With a dump width of 16 bytes, that's 128 KB of data, which fits even in the processor cache.

The update method allows modifying a cached row in place without removing it from the cache. This is needed for the dirty mechanic: when the user changes a byte, we immediately update both the cache and the dictionary of dirty offsets.

If the requested row isn't in the cache, the _load_region function exists:

def _load_region(self, anchor_row: int) -> None:
    row_start = max(0, anchor_row - self.PREFETCH_ROWS // 4)
    byte_start = row_start * self.width
    byte_len = min(self.PREFETCH_ROWS * self.width, self.size - byte_start)

    if byte_len <= 0:
        return

    raw = self._read_raw(byte_start, byte_len)

    for i in range(0, len(raw), self.width):
        r = row_start + i // self.width
        self._cache.put(r, bytearray(raw[i : i + self.width]))

This function reads a block of 512 lines (prefetch) from disk, anchored a quarter of the way above the requested line. This way, when scrolling forward, the data is already preloaded.

For files larger than 64 MB, regular reading via open().read() creates unnecessary copying of data from the kernel buffer to userspace. This is where mmap comes in handy — a syscall that allows mapping a file's or device's contents into the process's address space.

mmap maps the file into the process's virtual memory, and the operating system decides which pages to keep in physical memory. This provides two advantages: memory savings and native searching via self._mmap.find() without manual chunking.

    def _open_mmap(self) -> None:
        if not self._use_mmap or self.size == 0:
            return
        try:
            self._mmap_fh = open(self.path, "rb")  # type: ignore
            self._mmap = mmap.mmap(self._mmap_fh.fileno(), 0, access=mmap.ACCESS_READ)  # type: ignore
        except (OSError, ValueError):
            self._mmap = None
            if self._mmap_fh:
                self._mmap_fh.close()
                self._mmap_fh = None

    def _close_mmap(self) -> None:
        if self._mmap is not None:
            try:
                self._mmap.close()
            except Exception:
                pass
            self._mmap = None
        if self._mmap_fh is not None:
            try:
                self._mmap_fh.close()
            except Exception:
                pass
            self._mmap_fh = None

When saving, mmap is closed, data is written via a regular file descriptor, and then mmap is reopened.

But please note, we don't write changes immediately to the file. Instead, they accumulate in a dictionary where the key is the absolute offset and the value is the new byte. During rendering, get_row overlays dirty bytes on top of the data from the cache:

    def get_row(self, row: int) -> Optional[bytearray]:
        if not (0 <= row < self.total_rows):
            return None

        cached = self._cache.get(row)
        if cached is None:
            self._load_region(row)
            cached = self._cache.get(row)

        if cached is None:
            return None

        data = bytearray(cached)
        start_offset = row * self.width
        for col in range(len(data)):
            off = start_offset + col
            if off in self._dirty:
                data[col] = self._dirty[off]

        return data

During saving, dirty offsets are grouped into consecutive blocks and written with a single fh.write(block) call. This is faster than seek + write for each byte.

def save(self) -> None:
    if not self._dirty:
        return

    groups = _group_consecutive(list(self._dirty.items()))

    self._close_mmap()

    with open(self.path, "r+b") as fh:
        for offset, block in groups:
            fh.seek(offset)
            fh.write(block)

    self._dirty.clear()
    self._cache.clear()
    self._use_mmap = self.size >= _LARGE_FILE_THRESHOLD
    self._open_mmap()
    self.file_format = None
    self._detect_format()

Let's also touch upon format detection. When a file is opened, the first 1024 bytes are read and passed to the detect_format function from formats.py. The result is stored in self.file_format and used during rendering for structural highlighting. But if the format isn't recognized, file_format remains None, and only gradient highlighting is active.

def _detect_format(self) -> None:
    try:
        header = self._read_raw(0, 1024)
        self.file_format = detect_format(bytes(header))
    except (IOError, OSError):
        self.file_format = None

The full source code is available at this link.

The Color System

The colors.py module initializes all the required color pairs for curses. There are six groups in total: base (address, cursor, dirty), gradient (256 byte values), field types (10 pairs), status bar, interpretation panel, search.

As we mentioned earlier, I integrated gradient highlighting: for each of the 256 possible byte values, a separate color pair is created. The byte value is mapped onto a hue from 0° to 360° via HSV, with saturation 0.8 and brightness 0.9. The null byte gets dark gray (64, 64, 64), 0xFF gets white (255, 255, 255).

def _byte_to_rgb(bval: int) -> tuple[int, int, int]:
    if bval == BYTE_MIN:
        return DEFAULT_BYTE_RGB
    if bval == BYTE_MAX:
        return MAX_BYTE_RGB

    hue = (bval / 255.0) * 360.0
    return _hsv_to_rgb(hue, 0.8, 0.9)

The _hsv_to_rgb function is a standard algorithm for converting HSV to RGB, covering six sectors of the color wheel:

def _hsv_to_rgb(h: float, s: float, v: float) -> tuple[int, int, int]:
    h = h % 360.0
    c = v * s
    x = c * (1.0 - abs((h / 60.0) % 2.0 - 1.0))
    m = v - c

    if h < 60:        r, g, b = c, x, 0.0
    elif h < 120:     r, g, b = x, c, 0.0
    elif h < 180:     r, g, b = 0.0, c, x
    elif h < 240:     r, g, b = 0.0, x, c
    elif h < 300:     r, g, b = x, 0.0, c
    else:             r, g, b = c, 0.0, x

    return (int((r + m) * 255), int((g + m) * 255), int((b + m) * 255))

Initialization checks whether the terminal supports changing the palette:

def _init_hex_pairs() -> None:
    rich = curses.can_change_color() and curses.COLORS > 16

    for bval in range(COLOR_SLOTS):
        slot = 16 + bval
        pair_id = PAIR_HEX_BASE + bval

        if rich and _init_color_slot(slot, *_byte_to_rgb(bval)):
            curses.init_pair(pair_id, slot, -1)
        else:
            curses.init_pair(pair_id, curses.COLOR_WHITE, -1)

If it does, RGB values are written into color slots 16–271 via init_color and pairs are created with these slots. If the terminal doesn't support palette changes, all 256 pairs receive the color white. The editor works, just without highlighting.

The hex_color function returns a ready-made pair:

def hex_color(bval: int) -> int:
    return curses.color_pair(PAIR_HEX_BASE + bval)

The only difference between the panels is what is displayed: hex digits or a character/dot. The color scheme is the same. The placeholder symbol · for non-printable bytes gets a color corresponding to the byte's gradient value — for example, a null byte will be a dark gray dot, a 0xFF byte will be a white dot.

Generally, all pairs are initialized like this:

def init_colors() -> None:
    curses.start_color()
    curses.use_default_colors()

    _init_base_pairs()
    _init_field_pairs()
    _init_hex_pairs()
    _init_extra_pairs()
    _init_interpret_pairs()

Not to clutter the article, I won't show all the code; it is available at this link.

Configuration: constants.py

I extracted the constants into a separate module so as not to scatter magic numbers across other files. It gathers everything that doesn't change during the editor's runtime but is used in multiple places.

Color pair numbers for curses start from one and go in blocks. Base pairs occupy the range 1–9: address, separators, header, hints, highlight, cursor, dirty. Gradient pairs go from 10 to 265, one per byte value. Format field pairs are from 366 to 375, status line and search are 376–378, the interpretation panel is 379–381.

The FIELD_TYPE_COLORS dictionary maps the string name of a field type to a color pair number.

FIELD_TYPE_COLORS = {
    "MAGIC": PAIR_FIELD_MAGIC,
    "SIZE": PAIR_FIELD_SIZE,
    "OFFSET": PAIR_FIELD_OFFSET,
    "FLAGS": PAIR_FIELD_FLAGS,
    "CHECKSUM": PAIR_FIELD_CHECKSUM,
    "VERSION": PAIR_FIELD_VERSION,
    "DATA": PAIR_FIELD_DATA,
    "RESERVED": PAIR_FIELD_RESERVED,
    "HEADER": PAIR_FIELD_HEADER,
    "UNKNOWN": PAIR_FIELD_UNKNOWN,
}

The EditorMode enum defines the three states the editor can be in. We'll use this in the next section, which is dedicated specifically to the editor's state.

class EditorMode(Enum):
    READ = auto()
    HEX = auto()
    ASCII = auto()

There are also keybinding hint configurations and color constants.

In principle, I don't see a need to dissect it meticulously; the code is very straightforward. The full file is available at this link.

Keys and Terminal: keys.py and terminal.py

These two modules are a thin abstraction over the specific rendering backend. Currently, the backend is curses, but if I want to rewrite the editor for Textual, for instance, I only need to replace keys.py, terminal.py, and ui.py.

keys.py simply re-exports key codes from curses and adds constants for keys that lack named identifiers:

import curses

KEY_UP = curses.KEY_UP
KEY_DOWN = curses.KEY_DOWN
KEY_LEFT = curses.KEY_LEFT
KEY_RIGHT = curses.KEY_RIGHT
KEY_HOME = curses.KEY_HOME
KEY_END = curses.KEY_END
KEY_PPAGE = curses.KEY_PPAGE
KEY_NPAGE = curses.KEY_NPAGE
KEY_BACKSPACE = curses.KEY_BACKSPACE
KEY_DC = curses.KEY_DC
KEY_RESIZE = curses.KEY_RESIZE

KEY_ESC = 27
KEY_CTRL_R = 18
KEY_CTRL_S = 19
KEY_BACKSPACE_ALT1 = 127
KEY_BACKSPACE_ALT2 = 8

GOTO_KEYS = {ord("g"), ord("G")}
SEARCH_ASCII_KEY = ord("/")
SEARCH_HEX_KEY = ord("?")
SEARCH_NEXT_KEY = ord("n")
SEARCH_PREV_KEY = ord("N")
INTERPRET_KEYS = {ord("i"), ord("I")}
QUIT_KEYS = {ord("q"), ord("Q")}

Three Backspace variants are needed because different terminals send different codes: classic Ctrl+H (8), DEL (127), and curses.KEY_BACKSPACE (usually 263). All three are handled identically — deleting the previous byte.

terminal.py wraps four curses functions used in the main loop:

import curses
from typing import Any, Callable

def setup(stdscr: Any) -> None:
    curses.curs_set(0)
    stdscr.keypad(True)

def run_with_wrapper(fn: Callable[..., None], *args: Any) -> None:
    curses.wrapper(fn, *args)

def read_key(stdscr: Any) -> int:
    return stdscr.getch()

def clear(stdscr: Any) -> None:
    stdscr.clear()

def screen_size(stdscr: Any) -> tuple[int, int]:
    return stdscr.getmaxyx()

Extracting these two modules might seem excessive, but I specifically did it so the logic wouldn't know about curses and would use abstractions. It will be easy to rewrite for another library without the hassle of, say, __init__.py importing curses and working with it when it shouldn't. That's actually how it was in the first version of the project; then I saw that import and realized curses shouldn't be imported at the entry point.

Source files: keys.py and terminal.py.

Editor State: state.py

state.py is the module responsible for the editor's state. It doesn't read files or render the interface; it acts as a node that stores and implements the logic. Essentially, it's a dataclass that mutates in response to user actions.

@dataclass
class _UndoEntry:
    row: int
    col: int
    old_val: int
    new_val: int


@dataclass
class SearchState:
    query: bytes = b""
    last_offset: int = -1
    match_len: int = 0
    is_hex: bool = False


@dataclass
class StatusMessage:
    text: str = ""
    is_error: bool = False


@dataclass
class EditorState:
    hf: HexFile
    top_row: int = 0
    mode: EditorMode = EditorMode.READ
    editing: bool = False
    cur_row: int = 0
    cur_col: int = 0
    hex_nibble: int = 0
    show_interpret: bool = False
    search: SearchState = field(default_factory=SearchState)
    status: StatusMessage = field(default_factory=StatusMessage)
    _undo_stack: deque[_UndoEntry] = field(
        default_factory=lambda: deque(maxlen=UNDO_LIMIT), init=False, repr=False
    )
    _redo_stack: deque[_UndoEntry] = field(
        default_factory=lambda: deque(maxlen=UNDO_LIMIT), init=False, repr=False
    )

The hf field is an instance of HexFile, through which all data operations go. top_row is the first visible line of the dump, and scrolling is calculated from it. mode and editing define the current mode: READ, HEX, ASCII, and within HEX/ASCII — whether we're in editing mode.

The cursor property returns the cursor coordinates only if the mode is not READ — in READ mode there's no cursor, the user just scrolls the file:

@property
def cursor(self) -> Optional[tuple[int, int]]:
    return (self.cur_row, self.cur_col) if self.mode != EditorMode.READ else None

Cursor navigation is implemented in move_cursor. When moving past the left edge of a line, the cursor jumps to the last byte of the previous line; when past the right edge, to the first byte of the next line. File boundaries are checked via total_rows and _max_col:

def move_cursor(self, dr: int, dc: int) -> None:
    col = self.cur_col + dc
    row = self.cur_row + dr
    w = self.hf.width

    if col < 0:
        col, row = w - 1, row - 1
    elif col >= w:
        col, row = 0, row + 1

    row = max(0, min(row, self.hf.total_rows - 1))
    col = min(col, self._max_col(row))
    self.cur_row = row
    self.cur_col = col
    self.hex_nibble = 0

Scroll synchronization in sync_scroll ensures the cursor is always in the visible area. If the cursor moves above top_row, the top boundary is pulled up. If below the visible window, it's shifted down.

def sync_scroll(self, visible: int) -> None:
    if self.cur_row < self.top_row:
        self.top_row = self.cur_row
    elif self.cur_row >= self.top_row + visible:
        self.top_row = self.cur_row - visible + 1

Undo-Redo

This consists of two deques with a limit of 1000 entries. _undo_stack and _redo_stack are deques with maxlen=UNDO_LIMIT. Each entry is a _UndoEntry dataclass storing the row, column, old byte value, and new byte value.

The _record_write method is called before each write. It reads the current byte value, writes the new one, saves the entry to the undo stack, and clears the redo stack — any new change makes it impossible to redo old undone actions:

def _record_write(self, row: int, col: int, new_val: int) -> None:
    old_val = self.hf.read_byte(row * self.hf.width + col)
    self.hf.write_byte(row, col, new_val)
    entry = _UndoEntry(row=row, col=col, old_val=old_val, new_val=new_val)
    self._undo_stack.append(entry)
    self._redo_stack.clear()

undo pops an entry from the undo stack, reverts the byte to its old value, places the entry onto the redo stack, and moves the cursor to the changed byte. redo does the opposite. The status line receives a message with the hex value and offset.

Search

Search uses a separate SearchState dataclass, as mentioned earlier. The search_next and search_prev methods implement wrap-around search. If the end of the file is reached, it starts from the beginning, and vice versa. When wrapping, the status line shows "search wrapped to start/end". The found offset is saved in last_offset, the match length in match_len. These two fields are used in ui.py to highlight matches with a yellow background.

def _apply_search_result(
    self, found: Optional[int], visible: int, wrapped_msg: str
) -> bool:
    if found is None:
        self.status = StatusMessage(
            f"not found: {self.search.query!r}", is_error=True
        )
        return False
    self.search.last_offset = found
    self.search.match_len = len(self.search.query)
    self.jump_to_offset(found, visible)
    return True

def search_next(self, visible: int) -> bool:
    if not self.search.query:
        return False
    start = self.cur_row * self.hf.width + self.cur_col + 1
    found = self.hf.find_bytes(self.search.query, start)
    if found is None:
        found = self.hf.find_bytes(self.search.query, 0)
        if found is not None:
            self.status = StatusMessage("search wrapped to start")
    return self._apply_search_result(found, visible, "search wrapped to start")

def search_prev(self, visible: int) -> bool:
    if not self.search.query:
        return False
    current = self.cur_row * self.hf.width + self.cur_col
    found = self.hf.find_bytes_backward(self.search.query, current)
    if found is None:
        found = self.hf.find_bytes_backward(self.search.query, self.hf.size)
        if found is not None:
            self.status = StatusMessage("search wrapped to end")
    return self._apply_search_result(found, visible, "search wrapped to end")

The full source file is available at this link.

Keybindings: handlers.py

The handlers module turns key codes into calls to EditorState methods. The logic is split by mode, and for each mode a mapping table is built — a dictionary where the key is the key code and the value is a lambda with the action.

For READ mode, the _make_read_nav_table is built:

def _make_read_nav_table(state: EditorState, visible: int) -> dict[int, object]:
    return {
        KEY_DOWN: lambda: state.scroll(1, visible),
        KEY_UP: lambda: state.scroll(-1, visible),
        KEY_NPAGE: lambda: state.scroll(visible, visible),
        KEY_PPAGE: lambda: state.scroll(-visible, visible),
        KEY_HOME: lambda: setattr(state, "top_row", 0),
        KEY_END: lambda: setattr(
            state, "top_row", max(0, state.hf.total_rows - visible)
        ),
        ord("w"): state.cycle_width,
        ord("W"): state.cycle_width,
        ord("r"): lambda: state.set_mode(EditorMode.READ),
        ord("h"): lambda: state.set_mode(EditorMode.HEX),
        ord("a"): lambda: state.set_mode(EditorMode.ASCII),
    }

For normal HEX/ASCII mode, cursor navigation, entering edit mode, and undo are added. For editing mode, special keys (Esc to exit, Backspace/Delete for deletion, undo/redo) and navigation are added. The tables themselves are built in _make_panel_nav_table and _make_edit_special_table.

Duplication between handle_hex_edit and handle_ascii_edit is eliminated via the common function _handle_edit_common. It takes a predicate for valid characters and a writer function; everything else — navigation and special keys — is handled identically:

def _handle_edit_common(
    state: EditorState,
    key: int,
    visible: int,
    char_predicate: Callable[[int], bool],
    char_writer: Callable[[int], None],
) -> None:
    special = _make_edit_special_table(state, visible)
    nav_keys: dict[int, tuple[int, int]] = {
        KEY_DOWN: (1, 0), KEY_UP: (-1, 0),
        KEY_LEFT: (0, -1), KEY_RIGHT: (0, 1),
    }

    if key in special:
        special[key]()
    elif key in nav_keys:
        dr, dc = nav_keys[key]
        state.move_cursor(dr, dc)
        state.sync_scroll(visible)
    elif key == KEY_HOME:
        state.cur_col = 0
    elif key == KEY_END:
        state.cur_col = state._max_col(state.cur_row)
    elif char_predicate(key):
        char_writer(key)
        state.sync_scroll(visible)

Now handle_hex_edit and handle_ascii_edit are simply calls to this function with different predicates:

def handle_hex_edit(state, key, visible):
    _handle_edit_common(state, key, visible,
        char_predicate=lambda k: k in _HEX_CHARS,
        char_writer=lambda k: state.write_hex_nibble(_HEX_CHARS[k]))

def handle_ascii_edit(state, key, visible):
    _handle_edit_common(state, key, visible,
        char_predicate=lambda k: 32 <= k <= 126,
        char_writer=lambda k: state.write_ascii(chr(k)))

The _HEX_CHARS dictionary is defined at the module level — it's built once and maps character codes 0–9, a–f, A–F to numeric nibble values:

_HEX_CHARS: dict[int, int] = {
    **{ord(str(d)): d for d in range(10)},
    **{ord(c): v for c, v in zip("abcdef", range(10, 16))},
    **{ord(c): v for c, v in zip("ABCDEF", range(10, 16))},
}

In addition to mode handling, the module contains functions for search and navigation. handle_goto prompts for a hex offset via an input prompt and calls jump_to_offset. On invalid input, an error is written to the status line.

handle_search_ascii prompts for a string, encodes it in UTF-8, saves the query in state.search, and searches via hf.find_ascii. On success, it jumps to the found offset; on failure, it reports an error.

handle_search_hex uses _parse_hex_query to parse the input. The user can enter a hex sequence in any format: ff d8 ff, FFD8FF, 0xff 0xd8 0xff. Tokens are split by spaces, the 0x prefix is removed, odd-length tokens are left-padded with zero, then everything is converted via bytes.fromhex. If at least one token is invalid, an error is returned specifying it.

def _parse_hex_query(raw: str) -> tuple[bytes | None, str]:
    tokens = raw.split()
    result = bytearray()
    for token in tokens:
        token = token.removeprefix("0x").removeprefix("0X")
        if len(token) % 2 != 0:
            token = "0" + token
        try:
            result.extend(bytes.fromhex(token))
        except ValueError:
            return None, f"invalid hex token: {token!r}"
    if not result:
        return None, "empty query"
    return bytes(result), ""

One of the main functions is dispatch_key. It takes a key and returns False if the program should exit. The order of checks corresponds to priority: resize and save are handled before everything else; quit is only processed outside of editing mode; then editing, interpretation, goto, search, and finally, navigation depending on the mode.

def dispatch_key(state: EditorState, stdscr: Any, key: int, visible: int) -> bool:
    if key == KEY_RESIZE:
        return True

    if key == KEY_CTRL_S:
        state.hf.save()
        state.status.text = "saved"
        state.status.is_error = False
        return True

    if not state.editing and key in QUIT_KEYS:
        return False

    if state.editing:
        if state.mode == EditorMode.HEX:
            handle_hex_edit(state, key, visible)
        elif state.mode == EditorMode.ASCII:
            handle_ascii_edit(state, key, visible)
        return True

    if key in INTERPRET_KEYS:
        state.toggle_interpret()
        return True

    if key in GOTO_KEYS:
        handle_goto(state, stdscr, visible)
        return True

    if key == SEARCH_ASCII_KEY:
        handle_search_ascii(state, stdscr, visible)
        return True

    if key == SEARCH_HEX_KEY:
        handle_search_hex(state, stdscr, visible)
        return True

    if key == SEARCH_NEXT_KEY:
        state.search_next(visible)
        return True

    if key == SEARCH_PREV_KEY:
        state.search_prev(visible)
        return True

    if state.mode == EditorMode.READ:
        return handle_read(state, key, visible)

    return handle_panel_normal(state, key, visible)

The full source file is available at this link.

Rendering the UI

The rendering module is the most voluminous. It's responsible for everything we see on the screen: lines, panels, statuses, hints, prompts. The module doesn't store state — it receives EditorState and HexFile as parameters and renders them.

At the heart of everything lies the _addstr function — a wrapper over curses.addstr that clips the string to the window width and swallows rendering errors beyond boundaries:

def _addstr(win: Any, y: int, x: int, text: str, attr: int = 0) -> int:
    h, w = win.getmaxyx()
    if y >= h or x >= w:
        return x
    text = text[: w - x - 1]
    if text:
        try:
            win.addstr(y, x, text, attr)
        except curses.error:
            pass
    return x + len(text)

It returns the new x-coordinate, allowing output chains to be built without manually calculating offsets.

Also here is the integration with colors.py — the selection of color for each byte is implemented in _byte_attr. This is a pure function that returns a curses attribute based on the offset, byte value, and editor state. The order of checks strictly defines the priority:

def _byte_attr(
    offset: int,
    col: int,
    b: int,
    cursor_col: Optional[int],
    mirror_col: Optional[int],
    dirty_offsets: set[int],
    hf: HexFile,
    state: EditorState,
    use_hex_color: bool,
) -> int:
    if col == cursor_col:
        return curses.color_pair(PAIR_CURSOR)                 # 1. cursor
    if col == mirror_col:
        return curses.color_pair(PAIR_HIGHLIGHT)              # 2. mirror highlight
    if offset in dirty_offsets:
        return curses.color_pair(PAIR_DIRTY)                  # 3. changed byte
    if _is_search_match(offset, state):
        return curses.color_pair(PAIR_SEARCH_MATCH)           # 4. search match

    field_def = hf.get_field_at(offset)
    if field_def is not None:
        return field_color(field_def.ftype.name)              # 5. format field

    return hex_color(b) if use_hex_color else ascii_color(b)  # 6. gradient

Mirror highlighting (mirror_col) links the two panels. When the cursor is in the hex panel, the corresponding byte in the ASCII panel is highlighted with a yellow background — and vice versa. Mirror column coordinates are calculated in _resolve_cursor_cols:

def _resolve_cursor_cols(row, state):
    cursor = state.cursor
    if not cursor or cursor[0] != row:
        return None, None, None, None

    col = cursor[1]
    if state.mode == EditorMode.HEX:
        return col, None, None, col      # hex_cursor, ascii_cursor, hex_mirror, ascii_mirror
    if state.mode == EditorMode.ASCII:
        return None, col, col, None
    return None, None, None, None

If the cursor is in hex, the mirror in ASCII gets the same column — and _byte_attr for this column in the ASCII panel will return PAIR_HIGHLIGHT.

The draw_frame function renders the entire frame. Fun fact: in the first version, this function was in __init__.py, but for the sake of SRP I decided its place is in ui.py.

def draw_frame(stdscr: Any, state: EditorState) -> None:
    stdscr.erase()
    draw_header(stdscr, state.hf, state)
    draw_rows(stdscr, state)
    draw_interpret_panel(stdscr, state)
    draw_status(stdscr, state)
    draw_keybinds(stdscr, state)
    stdscr.refresh()

draw_hex_row renders one line: address, separator, hex panel, separator, ASCII panel. The hex panel is drawn in _draw_hex_part with grouping by 4 bytes and a separator ╌ between groups. In editing mode, the current nibble is underlined and bolded:

if idx == cursor_col and editing:
    hi_char = hi[hex_nibble]
    lo_char = hi[1 - hex_nibble]
    x = _addstr(win, y, x, hi_char, attr | curses.A_UNDERLINE | curses.A_BOLD)
    x = _addstr(win, y, x, lo_char, attr)

draw_rows iterates over all visible lines and calls draw_hex_row for each. The line that has focus (in READ — top_row, in other modes — cur_row) gets a highlighted address via PAIR_HIGHLIGHT.

The status header forms a string like:

  cbhe  HEX [I]    │  example.out  │  4.2 KiB  │  :16  │  PNG  │  1%

Mode, interpretation marker [I] if the panel is on, a marker that the file is modified but not saved (*), filename, human-readable size, dump width, format name, view percentage. When editing, the header changes color to green.

Editing Mode Example

Additionally, there's a status bar:

  off:00000010  dec:16  val:00  dec:  0  chr:·  │  Width [SIZE]         saved

Offset in hex and dec, byte value in hex, dec, and char. If the byte belongs to a format field, the field name and its type are displayed in yellow after the separator. The right part shows the status message. Errors are in red, normal messages in a dimmed color.

draw_interpret_panel draws a fixed-width panel of 28 characters on the right with a pseudographic border. Inside is a list of label/value pairs obtained from interpret_at.

Interpretation Panel

There are also keybinding hints. draw_keybinds outputs hints in the last line of the screen, depending on the mode. For READ, KEYBINDS_READ is used; for normal HEX/ASCII mode, KEYBINDS_NORMAL; for editing, KEYBINDS_EDIT. Keys are bolded, descriptions are in a normal font.

The full source file is available at this link.

Byte Interpretation: interpret.py

I confess here: I felt our editor was missing some feature besides highlighting. After a short brainstorm, I settled on byte interpretation. The interpret.py module answers the question "What could this byte and its neighbors represent?". It takes an offset in the file, reads several consecutive bytes, and interprets them as numbers of various sizes and byte orders, as a bit vector, and as a UTF-8 string.

Interpretation Panel

All interpretations are collected in the _STRUCT_FORMATS list — tuples of four elements: a short key, a human-readable label, a format string for struct.unpack, and a size in bytes:

_STRUCT_FORMATS: list[tuple[str, str, str, int]] = [
    ("i8", "int8", ">b", 1),
    ("u8", "uint8", ">B", 1),
    ("i16le", "int16le", "<h", 2),
    ("i16be", "int16be", ">h", 2),
    ("u16le", "uint16le", "<H", 2),
    ("u16be", "uint16be", ">H", 2),
    ("i32le", "int32le", "<i", 4),
    ("i32be", "int32be", ">i", 4),
    ("u32le", "uint32le", "<I", 4),
    ("u32be", "uint32be", ">I", 4),
    ("i64le", "int64le", "<q", 8),
    ("i64be", "int64be", ">q", 8),
    ("u64le", "uint64le", "<Q", 8),
    ("u64be", "uint64be", ">Q", 8),
    ("f32le", "float32le", "<f", 4),
    ("f32be", "float32be", ">f", 4),
    ("f64le", "float64le", "<d", 8),
    ("f64be", "float64be", ">d", 8),
]

For each format, a block of bytes of the required size is read via _read_raw:

def _read_raw(hf: HexFile, offset: int, length: int) -> Optional[bytes]:
    if offset + length > hf.size:
        return None
    chunks: list[int] = []
    for i in range(length):
        chunks.append(hf.read_byte(offset + i))
    return bytes(chunks)

Then struct.unpack is used with the corresponding format string.

def _interpret_struct(raw: bytes, fmt: str, is_float: bool) -> str:
    try:
        (v,) = struct.unpack(fmt, raw)
        return _fmt_float(v) if is_float else str(v)
    except struct.error:
        return "—"

Float values are formatted separately — _fmt_float handles NaN and ±Inf, and outputs up to six digits for normal numbers.

def _fmt_float(v: float) -> str:
    if v != v:
        return "NaN"
    if v == float("inf"):
        return "+Inf"
    if v == float("-inf"):
        return "-Inf"
    return f"{v:.6g}"

After the numeric interpretations, the bit representation of the first byte and UTF-8 strings from 4 and 8 bytes are added. UTF-8 is decoded, replacing non-printable characters with · — the same as in the main ASCII panel.

def _interpret_utf8(raw: bytes) -> str:
    try:
        text = raw.decode("utf-8")
        printable = "".join(c if c.isprintable() else "·" for c in text)
        return repr(printable)
    except UnicodeDecodeError:
        return "—"

The interpret_at function returns a list of (label, value) tuples — ready for rendering. The interpretation panel in ui.py simply iterates over this list and outputs in two columns.

def interpret_at(hf: HexFile, offset: int) -> list[InterpretRow]:
    rows: list[InterpretRow] = []

    for _key, label, fmt, size in _STRUCT_FORMATS:
        is_float = fmt[-1] in ("f", "d")
        raw = _read_raw(hf, offset, size)
        value = _interpret_struct(raw, fmt, is_float) if raw is not None else "—"
        rows.append((label, value))

    raw1 = _read_raw(hf, offset, 1)
    if raw1 is not None:
        rows.append(("bits(1B)", _interpret_bits(raw1)))

    raw4 = _read_raw(hf, offset, 4)
    if raw4 is not None:
        rows.append(("utf8(4B)", _interpret_utf8(raw4)))

    raw8 = _read_raw(hf, offset, 8)
    if raw8 is not None:
        rows.append(("utf8(8B)", _interpret_utf8(raw8)))

    return rows

The full source file is available at this link.

Entry Point

I used the command uv init --package to create the project, so in __init__.py the main function serves as the entry point.

In pyproject.toml it looks like this:

[project.scripts]
cbhe = "cbhe:main"

The code itself resides at src/cbhe.

The __init__.py module does three things: parses command-line arguments, loads formats, and launches the main loop. There is no rendering or state logic here — just glue.

The parse_arguments function creates a parser and outputs a namespace with the arguments.

def parse_arguments() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Curses-based hex editor with interpretation and highlighting",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  %(prog)s file.bin                    # Open file with auto-detection
  %(prog)s --formats custom.json file.bin  # Load custom formats
  %(prog)s -f fmt1.json -f fmt2.json file.bin  # Multiple format files
  %(prog)s -w 32 file.bin              # Set initial width to 32
  %(prog)s -m hex file.bin             # Start in hex mode
        """,
    )

    parser.add_argument("file", help="File to open and edit")
    parser.add_argument(
        "-f",
        "--formats",
        action="append",
        dest="format_files",
        help="JSON file with custom format definitions (can be used multiple times)",
    )
    parser.add_argument(
        "-w",
        "--width",
        type=int,
        choices=[8, 16, 32],
        default=16,
        help="Initial bytes per row (default: 16)",
    )
    parser.add_argument(
        "-m",
        "--mode",
        choices=["read", "hex", "ascii"],
        default="read",
        help="Initial mode (default: read)",
    )
    parser.add_argument(
        "--no-auto-detect",
        action="store_true",
        help="Disable automatic format detection",
    )
    parser.add_argument(
        "--format-dir", help="Directory containing JSON format files (loads all *.json)"
    )

    return parser.parse_args()

Format loading happens before entering curses mode. First, built-in formats are registered via register_builtins(), then user-defined JSONs from the -f and --format-dir arguments are loaded. If --no-auto-detect is specified, the format for HexFile is forcibly nullified after creation.

def load_all_formats(args: argparse.Namespace) -> None:
    format_files: list[str] = []

    if args.format_files:
        format_files.extend(args.format_files)

    if args.format_dir and os.path.isdir(args.format_dir):
        json_files = glob.glob(os.path.join(args.format_dir, "*.json"))
        format_files.extend(json_files)
        print(f"Found {len(json_files)} format files in {args.format_dir}")

    register_builtins()

    if format_files:
        print(f"Loading formats from: {format_files}")
        load_custom_formats(format_files)

The main loop runs in synchronous mode. It's minimally simple: it updates the state, renders the frame, and processes key presses.

def run(stdscr: Any, args: argparse.Namespace) -> None:
    init_colors()
    setup(stdscr)

    hf = HexFile(args.file, width=args.width)

    if args.no_auto_detect:
        hf.file_format = None

    state = EditorState(hf=hf)
    state.set_mode(_MODE_MAP[args.mode])

    while True:
        visible = _visible_rows(stdscr)
        state.clamp_top(visible)
        draw_frame(stdscr, state)

        key = stdscr.getch()

        if not dispatch_key(state, stdscr, key, visible):
            break

        if key == KEY_RESIZE:
            clear(stdscr)


def main() -> None:
    args = parse_arguments()

    if not os.path.isfile(args.file):
        print(f"File not found: {args.file}")
        sys.exit(1)

    load_all_formats(args)
    run_with_wrapper(run, args)

The full source file is available at this link.

Practice!

Finally, our editor works. To demonstrate a specific use case, I decided to take the simplest task from the reverse engineering world — changing a password in a program.

Let's write a simple C program that asks for a password and compares it to "1234":

#include <stdio.h>
#include <string.h>

int main() {
    char password[20];

    printf("Enter password: ");
    scanf("%s", password);

    if (strcmp(password, "1234") == 0) {
        printf("Access granted\n");
    } else {
        printf("Access denied\n");
    }

    return 0;
}

Then compile: gcc -o example example.c.

Let's run it:

 $ ./example
Enter password: 1111
Access denied

 $ ./example
Enter password: 1234
Access granted

And open the binary in our editor: cbhe example:

Opened Binary

Press / for ASCII search and enter, for instance, password. The editor finds the string in the .rodata section and moves the cursor to it. In the hex panel we see: 31 32 33 34 ("1234" in ASCII), and nearby 41 63 63 65 73 73 ("Access granted").

Switch to ASCII mode: a. Press e to enter editing mode. The cursor is on 1, type 1. The cursor moves, type 1, then another 1, then another 1. The string "1234" has become "1111". The changed bytes are highlighted in red.

Typing...

Press Ctrl+S to save. Exit: q.

Done!

Run the patched binary:

$ ./example
Enter password: 1111
Access granted

Why does this work? String literals in C are placed by the compiler in a read-only section (.rodata in ELF, .rdata in PE). When run, the program doesn't check the integrity of this section — it simply reads the bytes and compares them. We changed those bytes on disk, and the program honestly compares the entered string with the new value.

In reality, things are more complex. Modern compilers can inline strings, store them encrypted. Signed binaries (Windows Authenticode, macOS Gatekeeper) won't run after modification. Packers (UPX) and obfuscators shuffle sections. But for programs compiled with a simple gcc without protection flags, this method works. And our editor, with ASCII search and direct editing, makes such a task trivial — no need to calculate offsets in your head or use a separate viewer and a separate hex editor.

Conclusion

This has been an incredible journey. We've traveled from understanding how a byte decomposes into two nibbles to a working TUI editor with three layers of highlighting, automatic format detection, search, and an interpretation panel.

The code is written with extensibility in mind: new formats are added via JSON, the curses backend is isolated in two modules, logic is separated from rendering. The project can be developed in several directions — from rewriting using abstractions for even greater purity to extending functionality. It can be ported to C + ncurses or Rust + ratatui.

The source code is available in the repository; you can study it in more detail there.

If you spot nuances in the code, bad patterns, or simply have your own opinion on the matter — I'd be glad to read your comments.

I wrote this article aiming not for just a "do this, and you'll get that" guide, but for an explanation of patterns and functionality, accompanied by the core logic rather than the entire codebase.

alexeev-dev notes

Bytes, nibbles, and highlighting: writing your own TUI hex editor in Python

Table of Contents

What We See When Opening a Hex Editor

Byte, Nibble, and Hex Decomposition

The ASCII Column

Highlighting!

Nibble-Based Highlighting

Gradient Highlighting

Structural Highlighting by Format

Project Architecture

HexFile: Working with the File

The Color System

Configuration: constants.py

Keys and Terminal: keys.py and terminal.py

Editor State: state.py

Undo-Redo

Search

Keybindings: handlers.py

Rendering the UI

Byte Interpretation: interpret.py

Entry Point

Practice!

Conclusion