Running Multiple Interpreters in Python Code — Incredible Speed
On June 5, 2025, PEP-0734 was accepted. Judging by the information on the official website, it is a continuation of PEP-0554. This PEP proposes adding a new interpreters module to support checking, creating, and running code in multiple interpreters within the current process. And going further, it is a continuation of PEP-0684, which proposes one GIL per interpreter.
Several full-fledged interpreters working side by side. What are the advantages?
- One process;
- One thread, but you can manually create more;
- Data between interpreters is always transferred through serialization similar to pickle, including primitive types;
- One GIL per interpreter, you can still get the benefits of true multitasking over the network;
- Works with asyncio.
GIL (Global Interpreter Lock) in Python is a global interpreter lock. This is a mechanism built into the standard Python implementation (CPython) that prevents simultaneous execution of Python bytecode by multiple threads.
Among the downsides — this PEP significantly changed the C code, and therefore the stability of C extensions is not always guaranteed. By the way, I talked about how to create them in my previous article.
There are several important non-technical aspects about the process of creating this feature:
- PEP-734 and Free-Threading do very similar things — they allow implementing true multitasking, but in different ways;
- Initially, subinterpreters appeared in 3.10 only as a C-API;
- There is a separate PyPI package (https://pypi.org/project/interpreters-pep-734/) with this code;
- The Python part in the form of PEP-734 was added to 3.14 after the feature freeze;
- It was originally planned to add it as the
interpretersmodule, but at the last moment it becameconcurrent.interpreters, there's a lengthy discussion about this.
The PEP adds the interpreters module (concurrent.interpreters). This includes Interpreter objects representing the underlying interpreters. The module also provides a basic Queue class for communication between interpreters.
For users, there will be a simple API:
interp = interpreters.create()
try:
interp.exec('print("Hello from PEP-554")')
finally:
interp.close()
Right now, if you use Python 3.14, you can import the concurrent.interpreters package:
import concurrent.interpreters as interpreters
interp = interpreters.create()
a = 15
print(f"A in main: {a}")
try:
interp.exec('print("Hello from PEP-554")\na = 10\nprint(f"A in subinterp: {a}")')
finally:
interp.close()
Output when run:
A in main: 15
Hello from PEP-554
A in subinterp: 10
❯ Why is this PEP important?
The interpreters module will provide a high-level interface for the multiple interpreter functionality. The goal is to make the existing multiple-interpreters feature of CPython more accessible to Python code. This is especially relevant now that CPython has a per-interpreter GIL (PEP 684), and people are more interested in using multiple interpreters.
Without a stdlib module, users are limited to the C API, which restricts their ability to experiment with and take advantage of multiple interpreters.
The module will include a basic mechanism for communication between interpreters. Without it, multiple interpreters would be a much less useful feature.
❯ Architecture
Essentially, an "interpreter" is a collection of all the runtime state that Python threads need to share together.
Processes in Python can have one or more OS threads executing Python code (or interacting with the C API). Each of these threads works with the CPython runtime.
Interpreters are created through the C API using Py_NewInterpreterFromConfig() (or Py_NewInterpreter(), which is a lightweight wrapper around Py_NewInterpreterFromConfig()). This function does the following:
- Creates a new state;
- Creates a new thread state;
- Sets the thread state as current (current state is needed for interpreter initialization);
- Initializes the interpreter state using the thread state;
- Returns the thread state (still current).
When a Python process starts, it creates one interpreter state (the "main" interpreter) with one thread state for the current OS thread. Then the Python runtime is initialized using them.
After initialization, the script or module or REPL is executed using them. This execution happens in the interpreter's __main__ module.
When the process finishes executing the requested Python code or REPL in the main OS thread, the Python runtime is finalized in that thread using the main interpreter.
❯ C API
Inside, you can find many different C modules. Let's break them down in more detail.
This file contains the API for managing actions between isolated interpreters. The foundation, in general.
We'll skip some functions if they are minor (like _Py_GetMainfile), you can view them yourself.
Main functions:
runpy_run_pathcalls the launch ofrunpywith a path;set_exc_with_causecreates an exception with a cause.
Interpreter management:
_PyXI_NewInterpreter(): Creates a new isolated interpreter_PyXI_EndInterpreter(): Terminates an interpreter_Py_CallInInterpreter(): Executes a function in another interpreter
Cross-language data:
_PyXIData_t: Structure for transferring data between interpreters_PyObject_GetXIData(): Converts an object to cross-language format_PyXIData_NewObject(): Recreates an object from cross-language data
Serialization:
_PyPickle_GetXIData(): Uses pickle for object serialization_PyMarshal_GetXIData(): Uses marshal for code serialization
Session management:
_PyXI_session: Execution session in another interpreter_PyXI_Enter(): Beginning of a session in another interpreter_PyXI_Exit(): Ending a session
Error handling:
_PyXI_excinfo: Storing exception information between interpreters_PyXI_failure: Unified failure handling
Among the implementation features, we can highlight the isolation of the main (__main__) module for each interpreter.
All data is transferred safely, simple data can be shared without the need to use pickle. For complex objects, however, serialization is required.
Asynchronous call execution and memory handling through the _Py_PENDING_RAWFREE flag are supported.
Additionally, don't forget about exception handling. Exception serialization occurs through _PyXI_excinfo, tracebacks are converted to TracebackException, and error propagation methods between interpreters are implemented.
This module is responsible for low-level access to interpreter primitives. The definition of the interpreters themselves.
The module provides a low-level API for working with Python interpreters, including:
- Creation and destruction of interpreters
- Managing isolation between interpreters
- Executing code in different interpreters
- Cross-core data transfer
- Managing interpreter configuration
In the code, you can see the C function interp_create, which is the implementation of create(). It creates a new interpreter with the specified interpreter configuration. After that, you can see interp_destroy for destroying the interpreter object (in Python, this is destroy()). There's also list_all for getting a list of all interpreters in the current module, as well as get_current and get_main methods for getting the current and main interpreter.
We can also highlight a separate set of functions for code execution: exec() for executing arbitrary code, run_string for executing a string of code, run_func() for executing a function body, and call() for calling a callable object (including callable classes) with arguments.
For cross-core and multi-processor interaction, there's an implementation of safe buffer sharing between interpreters:
typedef struct {
PyObject base;
Py_buffer *view;
int64_t interpid;
} xibufferview;
For serialization and deserialization of complex objects, there's the _PyXIData mechanism. It is used as an argument to functions where work with complex objects happens. There's also support for shared data objects through shared parameters.
Moreover, there are functions for managing interpreter states. Thread states are related to interpreter states in roughly the same way that OS threads and processes are related (on a high level). To begin with, the relationship is one-to-many. A thread state belongs to one interpreter (and stores a pointer to it). This thread state is never used for another interpreter. However, in the reverse direction, an interpreter can have zero or more thread states associated with it. An interpreter is considered active only in OS threads where one of its thread states is current.
The set___main___attrs function sets attributes in the __main__ module, and capture_exception is needed for capturing exceptions to subsequently transfer them between interpreters. There's also the is_shareable method for checking the ability to share objects.
Among the features of this file, we can highlight safe file operations. Also, state cleanup (module_clear, traverse_module_state), deallocators (xibufferview_dealloc), and the use of Py_buffer for working with shared buffers.
Interpreter objects are strictly isolated, there's session switching through _PyXI_Enter and _PyXI_Exit. Isolation errors are also handled through unwrap_not_shareable.
The interpreter configuration is compatible with PyInterpreterConfig, and the config itself can be created through the new_config method.
Interpreters also have a managed lifecycle, implemented through readiness checks, deletion locks for the current interpreter, and reference counting.
The module is marked as Py_MOD_PER_INTERPRETER_GIL_SUPPORTED, which means that a separate GIL for each interpreter is supported. There are special exception types (InterpreterError, NotShareableError). Compatible with marshal for serializing complex objects.
This module is responsible for the message exchange queue between interpreters. Queues work between Python interpreters within a single process, use global memory for data storage, and support locks for synchronization.
There are several structures in this module.
The first one is _queueitem, a queue element, linked list.
struct _queueitem;
typedef struct _queueitem {
/* The interpreter that added the item to the queue.
The actual bound interpid is found in item->data.
This is necessary because item->data might be NULL,
meaning the interpreter has been destroyed. */
int64_t interpid;
_PyXIData_t *data;
unboundop_t unboundop;
struct _queueitem *next;
} _queueitem;
The operating principle is that each queue element (_queueitem) contains:
interpid: sender identifierdata: data buffer (up to 256 KB without serialization)next: pointer to the next element (FIFO)
And data transfer is only done through queue.put. Synchronization primitives are borrowed from threading.Lock.
Next is the queue itself (FIFO — first in — first out).
typedef struct _queue {
Py_ssize_t num_waiters; // protected by global lock
PyThread_type_lock mutex;
int alive;
struct _queueitems {
Py_ssize_t maxsize;
Py_ssize_t count;
_queueitem *first;
_queueitem *last;
} items;
struct _queuedefaults {
xidata_fallback_t fallback;
int unboundop;
} defaults;
} _queue;
Number of "waiters" (protected by GIL), mutex, life status, _queueitems substructure with maximum size, count, and first and last element, _queuedefaults substructure for default data.
Then comes _queueref — a reference to a queue:
struct _queueref;
typedef struct _queueref {
struct _queueref *next;
int64_t qid;
Py_ssize_t refcount;
_queue *queue;
} _queueref;
It contains the next reference object, queue ID, reference count, and the queue itself as a pointer.
And finally, the _queues structure:
typedef struct _queues {
PyThread_type_lock mutex;
_queueref *head;
int64_t count;
int64_t next_id;
} _queues;
Mutex, reference to the queue as the "head", count, and next_id. _queues is the global registry of all queues.
This module also defines data management: _PyXIData_t as a data container, and mechanisms for serializing and deserializing objects. Additionally, you can see the policy for handling "unbound" objects when the source interpreter is destroyed.
The module is thread-safe (synchronization goes through PyThread_type_lock, queue operations are atomic). Custom global memory allocators are used, there's cleanup and reference counting.
And naturally, error handling: Python exceptions about queues, a system of codes, and conversion of C errors to Python exceptions.
In short, here's the module API:
create()/destroy()— queue managementput()/get()— basic operationsbind()/release()— reference management- Helper methods (
get_count(),is_full(), and others)
Modules/_interpchannelsmodule.c
The final module, a set of primitives. This code implements a low-level cross-core channel mechanism for CPython, providing data transfer between interpreters. Its core is the globals structure, providing centralized management:
static struct globals {
PyMutex mutex; // Global mutex for synchronization
int module_count; // Active sub-interpreter counter
_channels channels; // Root channel container
} _globals = {0};
The global state is protected by a mutex, preventing race conditions when accessing the channel list. The _channels structure manages the lifecycle of all channels in the process:
typedef struct _channels {
PyThread_type_lock mutex; // Channel list mutex
_channelref *head; // Linked list of active channels
int64_t numopen; // Open channel counter
int64_t next_id; // Unique ID generator
} _channels;
Each channel is represented by a hierarchy of structures:
_channelref— entry in the global registry_channel_state— main channel state_channelitem— data transfer element
The message queue element encapsulates the transmitted data and meta-information:
typedef struct _channelitem {
int64_t interpid; // Source interpreter ID
_PyXIData_t *data; // Cross-interpreter data
_waiting_t *waiting; // Synchronization semaphore
unboundop_t unboundop; // Unbound object handler
struct _channelitem *next; // Next element
} _channelitem;
The key structure _channel_state manages the internal state of a channel:
typedef struct _channel {
PyThread_type_lock mutex; // Local mutex
_channelqueue *queue; // Message queue (FIFO)
_channelends *ends; // Interpreter registry
struct {
unboundop_t unboundop; // Standard object handler
xidata_fallback_t fallback; // Fallback serialization
} defaults;
int open; // State flag
struct _channel_closing *closing; // Closing state
} _channel_state;
Special attention is given to two-phase closing, fallback serialization, and automatic reference resolution. All of this together guarantees safe completion during parallel operations, handling of objects outside the standard XI format.
The exported channelid type provides an interface for Python:
typedef struct channelid {
PyObject_HEAD
int64_t cid; // Unique channel ID
int end; // Endpoint role
int resolve; // Auto-resolution flag
_channels *channels; // Container reference
} channelid;
As a mechanism, mutexes are hierarchical: from global to channel list and further to local.
Operations are atomic (indivisible), there are lock timeouts to prevent mutual deadlocks (pardon the tautology).
And naturally, don't forget to clean up memory — garbage collection is automatic with each interpreter destruction.
❯ Transfer Operation Principle:
Sending:
_PyXIData_t *data = xi_data_serialize(obj); // Serialization
_channelitem *item = create_item(data); // Element creation
append_to_queue(queue, item); // Injection into queue
signal_receivers(waiting); // Receiver notification
Receiving:
_channelitem *item = pop_from_queue(queue); // Element extraction
if (!item) wait_with_timeout(mutex, timeout); // Block on empty queue
PyObject *obj = xi_data_deserialize(item->data); // Deserialization
Closing:
channel->open = 0; // Flag setting
broadcast_closing(channel->waiting); // Waiting thread notification
schedule_async_cleanup(channel); // Asynchronous cleanup
The error handling system converts system call codes (e.g., EAGAIN) into Python exceptions using the PyErr_SetFromErrno mechanism. For critical sections, the Py_BEGIN_CRITICAL_SECTION pattern is applied with guaranteed resource release. The implementation ensures strict interpreter isolation through object serialization into a GC-independent representation.
❯ About the Module
You can read about the motivation and how the PEP works at this link.
The interpreters module is available in Python 3.14, but the module location has been changed, now it's concurrent.interpreters.
In it, you can find the following methods:
concurrent.interpreters.list_all()— returns a list of interpreter objects, one for each known one.concurrent.interpreters.get_current()— returns the interpreter object for the currently running one.concurrent.interpreters.get_main()— returns the interpreter object for the main interpreter.concurrent.interpreters.create()— initializes a new (idle) Python interpreter and returns an interpreter object for it.
You can read more about the objects on the documentation page.
Usage example:
import concurrent.interpreters as interpreters
from textwrap import dedent
interp = interpreters.create()
# Run in the current OS thread.
interp.exec('print("spam!")')
interp.exec("""if True:
print('spam!')
""")
interp.exec(dedent("""
print('spam!')
"""))
def run():
print('spam!')
interp.call(run)
# Run in new OS thread.
t = interp.call_in_thread(run)
t.join()
For Python 3.12+, there's also the PyPI package interpreters-pep-734:
try:
import interpreters
except ModuleNotFoundError:
from interpreters_backports import interpreters
try:
import interpreters.queues
except ModuleNotFoundError:
import interpreters_backports.interpreters.queues
from interpreters_backports import interpreters
try:
from interpreters import channels
except ModuleNotFoundError:
from interpreters_experimental.interpreters import channels
try:
from concurrent.futures import ThreadPoolExecutor
except ModuleNotFoundError:
from interpreters_backports.concurrent.futures import ThreadPoolExecutor
❯ More Details?
The process is one, but there are several interpreters and each has its own GIL. They all share one allocated memory. To protect against data overwriting by different interpreters, pickle is used, it prevents mutation of the same memory from different sources (so interpreters don't conflict).
Fun fact: They can be used similarly to CSP from Golang (if you implement a scheduler).
About using immutable data in subinterpreters, you can look at Yuri Selivanov's talk.
Also, at PyCON US-24, they presented the functionality of subinterpreters and free-threading. The video talk can be viewed here.
Subinterpreters are not managed by the OS, they exist in a single process. OS threads can be bound to different interpreters, and if an interpreter uses its own GIL (PEP-684), its threads are not blocked by the GIL of other interpreters.

In fact, a subinterpreter is a separate namespace that can have a separate GIL. Isolated from other subinterpreters.
Or it might not have one:
PyInterpreterConfig config = {
.use_main_obmalloc = 0,
.allow_fork = 0,
.allow_exec = 0,
.allow_threads = 1,
.allow_daemon_threads = 0,
.check_multi_interp_extensions = 1,
.gil = PyInterpreterConfig_OWN_GIL,
};
.gil = PyInterpreterConfig_OWN_GIL can be PyInterpreterConfig_SHARED_GIL.
According to the PEP:
The interpreters module will provide a high-level interface to the multiple interpreter functionality. The goal is to make the existing multiple-interpreters feature of CPython more easily accessible to Python code. This is particularly relevant now that CPython has a per-interpreter GIL (PEP 684) and people are more interested in using multiple interpreters.
Using subinterpreters can give a speed boost thanks to the separate GIL. And so we smoothly transition to benchmarks.
❯ Benchmark
I took the measurement code from here (installation of pyperf and httpx is required).
The benchmark uses IO-bound and CPU-bound tasks. It runs a simple version, Threading GIL/NoGIL, through multiprocessing, and the subinterpreters themselves.
IO-bound task on Ryzen 7 5825u:
Regular: Mean +- std dev: 4.85 sec +- 0.48 sec
Threading: Mean +- std dev: 1.22 sec +- 0.19 sec
Multiprocessing: Mean +- std dev: 1.45 sec +- 0.26 sec
Subinterpreters: Mean +- std dev: 1.85 sec +- 0.30 sec
CPU-bound task on Ryzen 7 5825u:
Regular: Mean +- std dev: 60.2 ms +- 0.6 ms
Threading: Mean +- std dev: 22.6 ms +- 0.7 ms
Multiprocessing: Mean +- std dev: 153 ms +- 3 ms
Subinterpreters: Mean +- std dev: 120.8 ms +- 4 ms
For the purity of the experiment, results on another machine.
Here
WORKLOADSwere larger than in the first benchmark on ryzen, becoming:WORKLOADS = [(1, 10000), (10001, 20000), (20001, 30000), (30001, 40000)]
CPU-bound task on M2 Pro:
Regular: Mean +- std dev: 163 ms +- 1 ms
Threading with GIL: Mean +- std dev: 168 ms +- 2 ms
Threading NoGIL: Mean +- std dev: 48.7 ms +- 0.6 ms
Multiprocessing: Mean +- std dev: 73.4 ms +- 1.5 ms
Subinterpreters: Mean +- std dev: 44.8 ms +- 0.5 ms
IO-bound task on M2 Pro:
Regular: Mean +- std dev: 1.45 sec +- 0.03 sec
Threading with GIL: Mean +- std dev: 384 ms +- 17 ms (~1/4 of 1.45s)
Threading NoGIL: Mean +- std dev: 373 ms +- 20 ms
Multiprocessing: Mean +- std dev: 687 ms +- 32 ms
Subinterpreters: Mean +- std dev: 547 ms +- 13 ms
It might seem that it didn't yield that much performance. But we didn't account for the fact that you can use asynchrony, multithreading inside subinterpreters. In the end, it's excellent functionality for long tasks, when you need to use a lot or isolation is important. Write your opinions in the comments.
You can see that subinterpreters fall behind threading in IO-bound tasks. Why does this happen?
In subinterpreters, each call to interp.exec() requires data serialization, session switching between interpreters, and creating a new GIL for each operation. Creating interpreters itself is expensive, and therefore it's better to choose them for CPU-bound tasks — because they provide true parallelism on multiple cores.
Subinterpreters show an advantage in CPU-bound tasks only with truly parallel execution (when each has its own GIL). In the current CPython (shared GIL), they lose to threads due to overhead costs of creation and data transfer.
Each call to interp.exec() requires converting data into a cross-core format through _PyXIData_t. For simple types (int, str), this happens quickly, but when transferring complex objects (dictionaries, dataclasses), a serialization mechanism similar to pickle kicks in. In tests with transferring 1000 dictionaries of 1 KB size, serialization consumed 37% of execution time.
❯ General Conclusions
I recommend watching the interview with CPython Core Developer Nikita Sobolev and subinterpreters module developer Eric Snow.
Diving into the history and implementation of subinterpreters, one thing is clear — it's a fairly fundamental shift in CPython's architecture. Essentially, you can achieve true parallelism through subinterpreters.
I'm particularly interested in the detail that we simply isolate interpreter states, giving each its own GIL. In that interview, Eric rightly noted that "isolation gives us a conceptual advantage."
But this very isolation was very difficult to achieve — all because of nuances in the form of:
- Immortal objects (PEP 683): Objects like None or small integers became "immortal" — their reference counter is fixed at an astronomical value, eliminating races between interpreters.
By the way, this is exactly why (due to PEP 683)
sys.getrefcount(X)where X is a number from -5 to 256 inclusive shows sky-high values, but once you go beyond this limit, the reference count becomes adequate.
Static types: The problem of mutable attributes (dict, subclasses) is solved by redirecting requests to per-interpreter storage.
Extension modules: Require a transition to multi-phase initialization (PEP 489) and heap-types. Libraries like OpenSSL (through the ssl module) are a special case, where splitting state between interpreters was problematic. But as is known, they've already overcome this problem.
But don't forget that the technology has its downsides. Creating interpreters is quite an expensive pleasure — sometimes it's easier to get by with threads, multiprocessing, or asynchrony. But it all pays off if you know how to use it correctly.
Integration with asyncio is especially promising. Each interpreter has its own event loop, but there's no built-in synchronization between them yet.
Subinterpreters are not a new concept. Their roots go back to Python 1.5, where they emerged as an answer to the problem of global states. The idea of encapsulating interpreter data into separate structures resembles engineering practices for combating "global chaos." As Eric notes, this is a logical development: if threads got isolated states (thread state), then interpreters deserved the same. Historically, the inspiration came from TCL, but in Python, this feature remained "dormant" for decades due to inaccessibility from Python code and isolation violations.
Eric is skeptical about the mass use of subinterpreters. In principle, many will agree with him, since their niche is libraries for high-level abstractions, web frameworks, data processing. And also as an alternative to multiprocessing — OS resources are saved better when used correctly, communication within a process can be faster.
But the success of the subinterpreters technology also depends on the adaptation of C extensions.
I also recommend reading articles by my fellow writers: CPython — Immortal objects and PEP-734: Subinterpreters in Python 3.14.
❯ Conclusion
The example code and tests for working with PEP-0734 are available in my repository.
Exercising the right of a little self-promotion, I can suggest you subscribe to my blog on Telegram and also to the channel "Open Source Findings". If, of course, you liked the article and want to see a bit more.
If you liked the article, share it with friends. And better yet, come contribute to Python and other open-source projects. Good luck!