BZMol: Molecular Structure Format#

BZMol is the core molecular data structure in boltz-data, using NumPy arrays for efficient vectorized operations and ML applications.

What is BZMol?#

Column-oriented, immutable, type-safe molecular representation. Two variants: BZMol (small molecules) and BZBioMol (biomolecules with residue/chain info).

Structure#

Fields are grouped by prefix (like database tables):

# The total number of atoms
num_atoms: int

# Atom properties, with leading dimension of num_atoms
atom_element: np.ndarray
atom_charge: np.ndarray
...

# The total number of bonds
num_bonds: int

# Bond properties, with leading dimension of num_bonds
bond_atom: np.ndarray
bond_order: np.ndarray

Creating BZMol#

From SMILES#

from boltz_data.mol import bzmol_from_smiles

mol = bzmol_from_smiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")
../_images/a5607e2976c27a19d60231c29aee72239d33a638b8448f92dd62421c1ad9a19a.svg

From Chemical Components#

from boltz_data.mol import bzmol_from_chemical_component

mol = bzmol_from_chemical_component("ATP")
../_images/9c6de39f850864e9a83521114ed26af694063f63e88842bc5c2b71d0f7edbec9.svg

From mmCIF Files#

from boltz_data.mol import bzmol_from_mmcif

bzmol = bzmol_from_mmcif("structure.cif")

From RDKit Molecules#

from boltz_data.mol import bzmol_from_rdmol

mol = bzmol_from_rdmol(rdmol)

From Gemmi Structures#

from boltz_data.mol import bzmol_from_structure

bzmol = bzmol_from_structure(structure)

Converting From BZMol#

To RDKit#

from boltz_data.mol import rdmol_from_bzmol

rdmol = rdmol_from_bzmol(mol)

To mmCIF#

from boltz_data.mol import mmcif_from_bzmol

structure = mmcif_from_bzmol(bzmol)
structure.make_mmcif_document().write_file("output.cif")

To SVG (Visualization)#

from boltz_data.mol import bzmol_to_svg, save_bzmol_svg

# Generate SVG string
svg = bzmol_to_svg(mol)

# Or save directly to file
save_bzmol_svg(mol, "molecule.svg")

Working with BZMol#

Accessing Properties#

carbon_atoms = mol.atom_element == 6
carbon_coords = mol.atom_coordinates[carbon_atoms]

# Bonds and adjacency
adj_matrix = mol.atom_adjacency_matrix
adj_list = mol.atom_adjacency_list

# Rings
rings = mol.rings
aromatic_rings = mol.aromatic_rings

Working with BZBioMol#

Extracting Chains#

from boltz_data.mol import subset_bzmol

chain_a_mask = bzmol.residue_chain[bzmol.atom_residue] == 0
chain_a = subset_bzmol(bzmol, chain_a_mask)

Modifying BZMol#

BZMol objects are immutable. Create new objects to modify:

Transformations#

from boltz_data.mol import transform_bzmol

transformed = transform_bzmol(bzmol, transformation_matrix)

Subsetting#

from boltz_data.mol import subset_bzmol

heavy_mol = subset_bzmol(mol, mol.atom_element > 1)

Concatenating#

from boltz_data.mol import concat_bzmols

combined = concat_bzmols([mol1, mol2])

Generating Coordinates#

from boltz_data.mol import generate_conformer, generate_depiction

mol_3d = generate_conformer(mol, seed=42)  # 3D
mol_2d = generate_depiction(mol)  # 2D for visualization

Validation#

from boltz_data.mol import validate_bzmol

errors = validate_bzmol(mol)

Biological Assemblies#

Generate crystallographic assemblies from mmCIF files:

from boltz_data.cif import read_single_cif_from_file
from boltz_data.mol import bzmol_from_mmcif, iterate_assemblies

# Load asymmetric unit
mmcif = read_single_cif_from_file("structure.cif")
bzmol = bzmol_from_mmcif(mmcif)

# Iterate through all biological assemblies
for assembly_bzmol in iterate_assemblies(mmcif=mmcif, bzmol=bzmol):
    print(f"Assembly with {assembly_bzmol.num_chains} chains")
    print(f"Total atoms: {assembly_bzmol.num_atoms}")

What are assemblies? Crystallographic structures contain an asymmetric unit, which is often just part of the biologically relevant structure. iterate_assemblies() applies symmetry operations to generate complete biological assemblies.

Geometric Analysis#

Residue Bounding Spheres#

Calculate bounding spheres for each residue:

from boltz_data.mol import get_residue_bounding_spheres_around_centroid

# Get bounding spheres
spheres = get_residue_bounding_spheres_around_centroid(bzmol)

print(f"Centers shape: {spheres.center.shape}")  # (num_residues, 3)
print(f"Radii shape: {spheres.radius.shape}")    # (num_residues,)

# Find large residues
large_residues = spheres.radius > 5.0  # Residues larger than 5Å

Use cases:

  • Spatial queries (finding nearby residues)

  • Interface detection

  • Coarse-grained representations

Additional Conversions#

To CIF Block#

Convert to a gemmi CIF block (alternative to mmCIF):

from boltz_data.mol import bzcif_from_bzmol

cif_block = bzcif_from_bzmol(mol)

# Use with gemmi
doc = cif_block.make_mmcif_document()

To Structure Definition#

Extract structure definition (topology without coordinates):

from boltz_data.mol import structure_from_bzmol
from boltz_data.ccd import chemical_component_dictionary_from_path

ccd = chemical_component_dictionary_from_path("ccd.pkl.gz")
structure = structure_from_bzmol(bzmol, chemical_component_dictionary=ccd)

# Inspect topology
for entity in structure.entities:
    if entity.type == "protein":
        print(f"Sequence: {entity.sequence}")

Use case: Extract sequences and topology for modification or analysis without coordinates.

Loading and Saving#

From Files#

from boltz_data.mol import bzmol_from_path

# Load from various formats (auto-detected by extension)
mol = bzmol_from_path("molecule.cbor.gz")  # Compressed CBOR
mol = bzmol_from_path("molecule.json")     # JSON
mol = bzmol_from_path("molecule.yaml")     # YAML
mol = bzmol_from_path("molecule.pkl")      # Pickle

To Files#

from boltz_data import fs

# Save in various formats
fs.write_object("molecule.cbor.gz", mol)  # Recommended
fs.write_object("molecule.json", mol)     # Human-readable
fs.write_object("molecule.yaml", mol)     # Configuration-friendly

See the File I/O guide for more details on serialization.

API Reference#

For detailed API documentation, see: