BZMol: Molecular Structure Format#
BZMol is the core molecular data structure in boltz-data, using NumPy arrays for efficient vectorized operations and ML applications.
What is BZMol?#
Column-oriented, immutable, type-safe molecular representation. Two variants: BZMol (small molecules) and BZBioMol (biomolecules with residue/chain info).
Structure#
Fields are grouped by prefix (like database tables):
# The total number of atoms
num_atoms: int
# Atom properties, with leading dimension of num_atoms
atom_element: np.ndarray
atom_charge: np.ndarray
...
# The total number of bonds
num_bonds: int
# Bond properties, with leading dimension of num_bonds
bond_atom: np.ndarray
bond_order: np.ndarray
Creating BZMol#
From SMILES#
from boltz_data.mol import bzmol_from_smiles
mol = bzmol_from_smiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")
From Chemical Components#
from boltz_data.mol import bzmol_from_chemical_component
mol = bzmol_from_chemical_component("ATP")
From mmCIF Files#
from boltz_data.mol import bzmol_from_mmcif
bzmol = bzmol_from_mmcif("structure.cif")
From RDKit Molecules#
from boltz_data.mol import bzmol_from_rdmol
mol = bzmol_from_rdmol(rdmol)
From Gemmi Structures#
from boltz_data.mol import bzmol_from_structure
bzmol = bzmol_from_structure(structure)
Converting From BZMol#
To RDKit#
from boltz_data.mol import rdmol_from_bzmol
rdmol = rdmol_from_bzmol(mol)
To mmCIF#
from boltz_data.mol import mmcif_from_bzmol
structure = mmcif_from_bzmol(bzmol)
structure.make_mmcif_document().write_file("output.cif")
To SVG (Visualization)#
from boltz_data.mol import bzmol_to_svg, save_bzmol_svg
# Generate SVG string
svg = bzmol_to_svg(mol)
# Or save directly to file
save_bzmol_svg(mol, "molecule.svg")
Working with BZMol#
Accessing Properties#
carbon_atoms = mol.atom_element == 6
carbon_coords = mol.atom_coordinates[carbon_atoms]
# Bonds and adjacency
adj_matrix = mol.atom_adjacency_matrix
adj_list = mol.atom_adjacency_list
# Rings
rings = mol.rings
aromatic_rings = mol.aromatic_rings
Working with BZBioMol#
Extracting Chains#
from boltz_data.mol import subset_bzmol
chain_a_mask = bzmol.residue_chain[bzmol.atom_residue] == 0
chain_a = subset_bzmol(bzmol, chain_a_mask)
Modifying BZMol#
BZMol objects are immutable. Create new objects to modify:
Transformations#
from boltz_data.mol import transform_bzmol
transformed = transform_bzmol(bzmol, transformation_matrix)
Subsetting#
from boltz_data.mol import subset_bzmol
heavy_mol = subset_bzmol(mol, mol.atom_element > 1)
Concatenating#
from boltz_data.mol import concat_bzmols
combined = concat_bzmols([mol1, mol2])
Generating Coordinates#
from boltz_data.mol import generate_conformer, generate_depiction
mol_3d = generate_conformer(mol, seed=42) # 3D
mol_2d = generate_depiction(mol) # 2D for visualization
Validation#
from boltz_data.mol import validate_bzmol
errors = validate_bzmol(mol)
Biological Assemblies#
Generate crystallographic assemblies from mmCIF files:
from boltz_data.cif import read_single_cif_from_file
from boltz_data.mol import bzmol_from_mmcif, iterate_assemblies
# Load asymmetric unit
mmcif = read_single_cif_from_file("structure.cif")
bzmol = bzmol_from_mmcif(mmcif)
# Iterate through all biological assemblies
for assembly_bzmol in iterate_assemblies(mmcif=mmcif, bzmol=bzmol):
print(f"Assembly with {assembly_bzmol.num_chains} chains")
print(f"Total atoms: {assembly_bzmol.num_atoms}")
What are assemblies? Crystallographic structures contain an asymmetric unit, which is often just part of the biologically relevant structure. iterate_assemblies() applies symmetry operations to generate complete biological assemblies.
Geometric Analysis#
Residue Bounding Spheres#
Calculate bounding spheres for each residue:
from boltz_data.mol import get_residue_bounding_spheres_around_centroid
# Get bounding spheres
spheres = get_residue_bounding_spheres_around_centroid(bzmol)
print(f"Centers shape: {spheres.center.shape}") # (num_residues, 3)
print(f"Radii shape: {spheres.radius.shape}") # (num_residues,)
# Find large residues
large_residues = spheres.radius > 5.0 # Residues larger than 5Å
Use cases:
Spatial queries (finding nearby residues)
Interface detection
Coarse-grained representations
Additional Conversions#
To CIF Block#
Convert to a gemmi CIF block (alternative to mmCIF):
from boltz_data.mol import bzcif_from_bzmol
cif_block = bzcif_from_bzmol(mol)
# Use with gemmi
doc = cif_block.make_mmcif_document()
To Structure Definition#
Extract structure definition (topology without coordinates):
from boltz_data.mol import structure_from_bzmol
from boltz_data.ccd import chemical_component_dictionary_from_path
ccd = chemical_component_dictionary_from_path("ccd.pkl.gz")
structure = structure_from_bzmol(bzmol, chemical_component_dictionary=ccd)
# Inspect topology
for entity in structure.entities:
if entity.type == "protein":
print(f"Sequence: {entity.sequence}")
Use case: Extract sequences and topology for modification or analysis without coordinates.
Loading and Saving#
From Files#
from boltz_data.mol import bzmol_from_path
# Load from various formats (auto-detected by extension)
mol = bzmol_from_path("molecule.cbor.gz") # Compressed CBOR
mol = bzmol_from_path("molecule.json") # JSON
mol = bzmol_from_path("molecule.yaml") # YAML
mol = bzmol_from_path("molecule.pkl") # Pickle
To Files#
from boltz_data import fs
# Save in various formats
fs.write_object("molecule.cbor.gz", mol) # Recommended
fs.write_object("molecule.json", mol) # Human-readable
fs.write_object("molecule.yaml", mol) # Configuration-friendly
See the File I/O guide for more details on serialization.
API Reference#
For detailed API documentation, see: