boltz_data.mol

boltz_data.mol#

Boltz molecule utilities.

Functions

`bzcif_from_bzmol`(bzmol, /)	Convert a BZMol to a CIF block representation.
`bzmol_from_chemical_component`(chemical_component)
`bzmol_from_chemical_components`(*, ...[, ...])
`bzmol_from_definition`(definition, /, *[, ...])	Create a BZMol from an entity definition.
`bzmol_from_mmcif`(mmcif, *[, ...])	Create a BZMol from an mmCIF file with coordinates.
`bzmol_from_path`(path, /)
`bzmol_from_rdmol`(rdmol, /, *[, conformer_id])	Convert an RDKit molecule to a BZMol.
`bzmol_from_smiles`(smiles, /)	Create a BZMol from a SMILES string using RDKit.
`bzmol_from_structure`(structure, /[, ...])	Create a BZMol from a structure definition.
`bzmol_to_svg`(mol, /, *[, box_width, ...])	Generate an SVG visualization of a BZMol structure.
`concat_bzmols`(*bzmols)	Concatenate multiple BZBioMol objects into a single BZBioMol.
`generate_conformer`(bzmol, /, *, seed)	Generate a 3D conformer for a BZMol using RDKit.
`generate_depiction`(bzmol, /, *[, match_3d])	Generate 2D coordinates for a BZMol using RDKit.
`get_molecular_interfaces`(bzmol, /[, threshold])	Find interfaces between chains using atoms within threshold distance.
`get_residue_bounding_spheres_around_centroid`(...)	Calculate bounding spheres for each residue centered at the residue centroid.
`iterate_assemblies`(*, mmcif, bzmol[, max_atoms])	Generate biological assemblies from an mmCIF file and asymmetric unit BZBioMol.
`mmcif_from_bzmol`(bzmol, /[, ...])	Convert a BZMol back to an mmCIF block.
`rdmol_from_bzmol`(bzmol, /)	Convert a BZMol or BZBioMol to an RDKit molecule.
`save_bzmol_svg`(mol, filepath, /, *[, ...])	Save a BZMol visualization as an SVG file.
`smiles_from_bzmol`(bzmol, /)	Convert a BZMol or BZBioMol to a SMILES string.
`structure_from_bzmol`(bzmol, /[, ...])	Convert a BZBioMol to a structure definition.
`subset_bzmol`(bzmol, *[, chain_ids])	Extract a subset of a BZBioMol by selecting specific chains.
`transform_bzmol`(bzmol, **kwargs)	Create a new BZMol with modified fields by replacing specified attributes.
`validate_bzmol`(mol, /)	Validate a BZBioMol instance for consistency.

Classes

`BZBioMol`(**data)	Biomolecular structure with residues and chains.
`BZMol`(**data)	Base representation of a molecular structure with atoms and bonds.

class boltz_data.mol.BZBioMol(**data)[source]#

Biomolecular structure with residues and chains.

Extends BZMol to include residue and chain information for proteins, DNA, RNA, and other biomolecules.

Parameters:

atom_element (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_charge (Annotated[ndarray[tuple[int, ...], dtype[int8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_atoms (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_order (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_resolved (Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_coordinates (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_b_factor (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)
atom_residue (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
residue_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
residue_number (Annotated[ndarray[tuple[int, ...], dtype[int32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)
residue_chain (Annotated[ndarray[tuple[int, ...], dtype[uint16]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
chain_id (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
chain_description (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)

property chain_atoms: list[slice | list[int]]#: Get atom indices for each chain as slices or lists.

property chain_residues: list[slice | list[int]]#: Get residue indices for each chain as slices or lists.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'validate_default': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property num_chains: int#: Total number of chains in the molecule.

property num_residues: int#: Total number of residues in the molecule.

property residue_any_resolved: ndarray[tuple[int, ...], dtype[bool]]#: Get a boolean array of shape (n_residues,) indicating which residues have any resolved atoms.

property residue_atoms: list[slice | list[int]]#: Get atom indices for each residue as slices or lists.

property residue_ordinal: ndarray[tuple[int, ...], dtype[uint16]]#: 0-indexed ordinal number of each residue within its chain.

atom_b_factor: Optional[Annotated[ndarray[tuple[int, ...], dtype[float32]]]]#: Optional array of shape (n_atoms,) with B-factors for each atom.

atom_residue: Annotated[ndarray[tuple[int, ...], dtype[uint32]]]#

residue_name: Annotated[ndarray[tuple[int, ...], dtype[str_]]]#: Residue names (e.g., ‘ALA’, ‘GLY’ for proteins, ‘A’, ‘G’ for nucleic acids).

residue_number: Optional[Annotated[ndarray[tuple[int, ...], dtype[int32]]]]#

residue_chain: Annotated[ndarray[tuple[int, ...], dtype[uint16]]]#

chain_id: Annotated[ndarray[tuple[int, ...], dtype[str_]]]#: Optional list of chain identifiers (e.g., ‘A’, ‘B’, ‘C’) for each chain.

chain_description: Optional[Annotated[ndarray[tuple[int, ...], dtype[str_]]]]#

class boltz_data.mol.BZMol(**data)[source]#

Base representation of a molecular structure with atoms and bonds.

This object can represent both single molecules and multiple disconnected molecules, in a similar manner to an RDKit Mol object.

Unlike a Mol object, information is stored by property, rather than by atom. For example, the atom_name property is an array of shape (n_atoms,) containing the names of all atoms.

Parameters:

atom_element (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_charge (Annotated[ndarray[tuple[int, ...], dtype[int8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_atoms (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_order (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_resolved (Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_coordinates (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

__init__(**data)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)
Return type:: None

property angle_atoms: Annotated[ndarray[tuple[int, ...], dtype[uint16]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#: Get array of shape (n_angles, 3) with atom indices for each angle.

property angle_resolved: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#: Get boolean array of shape (n_angles,) indicating which angles have valid coordinates.

property aromatic_rings: list[list[int]]#

property atom_adjacency_list: list[list[int]]#: Get the atom-atom adjacency list based on the bonds.

property atom_adjacency_matrix: ndarray[tuple[int, ...], dtype[bool]]#: Get the atom-atom adjacency matrix based on the bonds.

property atom_is_aromatic: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#

property atom_num_pi_electrons: Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#

property bond_is_aromatic: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#

property bond_length: ndarray[tuple[int, ...], dtype[float32]]#: Calculate bond lengths if coordinates are available.

property bond_resolved: ndarray[tuple[int, ...], dtype[bool]]#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'validate_default': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property num_angles: int#

property num_atoms: int#: Total number of atoms in the molecule.

property num_bonds: int#: Total number of bonds in the molecule.

property rings: list[list[int]]#

to_dict()[source]#

Return type:: dict[str, Any]

atom_element: Annotated[ndarray[tuple[int, ...], dtype[uint8]]]#: Atomic numbers for each atom, stored as int array.

atom_name: Annotated[ndarray[tuple[int, ...], dtype[str_]]]#: The atom names (e.g., ‘CA’, ‘N’, ‘C’, ‘O’ for proteins).

atom_charge: Annotated[ndarray[tuple[int, ...], dtype[int8]]]#: Formal charges for each atom, stored as a mapping from atom index to charge.

bond_atoms: Annotated[ndarray[tuple[int, ...], dtype[uint32]]]#: Array of shape (n_bonds, 2) with atom indices for each bond.

bond_order: Annotated[ndarray[tuple[int, ...], dtype[uint8]]]#: Bond orders (1=single, 2=double, 3=triple), stored as int array.

atom_resolved: Annotated[ndarray[tuple[int, ...], dtype[bool]]]#: Optional boolean array of shape (n_atoms,) indicating which atoms have valid coordinates.

atom_coordinates: Annotated[ndarray[tuple[int, ...], dtype[float32]]]#: Optional array of shape (n_atoms, 3) with xyz coordinates for each atom.

boltz_data.mol.bzcif_from_bzmol(bzmol, /)[source]#

Convert a BZMol to a CIF block representation.

Return type:: Block
Parameters:: bzmol (BZMol)

boltz_data.mol.bzmol_from_chemical_component(chemical_component)[source]#

Return type:: BZMol
Parameters:: chemical_component (ChemicalComponent | str)

boltz_data.mol.bzmol_from_chemical_components(*, chemical_components, chain_id, bonds=None, residue_numbers=None, description=None)[source]#

Return type:

BZBioMol

Parameters:

chemical_components (list[ChemicalComponent])
chain_id (str)
bonds (list[InternalBond] | None)
residue_numbers (list[int] | None)
description (str | None)

boltz_data.mol.bzmol_from_definition(definition, /, *, chemical_component_dictionary=None, chain_id, residue_numbers=None)[source]#

Create a BZMol from an entity definition.

Return type:

BZBioMol

Parameters:

definition (ProteinDefinition | RNADefinition | DNADefinition | LigandCCDDefinition | LigandSMILESDefinition | BranchedPolymerDefinition)
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None)
chain_id (str)
residue_numbers (list[int] | None)

boltz_data.mol.bzmol_from_mmcif(mmcif, *, chemical_component_dictionary=None)[source]#

Create a BZMol from an mmCIF file with coordinates.

This function: 1. Parses entity definitions from the mmCIF 2. Creates BZMols for each entity instance 3. Concatenates them into a single structure 4. Maps atom coordinates from the mmCIF to the BZMol

Parameters:

mmcif (Block) – The mmCIF block containing structure data.
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None) – Dictionary mapping component IDs to ChemicalComponent objects.

Return type:

BZBioMol

Returns:

A BZMol containing all atoms with their coordinates and a mask indicating which atoms have valid coordinates.

boltz_data.mol.bzmol_from_path(path, /)[source]#

Return type:: BZBioMol | BZMol
Parameters:: path (str | Path)

boltz_data.mol.bzmol_from_rdmol(rdmol, /, *, conformer_id=-1)[source]#

Convert an RDKit molecule to a BZMol.

This function extracts the molecular structure from an RDKit molecule to create a BZMol object. The resulting BZMol contains only atoms and bonds, without residue or chain information.

Parameters:

rdmol (Mol) – An RDKit Mol object.
conformer_id (int) – The ID of the conformer to extract coordinates from.

Return type:

BZMol

Returns:

A BZMol object containing atoms and bonds without coordinates, residues, or chains.

Example

>>> from rdkit import Chem
>>> rdmol = Chem.MolFromSmiles("CCO")
>>> mol = bzmol_from_rdmol(rdmol)
>>> mol.num_atoms
3  # Without hydrogens

boltz_data.mol.bzmol_from_smiles(smiles, /)[source]#

Create a BZMol from a SMILES string using RDKit.

This function converts a SMILES string to an RDKit molecule, then converts it to a BZMol object. The resulting BZMol contains only atoms and bonds, without residue or chain information.

Parameters:: smiles (str) – A valid SMILES string representing the molecule.
Return type:: BZMol
Returns:: A BZMol object containing atoms and bonds without coordinates, residues, or chains.
Raises:: ValueError – If the SMILES string is invalid or cannot be parsed.

Example

>>> mol = bzmol_from_smiles("CCO")  # Ethanol
>>> mol.num_atoms
3  # Without hydrogens
>>> mol.num_residues
0  # No residues defined

boltz_data.mol.bzmol_from_structure(structure, /, chemical_component_dictionary=None)[source]#

Create a BZMol from a structure definition.

Return type:

BZBioMol

Parameters:

structure (StructureDefinition)
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None)

boltz_data.mol.bzmol_to_svg(mol, /, *, box_width=60, box_height=30, padding=5)[source]#

Generate an SVG visualization of a BZMol structure.

Each chain is shown as a separate section with residues as rows and atoms as columns.

Parameters:

mol (BZMol | BZBioMol) – The BZMol to visualize.
box_width (int) – Width of each atom box in pixels.
box_height (int) – Height of each atom box in pixels.
padding (int) – Padding around the entire diagram and between chains in pixels.

Return type:

str

Returns:

SVG string representing the BZMol structure.

boltz_data.mol.concat_bzmols(*bzmols)[source]#

Concatenate multiple BZBioMol objects into a single BZBioMol.

Parameters:: *bzmols (BZMol | BZBioMol) – Variable number of BZBioMol objects to concatenate.
Return type:: BZMol | BZBioMol
Returns:: A single BZBioMol containing all atoms, residues, and bonds from input BZBioMols.

boltz_data.mol.generate_conformer(bzmol, /, *, seed)[source]#

Generate a 3D conformer for a BZMol using RDKit.

Return type:

BZMol

Parameters:

bzmol (BZMol)
seed (int)

boltz_data.mol.generate_depiction(bzmol, /, *, match_3d=True)[source]#

Generate 2D coordinates for a BZMol using RDKit.

Return type:

BZMol

Parameters:

bzmol (BZMol)
match_3d (bool)

boltz_data.mol.get_molecular_interfaces(bzmol, /, threshold=5.0)[source]#

Find interfaces between chains using atoms within threshold distance.

Uses a two-pass approach: 1. Find residue pairs with overlapping bounding boxes (fast sweep-and-prune) 2. Check atom distances only for those residue pairs (KDTree per pair)

This leverages residue grouping to avoid O(n²) atom comparisons.

Return type:

list[Interface]

Parameters:

bzmol (BZBioMol)
threshold (float)

boltz_data.mol.get_residue_bounding_spheres_around_centroid(bzmol, /)[source]#

Calculate bounding spheres for each residue centered at the residue centroid.

Return type:: Spheres
Parameters:: bzmol (BZBioMol)

boltz_data.mol.iterate_assemblies(*, mmcif, bzmol, max_atoms=None)[source]#

Generate biological assemblies from an mmCIF file and asymmetric unit BZBioMol.

Return type:

Generator[BZBioMol]

Parameters:

mmcif (Block)
bzmol (BZBioMol)
max_atoms (int | None)

boltz_data.mol.mmcif_from_bzmol(bzmol, /, chemical_component_dictionary=None, name='pred')[source]#

Convert a BZMol back to an mmCIF block.

Parameters:

bzmol (BZBioMol) – The BZMol to convert.
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None) – Dictionary mapping component IDs to chemical components.
name (str) – The name to give the mmCIF structure.

Return type:

Block

Returns:

An mmCIF block representing the structure in the BZMol.

boltz_data.mol.rdmol_from_bzmol(bzmol, /)[source]#

Convert a BZMol or BZBioMol to an RDKit molecule.

Return type:: RWMol
Parameters:: bzmol (BZMol | BZBioMol)

boltz_data.mol.save_bzmol_svg(mol, filepath, /, *, box_width=60, box_height=30, padding=5)[source]#

Save a BZMol visualization as an SVG file.

Parameters:

mol (BZMol) – The BZMol to visualize.
filepath (str | Path) – Path to save the SVG file.
box_width (int) – Width of each atom box in pixels.
box_height (int) – Height of each atom box in pixels.
padding (int) – Padding between boxes in pixels.

Return type:

None

boltz_data.mol.smiles_from_bzmol(bzmol, /)[source]#

Convert a BZMol or BZBioMol to a SMILES string.

Return type:: str
Parameters:: bzmol (BZMol)

boltz_data.mol.structure_from_bzmol(bzmol, /, chemical_component_dictionary=None)[source]#

Convert a BZBioMol to a structure definition.

Return type:

StructureDefinition

Parameters:

bzmol (BZBioMol)
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None)

boltz_data.mol.subset_bzmol(bzmol, *, chain_ids=None)[source]#

Extract a subset of a BZBioMol by selecting specific chains.

Return type:

BZBioMol

Parameters:

bzmol (BZBioMol)
chain_ids (list[str] | None)

boltz_data.mol.transform_bzmol(bzmol, **kwargs)[source]#

Create a new BZMol with modified fields by replacing specified attributes.

Return type:

TypeVar(TMol, bound= BZMol)

Parameters:

bzmol (TMol)
kwargs (Any)

boltz_data.mol.validate_bzmol(mol, /)[source]#

Validate a BZBioMol instance for consistency.

Return type:: None
Parameters:: mol (BZBioMol)

boltz_data.mol

Contents

boltz_data.mol#