boltz_data.mol#

Boltz molecule utilities.

Functions

bzcif_from_bzmol(bzmol, /)

Convert a BZMol to a CIF block representation.

bzmol_from_chemical_component(chemical_component)

bzmol_from_chemical_components(*, ...[, ...])

bzmol_from_definition(definition, /, *[, ...])

Create a BZMol from an entity definition.

bzmol_from_mmcif(mmcif, *[, ...])

Create a BZMol from an mmCIF file with coordinates.

bzmol_from_path(path, /)

bzmol_from_rdmol(rdmol, /, *[, conformer_id])

Convert an RDKit molecule to a BZMol.

bzmol_from_smiles(smiles, /)

Create a BZMol from a SMILES string using RDKit.

bzmol_from_structure(structure, /[, ...])

Create a BZMol from a structure definition.

bzmol_to_svg(mol, /, *[, box_width, ...])

Generate an SVG visualization of a BZMol structure.

concat_bzmols(*bzmols)

Concatenate multiple BZBioMol objects into a single BZBioMol.

generate_conformer(bzmol, /, *, seed)

Generate a 3D conformer for a BZMol using RDKit.

generate_depiction(bzmol, /, *[, match_3d])

Generate 2D coordinates for a BZMol using RDKit.

get_molecular_interfaces(bzmol, /[, threshold])

Find interfaces between chains using atoms within threshold distance.

get_residue_bounding_spheres_around_centroid(...)

Calculate bounding spheres for each residue centered at the residue centroid.

iterate_assemblies(*, mmcif, bzmol[, max_atoms])

Generate biological assemblies from an mmCIF file and asymmetric unit BZBioMol.

mmcif_from_bzmol(bzmol, /[, ...])

Convert a BZMol back to an mmCIF block.

rdmol_from_bzmol(bzmol, /)

Convert a BZMol or BZBioMol to an RDKit molecule.

save_bzmol_svg(mol, filepath, /, *[, ...])

Save a BZMol visualization as an SVG file.

smiles_from_bzmol(bzmol, /)

Convert a BZMol or BZBioMol to a SMILES string.

structure_from_bzmol(bzmol, /[, ...])

Convert a BZBioMol to a structure definition.

subset_bzmol(bzmol, *[, chain_ids])

Extract a subset of a BZBioMol by selecting specific chains.

transform_bzmol(bzmol, **kwargs)

Create a new BZMol with modified fields by replacing specified attributes.

validate_bzmol(mol, /)

Validate a BZBioMol instance for consistency.

Classes

BZBioMol(**data)

Biomolecular structure with residues and chains.

BZMol(**data)

Base representation of a molecular structure with atoms and bonds.

class boltz_data.mol.BZBioMol(**data)[source]#

Biomolecular structure with residues and chains.

Extends BZMol to include residue and chain information for proteins, DNA, RNA, and other biomolecules.

Parameters:
  • atom_element (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_charge (Annotated[ndarray[tuple[int, ...], dtype[int8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • bond_atoms (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • bond_order (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_resolved (Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_coordinates (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_b_factor (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)

  • atom_residue (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • residue_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • residue_number (Annotated[ndarray[tuple[int, ...], dtype[int32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)

  • residue_chain (Annotated[ndarray[tuple[int, ...], dtype[uint16]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • chain_id (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • chain_description (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)

property chain_atoms: list[slice | list[int]]#

Get atom indices for each chain as slices or lists.

property chain_residues: list[slice | list[int]]#

Get residue indices for each chain as slices or lists.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'validate_default': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property num_chains: int#

Total number of chains in the molecule.

property num_residues: int#

Total number of residues in the molecule.

property residue_any_resolved: ndarray[tuple[int, ...], dtype[bool]]#

Get a boolean array of shape (n_residues,) indicating which residues have any resolved atoms.

property residue_atoms: list[slice | list[int]]#

Get atom indices for each residue as slices or lists.

property residue_ordinal: ndarray[tuple[int, ...], dtype[uint16]]#

0-indexed ordinal number of each residue within its chain.

atom_b_factor: Optional[Annotated[ndarray[tuple[int, ...], dtype[float32]]]]#

Optional array of shape (n_atoms,) with B-factors for each atom.

atom_residue: Annotated[ndarray[tuple[int, ...], dtype[uint32]]]#
residue_name: Annotated[ndarray[tuple[int, ...], dtype[str_]]]#

Residue names (e.g., ‘ALA’, ‘GLY’ for proteins, ‘A’, ‘G’ for nucleic acids).

residue_number: Optional[Annotated[ndarray[tuple[int, ...], dtype[int32]]]]#
residue_chain: Annotated[ndarray[tuple[int, ...], dtype[uint16]]]#
chain_id: Annotated[ndarray[tuple[int, ...], dtype[str_]]]#

Optional list of chain identifiers (e.g., ‘A’, ‘B’, ‘C’) for each chain.

chain_description: Optional[Annotated[ndarray[tuple[int, ...], dtype[str_]]]]#
class boltz_data.mol.BZMol(**data)[source]#

Base representation of a molecular structure with atoms and bonds.

This object can represent both single molecules and multiple disconnected molecules, in a similar manner to an RDKit Mol object.

Unlike a Mol object, information is stored by property, rather than by atom. For example, the atom_name property is an array of shape (n_atoms,) containing the names of all atoms.

Parameters:
  • atom_element (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_charge (Annotated[ndarray[tuple[int, ...], dtype[int8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • bond_atoms (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • bond_order (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_resolved (Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

  • atom_coordinates (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])

__init__(**data)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

Return type:

None

property angle_atoms: Annotated[ndarray[tuple[int, ...], dtype[uint16]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#

Get array of shape (n_angles, 3) with atom indices for each angle.

property angle_resolved: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#

Get boolean array of shape (n_angles,) indicating which angles have valid coordinates.

property aromatic_rings: list[list[int]]#
property atom_adjacency_list: list[list[int]]#

Get the atom-atom adjacency list based on the bonds.

property atom_adjacency_matrix: ndarray[tuple[int, ...], dtype[bool]]#

Get the atom-atom adjacency matrix based on the bonds.

property atom_is_aromatic: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
property atom_num_pi_electrons: Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
property bond_is_aromatic: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
property bond_length: ndarray[tuple[int, ...], dtype[float32]]#

Calculate bond lengths if coordinates are available.

property bond_resolved: ndarray[tuple[int, ...], dtype[bool]]#
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'validate_default': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property num_angles: int#
property num_atoms: int#

Total number of atoms in the molecule.

property num_bonds: int#

Total number of bonds in the molecule.

property rings: list[list[int]]#
to_dict()[source]#
Return type:

dict[str, Any]

atom_element: Annotated[ndarray[tuple[int, ...], dtype[uint8]]]#

Atomic numbers for each atom, stored as int array.

atom_name: Annotated[ndarray[tuple[int, ...], dtype[str_]]]#

The atom names (e.g., ‘CA’, ‘N’, ‘C’, ‘O’ for proteins).

atom_charge: Annotated[ndarray[tuple[int, ...], dtype[int8]]]#

Formal charges for each atom, stored as a mapping from atom index to charge.

bond_atoms: Annotated[ndarray[tuple[int, ...], dtype[uint32]]]#

Array of shape (n_bonds, 2) with atom indices for each bond.

bond_order: Annotated[ndarray[tuple[int, ...], dtype[uint8]]]#

Bond orders (1=single, 2=double, 3=triple), stored as int array.

atom_resolved: Annotated[ndarray[tuple[int, ...], dtype[bool]]]#

Optional boolean array of shape (n_atoms,) indicating which atoms have valid coordinates.

atom_coordinates: Annotated[ndarray[tuple[int, ...], dtype[float32]]]#

Optional array of shape (n_atoms, 3) with xyz coordinates for each atom.

boltz_data.mol.bzcif_from_bzmol(bzmol, /)[source]#

Convert a BZMol to a CIF block representation.

Return type:

Block

Parameters:

bzmol (BZMol)

boltz_data.mol.bzmol_from_chemical_component(chemical_component)[source]#
Return type:

BZMol

Parameters:

chemical_component (ChemicalComponent | str)

boltz_data.mol.bzmol_from_chemical_components(*, chemical_components, chain_id, bonds=None, residue_numbers=None, description=None)[source]#
Return type:

BZBioMol

Parameters:
boltz_data.mol.bzmol_from_definition(definition, /, *, chemical_component_dictionary=None, chain_id, residue_numbers=None)[source]#

Create a BZMol from an entity definition.

Return type:

BZBioMol

Parameters:
  • definition (ProteinDefinition | RNADefinition | DNADefinition | LigandCCDDefinition | LigandSMILESDefinition | BranchedPolymerDefinition)

  • chemical_component_dictionary (Mapping[str, ChemicalComponent] | None)

  • chain_id (str)

  • residue_numbers (list[int] | None)

boltz_data.mol.bzmol_from_mmcif(mmcif, *, chemical_component_dictionary=None)[source]#

Create a BZMol from an mmCIF file with coordinates.

This function: 1. Parses entity definitions from the mmCIF 2. Creates BZMols for each entity instance 3. Concatenates them into a single structure 4. Maps atom coordinates from the mmCIF to the BZMol

Parameters:
  • mmcif (Block) – The mmCIF block containing structure data.

  • chemical_component_dictionary (Mapping[str, ChemicalComponent] | None) – Dictionary mapping component IDs to ChemicalComponent objects.

Return type:

BZBioMol

Returns:

A BZMol containing all atoms with their coordinates and a mask indicating which atoms have valid coordinates.

boltz_data.mol.bzmol_from_path(path, /)[source]#
Return type:

BZBioMol | BZMol

Parameters:

path (str | Path)

boltz_data.mol.bzmol_from_rdmol(rdmol, /, *, conformer_id=-1)[source]#

Convert an RDKit molecule to a BZMol.

This function extracts the molecular structure from an RDKit molecule to create a BZMol object. The resulting BZMol contains only atoms and bonds, without residue or chain information.

Parameters:
  • rdmol (Mol) – An RDKit Mol object.

  • conformer_id (int) – The ID of the conformer to extract coordinates from.

Return type:

BZMol

Returns:

A BZMol object containing atoms and bonds without coordinates, residues, or chains.

Example

>>> from rdkit import Chem
>>> rdmol = Chem.MolFromSmiles("CCO")
>>> mol = bzmol_from_rdmol(rdmol)
>>> mol.num_atoms
3  # Without hydrogens
boltz_data.mol.bzmol_from_smiles(smiles, /)[source]#

Create a BZMol from a SMILES string using RDKit.

This function converts a SMILES string to an RDKit molecule, then converts it to a BZMol object. The resulting BZMol contains only atoms and bonds, without residue or chain information.

Parameters:

smiles (str) – A valid SMILES string representing the molecule.

Return type:

BZMol

Returns:

A BZMol object containing atoms and bonds without coordinates, residues, or chains.

Raises:

ValueError – If the SMILES string is invalid or cannot be parsed.

Example

>>> mol = bzmol_from_smiles("CCO")  # Ethanol
>>> mol.num_atoms
3  # Without hydrogens
>>> mol.num_residues
0  # No residues defined
boltz_data.mol.bzmol_from_structure(structure, /, chemical_component_dictionary=None)[source]#

Create a BZMol from a structure definition.

Return type:

BZBioMol

Parameters:
boltz_data.mol.bzmol_to_svg(mol, /, *, box_width=60, box_height=30, padding=5)[source]#

Generate an SVG visualization of a BZMol structure.

Each chain is shown as a separate section with residues as rows and atoms as columns.

Parameters:
  • mol (BZMol | BZBioMol) – The BZMol to visualize.

  • box_width (int) – Width of each atom box in pixels.

  • box_height (int) – Height of each atom box in pixels.

  • padding (int) – Padding around the entire diagram and between chains in pixels.

Return type:

str

Returns:

SVG string representing the BZMol structure.

boltz_data.mol.concat_bzmols(*bzmols)[source]#

Concatenate multiple BZBioMol objects into a single BZBioMol.

Parameters:

*bzmols (BZMol | BZBioMol) – Variable number of BZBioMol objects to concatenate.

Return type:

BZMol | BZBioMol

Returns:

A single BZBioMol containing all atoms, residues, and bonds from input BZBioMols.

boltz_data.mol.generate_conformer(bzmol, /, *, seed)[source]#

Generate a 3D conformer for a BZMol using RDKit.

Return type:

BZMol

Parameters:
boltz_data.mol.generate_depiction(bzmol, /, *, match_3d=True)[source]#

Generate 2D coordinates for a BZMol using RDKit.

Return type:

BZMol

Parameters:
boltz_data.mol.get_molecular_interfaces(bzmol, /, threshold=5.0)[source]#

Find interfaces between chains using atoms within threshold distance.

Uses a two-pass approach: 1. Find residue pairs with overlapping bounding boxes (fast sweep-and-prune) 2. Check atom distances only for those residue pairs (KDTree per pair)

This leverages residue grouping to avoid O(n²) atom comparisons.

Return type:

list[Interface]

Parameters:
boltz_data.mol.get_residue_bounding_spheres_around_centroid(bzmol, /)[source]#

Calculate bounding spheres for each residue centered at the residue centroid.

Return type:

Spheres

Parameters:

bzmol (BZBioMol)

boltz_data.mol.iterate_assemblies(*, mmcif, bzmol, max_atoms=None)[source]#

Generate biological assemblies from an mmCIF file and asymmetric unit BZBioMol.

Return type:

Generator[BZBioMol]

Parameters:
  • mmcif (Block)

  • bzmol (BZBioMol)

  • max_atoms (int | None)

boltz_data.mol.mmcif_from_bzmol(bzmol, /, chemical_component_dictionary=None, name='pred')[source]#

Convert a BZMol back to an mmCIF block.

Parameters:
  • bzmol (BZBioMol) – The BZMol to convert.

  • chemical_component_dictionary (Mapping[str, ChemicalComponent] | None) – Dictionary mapping component IDs to chemical components.

  • name (str) – The name to give the mmCIF structure.

Return type:

Block

Returns:

An mmCIF block representing the structure in the BZMol.

boltz_data.mol.rdmol_from_bzmol(bzmol, /)[source]#

Convert a BZMol or BZBioMol to an RDKit molecule.

Return type:

RWMol

Parameters:

bzmol (BZMol | BZBioMol)

boltz_data.mol.save_bzmol_svg(mol, filepath, /, *, box_width=60, box_height=30, padding=5)[source]#

Save a BZMol visualization as an SVG file.

Parameters:
  • mol (BZMol) – The BZMol to visualize.

  • filepath (str | Path) – Path to save the SVG file.

  • box_width (int) – Width of each atom box in pixels.

  • box_height (int) – Height of each atom box in pixels.

  • padding (int) – Padding between boxes in pixels.

Return type:

None

boltz_data.mol.smiles_from_bzmol(bzmol, /)[source]#

Convert a BZMol or BZBioMol to a SMILES string.

Return type:

str

Parameters:

bzmol (BZMol)

boltz_data.mol.structure_from_bzmol(bzmol, /, chemical_component_dictionary=None)[source]#

Convert a BZBioMol to a structure definition.

Return type:

StructureDefinition

Parameters:
boltz_data.mol.subset_bzmol(bzmol, *, chain_ids=None)[source]#

Extract a subset of a BZBioMol by selecting specific chains.

Return type:

BZBioMol

Parameters:
boltz_data.mol.transform_bzmol(bzmol, **kwargs)[source]#

Create a new BZMol with modified fields by replacing specified attributes.

Return type:

TypeVar(TMol, bound= BZMol)

Parameters:
  • bzmol (TMol)

  • kwargs (Any)

boltz_data.mol.validate_bzmol(mol, /)[source]#

Validate a BZBioMol instance for consistency.

Return type:

None

Parameters:

mol (BZBioMol)