Structure Definitions#
The boltz_data.definition module provides classes for defining molecular structures programmatically, separate from coordinate data.
What are Structure Definitions?#
Structure definitions describe molecular topology—sequences, chains, bonds—without requiring 3D coordinates. They’re useful for:
Creating structures from scratch
Defining theoretical complexes before generating coordinates
Storing structure metadata independently
Round-trip conversions with BZMol
Core Classes#
EntityDefinition#
An entity is a distinct chemical component (protein chain, DNA strand, ligand, etc.).
Types:
ProteinDefinition- Protein sequenceDNADefinition- DNA sequenceRNADefinition- RNA sequenceLigandCCDDefinition- Small molecule from CCDLigandSMILESDefinition- Small molecule from SMILESBranchedPolymerDefinition- Glycans and branched structures
StructureDefinition#
A complete structure containing multiple entities and chains.
from boltz_data.definition import StructureDefinition, ChainDefinition
structure = StructureDefinition(
entities=[...], # List of EntityDefinition objects
chains={...}, # Dict mapping chain IDs to ChainDefinition
bonds=None # Optional inter-chain bonds
)
Creating Protein Structures#
Simple Protein#
from boltz_data.definition import ProteinDefinition
# Define a protein entity
protein = ProteinDefinition(
type="protein",
sequence="MKFLKFSLLTAVLLSVVFAFSSCGDDDDTGYLPPSQAIQDLLKRM",
description="Example protein"
)
Protein with Non-Standard Residues#
Use parentheses for non-standard residues:
protein = ProteinDefinition(
type="protein",
sequence="MKF(MSE)KF", # Selenomethionine at position 4
description="Protein with selenomethionine"
)
Protein with Custom Bonds#
from boltz_data.definition import ProteinDefinition, InternalBond
# Define disulfide bonds
protein = ProteinDefinition(
type="protein",
sequence="CGGGC",
bonds=[
InternalBond(
residue_index_1=0, # First cysteine
atom_name_1="SG",
residue_index_2=4, # Last cysteine
atom_name_2="SG",
bond_order=1
)
]
)
Creating Nucleic Acid Structures#
DNA#
from boltz_data.definition import DNADefinition
dna = DNADefinition(
type="dna",
sequence="ATCGATCG",
description="Example DNA strand"
)
RNA#
from boltz_data.definition import RNADefinition
rna = RNADefinition(
type="rna",
sequence="AUCGAUCG",
description="Example RNA strand"
)
Creating Ligand Structures#
From CCD#
from boltz_data.definition import LigandCCDDefinition
ligand = LigandCCDDefinition(
type="ligand_ccd",
comp_id="ATP",
description="Adenosine triphosphate"
)
From SMILES#
from boltz_data.definition import LigandSMILESDefinition
ligand = LigandSMILESDefinition(
type="ligand_smiles",
smiles="CCO",
description="Ethanol"
)
Creating Multi-Chain Structures#
Protein-Ligand Complex#
from boltz_data.definition import (
StructureDefinition,
ChainDefinition,
ProteinDefinition,
LigandCCDDefinition,
)
# Define entities
protein = ProteinDefinition(
type="protein",
sequence="MKFLKFSLLTAVLLSVVFAFSSCGDDDDTGYLPPSQAIQDLLKRM"
)
ligand = LigandCCDDefinition(
type="ligand_ccd",
comp_id="ATP"
)
# Create structure with two chains
structure = StructureDefinition(
entities=[protein, ligand],
chains={
"A": ChainDefinition(entity_idx=0), # Protein on chain A
"B": ChainDefinition(entity_idx=1), # Ligand on chain B
}
)
Protein Dimer#
# Same entity used for both chains
structure = StructureDefinition(
entities=[protein], # Single entity
chains={
"A": ChainDefinition(entity_idx=0), # First copy
"B": ChainDefinition(entity_idx=0), # Second copy (same entity)
}
)
With Residue Numbers#
structure = StructureDefinition(
entities=[protein],
chains={
"A": ChainDefinition(
entity_idx=0,
residue_numbers=list(range(1, 48)) # Custom numbering
),
}
)
Inter-Chain Bonds#
Connect different chains with InterChainBond:
from boltz_data.definition import InterChainBond
structure = StructureDefinition(
entities=[protein1, protein2],
chains={
"A": ChainDefinition(entity_idx=0),
"B": ChainDefinition(entity_idx=1),
},
bonds=[
InterChainBond(
chain_id_1="A",
residue_index_1=5, # Residue ordinal in chain
atom_name_1="SG",
chain_id_2="B",
residue_index_2=12,
atom_name_2="SG",
bond_order=1
)
]
)
Converting to BZMol#
Convert definitions to 3D structures:
from boltz_data.mol import bzmol_from_definition
from boltz_data.ccd import chemical_component_dictionary_from_path
# Load CCD dictionary
ccd = chemical_component_dictionary_from_path("ccd.pkl.gz")
# Convert to BZMol
bzmol = bzmol_from_definition(
protein,
chemical_component_dictionary=ccd
)
# Or for full structures
from boltz_data.mol import bzmol_from_structure
bzmol = bzmol_from_structure(structure, ccd)
Converting from BZMol#
Extract structure definition from BZMol:
from boltz_data.mol import structure_from_bzmol
# Convert BZBioMol back to definition
structure = structure_from_bzmol(bzmol, chemical_component_dictionary=ccd)
# Now you can inspect or modify the definition
for entity in structure.entities:
if entity.type == "protein":
print(f"Protein sequence: {entity.sequence}")
Branched Polymers#
For glycans and other branched structures:
from boltz_data.definition import BranchedPolymerDefinition, InternalBond
glycan = BranchedPolymerDefinition(
type="branched_polymer",
comp_ids=["NAG", "NAG", "BMA", "MAN"],
bonds=[
InternalBond(
residue_index_1=0,
atom_name_1="O4",
residue_index_2=1,
atom_name_2="C1",
bond_order=1
),
InternalBond(
residue_index_1=1,
atom_name_1="O4",
residue_index_2=2,
atom_name_2="C1",
bond_order=1
),
# Branch point
InternalBond(
residue_index_1=2,
atom_name_1="O6",
residue_index_2=3,
atom_name_2="C1",
bond_order=1
),
]
)
Serialization#
Structure definitions are Pydantic models and can be serialized:
from boltz_data import fs
# Save as JSON
fs.write_json("structure.json", structure)
# Save as YAML
fs.write_yaml("structure.yaml", structure)
# Load back
structure = fs.read_object("structure.yaml", as_=StructureDefinition)
Use Cases#
1. Theoretical Structure Design#
Design a structure before generating coordinates:
# Design a fusion protein
fusion = ProteinDefinition(
type="protein",
sequence="MKFLK" + "GGS" + "AIQDK", # Protein1 + linker + Protein2
)
# Generate coordinates later
from boltz_data.mol import bzmol_from_definition, generate_conformer
bzmol = bzmol_from_definition(fusion, chemical_component_dictionary=ccd)
bzmol = generate_conformer(bzmol, seed=42)
2. Structure Modification#
Modify sequences programmatically:
# Load structure
structure = structure_from_bzmol(bzmol, ccd)
# Get entity
protein_entity = structure.entities[0]
# Modify sequence
mutated_sequence = protein_entity.sequence[:10] + "A" + protein_entity.sequence[11:]
mutated_entity = ProteinDefinition(
type="protein",
sequence=mutated_sequence
)
# Create new structure
new_structure = StructureDefinition(
entities=[mutated_entity],
chains=structure.chains
)
# Convert back to BZMol
new_bzmol = bzmol_from_structure(new_structure, ccd)
3. Complex Assembly#
Build multi-component complexes:
# Define all components
protein = ProteinDefinition(type="protein", sequence="MKFLKF...")
dna = DNADefinition(type="dna", sequence="ATCGATCG")
ligand = LigandCCDDefinition(type="ligand_ccd", comp_id="ATP")
# Assemble into structure
complex_structure = StructureDefinition(
entities=[protein, dna, ligand],
chains={
"A": ChainDefinition(entity_idx=0), # Protein
"B": ChainDefinition(entity_idx=1), # DNA
"C": ChainDefinition(entity_idx=2), # Ligand
}
)
# Generate 3D structure
bzmol = bzmol_from_structure(complex_structure, ccd)
API Reference#
For detailed API documentation, see: