boltz_data.sequence#
Code for parsing and handling biological sequences.
Functions
|
Cluster sequences using MMseqs2. |
|
Convert a sequence string to a list of residue names. |
|
Convert a list of residue names to a sequence string. |
- boltz_data.sequence.cluster_sequences(*, sequences, min_seq_id=0.4, polymer_type=None)[source]#
Cluster sequences using MMseqs2.
- Parameters:
sequences (
Collection[str]) – List of sequences to cluster.min_seq_id (
float) – Minimum sequence identity for clustering.polymer_type (
Optional[Literal['protein','rna','dna']]) – Type of sequences. One of “protein”, “dna”, or “rna”. If None, type is inferred from sequences.
- Return type:
- Returns:
List of cluster IDs, in the same order as the input sequences.
- boltz_data.sequence.residue_names_from_sequence(sequence, /, *, polymer_type)[source]#
Convert a sequence string to a list of residue names.
- boltz_data.sequence.sequence_from_residue_names(residue_names, /, *, polymer_type, nonstandard_handling)[source]#
Convert a list of residue names to a sequence string.
- Parameters:
polymer_type (
Literal['protein','dna','rna']) – Type of polymer. One of “protein”, “dna”, or “rna”.nonstandard_handling (
Literal['X','error','parentheses']) – How to handle non-standard residues. One of: - “X”: Replace non-standard residues with ‘X’. - “error”: Raise an error if a non-standard residue is encountered. - “parentheses”: Wrap non-standard residues in parentheses.
- Return type:
- Returns:
A string representing the sequence.