Troubleshooting Guide¶
Common issues and their solutions when using GenAIRR.
Common Errors¶
1. "AttributeError" or "TypeError" when creating pipelines¶
Most common cause
This almost always means you are using old API syntax. See the Migration Guide for a full before/after comparison.
Problem: Using the old API or wrong constructor syntax.
Solution: Pass config to the Pipeline constructor and use keyword arguments for steps:
from GenAIRR import Pipeline, steps, HUMAN_IGH_OGRDB, S5F
pipeline = Pipeline(
config=HUMAN_IGH_OGRDB,
steps=[
steps.SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), productive=True),
# ... more steps
]
)
2. "No functional sequences generated"¶
Problem: All generated sequences have stop codons or are out-of-frame.
Solutions:
- Set productive=True in SimulateSequence
- Lower your mutation rates
- Check your data configuration has functional alleles
# Use productive sequences
steps.SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.25), productive=True)
3. "Empty mutations dictionary"¶
Problem: Using Uniform(0, 0) or very low mutation rates.
Solution: Increase mutation rates or use S5F for realistic mutations:
# Instead of this (no mutations)
steps.SimulateSequence(Uniform(min_mutation_rate=0, max_mutation_rate=0), productive=True)
# Use this for naive sequences with potential for mutations
steps.SimulateSequence(S5F(min_mutation_rate=0.001, max_mutation_rate=0.01), productive=True)
4. "Allele not found in data config"¶
Problem: Trying to use specific alleles that don't exist in your DataConfig.
Solution: Check available alleles first:
# List available V allele families
print("V allele families:", list(HUMAN_IGH_OGRDB.v_alleles.keys())[:5])
# Access specific allele properly (using family name)
v_allele = HUMAN_IGH_OGRDB.v_alleles['IGHVF1-G1'][0] # First allele in family
d_allele = HUMAN_IGH_OGRDB.d_alleles['IGHD1-1'][0]
j_allele = HUMAN_IGH_OGRDB.j_alleles['IGHJ1'][0]
# View all alleles in a family
for allele in HUMAN_IGH_OGRDB.v_alleles['IGHVF1-G1']:
print(allele.name)
5. "Pipeline execution is very slow"¶
Potential causes and solutions: - High mutation rates: Lower max_mutation_rate - Complex pipelines: Remove unnecessary steps - Productive=True with high mutation rates: The library keeps retrying until finding functional sequences
# Faster execution
steps.SimulateSequence(S5F(min_mutation_rate=0.003, max_mutation_rate=0.05), productive=True)
Performance Optimization¶
Memory Usage¶
- Generate sequences in batches rather than all at once
- Use
get_dict()only when needed (it creates copies)
Speed Tips¶
Quick wins for faster generation
- Use built-in data configs (they're pre-optimized and lazily loaded)
- Avoid very high mutation rates (>0.3) unless necessary
- Use
EnforceSequenceLengthto cap sequence length - Set
productive=Falseif you don't need the productivity guarantee
Debugging Tips¶
Check Your Pipeline¶
# Print pipeline structure
for i, step in enumerate(pipeline.steps):
print(f"Step {i}: {type(step).__name__}")
Inspect Container State¶
# After execution
container = pipeline.execute()
print("Sequence length:", len(container.sequence))
print("Mutations count:", len(container.mutations))
print("Is productive:", container.productive)
Validate Data Config¶
# Check if data config is loaded
print("V alleles count:", len(HUMAN_IGH_OGRDB.v_alleles))
print("Chain type:", HUMAN_IGH_OGRDB.metadata.chain_type)
Getting Help¶
- Check this troubleshooting guide first
- Review the parameter reference for correct usage
- Look at the examples in documentation
- Check if your issue is covered in existing GitHub issues
- Create a minimal reproducible example when reporting bugs