EbolaSeq is a command-line tool that simplifies the process of analyzing Ebola virus sequences. It automates the complete workflow from downloading sequences to creating phylogenetic trees. The tool retrieves Ebola virus sequences from NCBI GenBank, processes them according to user specifications, performs multiple sequence alignment, and generates phylogenetic trees. It also includes in silico screening of GP epitope changes relevant to mAb114 (Ebanga / ansuvimab), REGN-EB3 (Inmazeb), and MBP134 (ADI-15878 + ADI-23774) via the optional mAb escape report.
First, install conda if you haven’t already:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Then, ensure you have the required channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Install EbolaSeq via Conda:
conda create -n ebolaseq -c conda-forge -c bioconda ebolaseq -y
conda activate ebolaseq
conda create -n ebolaseq -c conda-forge -c bioconda python=3.9 pip mafft trimal iqtree=2.4.0 biopython minimap2 pal2nal
conda activate ebolaseq
git clone https://github.com/DaanJansen94/ebolaseq.git
cd ebolaseq
pip install .
# After `git pull`: overwrite old install (ensure correct version):
ebolaseq --upgrade
conda activate ebolaseq
ebolaseq -o OUTPUT_DIR [options]
EbolaSeq can be run in two modes:
-o, --output-dir — Output directory for results
--virus — Virus / species
--genome — Genome completeness
--completeness)--completeness — Required when --genome=2
--host — Host filter
--metadata — Metadata filter
Optional
--beast — Required when --metadata is 2 or 3
Consensus FASTA per species — Path to a FASTA file
--c_z = Zaire--c_s = Sudan--c_r = Reston--c_b = Bundibugyo--c_t = Tai Forest--alignment, -a — Alignment type
--proteins, -pr — For alignment 2 only; comma-separated
--phylogeny, -p — Create phylogenetic tree from alignment
-m, --min-cds-fraction — For alignment 2: minimum fraction of reference CDS length to keep a sequence (default 0.5). E.g. 0.2 keeps more partial sequences, 0.8 is stricter.
-t, --threads — Threads for minimap2 and MAFFT (default 1). E.g. -t 64 on a 64-core node. 0 = use all CPUs.
--mab-escape-report — Creates the GP mAb escape report in Escape/ for Ebanga, Inmazeb, and MBP134 (see docs).
--only — Consensus-only mode: skip downloading and run using only --c_z/--c_s/--c_r/--c_b/--c_t FASTA(s).
--remove — Path to file listing sequence IDs/headers to exclude
# Interactive (prompts for all choices)
ebolaseq -o my_analysis
# Non-interactive: Zaire, complete genomes, human, location+date, whole-genome alignment + phylogeny
ebolaseq -o my_analysis --virus 1 --genome 1 --host 1 --metadata 3 --alignment 1 --phylogeny
# Pan-Ebola, protein alignment (L and NP), phylogeny per protein, consensus for Zaire and Sudan
ebolaseq -o my_analysis --virus 6 --genome 1 --host 3 --metadata 4 \
--c_z consensus_zaire.fasta --c_s consensus_sudan.fasta \
--alignment 2 -pr L,NP --phylogeny
# Consensus-only: no download, just analyze your own FASTA(s)
ebolaseq -o my_analysis --only --c_b consensus_bundibugyo.fasta --alignment 2 -pr GP --mab-escape-report
# Exclude specific sequences
ebolaseq -o my_analysis --virus 1 --genome 1 --host 1 --metadata 4 --remove exclude.txt
location.txt.FASTA/, MAFFT/, Trimmed/. For protein: pan/ (or species name) with e.g. L/, NP/ each containing cds_aligned.fasta.gp_mab_escape_report.html, mab_escape_data.xlsx, plus verification files; see docs).--remove with a list of IDs to exclude cell-culture, lab-adapted, or other non-natural sequences.If you use EbolaSeq in your research, please cite:
Jansen, D., & Vercauteren, K. (2025). EbolaSeq: A Command-Line Tool for Downloading, Processing, and Analyzing Ebola Virus Sequences for Phylogenetic Analysis (v0.2.2). Zenodo. https://doi.org/10.5281/zenodo.14851686
This project is licensed under the GNU General Public License v3.0 (GPL-3.0) - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any problems or have questions, please open an issue on GitHub.