* Unzip these metagenomic reads into a new folder, which we will call :code:`reads`. Assuming you have downloaded and moved the above reads into your home directory, run
Loading reads from /Users/ssol0002/Documents/Projects/pam/test/data/reads/GCF_000005845.2_ASM584v2_genomic.fna.gz_2.fa, /Users/ssol0002/Documents/Projects/pam/test/data/reads/GCF_000005845.2_ASM584v2_genomic.fna.gz_1.fa...
Could not import ete3 plotting modules! Error raised:
Traceback (most recent call last):
File "/Users/ssol0002/Documents/Projects/pam/src/expam/tree/tree.py", line 622, in draw_tree
import ete3.coretype.tree
ModuleNotFoundError: No module named 'ete3'
Skipping plotting...
Could not import ete3 plotting modules! Error raised:
Traceback (most recent call last):
File "/Users/ssol0002/Documents/Projects/pam/src/expam/tree/tree.py", line 622, in draw_tree
import ete3.coretype.tree
ModuleNotFoundError: No module named 'ete3'
Skipping plotting...
.. note::
Note that **expam** tried to plot the results on a phylotree, but since we did not have the ete3 module installed,
it simply skipped plotting the results. This is the expected behaviour to let you know **expam** was not able
to produce a graphical picture for your results.
* The phylogenetic classifications will be located at :code:`~/my_run/phy`, and will contain four files:
* :code:`~/my_run/raw` - raw read-wise classifications. There will be a single raw read-wise output file, :code:`~/my_run/raw/GCF_000005845.2_ASM584v2_genomic.gz_1.csv`.
.. code-block::
C R4825323246286034638 p2 302 p2:240
C R4280015672552393909 p10 302 p10:240
C R5925738157954038177 p10 302 p1:5 p10:16 p2:198 p10:16 p1:5
C R3237657389899545456 p10 302 p2:85 p10:31 p2:8 p10:31 p2:85
C R8975058804953044791 p10 302 p10:21 p2:59 p10:80 p2:59 p10:21
C R6052336354009855322 p10 302 p2:53 p10:31 p2:72 p10:31 p2:53
The sample summary file is a tab-separated document where the first element of each row is a phylogenetic node/clade, and the corresponding values are contain details of the raw and cumulative classifications and splits at this particular node.
The classified summary file is a tab-separated matrix where each row is a phylogenetic clade, each column is an input sample, and the cell value is the raw counts at this clade. The split summary file is an analogous file that contains the raw split count at any given clade. These two files are formatted such that they will always have the same column and row indices, and in the same order.
The raw read-wise output is a sub-directory containing one output file for each input sample, the kraken-formatted read-wise output.
A more comprehensive overview is given :doc:`this tutorial <tutorials/classify>`.
Convert to taxonomy
-------------------
* First run :code:`expam download_taxonomy` download the taxonomy for all sequences in the database. This will require an internet connection.
.. code-block:: console
$ expam download_taxonomy -db ~/test
Posting 6 UIDs to NCBI Entrez nuccore.
Received 6 response(s) for ESummary TaxID request!
Posting 6 UIDs to NCBI Entrez taxonomy.
Received 6 response(s) for EFetch Taxon request!
Taxonomic lineages written to ~/test/phylogeny/taxid_lineage.csv!
Taxonomic ranks written to ~/test/phylogeny/taxa_rank.csv!
* We saved our previous classification results to :code:`~/my_run`. This is the directory we pass to :code:`expam to_taxonomy` to convert phylogenetic classifications to taxonomy.
.. code-block:: console
$ expam to_taxonomy -db test --out ~/my_run
* Initialising node pool...
* Checking for polytomies...
Polytomy (degree=3) detected! Resolving...
* Finalising index...
* There will now be taxonomic output files located in :code:`~/my_run/tax/`, analogous to each of the files present in the phylogenetic output, with the exception of :code:`classified.tsv` and :code:`split.tsv` - only the sample summaries and raw read-wise output are converted.
* :code:`~/my_run/tax/raw/GCF_000005845.2_ASM584v2_genomic.gz_1.csv` - taxonomic read-wise output. The second column is the read header, the third is the assigned taxid, and the fourth is the length of the read. Observe length of 300 for paired-end 150bp reads.
.. code-block::
C R4825323246286034638 2 302
C R4280015672552393909 511145 302
C R5925738157954038177 511145 302
C R3237657389899545456 511145 302
C R6111671585932593081 511145 302
C R4574482278193488645 511145 302
C R8975058804953044791 511145 302
C R6052336354009855322 511145 302
C R4978825024774141837 2 302
C R7016203356160788326 511145 302
The complete comprehensive overview is given :doc:`this tutorial <tutorials/classify>`.
@@ -90,6 +90,32 @@ Build a tree for the reference database
sequences --> 6
Build the database
------------------
* Now that we have a distance-tree for our added reference sequences, we can build the database.
.. code-block:: console
$ expam build -db test
Clearing old log files...
Importing phylogeny...
* Initialising node pool...
* Checking for polytomies...
Polytomy (degree=3) detected! Resolving...
* Finalising index...
Creating LCA matrix...
Extracting sequences from /Users/ssol0002/Documents/Projects/pam/test/data/sequences/GCF_000008725.1_ASM872v1_genomic.fna.gz...
Extracting sequences from /Users/ssol0002/Documents/Projects/pam/test/data/sequences/GCF_000007765.2_ASM776v2_genomic.fna.gz...
Extracting sequences from /Users/ssol0002/Documents/Projects/pam/test/data/sequences/GCF_000005845.2_ASM584v2_genomic.fna.gz...
Extracting sequences from /Users/ssol0002/Documents/Projects/pam/test/data/sequences/GCF_000006925.2_ASM692v2_genomic.fna.gz...
Extracting sequences from /Users/ssol0002/Documents/Projects/pam/test/data/sequences/GCF_000006945.2_ASM694v2_genomic.fna.gz...
Extracting sequences from /Users/ssol0002/Documents/Projects/pam/test/data/sequences/GCF_000006765.1_ASM676v1_genomic.fna.gz...
expam: 42.359643852s
PID - 65856 dying...
Running classifications
-----------------------
...
...
@@ -101,8 +127,6 @@ Running classifications
* These are paired reads, but for now we'll treat them as separate.
* We supply the :code:`-o` or :code:`--out` flag to tell *expam* where to save classification results.
* *expam* automatically creates a :code:`results` subdirectory in the database directory, which is a convenient but not necessarily required place to keep classification results related to this database.