Commit 74bc8cd3 authored by Sean Solari's avatar Sean Solari
Browse files

Bug fixes for new factoring

parent 0e01e20b
......@@ -165,7 +165,7 @@ Run metagenomic reads against a succesfully built database. See :doc:`Tutorial 2
.. code-block:: console
$ expam run -db DB_NAME [args...]
$ expam classify -db DB_NAME [args...]
.. option:: -d <file path>, --directory <file path>
......@@ -226,7 +226,7 @@ Run metagenomic reads against a succesfully built database. See :doc:`Tutorial 2
.. code-block:: console
$ expam run ... --group #FF0000 sample_one sample_two
$ expam classify ... --group #FF0000 sample_one sample_two
.. option:: --alpha <float>
......@@ -260,7 +260,7 @@ Example
.. code-block:: console
$ expam run -db DB_NAME -d /path/to/paired/reads --paired --out ~/paired_reads_analysis --taxonomy
$ expam classify -db DB_NAME -d /path/to/paired/reads --paired --out ~/paired_reads_analysis --taxonomy
.. _download taxonomy:
......@@ -305,7 +305,7 @@ Translate phylogenetic classification output to NCBI taxonomy.
Plotting results on phylotree
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Results are automatically visualised on top of a phylogenetic tree when during the :code:`expam run` command,
Results are automatically visualised on top of a phylogenetic tree when during the :code:`expam classify` command,
but can also be done after classification using the :code:`phylotree` command.
.. code-block::
......@@ -393,7 +393,7 @@ Example
.. note::
The :code:`expam_limit` context works the same for any command. :code:`expam build`
can be replaced with :code:`expam run`, or any other command.
can be replaced with :code:`expam classify`, or any other command.
The following is an example of the (tab-separated) log file output:
......
......@@ -3,27 +3,27 @@
A programmatic API to interact with phylogenetic trees, particularly those used in reference databases.
expam.tree.Location
-------------------
expam.tree.location.Location
----------------------------
.. autoclass:: expam.tree.Location
.. autoclass:: expam.tree.location.Location
.. autofunction:: expam.tree.Location.__init__
.. autofunction:: expam.tree.location.Location.__init__
expam.tree.Index
----------------
expam.tree.tree.Index
---------------------
.. autoclass:: expam.tree.Index
.. autoclass:: expam.tree.tree.Index
.. autofunction:: expam.tree.Index.load_newick
.. autofunction:: expam.tree.tree.Index.load_newick
.. autofunction:: expam.tree.Index.from_newick
.. autofunction:: expam.tree.tree.Index.from_newick
Example loading an Index object from a Newick string.
.. code-block:: python
>>> from expam.tree import Index
>>> from expam.tree.tree import Index
>>> tree_string = "(B:6.0,(A:5.0,C:3.0,E:4.0):5.0,D:11.0);"
>>> leaves, index = Index.from_newick(tree_string)
* Initialising node pool...
......@@ -42,13 +42,13 @@ expam.tree.Index
>>> index['A'].coordinate
[0, 0, 1, 0]
.. autofunction:: expam.tree.Index.resolve_polytomies
.. autofunction:: expam.tree.tree.Index.resolve_polytomies
.. autofunction:: expam.tree.Index.coord
.. autofunction:: expam.tree.tree.Index.coord
.. autofunction:: expam.tree.Index.to_newick
.. autofunction:: expam.tree.tree.Index.to_newick
.. autofunction:: expam.tree.Index.yield_child_nodes
.. autofunction:: expam.tree.tree.Index.yield_child_nodes
.. code-block:: python
......@@ -70,9 +70,9 @@ expam.tree.Index
Internal node (branch) names can start with 'p', but this may also be neglected.
.. autofunction:: expam.tree.Index.yield_leaves
.. autofunction:: expam.tree.tree.Index.yield_leaves
.. autofunction:: expam.tree.Index.get_child_nodes
.. autofunction:: expam.tree.tree.Index.get_child_nodes
.. code-block:: python
......@@ -81,4 +81,4 @@ expam.tree.Index
>>> index.get_child_nodes('E')
['E']
.. autofunction:: expam.tree.Index.get_child_leaves
.. autofunction:: expam.tree.tree.Index.get_child_leaves
......@@ -93,7 +93,7 @@ Phylogenetic classification results
.. code-block:: console
$ expam run -db my_database -d /path/to/sample_one.fq --out sample_one
$ expam classify -db my_database -d /path/to/sample_one.fq --out sample_one
* In :code:`./sample_one`, there will be a :code:`phy` subdirectory containing three files:
......@@ -199,11 +199,11 @@ Taxonomic results
.. code-block:: console
$ expam run -d /path/to/reads --out example --taxonomy
$ expam classify -d /path/to/reads --out example --taxonomy
.. code-block:: console
$ expam run -d /path/to/reads --out example_one
$ expam classify -d /path/to/reads --out example_one
$ expam to_taxonomy --out example_one
* Where before the results directory contained only a :code:`phy` subdirectory, it will now also contain a :code:`tax` folder.
......
Graphical output
================
* When a :code:`run` or :code:`to_taxonomy` command is executed, raw summary files are produced (as described in :doc:`Classification <classify>`) and a phylotree is also produced as a graphical depiction of the sample summary.
* When a :code:`classify` or :code:`to_taxonomy` command is executed, raw summary files are produced (as described in :doc:`Classification <classify>`) and a phylotree is also produced as a graphical depiction of the sample summary.
* This graphical representation has some customisable features:
* Multiple samples can be plotted on the same tree, with different colours for different samples.
......@@ -61,8 +61,8 @@ Example of grouping
.. code-block:: console
$ expam run ... --group a1 a2 a3 --group b1 b2 b3
$ expam run ... --group "#FF0000" a1 a2 a3 --group "#00FF00" b1 b2 b3
$ expam classify ... --group a1 a2 a3 --group b1 b2 b3
$ expam classify ... --group "#FF0000" a1 a2 a3 --group "#00FF00" b1 b2 b3
.. note::
......@@ -82,7 +82,7 @@ Example of grouping
.. code-block:: console
$ expam run ... --paired --group a1_f a2_f --group b1_f b2_f
$ expam classify ... --paired --group a1_f a2_f --group b1_f b2_f
Visual flags
^^^^^^^^^^^^
......@@ -110,7 +110,7 @@ Example of colour list
.. code-block:: console
$ expam run ... --colour_list "#FF0000" "#00FF00" "#0000FF"
$ expam classify ... --colour_list "#FF0000" "#00FF00" "#0000FF"
.. _itol integration:
......@@ -126,7 +126,7 @@ folder containing two files:
* :code:`tree.nwk` - Newick format tree that can be inserted into iTOL.
* :code:`style.txt` - An iTOL formatted text document that contains all the information needed for iTOL to style the tree.
For instance, say we previously ran :code:`expam run --out my_run -d /some/samples`, and
For instance, say we previously ran :code:`expam classify --out my_run -d /some/samples`, and
now run :code:`expam phylotree --out my_run --itol`, the corresponding files
would be located at
......
......@@ -97,7 +97,7 @@ Running classifications
* :code:`../expam/test/data/reads/`
* We use the :code:`run` command to classify reads.
* We use the :code:`classify` command to classify reads.
* These are paired reads, but for now we'll treat them as separate.
* We supply the :code:`-o` or :code:`--out` flag to tell *expam* where to save classification results.
......@@ -105,7 +105,7 @@ Running classifications
.. code-block:: console
$ expam run -db test -d /Users/seansolari/Documents/expam/test/data/reads/ --out test/results/unpaired_test
$ expam classify -db test -d /Users/seansolari/Documents/expam/test/data/reads/ --out test/results/unpaired_test
Clearing old log files...
Results directory created at /Users/seansolari/Documents/Databases/test/results/unpaired_test.
Loading the map and phylogeny.
......@@ -202,7 +202,7 @@ Running paired data
.. code-block:: console
$ expam run -db test -d /Users/seansolari/Documents/expam/test/data/reads/ --out test/results/paired_test --paired
$ expam classify -db test -d /Users/seansolari/Documents/expam/test/data/reads/ --out test/results/paired_test --paired
Clearing old log files...
Results directory created at /Users/seansolari/Documents/Databases/test/results/paired_test.
Loading the map and phylogeny.
......
......@@ -99,13 +99,10 @@ def run_classifier(
if taxonomy:
# Attempt to update taxon ids.
tax_obj: TaxonomyNCBI = TaxonomyNCBI(database_config)
tax_obj.accession_to_taxonomy(db_dir)
tax_obj.accession_to_taxonomy()
tax_results_path = os.path.join(out_dir, output_config.tax)
os.mkdir(output_config.tax)
name_to_lineage, taxon_to_rank = tax_obj.load_taxonomy_map(db_dir)
results.to_taxonomy(name_to_lineage, taxon_to_rank, tax_results_path)
name_to_lineage, taxon_to_rank = tax_obj.load_taxonomy_map()
results.to_taxonomy(name_to_lineage, taxon_to_rank)
results.draw_results(itol_mode=itol_mode)
finally:
......@@ -469,7 +466,7 @@ class ClassificationResults:
self.tax_id_hierarchy = {"1": set()} # Map from tax_id -> immediate children.
self.tax_id_pool = ["1"] # Children must appear later than parent this list.
def to_taxonomy(self, name_to_lineage, taxon_to_rank, tax_dir):
def to_taxonomy(self, name_to_lineage, taxon_to_rank):
col_names = ["c_perc", "c_cumul", "c_count", "s_perc", "s_cumul", "s_count", "rank", "scientific name"]
class_counts = pd.read_csv(self.results_config.phy_classified, sep="\t", index_col=0, header=0)
......@@ -544,7 +541,7 @@ class ClassificationResults:
cutoff = max(self.cutoff, (total_counts / 1e6) * self.cpm)
df = df[(df['c_cumul'] > cutoff) | (df['s_cumul'] > cutoff) | (df.index == 'unclassified')]
df.to_csv(os.path.join(tax_dir, sample_name + ".csv"), sep="\t", header=True)
df.to_csv(os.path.join(self.results_config.tax, sample_name + ".csv"), sep="\t", header=True)
#
# Map raw read output to taxonomy.
......
......@@ -7,7 +7,7 @@ import shutil
import matplotlib.pyplot as plt
import numpy as np
from expam.utils import die, ls, make_path_absolute
from expam.utils import die, ls, make_path_absolute, parse_float, parse_int
ExpamOptions = namedtuple(
......@@ -166,42 +166,19 @@ class CommandGroup:
except AttributeError:
raise AttributeError("Command %s not found!" % command)
@staticmethod
def parse_ints(*params):
for param in params:
INVALID_PARAM_MSG = ("Invalid parameter (%s), must be integer!" % str(param))
if param is not None:
try:
# Convert to float.
param = float(param)
except ValueError:
die(INVALID_PARAM_MSG)
# Convert to int and see if the value changes.
new_param = int(param)
if new_param != param:
die(INVALID_PARAM_MSG)
param = new_param
yield param
@staticmethod
def parse_floats(*params):
for param in params:
INVALID_PARAM_MSG = ("Invalid parameter (%s), must be integer!" % str(param))
if param is not None:
try:
param = float(param)
except ValueError:
die(INVALID_PARAM_MSG)
@classmethod
def parse_ints(cls, *params):
if len(params) == 1:
return parse_int(params[0])
else:
return (parse_int(param) for param in params)
yield param
@classmethod
def parse_floats(cls, *params):
if len(params) == 1:
return parse_float(params[0])
else:
return (parse_float(param) for param in params)
@staticmethod
def get_user_confirmation(msg):
......
......@@ -8,7 +8,6 @@ import traceback
import numpy as np
import pandas as pd
from expam.tree import PHYLA_COLOURS
from expam.tree.location import Location
......
......@@ -80,3 +80,31 @@ def is_hex(string):
return True
def parse_int(param):
INVALID_PARAM_MSG = ("Invalid parameter (%s), must be integer!" % str(param))
if param is not None:
try:
# Convert to float.
param = float(param)
except ValueError:
die(INVALID_PARAM_MSG)
# Convert to int and see if the value changes.
new_param = int(param)
if new_param != param:
die(INVALID_PARAM_MSG)
else:
new_param = None
return new_param
def parse_float(param):
INVALID_PARAM_MSG = ("Invalid parameter (%s), must be integer!" % str(param))
if param is not None:
try:
param = float(param)
except ValueError:
die(INVALID_PARAM_MSG)
return param
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment