Commit a0f6358e authored by Sean Solari's avatar Sean Solari
Browse files

Updates to documentation. Bug fix for input genomes as symlinks and removing...

Updates to documentation. Bug fix for input genomes as symlinks and removing sequences from database.
parent b19d2b6e
......@@ -4,10 +4,26 @@
#### From Bioconda (Recommended)
Conda installation is recommended. Create a new environment before installing expam.
```console
$ conda install -c bioconda expam
```
#### From PyPI
You may need to update g++ resources on your local machine. For linux, you can run the following.
```console
user@computer:~$ pip install expam
$ apt update
$ apt-get install build-essential
```
Then install from PyPi.
```console
$ python3 -m pip install --upgrade pip
$ python3 -m pip install expam
```
#### From GitLab source
......@@ -16,14 +32,26 @@ To install from source, you need a local installation of `Python >=3.8`, as well
and `cython`. There are some commonly encountered problems when installing on Linux, the
most common of which are outlined in the FAQ section below.
You may need to update g++ resources on your local machine. For linux, you can run the following.
```console
$ apt update
$ apt-get install build-essential
```
First download the source code from the GitLab repository.
```console
user@computer:~$ git clone git@github.com:seansolari/expam.git
$ git clone https://github.com/seansolari/expam.git
```
This can then be installed locally by executing the following command from the
source code root:
```console
user@computer:~$ python3 setup.py install
$ cd expam
$ python3 -m pip install --upgrade pip
$ python3 -m pip install -r requirements.txt
$ python3 setup.py install
```
<hr style="border:1px solid #ADD8E6"> </hr>
......@@ -49,7 +77,7 @@ See the Quick Start Tutorial for a guide to expam's basic usage and download lin
This is simply a matter of updating the compiler.
```bash
> sudo apt-get install build-essential
$ sudo apt-get install build-essential
```
<hr>
......@@ -58,7 +86,7 @@ This is simply a matter of updating the compiler.
This simply means you need to install/update the Python development files for version 3.
```bash
> sudo apt-get install python3-dev
$ sudo apt-get install python3-dev
```
(Reference - [SO](https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory/21530768))
......@@ -75,17 +103,17 @@ collection of circumstances.
First update the local installation of Qt.
```bash
> sudo apt-get install qt5-default
$ sudo apt-get install qt5-default
```
Now double-check which version of Qt has been installed.
```bash
> dpkl -l | grep "pyqt5"
$ dpkl -l | grep "pyqt5"
```
Install the corresponding Python interface to Qt.
```bash
> pip3 install pyqt5==5.12
$ pip3 install pyqt5==5.12
```
### OOM Killer
......@@ -97,20 +125,20 @@ To prevent this occurring, make prudent use of the `expam_limit` functionality (
If you suspect that OOM killer has been invoked, this can be confirmed using the following command:
```bash
dmesg -T | egrep -i 'killed process'
$ dmesg -T | egrep -i 'killed process'
```
In the event OOM killer has been called, it is prudent to check
how much shared memory is currently being used by the system.
```bash
df -h /dev/shm
$ df -h /dev/shm
```
If the amount of shared memory used is higher than you would expect, you can first check if there are any residual resources that need to be cleaned up.
```bash
ls -lah /dev/shm
$ ls -lah /dev/shm
```
If there are files starting with 'psm' and owned by you, these may be residual files that need to be cleaned up. Contact your systems administrator to remove these files.
......@@ -120,13 +148,13 @@ It may also be the case that OOM killer has killed some child process, leaving t
To check for sleeping (expam) processes, run
```bash
sudo lsof /dev/shm | grep "expam"
$ sudo lsof /dev/shm | grep "expam"
```
These sleeping processes should then be killed by running
```bash
kill -9 <PID>
$ kill -9 <PID>
```
Confirm that the leaked memory has been freed by running `df -h /dev/shm`.
......@@ -139,7 +167,8 @@ Confirm that the leaked memory has been freed by running `df -h /dev/shm`.
A complete list of available commands can by found by using the `-h`/`--help`
flags.
```console
user@computer:~$ expam --help
$ expam --version
$ expam --help
...
```
......
......@@ -61,24 +61,45 @@ Welcome to the **expam** documentation!
Installation
------------
It is highly recommended that you use a virtual environment (venv, conda, etc.) when installing and executing expam, so as
to isolate its dependencies from the rest of your system.
See the :doc:`dependencies <dependencies>` tutorial for more detailed instructions for creating
a virtual environment and managing expam's code and dependencies.
To confirm an installation, you can run
.. code-block:: console
$ expam --version
$ expam --help
Conda
^^^^^
Conda installation is recommended.
.. code-block:: console
$ conda install expam
$ conda install -c bioconda expam
Python Package Index (pip)
^^^^^^^^^^^^^^^^^^^^^^^^^^
You may need to update *g++* resources on your local machine. For linux, you can run the following.
.. code-block:: console
$ pip install expam
$ apt update
$ apt-get install build-essential
Then install from PyPi.
.. code-block:: console
$ python3 -m pip install --upgrade pip
$ python3 -m pip install expam
From GitHub source
......@@ -88,16 +109,26 @@ To install from source, you need a local installation of Python >=3.8, as well a
There are some commonly encountered problems when installing on Linux, the most common of which are
outlined in the FAQ section on the `GitHub page <https://github.com/seansolari/expam>`_.
You may need to update *g++* resources on your local machine. For linux, you can run the following.
.. code-block:: console
$ apt update
$ apt-get install build-essential
First download the source code from the GitLab repository.
.. code-block:: console
$ git clone git@github.com:seansolari/expam.git
$ git clone https://github.com/seansolari/expam.git
This can then be installed locally by executing the following command from the source code root.
.. code-block:: console
$ cd expam
$ python3 -m pip install --upgrade pip
$ python3 -m pip install -r requirements.txt
$ python3 setup.py install
......
......@@ -148,11 +148,11 @@ Phylogenetic classification
C R8975058804953044791 p10 302 p10:21 p2:59 p10:80 p2:59 p10:21
C R6052336354009855322 p10 302 p2:53 p10:31 p2:72 p10:31 p2:53
The sample summary file is a tab-separated document where the first element of each row is a phylogenetic node/clade, and the corresponding values are contain details of the raw and cumulative classifications and splits at this particular node.
The sample summary file is a tab-separated document where the first element of each row is a phylogenetic node/clade, and the corresponding values contain details of the raw and cumulative classifications and splits at this particular node.
The classified summary file is a tab-separated matrix where each row is a phylogenetic clade, each column is an input sample, and the cell value is the raw counts at this clade. The split summary file is an analogous file that contains the raw split count at any given clade. These two files are formatted such that they will always have the same column and row indices, and in the same order.
The raw read-wise output is a sub-directory containing one output file for each input sample, the kraken-formatted read-wise output.
The raw read-wise output is a sub-directory containing one output file for each input sample, outlining read-wise output in kraken format.
A more comprehensive overview is given :doc:`this tutorial <tutorials/classify>`.
......@@ -160,7 +160,7 @@ A more comprehensive overview is given :doc:`this tutorial <tutorials/classify>`
Convert to taxonomy
-------------------
* First run :code:`expam download_taxonomy` download the taxonomy for all sequences in the database. This will require an internet connection.
* First run :code:`expam download_taxonomy` to download the taxonomy for all sequences in the database. This will require an internet connection.
.. code-block:: console
......
......@@ -4,9 +4,9 @@ Classifying metagenomic samples
The classification algorithm
----------------------------
* There are three classes of results:
* There are three types of results;
1. confident classifications,
1. single-lineage (SL) classifications,
2. *split* classifications,
3. reads that are not like anything currently in the database.
......@@ -49,7 +49,7 @@ The classification algorithm
of the k-mer distribution. This should ignore those lineages in the k-mer distribution that contain
too few k-mers and are most likely due to sequencing error.
.. image:: includes/figure1better.png
.. image:: includes/figure1.png
:width: 500
:align: center
:alt: Classification figure.
......@@ -76,14 +76,14 @@ What do I do with splits?
be filtered out before interpreting the prevalence of clades and species in your sample.
The :code:`--cutoff` flag sets a minimum count that any clade/taxa needs to reach before it is included in the classification results.
The :code:`--cpm` flag sets the same cutoff, but in **count per million** as opposed to a flat cutoff number.
The :code:`--cpm` flag sets the same cutoff, but as a rate of **count required per million reads in the sample**, as opposed to a flat cutoff number.
When both are supplied, the highest of either cutoff is taken. *By default*, **expam** *requires each node to have at least
100 counts per million input reads*.
* With both these mechanisms in place, we can be more confident that high split counts in a particular region of the phylogeny is suggestive of novel sequence in the biological sample.
* The algorithm for classifying splits takes a conservative approach - **those that are interested only in a general profile can feel comfortable simply adding classification and split counts together to produce an overall profile.**
* *Splits* can also be used as a marker for genome discovery however - samples reported with a high split counts are potential targets for culturing novel isolates, a useful tool for groups culturing capability.
* *Splits* can also be used as a marker for genome discovery however - samples reported with a high split counts are potential targets for culturing novel isolates.
Phylogenetic classification results
......
......@@ -63,14 +63,15 @@ Build a tree for the reference database
$ expam set -db test -s 1000
* We'll use :code:`RapidNJ` to make a tree from the :code:`sourmash` distances (see :doc:`here <../dependencies>` to install).
* Run the :code:`tree` command to build the tree.
* We'll first ensure that :code:`sourmash` is installed, before running the :code:`tree` command to build the tree.
.. code-block:: console
$ python3 -m pip install sourmash
$ expam tree -db test --sourmash
...
* :code:`print` and match with my output.
* :code:`print` and match with the following output:
.. code-block:: console
......
......@@ -189,7 +189,7 @@ class BuildCommand(CommandGroup):
self.validate_database()
for directory in self.files:
self.add_sequences(directory)
self.remove_sequences(directory)
def remove_sequences(self, path):
self._modify_config(path, add=False)
......
......@@ -144,6 +144,10 @@ def sort_by_size(dirs):
return os.stat(file_dir).st_size
def _get_file_size(file_dir):
# Check if file is symlink.
if os.path.islink(file_dir):
file_dir = os.path.realpath(file_dir)
for suffix in _file_suffixes:
if _suffix_check(file_dir, suffix):
return _gzip_size(file_dir)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment