# miscellaneous

This page provides additional documentation in regards to amdirt that is not directly related to the functioning of amdirt commands themselves.

## viewer

### Downloading the sequencing data of selected libraries

amdirt provides three different methods to download the sequencing data of selected libraries from public archives:

- direct download from the FTP server using curl
- direct download via the FASP protocol using ASPERA
- indirect download via the Nextflow pipeline nf-core/fetchngs

#### Downloading via curl

[cURL](https://curl.se/) is a well established and popular tool curl for command line or script based data transfer. It is found on most modern operating UNIX based systems, and therefore it is the default downloading tool in amdirt. However, it is the slowest of the three options as it runs over a standard HTTP/FTP connection, and is not parallelised (each file is downloaded sequentially).

In most cases you can assume it is already installed on your machine, however you can check you have cURL installed by running:

```bash
which curl
```

the output of which, should be something like `/usr/bin/curl`. If you get no output, you will need to look into installing the tool.

If you select `curl` in `amdirt viewer` or `amdirt convert`, you will recieve a bash script that contains curl command(s).

It will look like this:

```bash
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR533/006/SRR5332466/SRR5332466.fastq.gz -o SRR5332466.fastq.gz
```

where you have the URL and the output file to save to.

By running `bash ancientMetagenomeDir_curl_download_script.sh`, the script will download each FASTQ file in the script one by one.

#### Downloading via the FASP protocol using ASPERA

[FASP](https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol) is a specific protocol that allows the download of large data files at a speed that is usually much higher than when downloading from the FTP server. It is particularly suitable when downloading very large data files. While much faster than `curl`, the `aspera` bash script generated by amdirt still runs sequentially.

Prior to be able to download via this method, make sure that you have the ASPERA connect installed on your system (using `which ascp`). If this is not the case, please refer to this [installation guide](https://www.ibm.com/docs/en/aspera-connect/4.1?topic=suc-installation#installation__section_zfj_wpq_ghb) and download the binary from [here](https://www.ibm.com/aspera/connect/). You can also install this via conda (`conda create -n aspera -c HCC aspera-cli`)

`amdirt viewer`/`convert` will return a script that for each sequencing file looks like this following the recommendation from [ENA](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html#using-aspera):

````bash
ascp -QT -l 300m -P 33001 -i path/to/aspera/installation/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:path/to/sequencing/file local/target/directory
```

amdirt will automatically replace `path/to/sequencing/file` to match the paths for the libraries that were selected. It will also set the `local/target/directory` to the current directory.

However, you will need to set the `path/to/aspera/installation` prior to running this. To make it more convenient, we opted for using the environment variable `ASPERA_PATH` that has to be set in the shell prior to running the script. Therefore, run:

```bash
ASPERA_PATH="$HOME/.aspera/cli"
```

> ⚠️ In case your institute blocks the port 33001, you will need to change the parameter `-P 33001` to another port that is not blocked.
````

#### Downloading via nf-core/fetchngs

[nf-core/fetchngs](https://nf-co.re/fetchngs) is a Nextflow bioinformatics pipeline to fetch metadata and raw FastQ files from both public and private databases. At present, the pipeline supports SRA / ENA / DDBJ / Synapse ids. While it still runs over HTTPS, it supports directly downloading via AWS S3 servers and is highly parallelised - downloading multiple files at once.

You will need to install [Nextflow](https:/nextflow.io) and have it configured for your machine or cluster, as well as a software environment system such as conda, docker, or singularity.

The output from `amdirt viewer`/`convert` will contain a list of accessions in a format compatible with the nf-core/fetchngs input file.

```bash
nextflow pull nf-core/fetchngs
nextflow run nf-core/fetchngs --input AncientMetagenomeDir_nf_core_fetchngs_input_table.tsv`
```