Python API

Dataset Validation

class AMDirT.validate.application.AMDirValidator(schema: AnyStr | BinaryIO | TextIO, dataset: AnyStr | BinaryIO | TextIO)[source]

Bases: DatasetValidator

Validator Class for AncientMetagenomeDir datasets

__init__(schema: AnyStr | BinaryIO | TextIO, dataset: AnyStr | BinaryIO | TextIO)

Dataset validation class

errors

List of DFError objects

Type:: list

dataset_name

Dataset name

Type:: str

schema_name

Schema name

Type:: str

schema

JSON schema

Type:: dict

dataset

Dataset as pandas dataframe

Type:: pd.DataFrame

dataset_json

Dataset as dictionary

Type:: dict

Parameters:

schema (Schema) – Path to schema in json format
dataset (Dataset) – Path to dataset in tsv format

__repr__(): Return repr(self).

__weakref__: list of weak references to the object (if defined)

check_columns() → bool

Checks if dataset has all required columns

Returns:: True if dataset has all required columns, False otherwise
Return type:: bool

check_duplicate_rows() → bool

Checks for duplicated rows in dataset

Returns:: True if dataset has no duplicated rows, False otherwise
Return type:: bool

check_multi_values(column_names: Iterable[str] = ['archive_accession']) → bool[source]: Check for duplicates entries in multi values column :param column_names: List of multi values columns to check for duplications. Defaults to [“archive_accession”]. :type column_names: Iterable[str], optional

check_sample_accession(remote: AnyStr | None = None) → bool[source]

Check that sample accession are valid

Parameters:: remote (AnyStr | None, optional) – Remote to check against. Defaults to None.

cleanup_errors(error: ValidationError) → DFError

Cleans up JSON schema validation errors

Parameters:: error (json_exceptions.ValidationError) – JSON schema validation error
Returns:: Cleaned DataFrame error
Return type:: DFError

dataset_to_json() → dict

Convert dataset from Pandas DataFrame to JSON

Returns:: Dataset as dictionary
Return type:: dict

read_dataset(dataset: AnyStr | BinaryIO | TextIO, schema: dict) → DataFrame

“Read dataset from file or string :param dataset: Path to dataset in tsv format :type dataset: str :param schema: Parsed schema as dictionary (from read_schema) :type schema: dict

Returns:: Dataset as pandas dataframe
Return type:: pd.DataFrame

read_schema(schema: AnyStr | BinaryIO | TextIO) → dict

Read JSON schema from file or string

Parameters:: schema (str) – Path to schema in json format
Returns:: JSON schema
Return type:: dict

to_markdown() → bool

Generate markdown output table for github display

Returns:: True if dataset is valid
Return type:: bool
Raises:: SystemExit – If dataset is invalid

to_rich()

Generate rich output table for console display

Returns:: True if dataset is valid
Return type:: bool
Raises:: SystemExit – If dataset is invalid

validate_schema() → bool

Validate dataset against JSON schema

Returns:: True if dataset is valid, False otherwise
Return type:: bool

Dataset conversion

AMDirT.convert.run_convert(samples, libraries, table_name, tables=None, output='.', bibliography=False, librarymetadata=False, curl=False, aspera=False, eager=False, fetchngs=False, sratoolkit=False, ameta=False, taxprofiler=False, mag=False, verbose=False)[source]

Run the AMDirT conversion application to input samplesheet tables for different pipelines

Parameters:

samples (str) – Path to AncientMetagenomeDir filtered samples tsv file
libraries (str) – Optional path to AncientMetagenomeDir pre-filtered libraries tsv file
table_name (str) – Name of the table of the table to convert
tables (str) – Path to JSON file listing tables
output (str) – Path to output table. Defaults to “.”

Dataset viewing/filtering

AMDirT.viewer.run_app(tables=None, verbose=False)[source]

Run the AMDirT interactive filtering application

Parameters:: tables (str) – path to JSON file listing AncientMetagenomeDir tables

Autofill

AMDirT.autofill.run_autofill(accession, table_name=None, schema=None, dataset=None, sample_output=None, library_output=None, verbose=False)[source]

Autofill the metadata of a table from ENA

Parameters:

accession (tuple(str)) – ENA project accession. Multiple accessions can be space separated (e.g. PRJNA123 PRJNA456)
table_name (str) – Name of the table to be filled
schema (str) – Path to the schema file
dataset (str) – Path to the dataset file
sample_output (str) – Path to the sample output table file
library_output (str) – Path to the library output table file

Returns:

ENA metadata run level table

Return type:

pd.DataFrame

Merge

AMDirT.merge.merge_new_df(dataset, table_type, table_name, markdown, outdir, verbose, schema_check=True, line_dup=True, columns=True)[source]

Merge a new dataset with the remote master dataset

Parameters:

dataset (Path) – Path to new dataset
table_type (str) – Type of table to merge (samples or libraries)
table_name (str) – Kind of table to merge (e.g. ancientmetagenome-hostassociated, ancientmetagenome-environmental, etc.)
markdown (bool) – Log in markdown format
outdir (Path) – Path to output directory
verbose (bool) – Enable verbose mode
schema_check (bool, optional) – Enable schema check. Defaults to True.
line_dup (bool, optional) – Enable line duplication check. Defaults to True.
columns (bool, optional) – Enable columns presence/absence check. Defaults to True.

Raises:

ValueError – Table type must be either ‘samples’ or ‘libraries’
ValueError – Table name not found in AncientMetagenomeDir file
DatasetValidationError – New dataset is not valid

ENA API

class AMDirT.core.ena.ENAPortalAPI[source]

Bases: ENA

Class to interact with the ENA Portal API

__get_json__(url: str) → List[Dict]

Get json content from URL

Parameters:: url (str) – URL to get json content from
Returns:: json content
Return type:: List[Dict]

__init__() → None[source]

ENA Portal API class

base_url

base URL for ENA Portal API

Type:: str

__repr__() → str: Display URL of API documentation

__weakref__: list of weak references to the object (if defined)

doc(dir: str = '.') → None

Get PDF documentation for API

Parameters:: dir (str) – path to output PDF directory

list_fields(result_type: str) → None[source]

Display list of available fields

Parameters:

result_type (str) – A result is a set of data that can
returned (be searched against and) –

Returns:

list of available fields

Return type:

List

list_results() → None[source]

Display list of available results

Returns:: list of available results
Return type:: List[Dict]

query(accession: str, result_type: str = 'read_run', fields: List = ['run_accession', 'sample_accession', 'fastq_ftp', 'fastq_md5', 'fastq_bytes']) → dict[source]

Generate list of runs metadata for a study accession

Parameters:

accession (str) – ENA accession
result_type (str) – A result is a set of data that can
returned (be searched against and) –
fields (List) – list of fields to return

Returns:

run_accession as keys, and metadata as values

Return type:

dict

status() → bool

Check if API is up

Returns:: True if API is up, False otherwise
Return type:: bool

class AMDirT.core.ena.ENABrowserAPI[source]

Bases: ENA

Class to interact with the ENA Browser API

__get_json__(url: str) → List[Dict]

Get json content from URL

Parameters:: url (str) – URL to get json content from
Returns:: json content
Return type:: List[Dict]

__init__() → None[source]

__repr__() → str: Display URL of API documentation

__weakref__: list of weak references to the object (if defined)

doc(dir: str = '.') → None

Get PDF documentation for API

Parameters:: dir (str) – path to output PDF directory

status() → bool

Check if API is up

Returns:: True if API is up, False otherwise
Return type:: bool