Python API

Dataset Validation

class AMDirT.validate.application.AMDirValidator(schema: AnyStr | BinaryIO | TextIO, dataset: AnyStr | BinaryIO | TextIO)[source]

Bases: DatasetValidator

Validator Class for AncientMetagenomeDir datasets

__init__(schema: AnyStr | BinaryIO | TextIO, dataset: AnyStr | BinaryIO | TextIO)

Dataset validation class

errors

List of DFError objects

Type:

list

dataset_name

Dataset name

Type:

str

schema_name

Schema name

Type:

str

schema

JSON schema

Type:

dict

dataset

Dataset as pandas dataframe

Type:

pd.DataFrame

dataset_json

Dataset as dictionary

Type:

dict

Parameters:
  • schema (Schema) – Path to schema in json format

  • dataset (Dataset) – Path to dataset in tsv format

__repr__()

Return repr(self).

__weakref__

list of weak references to the object (if defined)

check_columns() bool

Checks if dataset has all required columns

Returns:

True if dataset has all required columns, False otherwise

Return type:

bool

check_duplicate_rows() bool

Checks for duplicated rows in dataset

Returns:

True if dataset has no duplicated rows, False otherwise

Return type:

bool

check_multi_values(column_names: Iterable[str] = ['archive_accession']) bool[source]

Check for duplicates entries in multi values column :param column_names: List of multi values columns to check for duplications. Defaults to [“archive_accession”]. :type column_names: Iterable[str], optional

check_sample_accession(remote: AnyStr | None = None) bool[source]

Check that sample accession are valid

Parameters:

remote (AnyStr | None, optional) – Remote to check against. Defaults to None.

cleanup_errors(error: ValidationError) DFError

Cleans up JSON schema validation errors

Parameters:

error (json_exceptions.ValidationError) – JSON schema validation error

Returns:

Cleaned DataFrame error

Return type:

DFError

dataset_to_json() dict

Convert dataset from Pandas DataFrame to JSON

Returns:

Dataset as dictionary

Return type:

dict

read_dataset(dataset: AnyStr | BinaryIO | TextIO, schema: dict) DataFrame

“Read dataset from file or string :param dataset: Path to dataset in tsv format :type dataset: str :param schema: Parsed schema as dictionary (from read_schema) :type schema: dict

Returns:

Dataset as pandas dataframe

Return type:

pd.DataFrame

read_schema(schema: AnyStr | BinaryIO | TextIO) dict

Read JSON schema from file or string

Parameters:

schema (str) – Path to schema in json format

Returns:

JSON schema

Return type:

dict

to_markdown() bool

Generate markdown output table for github display

Returns:

True if dataset is valid

Return type:

bool

Raises:

SystemExit – If dataset is invalid

to_rich()

Generate rich output table for console display

Returns:

True if dataset is valid

Return type:

bool

Raises:

SystemExit – If dataset is invalid

validate_schema() bool

Validate dataset against JSON schema

Returns:

True if dataset is valid, False otherwise

Return type:

bool

Dataset conversion

AMDirT.convert.run_convert(samples, table_name, tables=None, output='.', bibliography=False, librarymetadata=False, curl=False, aspera=False, eager=False, fetchngs=False, ameta=False, taxprofiler=False, mag=False, verbose=False)[source]

Run the AMDirT conversion application to input samplesheet tables for different pipelines

Parameters:
  • tables (str) – Path to JSON file listing tables

  • samples (str) – Path to AncientMetagenomeDir filtered samples tsv file

  • table_name (str) – Name of the table of the table to convert

  • output (str) – Path to output table. Defaults to “.”

Dataset viewing/filtering

AMDirT.viewer.run_app(tables=None, verbose=False)[source]

Run the AMDirT interactive filtering application

Parameters:

tables (str) – path to JSON file listing AncientMetagenomeDir tables

Autofill

AMDirT.autofill.run_autofill(accession, table_name=None, schema=None, dataset=None, sample_output=None, library_output=None, verbose=False)[source]

Autofill the metadata of a table from ENA

Parameters:
  • accession (tuple(str)) – ENA project accession. Multiple accessions can be space separated (e.g. PRJNA123 PRJNA456)

  • table_name (str) – Name of the table to be filled

  • schema (str) – Path to the schema file

  • dataset (str) – Path to the dataset file

  • sample_output (str) – Path to the sample output table file

  • library_output (str) – Path to the library output table file

Returns:

ENA metadata run level table

Return type:

pd.DataFrame

Merge

AMDirT.merge.merge_new_df(dataset, table_type, table_name, markdown, outdir, verbose, schema_check=True, line_dup=True, columns=True)[source]

Merge a new dataset with the remote master dataset

Parameters:
  • dataset (Path) – Path to new dataset

  • table_type (str) – Type of table to merge (samples or libraries)

  • table_name (str) – Kind of table to merge (e.g. ancientmetagenome-hostassociated, ancientmetagenome-environmental, etc.)

  • markdown (bool) – Log in markdown format

  • outdir (Path) – Path to output directory

  • verbose (bool) – Enable verbose mode

  • schema_check (bool, optional) – Enable schema check. Defaults to True.

  • line_dup (bool, optional) – Enable line duplication check. Defaults to True.

  • columns (bool, optional) – Enable columns presence/absence check. Defaults to True.

Raises:
  • ValueError – Table type must be either ‘samples’ or ‘libraries’

  • ValueError – Table name not found in AncientMetagenomeDir file

  • DatasetValidationError – New dataset is not valid

ENA API

class AMDirT.core.ena.ENAPortalAPI[source]

Bases: ENA

Class to interact with the ENA Portal API

__get_json__(url: str) List[Dict]

Get json content from URL

Parameters:

url (str) – URL to get json content from

Returns:

json content

Return type:

List[Dict]

__init__() None[source]

ENA Portal API class

base_url

base URL for ENA Portal API

Type:

str

__repr__() str

Display URL of API documentation

__weakref__

list of weak references to the object (if defined)

doc(dir: str = '.') None

Get PDF documentation for API

Parameters:

dir (str) – path to output PDF directory

list_fields(result_type: str) None[source]

Display list of available fields

Parameters:
  • result_type (str) – A result is a set of data that can

  • returned (be searched against and) –

Returns:

list of available fields

Return type:

List

list_results() None[source]

Display list of available results

Returns:

list of available results

Return type:

List[Dict]

query(accession: str, result_type: str = 'read_run', fields: List = ['run_accession', 'sample_accession', 'fastq_ftp', 'fastq_md5', 'fastq_bytes']) dict[source]

Generate list of runs metadata for a study accession

Parameters:
  • accession (str) – ENA accession

  • result_type (str) – A result is a set of data that can

  • returned (be searched against and) –

  • fields (List) – list of fields to return

Returns:

run_accession as keys, and metadata as values

Return type:

dict

status() bool

Check if API is up

Returns:

True if API is up, False otherwise

Return type:

bool

class AMDirT.core.ena.ENABrowserAPI[source]

Bases: ENA

Class to interact with the ENA Browser API

__get_json__(url: str) List[Dict]

Get json content from URL

Parameters:

url (str) – URL to get json content from

Returns:

json content

Return type:

List[Dict]

__init__() None[source]
__repr__() str

Display URL of API documentation

__weakref__

list of weak references to the object (if defined)

doc(dir: str = '.') None

Get PDF documentation for API

Parameters:

dir (str) – path to output PDF directory

status() bool

Check if API is up

Returns:

True if API is up, False otherwise

Return type:

bool