Python API
Dataset Validation
- class amdirt.validate.application.AMDirValidator(schema: AnyStr | BinaryIO | TextIO, dataset: AnyStr | BinaryIO | TextIO)[source]
Bases:
DatasetValidator
Validator Class for AncientMetagenomeDir datasets
- __init__(schema: AnyStr | BinaryIO | TextIO, dataset: AnyStr | BinaryIO | TextIO)
Dataset validation class
- errors
List of DFError objects
- Type:
list
- dataset_name
Dataset name
- Type:
str
- schema_name
Schema name
- Type:
str
- schema
JSON schema
- Type:
dict
- dataset
Dataset as pandas dataframe
- Type:
pd.DataFrame
- dataset_json
Dataset as dictionary
- Type:
dict
- Parameters:
schema (Schema) – Path to schema in json format
dataset (Dataset) – Path to dataset in tsv format
- __repr__()
Return repr(self).
- __weakref__
list of weak references to the object
- check_columns() bool
Checks if dataset has all required columns
- Returns:
True if dataset has all required columns, False otherwise
- Return type:
bool
- check_duplicate_rows() bool
Checks for duplicated rows in dataset
- Returns:
True if dataset has no duplicated rows, False otherwise
- Return type:
bool
- check_multi_values(column_names: Iterable[str] = ['archive_accession']) bool [source]
Check for duplicates entries in multi values column :param column_names: List of multi values columns to check for duplications. Defaults to [“archive_accession”]. :type column_names: Iterable[str], optional
- check_sample_accession(remote: AnyStr | None = None) bool [source]
Check that sample accession are valid
- Parameters:
remote (AnyStr | None, optional) – Remote to check against. Defaults to None.
- cleanup_errors(error: ValidationError) DFError
Cleans up JSON schema validation errors
- Parameters:
error (json_exceptions.ValidationError) – JSON schema validation error
- Returns:
Cleaned DataFrame error
- Return type:
DFError
- dataset_to_json() dict
Convert dataset from Pandas DataFrame to JSON
- Returns:
Dataset as dictionary
- Return type:
dict
- read_dataset(dataset: AnyStr | BinaryIO | TextIO, schema: dict) DataFrame
“Read dataset from file or string :param dataset: Path to dataset in tsv format :type dataset: str :param schema: Parsed schema as dictionary (from read_schema) :type schema: dict
- Returns:
Dataset as pandas dataframe
- Return type:
pd.DataFrame
- read_schema(schema: AnyStr | BinaryIO | TextIO) dict
Read JSON schema from file or string
- Parameters:
schema (str) – Path to schema in json format
- Returns:
JSON schema
- Return type:
dict
- to_markdown() bool
Generate markdown output table for github display
- Returns:
True if dataset is valid
- Return type:
bool
- Raises:
SystemExit – If dataset is invalid
- to_rich()
Generate rich output table for console display
- Returns:
True if dataset is valid
- Return type:
bool
- Raises:
SystemExit – If dataset is invalid
- validate_schema() bool
Validate dataset against JSON schema
- Returns:
True if dataset is valid, False otherwise
- Return type:
bool
Dataset conversion
- amdirt.convert.run_convert(samples, libraries, table_name, tables=None, output='.', bibliography=False, librarymetadata=False, curl=False, aspera=False, eager=False, fetchngs=False, sratoolkit=False, ameta=False, taxprofiler=False, mag=False, verbose=False)[source]
Run the amdirt conversion application to input samplesheet tables for different pipelines
- Parameters:
samples (str) – Path to AncientMetagenomeDir filtered samples tsv file
libraries (str) – Optional path to AncientMetagenomeDir pre-filtered libraries tsv file
table_name (str) – Name of the table of the table to convert
tables (str) – Path to JSON file listing tables
output (str) – Path to output table. Defaults to “.”
Dataset viewing/filtering
Autofill
- amdirt.autofill.run_autofill(accession, table_name=None, schema=None, dataset=None, sample_output=None, library_output=None, verbose=False, output_ena_table=None)[source]
Autofill the metadata of a table from ENA
- Parameters:
accession (tuple(str)) – ENA project accession. Multiple accessions can be space separated (e.g. PRJNA123 PRJNA456)
table_name (str) – Name of the table to be filled
schema (str) – Path to the schema file
dataset (str) – Path to the dataset file
sample_output (str) – Path to the sample output table file
library_output (str) – Path to the library output table file
- Returns:
ENA metadata run level table
- Return type:
pd.DataFrame
Merge
- amdirt.merge.merge_new_df(dataset, table_type, table_name, markdown, outdir, verbose, schema_check=True, line_dup=True, columns=True)[source]
Merge a new dataset with the remote master dataset
- Parameters:
dataset (Path) – Path to new dataset
table_type (str) – Type of table to merge (samples or libraries)
table_name (str) – Kind of table to merge (e.g. ancientmetagenome-hostassociated, ancientmetagenome-environmental, etc.)
markdown (bool) – Log in markdown format
outdir (Path) – Path to output directory
verbose (bool) – Enable verbose mode
schema_check (bool, optional) – Enable schema check. Defaults to True.
line_dup (bool, optional) – Enable line duplication check. Defaults to True.
columns (bool, optional) – Enable columns presence/absence check. Defaults to True.
- Raises:
ValueError – Table type must be either ‘samples’ or ‘libraries’
ValueError – Table name not found in AncientMetagenomeDir file
DatasetValidationError – New dataset is not valid
ENA API
- class amdirt.core.ena.ENAPortalAPI[source]
Bases:
ENA
Class to interact with the ENA Portal API
- __get_json__(url: str) List[Dict]
Get json content from URL
- Parameters:
url (str) – URL to get json content from
- Returns:
json content
- Return type:
List[Dict]
- __repr__() str
Display URL of API documentation
- __weakref__
list of weak references to the object
- doc(dir: str = '.') None
Get PDF documentation for API
- Parameters:
dir (str) – path to output PDF directory
- list_fields(result_type: str) None [source]
Display list of available fields
- Parameters:
result_type (str) – A result is a set of data that can
returned (be searched against and)
- Returns:
list of available fields
- Return type:
List
- list_results() None [source]
Display list of available results
- Returns:
list of available results
- Return type:
List[Dict]
- query(accession: str, result_type: str = 'read_run', fields: List = ['run_accession', 'sample_accession', 'fastq_ftp', 'fastq_md5', 'fastq_bytes']) dict [source]
Generate list of runs metadata for a study accession
- Parameters:
accession (str) – ENA accession
result_type (str) – A result is a set of data that can
returned (be searched against and)
fields (List) – list of fields to return
- Returns:
run_accession as keys, and metadata as values
- Return type:
dict
- status() bool
Check if API is up
- Returns:
True if API is up, False otherwise
- Return type:
bool
- class amdirt.core.ena.ENABrowserAPI[source]
Bases:
ENA
Class to interact with the ENA Browser API
- __get_json__(url: str) List[Dict]
Get json content from URL
- Parameters:
url (str) – URL to get json content from
- Returns:
json content
- Return type:
List[Dict]
- __repr__() str
Display URL of API documentation
- __weakref__
list of weak references to the object
- doc(dir: str = '.') None
Get PDF documentation for API
- Parameters:
dir (str) – path to output PDF directory
- status() bool
Check if API is up
- Returns:
True if API is up, False otherwise
- Return type:
bool