validate
What
The purpose of the validate
command is to check that an AncientMetagenomeDir metadata file confirms the specifications of the project.
When
In most cases, validate
is run for you via automatic Pull Request checks on the AncientMetagenomeDir GitHub repository.
However in some cases, you may wish to run this manually yourself before opening your pull request.
How
Before you run validate
, you normally need a local copy of AncientMetagenomeDir, but primarily the AncientMetagenomeDir .tsv
file, and the corresponding .json
schema file.
To run, execute the following command:
amdirt validate ancientmetagenome-hostassociated_samples.tsv ancientmetagenome-hostassociated_samples_schema.json
With the .tsv
file in the first positional argument, and the .json
schema in the second position.
validate
can be run on both sample and library tables.
By default, no schema validation checks will be performed however it will check the files can be loaded as expected.
The tool offers different types of validations that are activated by options.
schema validation (
-s
): compares the TSV against the schemaduplicate line checking (
-d
): checks there are no duplicate linescolumn presence/absence checking (
-c
): checks that all expected columns are presentdoi duplication checking (
-i
): checks there are not the same DOI across two different publicationsENA accession validation (
-a
): checks the ENA accession code is valid and corresponds to the PRJ associated codemulti-value column checking: allows you to check there are no duplicates within a comma separate column with multiple values (e.g. sample accessions)
You can also change the output console formatting from a python-rich format (default) to a markdown formatting with (-m
). The markdown formatting is primarily for the AncientMetagenomeDir automatic checks on GitHub.
For example, to check all columns are present, there are no duplicate lines, and the table matches the schema you can run
amdirt validate -s -d ancientmetagenome-hostassociated_samples.tsv ancientmetagenome-hostassociated_samples_schema.json
If the table is valid you will get a corresponding message as follows:
2022-12-16 11:15:41.109 INFO amdirt: ancientmetagenome-hostassociated_samples.tsv is valid
If the table is invalid, you will get a table containing a list of the found errors:
amdirt Validation Report of ancientmetagenome-hostassociated_samples.tsv against ancientmetagenome-hostassociated_samples_schema.json
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Error ┃ Source ┃ Column ┃ Row ┃ Message ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Schema Validation Error │ Homo saens │ sample_host │ 3 │ 'Homo saens' is not an accepted value. │
│ │ │ │ │ Please check │
│ │ │ │ │ https://spaam-community.github.io/AncientMetagenomeDir/assets/enums/sample_host.json │
│ Schema Validation Error │ -1243.386 │ latitude │ 5 │ -1243.386 is less than the minimum of -90 │
└─────────────────────────┴────────────┴─────────────┴─────┴───────────────────────────────────────────────────────────────────────────────────────────────┘
Invalid dataset ancientmetagenome-hostassociated_samples.tsv
Where Source
is the problematic cell entry, the Column
and Row
where the error is found, and the reason why it is problematic in the Message
.
⚠️ The row index includes the header! For example row 3 correpsonds to the the second sample row in the table.
You can then use this information to correct your table before opening your GitHub pull request to the AncientMetagenomeDir repository.