Metadata retrieval

chaininglib.search.metadata.get_available_metadata(resource_name, resource_type=None)[source]

Return all possible metadata fields for a lexicon or corpus

Parameters:
  • resource_name – Name of the lexicon or corpus
  • resource_type – (optional) One of ‘lexicon’ or ‘corpus’. Can be used to disambiguate when resource name can be both a lexicon or corpus
Returns:

A dictionary of lists of document and token metadata (corpus) or a list of metadata fields (lexicon)

>>> corpus_metadata = get_available_metadata("zeebrieven")
>>> print(corpus_metadata)
>>> {'document': ['aantal_paginas', 'aantal_woorden', ...,  'witnessYear_from', 'witnessYear_to'], 'token': ['word', 'lemma', 'pos', 'punct', 'starttag']}
>>> lexicon_metadata = get_available_metadata("molex")
>>> print(lexicon_metadata)
>>> ['lemEntryId', 'lemma', 'lemPos', 'wordformId', 'wordform', 'hyphenation', 'wordformPos', 'Gender', 'Number']