Class NameCentricSynonymIndexGenerator
- java.lang.Object
-
- de.julielab.jules.ae.genemapping.resources.NameCentricSynonymIndexGenerator
-
public class NameCentricSynonymIndexGenerator extends java.lang.ObjectSynonym or gene name centric indexer, new as of March 11, 2019. The idea is to save storage and gain more focused gene mention search results by not indexing each synonym of each gene but group the gene ids by all possible synonyms. Thus, each synonym is only stored once and references the list of genes it may refer to, immediately showing the ambiguity of the synonym.
-
-
Constructor Summary
Constructors Constructor Description NameCentricSynonymIndexGenerator(java.io.File dictFile, java.io.File indexFile)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcreateIndex()Creates the synonym index.static voidmain(java.lang.String[] args)To execute the ContextIndexGenerator start it with the following command-line arguments:
arg0: path to resources directory arg1: path to synonym indices directory
-
-
-
Constructor Detail
-
NameCentricSynonymIndexGenerator
public NameCentricSynonymIndexGenerator(java.io.File dictFile, java.io.File indexFile) throws java.io.FileNotFoundException, java.io.IOException- Parameters:
dictFile- A file containing gene or protein names / synonyms and their respective NCBI Gene or UniProt ID. No term normalization is expected for this dictionary.indexFile- The directory where the name / synonym index will be written to.- Throws:
java.io.FileNotFoundExceptionjava.io.IOException
-
-
Method Detail
-
main
public static void main(java.lang.String[] args)
To execute the ContextIndexGenerator start it with the following command-line arguments:
arg0: path to resources directory arg1: path to synonym indices directory- Parameters:
args-
-
createIndex
public void createIndex() throws java.io.IOExceptionCreates the synonym index. Each unique synonym is indexed in a document of its own. Each such document has a number of fields for each gene that has the current synonym and lists the gene ID, its tax ID (if the tax ID mapping is given) and the "priority" that the synonym has for the gene. The priority aims to describe the reliability of the source given the respective synonym. Higher numbers mean a lower priority. The official gene symbol has priority -1.- Throws:
java.io.IOException
-
-