Hypernym Graph

Home

About

The hypernym browser is a research project by Ian Dennis Miller, a PhD Candidate at the University of Toronto. A methods paper in the domain of computational psycholinguistics is forthcoming. Source code for the hypernym browser is available at https://projects.sisrlab.com/idm/hypernym

Hypernym is hosted on SISR Lab, which stands for Social in silico Research.

The following notes applied to an older version of the Hypernym Graph, but they are mostly relevant still.

Methods

  • we began with a simple list of words.
  • some words are near synonyms whereas others are nearly antonyms.
  • we sought to visualize the semantic relatedness of these words.
  • as inspiration, I was thinking of building a “word cloud”.
  • a word cloud consists of a list of words along with a numerical dimension that is used to somehow scale the word list.
  • typically, a dimension like “word frequency in a text corpus” is used as a numerical dimension for constructing a word cloud.
  • because we were starting with a simple list of words instead of a text corpus, I took a different direction.
  • I used the Python Natural Language Toolkit (NLTK) as the computational environment for this work.
  • I am basing word relationships on the Princeton WordNet semantic database.
  • I used Python NetworkX to represent and manipulate the graph data structure.
  • I constructed a hypernym graph (AKA network) for each word that situated the word as a descendant of the semantic category that contains the word.
  • I then merged these networks into a single network.
  • not all words were related through the network; there were some “isolates.”
  • I extracted the largest connected component, which consists of only those words that are somehow connected.
  • I used Python NetworkX to export the graph using the GraphML file format.
  • I imported the GraphML file using Gephi, a network visualization environment.
  • I calculated the modularity of the network, which is sortof like a “neighbourhood detection” algorithm.
  • I used modularity classes to assign colours to the nodes of the network, which visualizes more-closely-related words using the same colour.
  • I applied a “Force Atlas 2” layout with the following non-default options:
    • approximation: 1.5
    • scaling: 4.0
    • dissuade hubs: true
    • prevent overlap: true
  • I visualized node labels with Arial Normal 9pt font.
  • I exported the graph as a PDF file
  • Result: Depending on perspective, the root of the network is either:
    • entity
    • psychological feature
    • abstraction