Derek Greene

Resources

Datasets

  • BBC Datasets – Two text corpora consisting of news articles, particularly suited to evaluating cluster analysis techniques.
  • Stability Topic Corpora – Text corpora for benchmarking stability analysis in topic modeling.
  • Multi-View Twitter Datasets – Four pre-processed Twitter datasets, used for evaluating multi-view network analysis methods.
  • News Curation Datasets – A collection of pre-processed Twitter datasets for evaluating criteria for Twitter user list curation.
  • Irish Blog Network – Text and network data originating from a study of the state of the Irish blogosphere in 2011.
  • Irish Economic Sentiment Collection – A sentiment analysis text corpus, compiled from articles published in three Irish online news sources in 2009.
  • 3Sources Collection – A multi-view text corpus, constructed from news articles from three online news services.
  • 3Sources Collection – Two datasets for evaluating dynamic clustering algorithms, originating from news articles and social bookmarking data.
  • Synthetic Multi-view Datasets – A set of synthetic text datasets for the evaluation of multi-view learning algorithms.
  • CBR Conference Series Dataset – Network and text data constructed from the publications of the CBR conference series (1993-2008).
  • 20 Newsgroups Subsets – A large number of artificially constructed text datasets, originating from the popular 20 Newsgroups corpus.

Software

  • Curatr - Python implementation of Curatr, an online platform which provides access to the British Library Digital Collection, developed as part of the VICTEUR Project
  • Topic Ensembles - A Python reference implementation of methods for stable ensemble topic modeling with Non-negative Matrix Factorization.
  • Dynamic Topic Modeling - A Python implementation of a new approach for Dynamic Topic Modeling via Non-negative Matrix Factorization.
  • Topic Stability – A Python implementation of an algorithm for using stability analysis to select the number of topics for topic modelling.
  • Unified Graph – A Python implementation of an approach for producing a unified graph from multiple views of a social network.
  • Dynamic Community Finding – A C++ reference implementation of an algorithm for dynamic community tracking, published at ASONAM 2010.

Slides

  • “Constructing Social Networks of Irish and British Fiction”, presented at Symposium on Digital Culture, Big Data and Society (February 2018) [PDF]
  • Tutorial on “Topic modelling with Scikit-learn”, presented at PyData Dublin (September 2017) [PDF] [Code]
  • Tutorial on “Practical Social Network Analysis with Gephi” (June 2014) [PDF]
  • “Stability Analysis for Topic Models” (May 2014) [PDF]