Software

  • Topic Ensembles - A Python reference implementation of methods for stable ensemble topic modeling with Non-negative Matrix Factorization
  • Dynamic Topic Modeling - A Python implementation of a new approach for Dynamic Topic Modeling via Non-negative Matrix Factorization.
  • Topic Stability – A Python implementation of an algorithm for using stability analysis to select the number of topics for topic modelling.
  • Unified Graph – A Python implementation of an approach for producing a unified graph from multiple views of a social network.
  • Dynamic Community Finding – A C++ reference implementation of an algorithm for dynamic community tracking, published at ASONAM 2010.

Datasets

  • Stability Topic Corpora – Text corpora for benchmarking stability analysis in topic modeling.
  • Multi-View Twitter Datasets – Four pre-processed Twitter datasets, used for evaluating multi-view network analysis methods.
  • News Curation Datasets – A collection of pre-processed Twitter datasets for evaluating criteria for Twitter user list curation.
  • Irish Blog Network – Text and network data originating from a study of the state of the Irish blogosphere in 2011.
  • Irish Economic Sentiment Collection – A sentiment analysis text corpus, compiled from articles published in three Irish online news sources in 2009.
  • 3Sources Collection – A multi-view text corpus, constructed from news articles from three online news services.
  • 3Sources Collection – Two datasets for evaluating dynamic clustering algorithms, originating from news articles and social bookmarking data.
  • Synthetic Multi-view Datasets – A set of synthetic text datasets for the evaluation of multi-view learning algorithms.
  • CBR Conference Series Dataset – Network and text data constructed from the publications of the CBR conference series (1993-2008).
  • BBC Datasets – Two text corpora consisting of news articles, particularly suited to evaluating cluster analysis techniques.
  • 20 Newsgroups Subsets – A large number of artificially constructed text datasets, originating from the popular 20 Newsgroups corpus.

Slides

  • Tutorial on Topic modelling with Scikit-learn, presented at PyData Dublin (September 2017) [PDF] [Code]
  • Tutorial on Practical Social Network Analysis with Gephi (June 2014) [PDF]
  • Stability Analysis for Topic Models (May 2014) [PDF]