NMF for Topic Modeling


Topic modeling is a key tool for the discovery of hidden structure in large collections of documents. Probabilistic methods, such as Latent Dirichlet allocation (LDA), are often employed by using tools such as the Java MALLET library. However, a highly-effective alternative is to use Non-negative Matrix Factorization (NMF). NMF refers to an unsupervised family of [...]

Continue reading

Stability Analysis for Clustering


A frequent question that arises when applying unsupervised learning methods, such as cluster analysis or topic modeling, is “how many clusters are in my data set”? While domain knowledge can often help to narrow down this choice to a smaller range, choosing one or more specific values of the number of clusters k often presents [...]

Continue reading

Practical Social Network Analysis With Gephi


Recently I presented a tutorial at the VOX-Pol project’s inaugural Summer School in DCU, which covered practical analysis and visualisation of social networks. Since 2010, my application of choice for visualising networks has been the excellent open source Java-based Gephi Platform, developed by Mathieu Bastian and his colleagues. The three screen overview/tabular/preview interface fits well [...]

Continue reading

Exploring the Irish Blogosphere


In 2011, at the ICWSM conference in Barcelona we presented the first quantitative analysis of the Irish blogosphere, working with Karen Wade from the Humanities Institute of Ireland (HII). Since then, there has been considerable change in the use of blogs, particularly with the rapidly increasing popularity of microblogging platforms such as Twitter. In September [...]

Continue reading

Finding Patterns in Movie Lists (Part 2)


In my previous post, I described the collection and initial characterisation of a new dataset, consisting of user curated lists of movies originating from IMDb. Here I provide a more in-depth analysis of the data, by applying techniques from social network analysis and bibliographic analysis to discover latent patterns of movies within the aggregated list [...]

Continue reading

Finding Patterns in Movie Lists (Part 1)


User content curation is becoming an important source of preference data, as well as providing information regarding the items being curated. One popular approach involves the creation of lists of items. This is facilitated on a range of sites, from lists of users on Twitter to lists of locations on Foursquare. In previous work, I [...]

Continue reading