A graph mining demonstration on low budget hardware

Below a timed demonstration of mining large graphs with minimal resources consumption. At the core of this demonstration there are an experimental custom python module networkdisk developed together with Bruno Guillon and that we will hopefully release soon.

This module provides a backend in SQLite for networkx.

The demonstration propose to find shortest paths in the (undirected) graph of links between wikipedia pages in English. The graph contains \(10^8\) edges and occupy 20GB on disk (with extras information such that title of page and some indexes). Such graph often requires to be explore a decent machine. The operation performed below are done on a Raspberry Pi 4 with almost no memory consumption nor parallelism. The graph can be download as a SQLite database file here.

Below, we display the first path obtained through the exploration as well as a concise DAG structure representing all the shortest path. The explored size (number of nodes that we have seen during the exploration) together with the complete computation time are also display.

The drawing is performed by graphviz thanks to the pygraphviz Python’s module.

Autocompletion is not handle by the Pis and SQLite but by the server and PostgreSQL database for its (simple) support of Full Text search features.