Mining patterns and structure from "the greatest composer to have ever lived"
"Pattern Mining" is the automated process of structure discovery in a data corpus. Our research into the mathematics of pattern mining has yielded a broad range of powerful techniques that can be usefully applied in a highly domain-agnostic way. Here, we present a pattern-mined Bach fugue. The fugue is a highly structured, geometric musical form popular in the Baroque era of classical music, making it an interesting and productive but non-standard candidate for testing our methods.
The fragments identified by our algorithm can be regarded as priviledged atomic structures that stably repeat across the composition. They were discovered via fully automated means, with no human hand-coding or intervention. Clicking on patterns in the piano roll will play how they sound - it is interesting that the identified patterns correlate strongly to the salient melodic themes to which the listener attends in standard listening. The algorithm also discovers hierarchical or containment relations between patterns.
Due to the generality of their mathematical formulation, our techniques are domain-agnostic. For example, we believe future directions for research include:
- Code analysis: Functions formally relate to the tokens they contain. The present methods can reveal which functions share implementation patterns, identify refactoring opportunities, and expose architectural structure.
- Biological sequences: Proteins and structural motifs: protein families may be organised by shared structure, revealing evolutionary relationships.
- Knowledge graphs: Entities are objects, relations are attributes. Our methods reveal implicit type hierarchies and structural regularities.
Extensions to Neural Systems
Our mathematics extends naturally into neural paradigms. Any system that learns representations produces internal structures that can be mined. Our techniques seek to bridge subsymbolic learning and symbolic and algebraic structure, offering a path toward neural systems whose learned concepts are inspectable, composable, and verifiable.