PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC

Lekai Qian^*, Haoyu Gu^*, Dehan Li^*, Boyu Cao, Qi Liu^†,

South China University of Technology

^*Indicates Equal Contribution,^†Indicates Corresponding Author

Abstract

Symbolic music representation is a fundamental challenge in computational musicology. While grid-based representations effectively preserve pitch-time spatial correspondence, their inherent data sparsity leads to low encoding efficiency. Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns. Specifically, we design four complementary event types: Frame Events for temporal boundaries, Gap Events for sparse regions, Pattern Events for note patterns, and Musical Structure Events for musical metadata. Pianoroll-Event strikes an effective balance between sequence length and vocabulary size, improving encoding efficiency by 1.36× to 7.16× over representative discrete sequence methods. Experiments across multiple autoregressive architectures show models using our representation consistently outperform baselines in both quantitative and human evaluations.

Pianoroll-Event Data Representation

The process of converting pianoroll representation into pianoroll-events. Through frame segmentation, partitioning, and compression operations, the pianoroll is transformed into a sequence of pianoroll-events containing diverse event types.

Encoding Efficiency Comparison

Method	Length (Avg.)	Vocabulary Size
Ours	749.8	347
REMI	1339.7	330
MIDILike	1398.9	448
REMI-BPE	317.8	20,000
ABCNotation	2575.0	128

Table 1. Encoding comparison across different representation methods, demonstrating the effectiveness of our approach in balancing sequence length and vocabulary size.

Generated Songs

The following musical pieces are generated by our best-performing Transformer decoder model with Llama architecture, conditioned only on time signature and BPM. The results demonstrate the considerable potential of our representation method.

MIDI Demo

LSTM

GPT SMALL

GPT LARGE

LLAMA

BibTeX

@article{YourPaperKey2026,
  title={PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC},
  author={Lekai Qian*, Haoyu Gu*, Dehan Li*, Boyu Cao, Qi Liu†}, 
  journal={...}, 
  year={2026}, 
  url={...} 
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3