UNIDACT - Universal Data Compression with Circular Context Trees

Ico_CTTC
Start: 01/01/2025
End: 30/06/2026
Funding: Catalan
Status: On going
Research unit:
Adaptive Processing Technologies (ADAPT)
Acronym: UNIDACT
Call ID: ERC-2023-POC
Code: 101158232

UNIDACT is a universal data compression algorithm based on circular context trees originally developed for binary sources within the scope of the ITUL project, funded by an ERC Consolidator Grant. Its initial implementation tested with a wide range of simulated data demonstrated an average of at least 82% improvement over commercial algorithms like Lempel-Ziv (ZIP), Burrows-Wheeler transform compression (BZIP), and more complex state-of-the-art algorithms like prediction by partial matching (PPM) and context tree weighting (CTW).  The proposed algorithm has linear complexity. The goal of this proof-of-concept project is to extend the current binary implementation to arbitrary alphabets, provide fast encoder/decoder implementation, develop specific applications to compress satellite observation data and genomic data, develop an indexed version of the compression algorithm in order to access some of the data without full decompression, and investigate specific software licensing options and opportunities.

Mehdi Dabirnia
PI/Project Leader
UPC-UNIVERSITAT POLITÈCNICA DE CATALUNYA
Coordinator
No results found