SPIRE 2022

29th International Symposium on String Processing and Information Retrieval

November 8-10th, 2022, Concepción-Chile

Welcome to SPIRE 2022

SPIRE 2022 is the 29th edition of the annual Symposium on String Processing and Information Retrieval. SPIRE has its origins in the South American Workshop on String Processing, which was first held in Belo Horizonte, Brazil, in 1993. Since 1998, the focus of the workshop has also included information retrieval due to its increasing relevance to and inter-relationship with string processing.

SPIRE 2022 will be held in the city of Concepción in Chile for the first time.

The conference will be hybrid, but it can change to a online format considering the pandemic situation in November.

A Best Paper Award (1000EUR prize supported by Springer Springer) will be given to the author(s) of the most outstanding work presented at the conference.

For faster updates, follow our twitter account

Traveling to Chile: Please, consider the following official instructions before preparing your trip.

Call for papers

Important dates

  • Abstract deadline [EXTENDED]: June 17th, 2022 (anywhere on Earth) [No longer needed]
  • Strong paper deadline [EXTENDED]: June 24th, 2022 (anywhere on Earth)
  • Notification: August 19th, 2022
  • Camera-ready due: August 29th, 2020
  • Early bird registration: Until October 5th, 2022
  • Author registration: Until September 9th, 2022
  • Main conference: November 8-10th, 2022
  • The Workshop on Compression, Text and Algorithms (WCTA): November 11th.

Scope

SPIRE 2022 covers research in all aspects of string processing, information retrieval, computational biology, and related applications. Typical topics of interest include (but are not limited to):

  • String Processing: ​ string pattern matching, text indexing, data structures for string processing, text compression, compressed data structures, compressed string processing, text mining, 2D pattern matching, automata based string processing.
  • Information Retrieval (IR): retrieval models, indexing, evaluation, algorithms and data structures for IR, efficient implementation of IR systems, interface design, text classification and clustering, text analysis and mining, collaborative and content-based filtering, topic modeling for IR, search tasks (Web search, enterprise search, desktop search, legal search, cross-lingual retrieval, federated search, (micro) blog search, XML retrieval, multimedia retrieval), digital libraries.
  • Computational Biology: high-throughput DNA sequencing (assembly, read alignment, read error correction, metagenomics, transcriptomics, proteomics), evolution and phylogenetics, gene and regulatory element recognition, motif finding, protein structure prediction.

Submission

SPIRE 2022 invites submissions in two catogies:

  • Long papers: Up to 12 pages, excluding references and optional appendices.
  • Short papers: Up to 6 pages, excluding references and optional appendices.

The reviewing process will be single-blind, namely each submission should be non-anonymous.

The submission server will be EasyChair.

As in past editions, the proceedings of SPIRE 2022 will be published by Springer in the Lecture Notes in Computer Science (LNCS) series LNCS [Springer]. The use of either LaTeX or Word LNCS templates is mandatory. Suitable templates are available at the Springer Website and at Overleaf. Do not change the margin size or the font, do not make a separate title page, etc.: use the LNCS style file as given. Simultaneous submission to other conferences with published proceedings is not permitted.

Publishing process

Licence to publish proceedings papers: download

Committees

Program Committee Chairs

  • Diego Arroyuelo, Universidad Técnica Federico Santa María and Millennium Institute for Foundational Research on Data
  • Barbara Poblete, University of Chile and Amazon

Program Committee

  • Amihood Amir, Bar-Ilan University
  • Diego Arroyuelo, Universidad Técnica Federico Santa María and Millennium Institute for Foundational Research on Data
  • Ricardo Baeza-Yates, Northeastern University, Universitat Pompeu Fabra and University of Chile
  • Hideo Bannai, Tokyo Medical and Dental University
  • Altigran da Silva, Universidade Federal do Amazonas
  • Antonio Fariña, University of A Coruña
  • Gabriele Fici, University of Palermo
  • Travis Gagie, Dalhousie University
  • Pawel Gawrychowski, University of Wroclaw
  • Marcos Goncalves, Federal University of Minas Gerais
  • Inge Li Gørtz, Technical University of Denmark
  • Meng He, Dalhousie University
  • Wing-Kai Hon, National Tsing Hua University
  • Shunsuke Inenaga, Kyushu University
  • Dominik Köppl, Tokyo Medical and Dental University
  • Thierry Lecroq, University of Rouen Normandy
  • Zsuzsanna Lipták, University of Verona
  • Felipe A. Louza, Universidade Federal de Uberlândia
  • Giovanni Manzini, University of Pisa
  • Joao Meidanis, University of Campinas and Scylla Bioinformatics
  • Alistair Moffat, The University of Melbourne
  • Viviane P. Moreira, Instituto de Informatica - UFRGS
  • Gonzalo Navarro, University of Chile and Millennium Institute for Foundational Research on Data
  • Nadia Pisanti, University of Pisa
  • Solon Pissis, Centrum Wiskunde & Informatica
  • Barbara Poblete, University of Chile and Amazon
  • Nicola Prezza, Ca' Foscari University of Venice
  • Simon Puglisi, University of Helsinki
  • Rajeev Raman, University of Leicester
  • Kunihiko Sadakane, The University of Tokyo
  • Srinivasa Rao Satti, Norwegian University of Science and Technology
  • Marinella Sciortino, University of Palermo
  • Diego Seco, University of A Coruña
  • Sharma V. Thankachan, University of Central Florida
  • Rossano Venturini, University of Pisa
  • Nivio Ziviani, Federal University of Minas Gerais

Organizing Committee Chairs

  • José Fuentes-Sepúlveda, University of Concepción and Millennium Institute for Foundational Research on Data
  • Cecilia Hernández, University of Concepción and Centre for Biotechnology and Bioengineering

Steering Committee

  • Ricardo Baeza-Yates, Northeastern University, Universitat Pompeu Fabra and University of Chile
  • Christina Boucher, University of Florida
  • Nieves R. Brisaboa, University of A Coruña
  • Thierry Lecroq, University of Rouen Normandy
  • Simon J. Puglisi, University of Helsinki
  • Berthier Ribeiro-Neto, Google Inc. and Federal University of Minas Gerais
  • Sharma Thankachan, University of Central Florida
  • Hélène Touzet (CNRS, Lille, France)
  • Nivio Ziviani, Universidade Federal Minas Gerais

Program

Coming Soon...

Accepted papers

  1. Panagiotis Charalampopoulos, Solon Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen and Wiktor Zuba. Subsequence Covers of Words
  2. Amihood Amir, Eitan Kondratovsky, Gad M. Landau, Shoshana Marcus and Dina Sokol. Reconstructing Parameterized Strings from Parameterized Suffix and LCP Arrays
  3. Maik Fröbe, Christopher Akiki, Martin Potthast and Matthias Hagen. Zero Shot or Not? The Effects of Train-Test Leakage on Neural Retrieval Models
  4. Lucas P. Ramos, Felipe A. Louza and Guilherme P. Telles. Genome comparison on succinct colored de Bruijn graphs
  5. Rathish Das, Meng He, Eitan Kondratovsky, J. Ian Munro and Kaiyu Wu. Internal Masked Prefix Sums and Its Connection to Fully Internal Measurement Queries
  6. Guillaume Fertin, Géraldine Jean and Anthony Labarre. Sorting Genomes by Prefix Double-Cut-and-Joins
  7. Philip Bille, Inge Li Gørtz and Tord Stordalen. The Complexity of the Co-Occurrence Problem
  8. Travis Gagie, Sana Kashgouli and Ben Langmead. KATKA: A KRAKEN-like tool with k given at query time
  9. Daiki Hashimoto, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka and Ayumi Shinohara. Computing the Parameterized Burrows-Wheeler Transform Online
  10. Golnaz Badkobeh, Alessandro De Luca, Gabriele Fici and Simon Puglisi. Maximal Closed Substrings
  11. Antonio Boffa, Paolo Ferragina, Francesco Tosoni and Giorgio Vinciguerra. Compressed String Dictionaries via Data-Aware Subtrie Compaction
  12. Travis Gagie. On representing the degree sequences of sublogarithmic-degree Wheeler graphs
  13. Christina Boucher, Dominik Köppl, Herman Perera and Massimiliano Rossi. Accessing the Suffix Array via φ-1-Forest
  14. Pawel Gawrychowski, Florin Manea and Stefan Siemer. Matching Patterns with Variables Under Edit Distance
  15. Panagiotis Charalampopoulos, Pawel Gawrychowski, Shay Mozes and Oren Weimann. On the Hardness of Computing the Edit Distance of Shallow Trees
  16. Florian Kurpicz. Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors
  17. Gonzalo Navarro, Francisco Olivares and Cristian Urbina. Balancing Run-Length Straight-Line Programs
  18. Jannik Olbrich, Enno Ohlebusch and Thomas Büchler. On the Optimisation of the GSACA Suffix Array Construction Algorithm
  19. Akiyoshi Kawamoto and Tomohiro I. Substring Complexities on Run-length Compressed Strings
  20. Parisa Darbari, Daniel Gibney and Sharma V. Thankachan. Quantum Time Complexity and Algorithms for Pattern Matching on Labeled Graphs
  21. Diego Diaz, Simon Puglisi and Leena Salmela. Computing all-vs-all MEMs in run-length encoded collections of HiFi reads
  22. Laurentius Leonard, Shunsuke Inenaga, Hideo Bannai and Takuya Mieno. Online algorithms for finding distinct substrings with length and multiple prefix and suffix conditions
  23. Garance Gourdel, Anne Driemel, Pierre Peterlongo and Tatiana Starikovskaya. Pattern matching under DTW distance

Keynote speakers

Leena Salmela

Leena Salmela is a university lecturer in the Department of Computer Science at University of Helsinki, and research fellow at the Academy of Finland. Her research interests include genome assembly algorithms, de Bruijn graphs, and other algorithms for sequencing data. Software developed by her, such as the sequencing error correction program LoRDEC, has been used in numerous biological studies.

Leena Salmela

Talk: De Bruijn graphs: solving biological problems in small space

De Bruijn graphs have become a standard data structure in analysing sequencing data due to its ability to represent the information in a sequencing read set in small space. They represent the sequencing reads by the k-mers, i.e., substrings of length k occurring in the reads. Classically, the edges of a de Bruijn graph are defined to be the k-mers and the nodes are the k-1-length prefixes and suffixes of the k-mers. The construction of a de Bruijn graph starts by counting the k-mers occurring in the reads. Many good methods exist for extracting exact k-mers from read data and counting the number of their occurrences. However, sequencing read sets can contain a significant number of sequencing errors, which limits the usefulness of counting exact k-mers to short k-mers. Recently, we have developed methods for extracting longer k-mers from noisy data by using spaced seeds and strobemers.

De Bruijn graphs were originally introduced for solving the genome assembly problem, where the goal is to reconstruct the genome based on sequencing reads. In practice, genome assembly is solved with de Bruijn graphs by reporting unitigs, which are non-branching paths in the de Bruijn graphs. The choice of k is a crucial matter in de-Bruijn-graph-based genome assembly. A too small k will make the graph tangled, resulting in short unitigs, while a too large k will fragment the graph, again resulting in short unitigs. A variable-order de Bruijn graph, which represents de Bruijn graphs of all orders k in a single data structure, has been presented as a solution to the choice of k. However, it is not clear how the definition of unitigs can be extended to variable-order de Bruijn graphs.

In this talk, we present a robust definition of assembled sequences in variable-order de Bruijn graphs and an algorithm for enumerating them. Apart from genome assembly, de Bruijn graphs are used in many other problems such as sequencing error correction, reference free variant calling, indexing read sets, and so on. At the end of this talk, we will review some of these applications and their de-Bruijn-graph-based solutions.


Dominik Kempa

Dominik Kempa is an assistant professor in the Department of Computer Science at Stony Brook University. His research interests include parallel and external-memory algorithms, data compression, string algorithms, combinatorics on words, compressed data structures and bioinformatics.

Dominik Kempa

Talk: LZ-End Parsing: Upper Bounds and Algorithmic Techniques

Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called phrases), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. In this talk, we give an overview of the recent progress on the above problem. More precisely, we prove that for any string of length n, the number ze of phrases in the LZ-End parsing satisfies ze = O(z log2n), where z is the number of phrases in the LZ77 parsing. This is the first non-trivial upper bound on the size of LZ-End parsing in terms of LZ77, and it puts LZ-End among the strongest dictionary compressors. Using our techniques, we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in O(ze) space and O(polylog n) time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries. This work was carried out in collaboration with Barna Saha and was presented at the 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA 2022).

Tutorial on Graph Databases

This tutorial will provide an introduction to graph databases, which have been gaining more and more attention in recent years, particularly for querying large-scale knowledge graphs. Graph databases pose several novel challenges in terms of modeling, querying, and implementing them on massive graphs. Though various well-known systems like Neo4j, Blazegraph, and Neptune, have already appeared, there are still many relevant research challenges around graph databases. An interesting recent development in this area has been the application of text indexing and compression techniques to optimize graph queries at large scale. The tutorial will be divided into three 1-hour sessions. The first session will introduce and motivate graph databases in the broader context of data management, discussing their advantages (and disadvantages) compared with relational databases, their data models, their key applications, providing also a short demo of graph database engines in use. The second session of the tutorial will focus on querying graph databases, introducing the key primitives used for querying graphs, along with some popular graph query languages, before concluding with some high-level challenges for evaluating graph queries in an efficient manner. Towards addressing these challenges, the third session of the tutorial will introduce indexes and algorithms that enable the efficient evaluation of queries over graph databases, starting with more traditional techniques inspired by relational databases, progressing towards more modern algorithms and indexes; in this part, we will see some surprising applications of text compression and indexing techniques for querying graph databases efficiently (in terms of both time and space).

Speakers

Aidan Hogan

Aidan Hogan is an Associate Professor at the Department of Computer Science, University of Chile, and an Associate Researcher at the Millennium Institute for Foundational Research on Data (IMFD). His research interests relate primarily to the Semantic Web, Databases and Information Extraction; he has published over one hundred peer-reviewed works on these topics. He has been invited as a lecturer to seven summer schools and he has co-organised three summer schools. He has given tutorials at ESWC, ISWC, WWW and HyperText. He is a sole author or lead author of three books; an Open Access version of the latest such book, entitled "Knowledge Graphs", is available online (https://kgbook.org). For further information, you can visit his homepage (http://aidanhogan.com/).

Aidan Hogan


Domagoj Vrgoč

Domagoj Vrgoč is an Associate Professor at the University of Zagreb, Croatia, at the Institute for Mathematical and Computational Engineering of the Pontificia Universidad Católica de Chile, and an Associate Researcher at the Millennium Institute for Foundational Research on Data (IMFD). His work mainly focuses on Graph Databases, Semantic Web, and Information Extraction, with works on these topics published in many prime venues in the area. He has served as the program committee member of conferences such as PODS, ICDT, The Web Conference, ISWC, IJCAI, amongst others. His main focus these days is on building a working graph database engine based on latest research findings. To find out more, you can check out https://github.com/MillenniumDB/MillenniumDB.

Domagoj Vrgoč

Registration

At least one author per accepted paper must register at the appropriate conference rate by September 9, 2022. Registration fees are:

Category Early bird (Until Oct. 5) Regular
Onsite author USD 320 USD 400
Online author USD 240 USD 300
Onsite general USD 240 USD 300
Onsite student USD 160 USD 200
Online no-author USD 0 USD 0

Note: In coordination with our workshop WCTA, we are providing free-registration scholarships for students. If you want to apply for them, fill the following scholarships form.

Online registration fee includes:

  • Attendance to SPIRE 2022 sessions, invited talks and tutorial
  • LNCS Proceedings (electronic version)

Additionally, onsite registration fee includes:

  • Conference material
  • Coffee breaks
  • Lunches
  • Social dinner and tourist event


Registration process

The registration process consists of four steps

  1. Registration form (mandatory for all)
  2. Complete the following form

  3. Payment
  4. We provide two payment options: online platform and bank transfer.

    • Online platform: The prices in the online platform are the equivalent prices in chilean pesos

      • Onsite author: Link
      • Online author: Link
      • Onsite general: Link
      • Onsite student: Link

    • Bank transfer: Please make a bank transfer, for the amount of your corresponding fee, to

      • Bank name: Banco Corpbanca
      • Swift code: CONBCLRM
      • Address: O’higgins 612, Concepción, Chile
      • Account number: 61-039881
      • Bank account holder: Universidad de Concepción - UdeC
      • RUT (ID): 81.494.400-K
      • Address: Edmundo Larenas 270 interior, Concepción, Chile
      • Contact e-mail: iit-capa@udec.cl
      • Reason of the payment: Registration SPIRE2022: Name of the participant -- affiliation (country)
      • Amount: your corresponding fee

      If necessary, consider:

      • Beneficiary Bank: Corpbanca, Santiago, Chile
      • Swift code: CONBCLRM
      • Intermediary Bank: Wells Fargo Bank, New York, USA
      • Swift code: PNBPUS3NNYC
      • ABA code: 026005092
      • Account number: 2000192290166

      Note 1: Please make sure that the bank transfer is free of charge for the beneficiary.

      Note 2: For bank transfers inside Chile, contact us at spire2022@inf.udec.cl.

  5. Send email and attach the payment confirmation receipt
  6. To complete the registration process, please send us an email to spire2022@inf.udec.cl, attaching your payment-confirmation/receipt (e.g. PDF). In the email, include the recipient name (person/institution) for who/which the invoice must be issued and the full address for the invoice (otherwise, the address provided in the registration form will be used)

    Note 1: All registrations will generate a legal invoice.

    Note 2: If you need to issue the invoice to the name of your institution, please contact us at spire2022@inf.udec.cl. More details are needed to generate the invoice.

  7. Confirmation email
  8. Once received your email, and as soon as we receive the payment in our bank account, we will send you an email to confirm that the registration completed successfully. Please note that this could take more than one week, so please be patient.

The venue

SPIRE 2022 will be held in the Faculty of Engineering of the Universidad de Concepción, located in the city of Concepción, Chile. Concepción is the second largest city in Chile and the Universidad de Concepción is the largest educational institution of the city, and one of the most traditional and prestigious universities of Chile. Its beautiful campus is the main touristic attraction of the city, with its “campanil” tower overlooking both the University and the city of Concepción. The University hosts more than 20 thousand students in its three campuses, and it has become a symbol of pride of the city.

Concepción is the second largest city in Chile, located next to the BioBio river and the Pacific ocean. Concepción is a vibrant city with many universities, shopping centers and it is close to many touristic areas.

Concepción is surrounded by beautiful nature and picturesque locations such as the Dichato beach (for swimming and seafood eating), Caleta Tumbes (for seafood restaurants), Chiflón del Diablo (former underground coal mine)

How to arrive at Concepción

From Santiago, you can reach Concepción either by flights (of 45 minutes of duration) or buses (around 6 hours of duration). Airlines that provide service to Concepción are LATAM, Sky and JetSmart. If you prefer to travel by bus, you first need to reach the Bus terminal in downtown Santiago, (also called “Terminal Alameda”) where many companies offer service to Concepción (we recommend either EME Bus or Pullman Tur).

How to arrive at University of Concepción

The University is located in the heart of Concepción. You can use google maps with the address “Edmundo Larenas 219, Concepción” from your hotel, to obtain the best route. From downtown Concepción, it is around 15-20 minutes walking to the Conference venue.

Weather in Concepción

You will enjoy a very comfortable dry summer weather in Concepción. The average temperature is around 20 degrees celsius (78 farenheit) and it is very uncommon for rain to occur. However, given the abundant sunlight, we recommend the use of sunscreen of factor 30 or higher.

Accomodations

Concepción has many hotels with a wide price range. Some of this hotels include:

Hotel Mercure

Hotel Araucano

Hotel Holiday Inn Express

Hotel Diego de Almagro

Hotel Terrano

Hotel Ibis

Hotel Aurelio

More about Concepción

History

SPIRE started as the South American Workshop on String Processing in Belo Horizonte in 1993. It skipped 1994, changed its name in 1998, left South America in 1999 when it went to Cancun, and started alternating between Latin America and Europe in 2000 when it went to A Coruña, a pattern that was briefly interrupted in 2008 and 2016 when it went to Melbourne and Beppu, respectively. The complete list of previous venues is as follows:

  • 28th SPIRE: Lille, France (online), October 2021
  • 27th SPIRE: Florida, United States (online), October 2020
  • 26th SPIRE: Segovia, Spain, October 2019
  • 25th SPIRE: Lima, Perú, October 2018
  • 24th SPIRE: Palermo, Italy, September 2017
  • 23rd SPIRE: Beppu, Japan, October 2016
  • 22nd SPIRE: London, UK, September 2015
  • 21st SPIRE: Ouro Preto, Brazil, October 2014
  • 20th SPIRE: Jerusalem, Israel, October 2013
  • 19th SPIRE: Cartagena, Columbia, October 2012
  • 18th SPIRE: Pisa, Italy, October 2011
  • 17th SPIRE: Los Cabos, Mexico, October 2010
  • 16th SPIRE: Saariselkä, Finland, August 2009
  • 15th SPIRE: Melbourne, Australia, November 2008
  • 14th SPIRE: Santiago, Chile, October 2007
  • 13th SPIRE: Glasgow, Scotland, October 2006
  • 12th SPIRE: Buenos Aires, Argentina, October 2005
  • 11th SPIRE: Padova, Italy, October 2004
  • 10th SPIRE: Manaus, Brazil, October 2003
  • 9th SPIRE: Lisbon, Portugal, September 2002
  • 8th SPIRE: Laguna de San Rafael, Chile, November 2001
  • 7th SPIRE: A Coruña, Spain, September 2000
  • 6th SPIRE: Cancun, Mexico, September 1999
  • 5th SPIRE: Santa Cruz, Bolivia, September 1998
  • 4th WSP: Valparaiso, Chile, 1997
  • 3rd WSP: Recife, Brazil, 1996
  • 2nd WSP: Valparaiso, Chile, 1995
  • 1st WSP: Belo Horizonte, Brazil, 1993