29th International Symposium on String Processing and Information Retrieval
29th International Symposium on String Processing and Information Retrieval
SPIRE 2022 is the 29th edition of the annual Symposium on String Processing and Information Retrieval. SPIRE has its origins in the South American Workshop on String Processing, which was first held in Belo Horizonte, Brazil, in 1993. Since 1998, the focus of the workshop has also included information retrieval due to its increasing relevance to and inter-relationship with string processing.
SPIRE 2022 will be held in the city of Concepción in Chile for the first time.
The conference will be hybrid, but it can change to a online format considering the pandemic situation in November.
A Best Paper Award (1000EUR prize supported by Springer ) will be given to the author(s) of the most outstanding work presented at the conference.
For faster updates, follow our twitter account
Traveling to Chile: Please, consider the following official instructions before preparing your trip.
SPIRE 2022 invites submissions in two catogies:
The reviewing process will be single-blind, namely each submission should be non-anonymous.
The submission server will be EasyChair.
As in past editions, the proceedings of SPIRE 2022 will be published by Springer in the Lecture Notes in Computer Science (LNCS) series . The use of either LaTeX or Word LNCS templates is mandatory. Suitable templates are available at the Springer Website and at Overleaf. Do not change the margin size or the font, do not make a separate title page, etc.: use the LNCS style file as given. Simultaneous submission to other conferences with published proceedings is not permitted.
Licence to publish proceedings papers: download
Leena Salmela is a university lecturer in the Department of Computer Science at University of Helsinki, and research fellow at the Academy of Finland. Her research interests include genome assembly algorithms, de Bruijn graphs, and other algorithms for sequencing data. Software developed by her, such as the sequencing error correction program LoRDEC, has been used in numerous biological studies.
De Bruijn graphs have become a standard data structure in analysing sequencing data due to its ability to represent the information in a sequencing read set in small space. They represent the sequencing reads by the k-mers, i.e., substrings of length k occurring in the reads. Classically, the edges of a de Bruijn graph are defined to be the k-mers and the nodes are the k-1-length prefixes and suffixes of the k-mers. The construction of a de Bruijn graph starts by counting the k-mers occurring in the reads. Many good methods exist for extracting exact k-mers from read data and counting the number of their occurrences. However, sequencing read sets can contain a significant number of sequencing errors, which limits the usefulness of counting exact k-mers to short k-mers. Recently, we have developed methods for extracting longer k-mers from noisy data by using spaced seeds and strobemers.
De Bruijn graphs were originally introduced for solving the genome assembly problem, where the goal is to reconstruct the genome based on sequencing reads. In practice, genome assembly is solved with de Bruijn graphs by reporting unitigs, which are non-branching paths in the de Bruijn graphs. The choice of k is a crucial matter in de-Bruijn-graph-based genome assembly. A too small k will make the graph tangled, resulting in short unitigs, while a too large k will fragment the graph, again resulting in short unitigs. A variable-order de Bruijn graph, which represents de Bruijn graphs of all orders k in a single data structure, has been presented as a solution to the choice of k. However, it is not clear how the definition of unitigs can be extended to variable-order de Bruijn graphs.
In this talk, we present a robust definition of assembled sequences in variable-order de Bruijn graphs and an algorithm for enumerating them. Apart from genome assembly, de Bruijn graphs are used in many other problems such as sequencing error correction, reference free variant calling, indexing read sets, and so on. At the end of this talk, we will review some of these applications and their de-Bruijn-graph-based solutions.
Dominik Kempa is an assistant professor in the Department of Computer Science at Stony Brook University. His research interests include parallel and external-memory algorithms, data compression, string algorithms, combinatorics on words, compressed data structures and bioinformatics.
Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called phrases), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. In this talk, we give an overview of the recent progress on the above problem. More precisely, we prove that for any string of length n, the number ze of phrases in the LZ-End parsing satisfies ze = O(z log2n), where z is the number of phrases in the LZ77 parsing. This is the first non-trivial upper bound on the size of LZ-End parsing in terms of LZ77, and it puts LZ-End among the strongest dictionary compressors. Using our techniques, we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in O(ze) space and O(polylog n) time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries. This work was carried out in collaboration with Barna Saha and was presented at the 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA 2022).
This tutorial will provide an introduction to graph databases, which have been gaining more and more attention in recent years, particularly for querying large-scale knowledge graphs. Graph databases pose several novel challenges in terms of modeling, querying, and implementing them on massive graphs. Though various well-known systems like Neo4j, Blazegraph, and Neptune, have already appeared, there are still many relevant research challenges around graph databases. An interesting recent development in this area has been the application of text indexing and compression techniques to optimize graph queries at large scale. The tutorial will be divided into three 1-hour sessions. The first session will introduce and motivate graph databases in the broader context of data management, discussing their advantages (and disadvantages) compared with relational databases, their data models, their key applications, providing also a short demo of graph database engines in use. The second session of the tutorial will focus on querying graph databases, introducing the key primitives used for querying graphs, along with some popular graph query languages, before concluding with some high-level challenges for evaluating graph queries in an efficient manner. Towards addressing these challenges, the third session of the tutorial will introduce indexes and algorithms that enable the efficient evaluation of queries over graph databases, starting with more traditional techniques inspired by relational databases, progressing towards more modern algorithms and indexes; in this part, we will see some surprising applications of text compression and indexing techniques for querying graph databases efficiently (in terms of both time and space).
Aidan Hogan is an Associate Professor at the Department of Computer Science, University of Chile, and an Associate Researcher at the Millennium Institute for Foundational Research on Data (IMFD). His research interests relate primarily to the Semantic Web, Databases and Information Extraction; he has published over one hundred peer-reviewed works on these topics. He has been invited as a lecturer to seven summer schools and he has co-organised three summer schools. He has given tutorials at ESWC, ISWC, WWW and HyperText. He is a sole author or lead author of three books; an Open Access version of the latest such book, entitled "Knowledge Graphs", is available online (https://kgbook.org). For further information, you can visit his homepage (http://aidanhogan.com/).
Domagoj Vrgoč is an Associate Professor at the University of Zagreb, Croatia, at the Institute for Mathematical and Computational Engineering of the Pontificia Universidad Católica de Chile, and an Associate Researcher at the Millennium Institute for Foundational Research on Data (IMFD). His work mainly focuses on Graph Databases, Semantic Web, and Information Extraction, with works on these topics published in many prime venues in the area. He has served as the program committee member of conferences such as PODS, ICDT, The Web Conference, ISWC, IJCAI, amongst others. His main focus these days is on building a working graph database engine based on latest research findings. To find out more, you can check out https://github.com/MillenniumDB/MillenniumDB.
At least one author per accepted paper must register at the appropriate conference rate by September 9, 2022. Registration fees are:
|Category||Early bird (Until Oct. 5)||Regular|
|Onsite author||USD 320||USD 400|
|Online author||USD 240||USD 300|
|Onsite general||USD 240||USD 300|
|Onsite student||USD 160||USD 200|
|Online no-author||USD 0||USD 0|
Note: In coordination with our workshop WCTA, we are providing free-registration scholarships for students. If you want to apply for them, fill the following scholarships form.
Online registration fee includes:
Additionally, onsite registration fee includes:
The registration process consists of four steps
Complete the following form
We provide two payment options: online platform and bank transfer.
Online platform: The prices in the online platform are the equivalent prices in chilean pesos
Bank transfer: Please make a bank transfer, for the amount of your corresponding fee, to
If necessary, consider:
Note 1: Please make sure that the bank transfer is free of charge for the beneficiary.
Note 2: For bank transfers inside Chile, contact us at email@example.com.
To complete the registration process, please send us an email to firstname.lastname@example.org, attaching your payment-confirmation/receipt (e.g. PDF). In the email, include the recipient name (person/institution) for who/which the invoice must be issued and the full address for the invoice (otherwise, the address provided in the registration form will be used)
Note 1: All registrations will generate a legal invoice.
Note 2: If you need to issue the invoice to the name of your institution, please contact us at email@example.com. More details are needed to generate the invoice.
Once received your email, and as soon as we receive the payment in our bank account, we will send you an email to confirm that the registration completed successfully. Please note that this could take more than one week, so please be patient.
SPIRE 2022 will be held in the Faculty of Engineering of the Universidad de Concepción, located in the city of Concepción, Chile. Concepción is the second largest city in Chile and the Universidad de Concepción is the largest educational institution of the city, and one of the most traditional and prestigious universities of Chile. Its beautiful campus is the main touristic attraction of the city, with its “campanil” tower overlooking both the University and the city of Concepción. The University hosts more than 20 thousand students in its three campuses, and it has become a symbol of pride of the city.
Concepción is the second largest city in Chile, located next to the BioBio river and the Pacific ocean. Concepción is a vibrant city with many universities, shopping centers and it is close to many touristic areas.
Concepción is surrounded by beautiful nature and picturesque locations such as the Dichato beach (for swimming and seafood eating), Caleta Tumbes (for seafood restaurants), Chiflón del Diablo (former underground coal mine)
From Santiago, you can reach Concepción either by flights (of 45 minutes of duration) or buses (around 6 hours of duration). Airlines that provide service to Concepción are LATAM, Sky and JetSmart. If you prefer to travel by bus, you first need to reach the Bus terminal in downtown Santiago, (also called “Terminal Alameda”) where many companies offer service to Concepción (we recommend either EME Bus or Pullman Tur).
The University is located in the heart of Concepción. You can use google maps with the address “Edmundo Larenas 219, Concepción” from your hotel, to obtain the best route. From downtown Concepción, it is around 15-20 minutes walking to the Conference venue.
You will enjoy a very comfortable dry summer weather in Concepción. The average temperature is around 20 degrees celsius (78 farenheit) and it is very uncommon for rain to occur. However, given the abundant sunlight, we recommend the use of sunscreen of factor 30 or higher.
SPIRE started as the South American Workshop on String Processing in Belo Horizonte in 1993. It skipped 1994, changed its name in 1998, left South America in 1999 when it went to Cancun, and started alternating between Latin America and Europe in 2000 when it went to A Coruña, a pattern that was briefly interrupted in 2008 and 2016 when it went to Melbourne and Beppu, respectively. The complete list of previous venues is as follows: