What's new in PostgreSQL 11: built-in web search

What's new in PostgreSQL 11: built-in web search  
 
Continuing with theme of interesting possibilities of the forthcoming release of PostgreSQL 1? I would like to tell about new built-in function websearch_to_tsquery . The corresponding patch developed by Victor Drobny and Dmitry Ivanov, with corrections from Fedor Sigayev. Let's figure out what is implemented in this patch. full-text search has been in PostgreSQL for a long time, and it works very well. What else could I add?
 
 
Imagine that you are doing an online store based on PostgreSQL and you need a search for products. Here you have a form with a search query. To search the database from this query, you need to somehow build a tsvector. You can do this with the to_tsquery function. But to_tsquery expects that the string will be in a specific format:
 
 
= # select to_tsquery ('foo bar baz');
 
ERROR: syntax error in tsquery: "foo bar baz"
 
 
= # select to_tsquery ('foo & bar & baz');
 
to_tsquery
 
-----------------------
 
'foo' & 'bar' & 'baz'
 

 
In other words, in this case it is necessary to write a function that converts a user request into a query, understandable to_tsquery. Uncomfortable. Partly this problem is solved by the functions plainto_tsquery and phraseto_tsquery:
 
 
= # select plainto_tsquery ('foo bar baz');
 
plainto_tsquery
 
-----------------------
 
'foo' & 'bar' & 'baz'
 
 
= # select phraseto_tsquery ('foo bar baz');
 
phraseto_tsquery
 
---------------------------
 
'foo' <-> 'bar' <-> 'baz'
 

 
But there is a problem with them. The fact is that the user can intuitively use quotes or, say, some Boolean operators, because it works for Google, Yandex and other search engines. Let's see what happens in this case:
 
 
= # select plainto_tsquery ('"foo bar" -baz or qux');
 
plainto_tsquery
 
-------------------------------
 
'foo' & 'bar' & 'baz' & 'qux'
 

 
Everything is broken! Oh. Do you really have to write your own parser?
 
 
So that it does not have to write from scratch for every application, starting with PostgreSQL 1? the corresponding parser will now be directly in the DBMS:
 
 
= # select websearch_to_tsquery ('"foo bar" -baz or qux');
 
websearch_to_tsquery
 
----------------------------------
 
'foo' <-> 'bar' &! 'baz' | 'qux'
 

 
In addition to the fact that websearch_to_tsquery understands quotes, minus sign and Boolean operators, it is interesting in that it ignores any attempt to make a syntax error. That is, you will never get an error, the output will always be some kind of tsquery:
 
 
= # select websearch_to_tsquery ('- "foo bar" (((baz or or qux! @ # $% ^ & * _ + - =');
 
websearch_to_tsquery
 
--------------------------------------
 
! ('foo' <-> 'bar') & 'baz' | 'qux'
 

 
Another feature of the function is the fact that it ignores any parentheses. That is, that's not like that:
 
 
= # select websearch_to_tsquery ('foo and (bar or baz)');
 
websearch_to_tsquery
 
-----------------------
 
'foo' & 'bar' | 'baz'
 

 
This behavior was chosen for reasons that normal people (not the IT specialists: trollface :) practice the brackets still do not use. Simply ignoring them, we essentially simplify the implementation of the feature "to make sure that the function never fails with an error", and query parsing will work faster. Perhaps the flag, which includes support for parentheses, will appear in future versions.
 
 
This is a simple but useful function. At a minimum, it does full-text search in PostgreSQL no worse than it was before, and allows developers to simplify the code of their applications.
+ 0 -

Comments 2

Offline
赵熠东
赵熠东 24 October 2018 14:47
excellent!!!
Offline
Kirill
Kirill 4 August 2019 21:16
Is it really PostgreSQL 1?

Add comment