Solvability beltway problems in polynomial time

3r33333. 3r3-31. Students get acquainted with the beltway problem by taking bioinformatics courses, in particular, one of the most qualitative and closest in spirit for programmers - the Bioinformatics course (Pavel Pevzner) at the course of the University of California San Diego. The problem draws attention to the simplicity of the statement, its practical importance, and the fact that it is still considered unresolved, with the seeming opportunity to solve it by simple coding. The problem is put this way. Is it possible __ in polynomial time __ restore the coordinates of the set of points

$ inline $ X $ inline $

if the set of all pair distances between them is known

$ inline $ delta X $ inline $ 3r33333. ? 3r33333. 3r33333. 3r33333. 3r33333. This seemingly simple task is still on the list of unsolved problems of computational geometry. Moreover, the situation does not even concern points in multidimensional spaces, all the more crooked, the problem is the simplest option - when all points have integer coordinates and are localized on one line. 3r33333. 3r33333. 3r314.

3r33333. 3r33333. Over the past almost half a century since this task was perceived by mathematicians as a challenge (Shamos, 1975), some results were obtained. We consider two cases possible for a one-dimensional problem: 3r3333341. 3r33333. 3r33333. 3r33333.

3r33333. 3r3189. The points are located on the straight (turnpike problem)

3r33333. 3r3189. the points are located on the circle (beltway problem)

3r33333. 3r330. 3r33333. 3r33333. These two cases have received different names for a reason - various efforts are required to solve them. And indeed, the first problem was solved rather quickly (in 15 years) and a backtracking algorithm was built, which restores the solution on average in a quadratic time 3r33311. $ inline $ O (n ^ 2 log n) $ inline $

where

$ inline $ n $ inline $

- the number of points (Skiena, 1990); for the second task, this has not been done so far, and the best proposed algorithm has an exponential computational complexity of 3r33311. $ inline $ O (n ^ n log n) $ inline $

(Lemke, 2003). The picture below shows an estimate of how long your computer will work to get a solution for a set with a different number of points. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. That is, for a psychologically acceptable time (~ 10 sec), you can restore the set 3r3333311. $ inline $ X $ inline $

up to 10 thousand points in the turnpike case and only ~ 10 points for the beltway case, which is completely useless for practical applications. 3r33333. 3r33333. 3r33333. 3r33333. Small clarification. It is believed that the turnpike problem is solved from the point of view of use in practice, that is, for the vast majority of the data encountered. At the same time, the objections of pure mathematicians to the fact that there are special data sets for which the time to obtain a solution is exponentially 3r-3311 are ignored. $ inline $ O (2 ^ n log n) $ inline $

(Zhang, 1982). In contrast to turnpike, beltway, the problem with its exponential algorithm can in no way be considered practically solved. 3r33333. 3r33333. 3r33333. 3r33333. 3r3152. The importance of solving beltway problems in terms of bioinformatics

3r33333. 3r33333. At the end of the 20th century, a new pathway for the synthesis of biomolecules was discovered, called the non-ribosomal route of synthesis. Its main difference from the classical path of synthesis is that the final result of the synthesis is not encoded in DNA at all. Instead, the DNA contains only the code of the “tools” (many different synthetases) that these objects can assemble. Thus, biomachine engineering is substantially enriched, and a cell instead of one type of collector (ribosomes) that works with only 20 standard parts (standard amino acids, also called proteinogenic), many other types of collectors appear that can work with more than 500 standard and non-standard parts (non-proteinogenic amino acids and their various modifications). And these collectors can not only connect parts into chains, but also get very intricate designs - cyclic, branching, as well as designs with many cycles and branching. All this significantly increases the arsenal of the cell for different cases of its life. The biological activity of such structures is high, the specificity (selectivity of action) is also, the variety of properties is not limited. The class of these cell products in the English-language literature is called NRPs (non-ribosomal products, or non-ribosomal peptides). Biologists are hoping that it is among these products of cell metabolism that one can find new pharmacological preparations that are highly effective and do not have side effects due to their specificity. 3r33333. 3r33333. 3r33333. 3r33333. The only question is, how and where to look for NRPs? They are very effective, therefore the cell has absolutely no need to produce them in large quantities, and their concentrations are negligible. So chemical methods of extraction with their low accuracy ~ 1% and huge expenses on reagents and time are useless. Further. They are not encoded in DNA, it means all the databases that are accumulated in deciphering the genome, and all methods of bioinformatics and genomics are also useless. The only way to find something remains physical methods, and, namely, mass spectroscopy. Moreover, the level of detection of substances in modern spectrometers is so high that it allows you to find insignificant quantities, total> ~ 800 molecules (atto-molar range, or concentration 3r33311. $ Inline $ 10 ^ {- 18} $ inline $ 3r?312.). 3r33333. 3r33333. 3r33333. 3r33333. "

3r33333. 3r33333. 3r33333. And how does a mass spectrometer work? In the working chamber of the device is the destruction of molecules into fragments (more often through their collisions with each other, less often due to external influence). These fragments are ionized, and then accelerated and separated in an external electromagnetic field. The separation takes place either by the time it reaches the detector, or by the rotation angle in a magnetic field, because the greater the mass of the fragment and the lower its charge, the more clumsy it is. Thus, the mass spectrometer “weighs” the masses of the fragments. Moreover, the “weighting” can be made multi-stage by repeatedly filtering fragments with one mass (more precisely, with one value

$ Inline $ m /z $ inline $

) And driving them through fragmentation with further separation. Two-stage mass spectrometers are widely distributed and are called tandem, multistage mass spectrometers are extremely rare, and are simply referred to as 3r-3311. $ inline $ MC ^ n $ inline $ 3r3333312. where

$ inline $ n $ inline $

- number of stages. And here the task appears, knowing only the masses of various fragments of a molecule to recognize its structure? Here we come to turnpike and beltway problems, for linear and cyclic peptides, respectively. 3r33333. 3r33333. 3r33333. 3r33333. Let me explain in more detail how the task of restoring the structure of a biomolecule is reduced to these problems using the example of a cyclic peptide. 3r33333. 3r33333. 3r33333. 3r33333. 3r33939. 3r33333. 3r33333. 3r33333. 3r33333. The cyclic ABCD peptide at the first stage of fragmentation can be broken in 4 places, either by the D-A bond, or by A-B, B-C or C-D, forming 4 possible linear peptides - either ABCD, BCDA, CDAB or DABC. Since a huge number of identical peptides pass through the spectrometer, then at the output we will have fragments of all 4 types. Moreover, we note that all the fragments have the same mass, and cannot be separated in the first stage. At the second stage, the linear fragment of ABCD can be broken in three places, forming smaller fragments with different masses A and BCD, AB and CD, ABC and D, and forming the corresponding mass spectrum. In this spectrum, the mass of the fragment is plotted along the x axis, and the number of small fragments with a given mass along the y axis. Similarly, the spectra are formed for the remaining three linear fragments of BCDA, CDAB and DABC. Since all 4 large fragments passed to the second stage, all their spectra are added. Total, on an output some set of masses

turns out. $ inline $ {m_1 ^ {n_1}, m_2 ^ {n_2}, , m_q ^ {n_q}} $ inline $

where

$ inline $ m_i $ inline $

- some mass, and

$ inline $ n_i $ inline $

- the multiplicity of its repetition. At the same time, we do not know which fragment this mass belongs to or whether this fragment is unique, since different fragments can have the same mass. The further the bonds in the peptide are from each other, the greater the mass of the peptide fragment is between them. That is, the task of restoring the order of elements in a cyclic peptide is reduced to a beltway problem, in which elements of the set

$ inline $ X $ inline $

the bonds in the peptide, and the elements of the set

$ inline $ delta X $ inline $ 3r33333. - masses of fragments between bonds. 3r33333. 3r33333. 3r33333. 3r33333. 3r3152. Anticipation of the possibility of the existence of an algorithm with a polynomial solution time beltway problems 3r3153. 3r33333. 3r33333. From my own experience and from communicating with people who tried or actually did something to solve this problem, I noticed that the vast majority of people try to solve it either in the general case or for integer data in a narrow range like this (? 50). Both options are doomed to failure. For the general case, it was proved that the total number of solutions for beltway problems in the one-dimensional case is 3r-3311. $ inline $ S_1 (n) $ inline $

limited to values (Lemke, 2003) 3r3118. 3r33311. $$ display $$ e ^ {2 ^ {frac {ln n} {ln ln n} + o (1)}} leq S_1 (n) leq frac {1} {2} n ^ {n-2} $$ display $$

which means the existence of an exponential number of solutions in the asymptotics of 3r33311. $ inline $ n rightarrow infty $ inline $

. Obviously, if the number of solutions grows exponentially, then the time to obtain them cannot grow more slowly. That is, for the general case, a polynomial time solution is impossible. As for the integer data in a narrow range, here everything can be checked experimentally, since it is not too difficult to write code that searches for a solution using the full brute force method. For small

$ inline $ n $ inline $

This code will not take very long. The results of testing such a code will show that, under such conditions, for data, the total number of different solutions for any set is 3r33311. $ inline $ delta X $ inline $ 3r33333. already at small

$ inline $ n $ inline $

also growing extremely sharply. 3r33333. 3r33333. 3r33333. 3r33333. Learning about these facts, you can put up and give up on this task. I admit that this is one of the reasons why the beltway problem is still considered unsolved. However, loopholes do exist. Recall that the exponential function

$ inline $ e ^ {alpha x} $ inline $

behaves very interesting. At first, it grows scary slowly, rising from 0 to 1 for an infinitely large interval of 3r3333311. $ inline $ infty $ inline $

to ? then its growth accelerates more and more. However, the smaller the value of

$ inline $ alpha $ inline $

, the greater should be the value of

$ inline $ x $ inline $

so that the result of the function transcends some specified value

$ inline $ y = xi $ inline $

. As such a value, it is convenient to choose the number

$ inline $ xi = 2 $ inline $

, before it is the only solution, after it there are many solutions Question. Did anyone check the dependence of the number of decisions on what data gets into the input? Yes, I did. There is a remarkable phD dissertation of a Croatian woman mathematician Tamara Dakis (Dakic, 2000), in which she defined what conditions the input data should satisfy in order for the problem to be solved in polynomial time. The most important of them are the uniqueness of the solution and the absence of duplication in the set of input data 3r-3311. $ inline $ delta X $ inline $ 3r33333. . The level of her phD dissertation is very high. It is a pity that this student was limited only by the turnpike problem, convinced that she would have turned her interest in the beltway problem a long time ago. 3r33333. 3r33333. 3r33333. 3r33333. 3r3152. Getting a polynomial algorithm for solving beltway problem

3r33333. 3r33333. Detect data for which it is possible to build the desired algorithm by accident. It took additional ideas. The main idea arose from the observation (see above) that the spectrum of a cyclic peptide is the sum of the spectra of all linear peptides that are formed during single ring breaks. Since the amino acid sequence in a peptide can be recovered from any such linear peptide, the total number of lines in the spectrum is significant (in 3r3111. $ Inline $ n $ inline $

Times where

$ Inline $ n $ inline $ 3r33312. Is a number amino acids in peptide) is redundant. The only question is which lines need to be excluded from the spectrum in order not to lose the possibility of restoring the sequence. Since mathematically both tasks (recovery of the cyclic peptide sequence from the mass spectrum and the beltway problem) are isomorphic, it is also possible to thin out the set 3r33311. $ inline $ delta X $ inline $ 3r33333. . 3r33333. 3r33333. 3r33333. 3r33333. Thinning operations

$ inline $ delta X $ inline $ 3r33333. were built using local symmetries of the set 3r3333311. $ inline $ delta X $ inline $ 3r33333. (Fomin, 2016a). 3r33333. 3r33333. 3r33333. 3r33333. 3r3174. 3r33333. 3r3189. Symmetrization. The first operation is to select an arbitrary element of the set

$ inline $ x_ {mu} in Delta X $ inline $

and remove from

$ inline $ delta X $ inline $ 3r33333. all elements, except those that have symmetric pairs with respect to the points 3r3333311. $ inline $ x_ {mu} /2 $ inline $

and 3r33311. $ inline $ (L + x_ {mu}) /2 $ inline $

where

$ inline $ L $ inline $

- the circumference (I remind you that in the beltway case all points are located on the circle). 3r3198. 3r33333. 3r3189. Partial solution convolution. The second operation uses the presence of a guess about the solution, that is, the knowledge of individual points that belong to the solution, the so-called partial solution. Here, from the set

$ inline $ delta X $ inline $ 3r33333. all elements are also deleted, except those that satisfy the condition - if we measure the distances from the point being checked to all points of the partial solution, then 3r3192. all [/u] The values obtained are in r3r3311. $ inline $ delta X $ inline $ 3r33333. . I will clarify that if at least one of the distances obtained is missing in 3r33311. $ inline $ delta X $ inline $ 3r33333. then the dot is ignored. 3r3198. 3r33333.

3r33333. 3r33333. Theorems were proved that both operations thin out the set 3r33311. $ inline $ delta X $ inline $ 3r33333. , but there will still be enough elements in it to restore the

solution. $ inline $ X $ inline $

. Using these operations, the operation was built and implemented on the c ++ algorithm to solve beltway problems (Fomin, 2016b). The algorithm differs little from the classical backtracking algorithm (that is, we try to build a solution by iterating through possible options and make returns if we get an error during construction). The only difference is that for the account of the narrowing of the set

$ inline $ delta X $ inline $ 3r33333. Trying us becomes significantly less options. 3r33333. 3r33333. 3r33333. 3r33333. Here is an example of how the set r3r3311 is sharply narrowed. $ inline $ delta X $ inline $ 3r33333. when thinning. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. 3r33333. Computational experiments were performed on randomly generated cyclic peptides of length 3r-3311. $ inline $ n $ inline $

from 10 to 1000 amino acids (y-axis on a logarithmic scale). The abscissa axis also on a logarithmic scale shows the number of elements in sets thinned by various operations 3r33333. $ inline $ delta X $ inline $ 3r33333. in units of

$ inline $ n $ inline $

. Such a presentation is absolutely unusual, because I will decipher how it is read by example. Look at the left chart. Let the peptide have a length of

$ inline $ n = 100 $ inline $ 3r3333312. . For him, the number of elements in the set

$ inline $ delta X $ inline $ 3r33333. equals

$ inline $ n ^ 2 = 10000 $ inline $

(This is a point on the upper dotted line

$ inline $ y = n ^ 2 $ inline $

). After removing from the set

$ inline $ delta X $ inline $ 3r33333. repeating elements, the number of elements in

$ inline $ delta X $ inline $ 3r33333. reduced to

$ inline $ n_D approx n ^ {1.9} approx 6300 $ inline $

(circles with crosses). After symmetrization, the number of elements falls to

$ inline $ n_S approx n ^ {???} approx 3100 $ inline $

(black circles), and after the convolution of a partial solution to 3r3333311. $ inline $ n_C approx n ^ {???} approx 500 $ inline $

(crosses). So the total amount of the set is

$ inline $ delta X $ inline $ 3r33333. reduced by only 20 times. For a peptide of the same length, but on the right chart, the reduction is from 3r33311. $ inline $ n ^ 2 = 10000 $ inline $

to

$ inline $ N_C approx n approx 100 $ inline $

that is 100 times. 3r33333. 3r33333. 3r33333. 3r33333. Note that the generation of test examples for the left diagram is done in such a way that the duplication level is 3r-3311. $ inline $ k_ {dup} $ inline $ 3r3333312. in

$ inline $ delta X $ inline $ 3r33333. was in the range from 0.1 to 0.? and for the right - less than 0.1. The level of duplication is defined as

$ inline $ k_ {dup} = 2-log {N_u} /log {n} $ inline $

where

$ inline $ N_u $ inline $ 3r3333312. - the number of unique elements in the set

$ inline $ delta X $ inline $ 3r33333. . This definition gives a natural result: in the absence of duplicates in the set 3r33333. $ inline $ delta X $ inline $ 3r33333. its power is

$ inline $ N_u = n ^ 2 $ inline $

and 3r33311. $ inline $ k_ {dup} = 0 $ inline $

, with the maximum possible duplication

$ inline $ N_u = n $ inline $

and 3r33311. $ inline $ k_ {dup} = 1 $ inline $

. How to make provide a different level of duplication, I will say a little later. From the diagrams, it can be seen that the smaller the level of duplication, the more thinned out 3r-3311. $ inline $ delta X $ inline $ 3r33333. at

$ inline $ k_ {dup} < 0.1$inline$ the number of elements in the thinned

$ inline $ delta X $ inline $ 3r33333. generally reaches its limit

$ inline $ O (n ^ 2) rightarrow O (n) $ inline $

, because in the thinned set is less than 3r33311. $ inline $ O (n) $ inline $

elements cannot be obtained (operations retain enough elements to obtain a solution in which there are

$ inline $ n $ inline $

elements). The fact of narrowing the power of the set

$ inline $ delta X $ inline $ 3r33333. down to the lower limit is very important, it is he who leads to dramatic changes in the computational complexity of obtaining a solution. 3r33333. 3r33333. 3r33333. 3r33333. After the insertion of thinning operations in the backtracking algorithm and in solving the beltway problem, a complete analog of what Tamara Dakis said about the turnpike problem was revealed. I will remind you. She said that for a turnpike problem it is possible to obtain a solution in polynomial time if the solution is unique and there is no duplication in 3r33311. $ inline $ delta X $ inline $ 3r33333. . It turned out that a complete lack of duplication is not necessarily necessary (it is hardly possible for real data), it is enough that its level will be sufficiently small. The figure below shows how long it takes to get a solution to the beltway problem depending on the length of the peptide and the level of duplication in r3r3311. $ inline $ delta X $ inline $ 3r33333. . 3r33333. 3r33333. 3r33333. 3r33333. 3r3304. 3r33333. 3r33333. 3r33333. 3r33333. In the figure, both the abscissa and ordinate axes are given on a logarithmic scale. This allows you to clearly see whether the dependence of the count time on the length of the sequence 3r33333. $ inline $ T = f (n) $ inline $

exponential (straight line) or polynomial (log-form curve). As can be seen in the diagrams, with a low level of duplication (right diagram), the solution is obtained in polynomial time. Well, to be more precise, the solution is obtained in quadratic time. This happens when thinning operations reduce the power of the array to the lower limit of 3r-3311. $ inline $ O (n ^ 2) rightarrow O (n) $ inline $

, there are few points left in it, returns when selecting options become single, and as a matter of fact, the algorithm ceases to sort out the options, but simply constructs a solution from what is left. 3r33333. 3r33333. 3r33333. 3r33333. P.S. Well, I will reveal the last secret regarding the generation of sets in different levels of duplication. This is due to the different accuracy of data presentation. If the data is generated with low accuracy (for example, rounding to integers), then the level of duplication becomes high, more than 0.3. If the data is generated with good accuracy, for example, up to 3 decimal places, then the level of duplication decreases sharply, less than 0.1. And from here follows the last most important remark. 3r33333. 3r33333. 3r33333. 3r33333. * For real data, in conditions of constantly increasing measurement accuracy, the beltway problem becomes solvable in real time. 3r33333. 3r33333. * 3r33333. 3r33333. 3r33333. 3r33333. Literature. 3r33333. 3r33333. 3r33333. 3r33333. 1. Dakic, T. (2000). On the Turnpike Problem. PhD thesis, Simon Fraser University. 3r33333. 3r33333. 2. Fomin. E. (2016a) wise Distance ^ ^ Pair Pair Pair Pair 2 ps I. I. I. 2 ps ps I. I. 2 I. wise I. I. I. 2 I. I. I. I. I. 2 wise I. I. I. I. I. I. 2 I J Comput Biol. 201? 23 (9): 769-75. 3r33333. 3r33333. 3. Fomin. E. (2016b) wise Ste Pair Ste Pair ps 2 ps ps ps Pair wise 2 wise ps ps ??? ps Algorithm. J Comput Biol. 201? 23 (12): 934-942. 3r33333. 3r33333. 4. Lemke, P., Skiena, S., and Smith, W. (2003). Reconstructing Sets From Interpoint Distances. Discrete and Computational Geometry Algorithms and Combinatorics, 25: 597–631. 3r33333. 3r33333. 5. Skiena, S., Smith, W., and Lemke, P. (1990). Reconstructing sets from interpoint distances. In Proceedings of the Sixth ACM Symposium on Computational Geometry, pages 332–339. Berkeley, CA

3r33333. 6. Zhang, Z. 1982. An example of a partial digest mapping algorithm. J. Comp. Biol. ? 235–240.

3r33333. 3r33333. 3r33333. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r33347. 3r33333.

3r33333. 3r33333. 3r33333. 3r33333.

It may be interesting

#### weber

Author**26-10-2018, 08:25**

Publication Date
#### Development / Programming

Category- Comments: 0
- Views: 277

This is my first time i visit here. I found such a substantial number of interesting stuff in your blog especially its examination. Really its inconceivable article. Keep it up.Gulf Coast Western Reviews

I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me.. Thanks for all your help and wishing you all the success in your business.GSM Solutions

Wow, What a Excellent post. I really found this to much informatics. It is what i was searching for.I would like to suggest you that please keep sharing such type of info.Thanksthc vape juice