Fast median computation for symmetric, orthogonal matrices under the rank distance
File(s)
Author(s)
Meidanis, Joao
Chindelevitch, Leonid
Type
Journal Article
Abstract
Biological genomes can be represented as square, symmetric,
orthogonal, 0-1 matrices. It turns out that the rank distance
applied to two genome matrices has a biological significance:
it is related to the smallest number of basic rearrangement
mutations, such as reversals, translocations, transpositions
(taken with weight 2), etc. that explain the differences
between the two genomes. Therefore, closer genomes will
produce smaller rank distances.
An important tool in this context is the median problem:
given three genomes A, B, and C, find a fourth genome M
that minimizes d(A, M) + d(B, M) + d(C, M). For genome
matrices, the computational complexity of this problem is
currently unknown. However, for orthogonal matrices, there
are fast algorithms that solve it exactly.
One such algorithm uses a “walk towards the median”
paradigm. Starting from any of the input matrices, say, B, the
algorithm produces rank-1 “steps”, which are rank-1 matrices
that, added to B, decrease its rank distance to both A and C
simultaneously. It can be shown that such steps always exist
for orthogonal matrices, and can be found in polynomial time.
The algorithm stops when no more improvement can be made,
which is equivalent to saying that B lies between A and C in
terms of the rank distance (the triangle inequality becomes
an equality). There is an O(nω+1) algorithm implementing
this idea, where ω is the matrix multiplication exponent.
Here we propose a novel scheme that works for symmetric orthogonal matrices, and produces a median, also guaranteed
to be symmetric, in O(nω) time.
There is another O(nω) time algorithm that produces the
so-called MI median, which agrees with the majority in the
subspaces where A = B, B = C, or C = A, and is equal
to the identity in the orthogonal complement of the sum of
these subspaces. However, this algorithm produces a different
median, and has only been proved to be correct for genomic
matrices. The algorithm we present here is more general.
orthogonal, 0-1 matrices. It turns out that the rank distance
applied to two genome matrices has a biological significance:
it is related to the smallest number of basic rearrangement
mutations, such as reversals, translocations, transpositions
(taken with weight 2), etc. that explain the differences
between the two genomes. Therefore, closer genomes will
produce smaller rank distances.
An important tool in this context is the median problem:
given three genomes A, B, and C, find a fourth genome M
that minimizes d(A, M) + d(B, M) + d(C, M). For genome
matrices, the computational complexity of this problem is
currently unknown. However, for orthogonal matrices, there
are fast algorithms that solve it exactly.
One such algorithm uses a “walk towards the median”
paradigm. Starting from any of the input matrices, say, B, the
algorithm produces rank-1 “steps”, which are rank-1 matrices
that, added to B, decrease its rank distance to both A and C
simultaneously. It can be shown that such steps always exist
for orthogonal matrices, and can be found in polynomial time.
The algorithm stops when no more improvement can be made,
which is equivalent to saying that B lies between A and C in
terms of the rank distance (the triangle inequality becomes
an equality). There is an O(nω+1) algorithm implementing
this idea, where ω is the matrix multiplication exponent.
Here we propose a novel scheme that works for symmetric orthogonal matrices, and produces a median, also guaranteed
to be symmetric, in O(nω) time.
There is another O(nω) time algorithm that produces the
so-called MI median, which agrees with the majority in the
subspaces where A = B, B = C, or C = A, and is equal
to the identity in the orthogonal complement of the sum of
these subspaces. However, this algorithm produces a different
median, and has only been proved to be correct for genomic
matrices. The algorithm we present here is more general.
Date Issued
2021-04-01
Date Acceptance
2020-10-22
Citation
Linear Algebra and Its Applications, 2021, 614, pp.394-414
ISSN
0024-3795
Publisher
Elsevier
Start Page
394
End Page
414
Journal / Book Title
Linear Algebra and Its Applications
Volume
614
Copyright Statement
© 2020 Elsevier Ltd. All rights reserved. This manuscript is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence http://creativecommons.org/licenses/by-nc-nd/4.0/
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000615183800024&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Subjects
Science & Technology
Physical Sciences
Mathematics, Applied
Mathematics
Orthogonal matrices
Hermitian, skew-Hermitian, and related matrices
Eigenvalues, singular values, and eigenvectors
Publication Status
Published
Coverage Spatial
Fundacao Getulio Vargas, Rio de Janeiro, BRAZIL
Date Publish Online
2020-11-04