<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>00000cab a22000003a 4500</leader>
  <controlfield tag="001">UP-99796217608796420</controlfield>
  <controlfield tag="003">Buklod</controlfield>
  <controlfield tag="005">20231007234805.0</controlfield>
  <controlfield tag="006">m    |o  d |      </controlfield>
  <controlfield tag="007">ta</controlfield>
  <controlfield tag="008">090422s        xx     d | ||r |||||   ||</controlfield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">DENGII</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
   <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="100" ind1="0" ind2=" ">
   <subfield code="a">Borodin, Allan</subfield>
  </datafield>
  <datafield tag="245" ind1="0" ind2="0">
   <subfield code="a">Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces.</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">pp. 153-167</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
   <subfield code="a">One of the central problems in information retrieval, data mining, computational biology, statistical analysis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Often the data is noisy and even approximate classification is of extreme importance. The difficulty of such classification stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or ldquonearly quadraticrdquo running time; i.e., O(dn 2?agr(d)) time where n is the number of data points, d is the dimension of the space and agr(d) approaches 0 as d grows. In this paper, we show (for three fairly natural clustering rules) that computing an approximate solution can be done much more efficiently. More specifically, for agglomerative clustering (used, for example, in the Alta Vistatrade search engine), for the clustering defined by sparse partitions, and for a clustering based on minimum spanning trees we derive randomized (1 + isin) approximation algorithms with running times Õ(d 2 n 2?gamma) where gamma &gt; 0 depends only on the approximation parameter isin and is independent of the dimension d.</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
   <subfield code="a">Graph-theoretic clustering.</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
   <subfield code="a">High dimensional spaces.</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
   <subfield code="a">Sparse partitions.</subfield>
  </datafield>
  <datafield tag="773" ind1="0" ind2=" ">
   <subfield code="t">Machine learning.</subfield>
   <subfield code="g">56, 1-3 (2004).</subfield>
  </datafield>
  <datafield tag="905" ind1=" " ind2=" ">
   <subfield code="a">FO</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
   <subfield code="a">Article</subfield>
  </datafield>
 </record>
</collection>
