<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>00000ctm a22000003a 4500</leader>
  <controlfield tag="001">UP-8027390931316197135</controlfield>
  <controlfield tag="003">Buklod</controlfield>
  <controlfield tag="005">20100812110753.0</controlfield>
  <controlfield tag="006">a     r    |||| u|</controlfield>
  <controlfield tag="007">ta</controlfield>
  <controlfield tag="008">100812s        xx     d     r    |||| u|</controlfield>
  <datafield tag="035" ind1=" " ind2=" ">
   <subfield code="a">(iLib)UPMIN-00004810112</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">DLC</subfield>
   <subfield code="c">DLC</subfield>
   <subfield code="d">upmin</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
   <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="090" ind1=" " ind2="0">
   <subfield code="a">LG993.5 2010</subfield>
   <subfield code="b">A64 M34</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
   <subfield code="a">Madarang, Jennelle Rizza M.</subfield>
  </datafield>
  <datafield tag="245" ind1="0" ind2="0">
   <subfield code="a">Modified K-mean clustering algorithm for fixed numeric and categorical data sets with missing values</subfield>
   <subfield code="c">Jennelle Rizza M. Madarang.</subfield>
  </datafield>
  <datafield tag="264" ind1=" " ind2="1">
   <subfield code="c">2010</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">65 leaves.</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
   <subfield code="a">Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2010</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
   <subfield code="a">Clustering is a data mining technique that aims to organize a given set of objects into groups or clusters such that objects within the same cluster are more similar to each other than to data objects in other clusters. However, most of the clustering algorithms deal with complete and with either numeric or categorical data sets only, but not mixed. Ahmad and Dey (2007) proposed an algorithm for clustering complete mixed data sets. In order to deal with incomplete data sets or missing values, modification of the proposed algorithm of Ahmad and Dey (2007) was done. The modification combined two techniques of handling missing values which are available case analysis which uses the available information left on the data set, and the adaptive imputation which imputes missing data during the clustering stage. The performance of the modified algorithm was tested in two data sets, small and large, and was compared to other existing methods namely, case deletion, mean and mode imputation, and kNN imputation using the Adjusted Ran Index, modified algorithm produced fair quality of resulting clusters in the small data set. It was competitive with regards to K-mean after mean and mode imputation and K-mean after kNN imputation. However, the quality of the resulting clusters on large data set is very poor on all methods. It seemed that as the size of the data set becomes bigger the modified K-mean algorithm performed worse</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
   <subfield code="a">Clustering.</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
   <subfield code="a">K-mean algorithm.</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
   <subfield code="a">Missing values.</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
   <subfield code="a">Mixed numeric and categorical data.</subfield>
  </datafield>
  <datafield tag="658" ind1=" " ind2=" ">
   <subfield code="a">Undergraduate Thesis</subfield>
   <subfield code="c">AMAT200.</subfield>
  </datafield>
  <datafield tag="905" ind1=" " ind2=" ">
   <subfield code="a">FI</subfield>
  </datafield>
  <datafield tag="905" ind1=" " ind2=" ">
   <subfield code="a">UP</subfield>
  </datafield>
  <datafield tag="852" ind1="0" ind2=" ">
   <subfield code="a">UPMIN</subfield>
   <subfield code="b">UPMIN-MAIN</subfield>
   <subfield code="h">LG993.5 2010</subfield>
   <subfield code="i">A64 M34</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
   <subfield code="a">Thesis</subfield>
  </datafield>
 </record>
</collection>
