<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>00000cab a22000003a 4500</leader>
  <controlfield tag="001">UP-99796217611077381</controlfield>
  <controlfield tag="003">Buklod</controlfield>
  <controlfield tag="005">20231008001307.0</controlfield>
  <controlfield tag="006">m    |o  d |      </controlfield>
  <controlfield tag="007">ta</controlfield>
  <controlfield tag="008">140124s        xx     d | ||r |||||   ||</controlfield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">DENGII</subfield>
  </datafield>
  <datafield tag="041" ind1="0" ind2=" ">
   <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
   <subfield code="a">Wang, Weiyan</subfield>
  </datafield>
  <datafield tag="245" ind1="1" ind2="0">
   <subfield code="a">Parallelization and performance optimization on face detection algorithm with OpenCL</subfield>
   <subfield code="b">A case study.</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">June 2012</subfield>
   <subfield code="b">p.</subfield>
  </datafield>
  <datafield tag="520" ind1="3" ind2=" ">
   <subfield code="a">Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very naïve implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4?13 times speedup varying with the image size. Furthermore, those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU.</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="0">
   <subfield code="a">Graphics processing unit.</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
   <subfield code="a">Viola-Jones.</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Zhang, Yunquan.</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Yan, Shengen.</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Zhang, Ying.</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Jia, Haipeng.</subfield>
  </datafield>
  <datafield tag="773" ind1="0" ind2=" ">
   <subfield code="t">Tsinghua Science and Technology</subfield>
   <subfield code="g">17, 3 (June 2012).</subfield>
  </datafield>
  <datafield tag="905" ind1=" " ind2=" ">
   <subfield code="a">FO</subfield>
  </datafield>
  <datafield tag="852" ind1=" " ind2=" ">
   <subfield code="a">UPD</subfield>
   <subfield code="b">DENG-II</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
   <subfield code="a">Article</subfield>
  </datafield>
 </record>
</collection>
