Skip to main content

The unique strengths and storage access characteristics of discard-based search

Abstract

Discard-based searchis a new approach to searching the content of complex, unlabeled, nonindexed data such as digital photographs, medical images, and real-time surveillance data. The essence of this approach is query-specific content-based computation, pipelined with human cognition. In this approach, query-specific parallel computation shrinks a search task down to human scale, thus allowing the expertise, judgment, and intuition of an expert to be brought to bear on the specificity and selectivity of the search. In this paper, we report on the lessons learned in the Diamond projectfrom applying discard-based search to a variety of applications in the health sciences. From the viewpoint of a user, discard-based search offers unique strengths. From the viewpoint of server hardware and software, it offers unique opportunities for optimization that contradict long-established tenets of storage design. Together, these distinctive end-to-end attributes herald a new genre of Internet applications.

References

  1. 1.

    Acharya A, Uysal M, Saltz J (1998) Active disks: programming model, algorithms and evaluation. In: Proceedings of the international conference on architectural support for programming languages and operating systems

  2. 2.

    von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems

  3. 3.

    Amiri K, Petrou D, Ganger G, Gibson G (2000) Dynamic function placement for data-intensive cluster computing. In: Proceedings of the USENIX technical conference

  4. 4.

    Arpaci-Dusseau R, Anderson E, Treuhaft N, Culler D, Hellerstein J, Patterson D, Yelick K (1999) Cluster I/O with river: making the fast case common. In: Proceedings of input/output for parallel and distributed systems

  5. 5.

    Avnur R, Hellerstein J (2000) Eddies: continuously adaptive query processing. In: Proceedings of SIGMOD

  6. 6.

    Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the USENIX symposium on operating systems design and implementation, San Francisco, CA

  7. 7.

    Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1)

  8. 8.

    Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New York

    Google Scholar 

  9. 9.

    Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. IEEE Comput 28(9)

  10. 10.

    Gibbons P, Mummert L, Sukthankar R, Satyanarayananan M (2007) Just-in-time indexing for interactive data exploration. Tech Rep CMU-CS-07-120, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA

  11. 11.

    Goode A, Sukthankar R, Mummert L, Chen M, Saltzman J, Ross D, Szymanski S, Tarachandani A, Satyanarayanan M (2008) Distributed online anomaly detection in high-content screening. In: Proceedings of the 2008 5th IEEE international symposium on biomedical imaging, Paris, France

  12. 12.

    Goode A, Chen M, Tarachandani A, Mummert L, Sukthankar R, Helfrich C, Stefanni A, Fix L, Saltzmann J, Satyanarayanan M (2007) Interactive search of adipocytes in large collections of digital cellular images. In: Proceedings of the 2007 IEEE international conference on multimedia and expo (ICME07), Beijing, China

  13. 13.

    Goode A, Satyanarayanan M (2008) A vendor-neutral library and viewer for whole-slide images. Tech Rep CMU-CS-08-136, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA

  14. 14.

    Hunt G, Scott M (1999) The Coign automatic distributed partitioning system. In: Proceedings of OSDI

  15. 15.

    Huston L, Sukthankar R, Wickremesinghe R, Satyanarayanan M, Ganger GR, Riedel E, Ailamaki A (2004) Diamond: a storage architecture for early discard in interactive search. In: Proceedings of the 3rd USENIX conference on file and storage technologies, San Francisco, CA

  16. 16.

    Keeton K, Patterson D, Hellerstein J (1998) A case for intelligent disks (IDISKs). SIGMOD Rec 27(3)

  17. 17.

    Kim E, Haseyama M, Kitajima H (2002) Fast and robust ellipse extraction from complicated images. In: Proceedings of IEEE information technology and applications

  18. 18.

    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis

  19. 19.

    Merriam-Webster (2007) Merriam–Webster online search. http://mw1.merriam-webster.com/dictionary/tenet

  20. 20.

    Minka T, Picard R (1997) Interactive learning using a society of models. Pattern Recognit 30

  21. 21.

    Mummert L, Schlosser S, Mesnier M, Satyanarayanan M (2007) Rethinking storage for discard-based search. Tech Rep CMU-CS-07-176, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA

  22. 22.

    Patterson RH, Gibson GA, Ginting E, Stodolsky D, Zelenka J (1995) Informed prefetching and caching. In: Proceedings of the fifteenth ACM symposium on operating systems principles, Copper Mountain, CO

  23. 23.

    Riedel E, Gibson G, Faloutsos C (1998) Active storage for large-scale data mining and multimedia. In: Proceedings of the international conference on very large databases

  24. 24.

    Stonebreaker M (1981) Operating system support for database management. Commun ACM 24(7)

  25. 25.

    Wikipedia (2007) Conveyor belt sushi. Wikipedia, The Free Encyclopedia. [Online: accessed 3-September-2007]

  26. 26.

    Yang L, Jin R, Mummert L, Sukthankar R, Goode A, Zheng B, Hoi SC, Satyanarayanan M (2010) A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval. IEEE Trans Pattern Anal Mach Intell 32(1)

  27. 27.

    Yao A, Yao F (1985) A general approach to D-dimensional geometric queries. In: Proceedings of the annual ACM symposium on theory of computing

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mahadev Satyanarayanan.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Satyanarayanan, M., Sukthankar, R., Mummert, L. et al. The unique strengths and storage access characteristics of discard-based search. J Internet Serv Appl 1, 31–44 (2010). https://doi.org/10.1007/s13174-010-0001-z

Download citation

Keywords

  • Data-intensive computing
  • Non-text search technology
  • Medical image processing
  • Interactive search
  • Computer vision
  • Pattern recognition
  • Distributed systems
  • ImageJ
  • MATLAB
  • Parallel processing
  • Human-in-the-loop
  • Diamond
  • OpenDiamond
  • Storage systems
  • I/O workloads
  • RAID