The unique strengths and storage access characteristics of discard-based search

Satyanarayanan, Mahadev; Sukthankar, Rahul; Mummert, Lily; Goode, Adam; Harkes, Jan; Schlosser, Steve

doi:10.1007/s13174-010-0001-z

Original Paper
Open access
Published: 24 February 2010

The unique strengths and storage access characteristics of discard-based search

Mahadev Satyanarayanan¹,
Rahul Sukthankar²,
Lily Mummert²,
Adam Goode¹,
Jan Harkes¹ &
…
Steve Schlosser³

Journal of Internet Services and Applications volume 1, pages 31–44 (2010)Cite this article

811 Accesses
4 Citations
1 Altmetric
Metrics details

Abstract

Discard-based searchis a new approach to searching the content of complex, unlabeled, nonindexed data such as digital photographs, medical images, and real-time surveillance data. The essence of this approach is query-specific content-based computation, pipelined with human cognition. In this approach, query-specific parallel computation shrinks a search task down to human scale, thus allowing the expertise, judgment, and intuition of an expert to be brought to bear on the specificity and selectivity of the search. In this paper, we report on the lessons learned in the Diamond projectfrom applying discard-based search to a variety of applications in the health sciences. From the viewpoint of a user, discard-based search offers unique strengths. From the viewpoint of server hardware and software, it offers unique opportunities for optimization that contradict long-established tenets of storage design. Together, these distinctive end-to-end attributes herald a new genre of Internet applications.

References

Acharya A, Uysal M, Saltz J (1998) Active disks: programming model, algorithms and evaluation. In: Proceedings of the international conference on architectural support for programming languages and operating systems
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems
Amiri K, Petrou D, Ganger G, Gibson G (2000) Dynamic function placement for data-intensive cluster computing. In: Proceedings of the USENIX technical conference
Arpaci-Dusseau R, Anderson E, Treuhaft N, Culler D, Hellerstein J, Patterson D, Yelick K (1999) Cluster I/O with river: making the fast case common. In: Proceedings of input/output for parallel and distributed systems
Avnur R, Hellerstein J (2000) Eddies: continuously adaptive query processing. In: Proceedings of SIGMOD
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the USENIX symposium on operating systems design and implementation, San Francisco, CA
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1)
Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. IEEE Comput 28(9)
Gibbons P, Mummert L, Sukthankar R, Satyanarayananan M (2007) Just-in-time indexing for interactive data exploration. Tech Rep CMU-CS-07-120, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA
Goode A, Sukthankar R, Mummert L, Chen M, Saltzman J, Ross D, Szymanski S, Tarachandani A, Satyanarayanan M (2008) Distributed online anomaly detection in high-content screening. In: Proceedings of the 2008 5th IEEE international symposium on biomedical imaging, Paris, France
Goode A, Chen M, Tarachandani A, Mummert L, Sukthankar R, Helfrich C, Stefanni A, Fix L, Saltzmann J, Satyanarayanan M (2007) Interactive search of adipocytes in large collections of digital cellular images. In: Proceedings of the 2007 IEEE international conference on multimedia and expo (ICME07), Beijing, China
Goode A, Satyanarayanan M (2008) A vendor-neutral library and viewer for whole-slide images. Tech Rep CMU-CS-08-136, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA
Hunt G, Scott M (1999) The Coign automatic distributed partitioning system. In: Proceedings of OSDI
Huston L, Sukthankar R, Wickremesinghe R, Satyanarayanan M, Ganger GR, Riedel E, Ailamaki A (2004) Diamond: a storage architecture for early discard in interactive search. In: Proceedings of the 3rd USENIX conference on file and storage technologies, San Francisco, CA
Keeton K, Patterson D, Hellerstein J (1998) A case for intelligent disks (IDISKs). SIGMOD Rec 27(3)
Kim E, Haseyama M, Kitajima H (2002) Fast and robust ellipse extraction from complicated images. In: Proceedings of IEEE information technology and applications
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis
Merriam-Webster (2007) Merriam–Webster online search. http://mw1.merriam-webster.com/dictionary/tenet
Minka T, Picard R (1997) Interactive learning using a society of models. Pattern Recognit 30
Mummert L, Schlosser S, Mesnier M, Satyanarayanan M (2007) Rethinking storage for discard-based search. Tech Rep CMU-CS-07-176, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA
Patterson RH, Gibson GA, Ginting E, Stodolsky D, Zelenka J (1995) Informed prefetching and caching. In: Proceedings of the fifteenth ACM symposium on operating systems principles, Copper Mountain, CO
Riedel E, Gibson G, Faloutsos C (1998) Active storage for large-scale data mining and multimedia. In: Proceedings of the international conference on very large databases
Stonebreaker M (1981) Operating system support for database management. Commun ACM 24(7)
Wikipedia (2007) Conveyor belt sushi. Wikipedia, The Free Encyclopedia. [Online: accessed 3-September-2007]
Yang L, Jin R, Mummert L, Sukthankar R, Goode A, Zheng B, Hoi SC, Satyanarayanan M (2010) A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval. IEEE Trans Pattern Anal Mach Intell 32(1)
Yao A, Yao F (1985) A general approach to D-dimensional geometric queries. In: Proceedings of the annual ACM symposium on theory of computing

Download references

Author information

Authors and Affiliations

Carnegie Mellon Univ., Pittsburgh, PA, USA
Mahadev Satyanarayanan, Adam Goode & Jan Harkes
Intel Labs Pittsburgh, Pittsburgh, PA, USA
Rahul Sukthankar & Lily Mummert
Avere Systems, Pittsburgh, PA, USA
Steve Schlosser

Authors

Mahadev Satyanarayanan
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Sukthankar
View author publications
You can also search for this author in PubMed Google Scholar
Lily Mummert
View author publications
You can also search for this author in PubMed Google Scholar
Adam Goode
View author publications
You can also search for this author in PubMed Google Scholar
Jan Harkes
View author publications
You can also search for this author in PubMed Google Scholar
Steve Schlosser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahadev Satyanarayanan.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Satyanarayanan, M., Sukthankar, R., Mummert, L. et al. The unique strengths and storage access characteristics of discard-based search. J Internet Serv Appl 1, 31–44 (2010). https://doi.org/10.1007/s13174-010-0001-z

Download citation

Received: 26 January 2010
Accepted: 02 February 2010
Published: 24 February 2010
Issue Date: May 2010
DOI: https://doi.org/10.1007/s13174-010-0001-z