Computer vision has become increasingly important and effective in recent years due to its wide-ranging applications in areas as diverse as smart surveillance and monitoring, health and medicine, sports and recreation, robotics, drones, and self-driving cars. Visual recognition tasks, such as image classification, localization, and detection, are the core building blocks of many of these applications, and recent developments in Convolutional Neural Networks (CNNs) have led to outstanding performance in these state-of-the-art visual recognition tasks and systems. As a result, CNNs now form the crux of deep learning algorithms in computer vision.
This self-contained guide will benefit those who seek to both understand the theory behind CNNs and to gain hands-on experience on the application of CNNs in computer vision. It provides a comprehensive introduction to CNNs starting with the essential concepts behind neural networks: training, regularization, and optimization of CNNs. The book also discusses a wide range of loss functions, network layers, and popular CNN architectures, reviews the different techniques for the evaluation of CNNs, and presents some popular CNN tools and libraries that are commonly used in computer vision. Further, this text describes and discusses case studies that are related to the application of CNN in computer vision, including image classification, object detection, semantic segmentation, scene understanding, and image generation.
This book is ideal for undergraduate and graduate students, as no prior background knowledge in the field is required to follow the material, as well as new researchers, developers, engineers, and practitioners who are interested in gaining a quick understanding of CNN models.
Salman Khan received a B.E. in Electrical Engineering from the National University of Sciences and Technology (NUST) in 2012 with high distinction, and a Ph.D. from The University of Western Australia (UWA) in 2016. His Ph.D. thesis received an Honorable Mention on the Dean's list Award. In 2015, he was a visiting researcher with National ICT Australia, Canberra Research Laboratories. He is currently a Research Scientist with Data61, Commonwealth Scientific and Industrial Research Organization (CSIRO), and has been an Adjunct Lecturer with Australian National University (ANU) since 2016. He was awarded several prestigious scholarships such as the International Postgraduate Research Scholarship (IPRS) for Ph.D. and the Fulbright Scholarship for MS. He has served as a program committee member for several leading computer vision and robotics conferences such as IEEE CVPR, ICCV, ICRA, WACV, and ACCV. His research interests include computer vision, pattern recognition, and machine learning.
Hossein Rahmani received his BSc. in Computer Software Engineering in 2004 from Isfahan University of Technology, Isfahan, Iran and his MSc. degree in Software Engineering in 2010 from Shahid Beheshti University, Tehran, Iran. He completed his Ph.D. from The University of Western Australia in 2016. He has published several papers in top conferences and journals such as CVPR, ICCV, ECCV, and TPAMI. He is currently a Research Fellow in the School of Computer Science and Software Engineering at The University of Western Australia. He has served as a reviewer for several leading computer vision conferences and journals such as IEEE TPAMI, and CVPR. His research interests include computer vision, action recognition, 3D shape analysis, and machine learning.
Syed Afaq Ali Shah received his B.Sc. and M.Sc. degrees in Electrical Engineering from the University of Engineering and Technology (UET) Peshawar, in 2003 and 2010, respectively. He obtained his Ph.D. from the University of Western Australia in the area of computer vision and machine learning in 2016. He is currently working as a research associate in the school of computer science and software engineering at the University of Western Australia, Crawley, Australia. He has been awarded the ""Start Something Prize for Research Impact through Enterprise"" for 3D facial analysis project funded by the Australian Research Council. He has served as a program committee member for ACIVS 2017. His research interests include deep learning, computer vision, and pattern recognition.
Mohammed Bennamoun received his M.Sc. from Queen's University, Kingston, Canada in the area of Control Theory, and his Ph.D. from Queen's QUT in Brisbane, Australia, in the area of Computer Vision. He lectured Robotics at Queen's, and then joined QUT in 1993 as an associate lecturer. He is currently a Winthrop Professor.
He served as the Head of the School of Computer Science and Software Engineering at The University of Western Australia (UWA) for five years (February 2007-March 2012). He served as the Director of a University Centre at QUT: The Space Centre for Satellite Navigation from 1998-2002. He served as a member of the Australian Research Council (ARC) College of Experts from 2013-2015. He was an Erasmus Mundus Scholar and Visiting Professor in 2006 at the University of Edinburgh. He was also a visiting professor at CNRS (Centre National de la Recherche Scientifique) and Telecom Lille1, France in 2009, The Helsinki University of Technology in 2006, and The University of Bourgogne and Paris 13 in France in 2002-2003. He is the co-author of the book Object Recognition: Fundamentals and Case Studies (Springer Verlag, 2001), and the co-author of an edited book Ontology Learning and Knowledge Discovery Using the Web, published in 2011.
Mohammed has published over 100 journal papers and over 250 conference papers, and secured highly competitive national grants from the ARC, government, and other funding bodies. Some of these grants were in collaboration with industry partners (through the ARC Linkage Project scheme) to solve real research problems for industry, including Swimming Australia, the West Australian Institute of Sport, a textile company (Beaulieu Pacific), and AAMGeoScan. He worked on research problems and collaborated (through joint publications, grants, and supervision of Ph.D. students) with researchers from different disciplines including animal biology, speech processing, biomechanics, ophthalmology, dentistry, linguistics, robotics, photogrammetry, and radiology. He has collaborated with researchers from within Australia (e.g., CSIRO), as well as internationally (e.g. Germany, France, Finland, U.S.). He won several awards, including the Best Supervisor of the Year Award at QUT in 1998, an award for teaching excellence (research supervision), and the Vice-Chancellor's Award for Research Mentorship in 2016. He also received an award for research supervision at UWA in 2008.He has served as a guest editor for a couple of special issues in international journals, such as the International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI). He was selected to give conference tutorials at the European Conference on Computer Vision (ECCV), the International Conference on Acoustics Speech and Signal Processing (IEEE ICASSP), the IEEE International Conference on Computer Vision (CVPR 2016), Interspeech (2014), and a course at the International Summer School on Deep Learning (DeepLearn2017). He has organized several special sessions for conferences, including a special session for the IEEE International Conference in Image Processing (IEEE ICIP). He was on the program committee of many conferences, e.g., 3D Digital Imaging and Modeling (3DIM) and the International Conference on Computer Vision. He also contributed in the organization of many local and international conferences. His areas of interest include control theory, robotics, obstacle avoidance, object recognition, machine/deep learning, signal/image processing, and computer vision (particularly 3D).
Gerard Medioni received the Diplome d'Ingenieur in Information at The Ecole Nationale Superieure es Telecommunications, in 1977, and the M.S. and Ph.D. degrees in Computer Science from the University of Southern California, in 1980 and 1983, respectively. He has been at USC since then, and is currently Professor of Computer Science and Electrical Engineering, co-director of the Institute for Robotics and Intelligent Systems (IRIS), and co-director of the USC Games Institute. He served as Chairman of the Computer Science Department from 2001 to 2007. Prior to this, he was President and CEO of I.C. Vision, in Los Angeles, California, and held positions of Associate Professor, from 1992-1999, Assistant Professor, from 1987-1992, and Research Assistant Professor, from 1983-1987, at the Departments of Computer Science and Electrical Engineering, at the University of Southern California. From 1979-1983, he was a Research Assistant in the Intelligent Systems Group at the University of Southern California. Prior to his academic career, he was a research engineer at Underwater Signal Processing Division at Thomson-CSF, in Cagnes sur Mer, France. From 2000 to 2001, while on sabbatical leave, he was Chief Technical Officer at Geometrix, Inc. in San Jose, California.
Professor Medioni has made significant contributions to the field of computer vision. His research covers a broad spectrum of the field, such as edge detection, stereo and motion analysis, shape inference and description, and system integration. He has published 3 books, over 50 journal papers and 150 conference articles, and is the recipient of 8 international patents.
Prof. Medioni is associate editor of the Image and Vision Computing Journal, associate editor of the Pattern Recognition and Image Analysis Journal, and associate editor of the International Journal of Image and Video Processing.
Prof. Medioni served as program co-chair of the 1991 IEEE CVPR Conference in Hawaii, of the 1995 IEEE Symposium on Computer Vision in Miami, general co-chair of the1997 IEEE CVPR Conference in Puerto Rico, conference co-chair of the 1998 ICPR Conference in Australia, general co-chair of the 2001 IEEE CVPR Conference in Kauai, general co-chair of the 2007 IEEE CVPR Conference in Minneapolis, and general co-chair of the upcoming 2009 IEEE CVPR Conference in Miami. He is a Fellow of IAPR, a Fellow of the IEEE, and a Fellow of AAAI.
Sven Dickinson received the B.A.Sc. degree in Systems Design Engineering from the University of Waterloo in 1983, and the M.S. and Ph.D. degrees in Computer Science from the University of Maryland, in 1988 and 1991, respectively. He is currently Professor of Computer Science at the University of Toronto, where he serves as Acting Chair. Prior to that, he served as Departmental Vice Chair, from 2003-2006, and as Associate Professor, from 2000-2007. From 1995-2000, he was an Assistant Professor of Computer Science at Rutgers University, where he also held a joint appointment in the Rutgers Center for Cognitive Science (RuCCS) and membership in the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS). From 1994-1995, he was a Research Assistant Professor in the Rutgers Center for Cognitive Science, and from 1991-1994, a Research Associate at the Artificial Intelligence Laboratory, University of Toronto. He has held affiliations with the MIT Media Laboratory (Visiting Scientist, 1992-1994), the University of Toronto (Visiting Assistant Professor, 1994 1997), and the Computer Vision Laboratory of the Center for Automation Research at the University of Maryland (Assistant Research Scientist, 1993-1994, Visiting Assistant Professor, 1994 1997). Prior to his academic career, he worked in the computer vision industry, designing image processing systems for Grinnell Systems Inc., San Jose, CA, 1983-1984, and optical character recognition systems for DEST, Inc., Milpitas, CA, 1984-1985.
His research interests revolve around the problem of object recognition, in general, and generic object recognition, in particular. He has explored a multitude of generic shape representations, and their common representation as hierarchical graphs has led to his interest in inexact graph indexing and matching. His interest in shape representation and matching has also led to his research in object tracking, vision-based navigation, content based image retrieval, and the use of language to guide perceptual grouping, object recognition, and motion analysis. One of the focal points of his research is the problem of image abstraction, which he believes is critical in bridging the representational gap between exemplar-based and generic object recognition. He has published over 100 papers on these topics in refereed journals, conferences, and edited collections. In 1996, he received the NSF CAREER award for his work in generic object recognition, and in 2002, received the Government of Ontario Premiere's Research Excellence Award (PREA), also for his work in generic object recognition. He was co-chair of the 1997, 1999, 2004, and 2007 IEEE International Workshops on Generic Object Recognition (or Object Categorization), co chaired the DIMACS Workshop on Graph Theoretic Methods in Computer Vision in 1999, and co-chaired the First International Workshop on Shape Perception in Human and Computer Vision in 2008. From 1998-2002, he served as Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence, in which he also co-edited a special issue on graph algorithms and computer vision, which appeared in 2001. He currently serves as Associate Editor for the journals: International Journal of Computer Vision; Image and Vision Computing; Pattern Recognition Letters; IET Computer Vision; and the Journal of Electronic Imaging.