TW098 Hand Gesture-controlled Music Player

排名 獲獎學生 學校 指導教授
季軍 洪彥倫、康凱儒、蔡竣宇 銘傳大學 黃炳森
 

1. 設計介紹

 The advent of user interface is an interesting journey. With great leaps and bounds from earlier times, humankind has entered into a period where convenience is everywhere to be found. Remote controls to the television, the garage door, or automobiles are just a hands reach away. That does not mean, however, that there is no room for improvement. That is why our project proposes to eliminate the remote control itself.

        By focusing on improving user interface even more through the use of hand gestures, and by implementing this new technology to the music player provided in the DE2-70 sample programs, we aim to introduce a new way of interacting with machines and software. Hand gestures are a simple, convenient, and intuitive form of control that has been demonstrated in science-fiction, new technologies, and most recently in the famous Xbox 360 Project Natal, which utilizes hand gesture control for media and games.

        The DE2-70 is perfectly suited towards building this project. By using the 5 mega pixel camera included in the kit for input, DE2-70 FPGA board for processing, and audio out as well as the included LCD touch screen for music and visual output, we hope to achieve an innovative, convenient, and easy to use system for a new generation.

        Our system works by taking in still images of hand gestures to interpret their form and executing a command. By refreshing continuously and taking in multiple frames per second, we created a real-time system that recognizes hand gestures. The hand gestures will control playback, volume, and selection of specific tracks. The LCD touch screen will correspondingly provide album art.

        For example, if a gesture resembling a hand pointing to the right is read into the system, the DE2-70 interprets the image and identifies the hand gesture. A signal is then sent to the Nios II to play the next track and display the album art of the corresponding music.

        By using hardware computing for image processing, we can increase the speed and response time of the overall system considerably. The Nios II soft-core embedded processor then takes the interpreted information and commences music playback as well as LCD screen display.

 

Target Audiences 

        Eventually we hope to move this project onto home/commercial multimedia centers. Since the system at its core is a new control scheme, it will hopefully be adapted and expanded to more versatile use, such as controlling heavy machinery, aiding power point presentations, etc.

        In addition, while this project may appeal to the mainstream crowd, we also find a major application to be found within the handicapped population, especially individuals who are physically-challenged. By providing quick access to any one of life’s essential items or leisure items, such as a music player, we provide a control scheme that can be accessed anytime, anywhere. 

  Development Board

        For our project, we decided to use Terasic’s Development and Education Board (DE2-70). The DE2-70 perfectly suited our needs, providing 18 switches, 4 push buttons, two 32 MB SDRAMs, 2 MB SSRAM, 8 MB Flash memory, an Altera Cyclone II FPGA device, SD Card support, and many other features. During our design, we decided to utilize this board as much as we could to take advantage of the powerful features that it provides.

  Figure 1. Terasic DE2-70: Development and Education Board

The board that we used also came with a 5 megapixel CMOS digital camera as well as 4.3" LCD Touch Panel Kit, both provided by Terasic.

Figure 2. D5M Digital Camera

 

Figure 3. LTM Touch Screen Panel

LTM.jpg

 

2. 功能描述

The functions that make up the system are:

  • Color Filter
  • Bounding Box around Glove
  • Gesture Interpretation
  • Software Music Playback
  • Touch Screen Controls
  • Touch Screen Display: Album Art

  Because we realize that in order for the user to enjoy the experience of using hand gestures to control music playback, the response time had to be as fast as possible. That's why our system utilizes hardware processing for algorithms.

  Color Filter

        The original goal of this project was of course to achieve hand recognition. Through the design process, however, we discovered that in order to realize our original plan, utilizing only skin detection was not sufficient, due other body areas of the same skin might interfere with music playback operations. For instance, the face or arm, which also has roughly the same RGB values of the hand, would disrupt the image recognition when shown in front of the camera. That is why we decided to use a red glove instead, leaving hand recognition for future plans, and our overall goal.

           In this project, we use a red glove as our main “remote control”, by using a color filter that grays out everything else that does not correspond to our glove color. In order to achieve this, we set a filter, where the red value must be above a certain value determined by the hardware switches, and the blue and green sum must be quite low compared to the red. The reason for implementing a threshold value, which is determined by the switches, is due to different lighting conditions, the glove will likewise have different RGB values.

  Figure 4. Color Detection

 

Bounding Box

        When we determine where our glove is, a bounding box is formed around the hand. A red glove is shown below to illustrate the bounding box formed around the red glove, filtering all other pixels as either black or gray. The bounding box works by identifying pixels that fit the prerequisite and move from the outer edge to the identified pixel.

 


Figure 5. Horizontal Bounding Box

DSC00140.JPG

 

Figure 6. Vertical Bounding Box

DSC00139.JPG

Gesture Interpretation

           For image interpretation, we use a technique that counts pixels and compares them within the bounding box. This method is realized by cutting the bounding box in halves along the y-axis as well as the x-axis. The red pixels are then counted and compared to see which half has more, thus determining which direction the hand is pointing in.

  Figure 7. Image Interpretation

Other Interpretations

        For other signals, we had to use a different method of interpretations, such as the stop sign, shown below. The stop sign is recognized because the upper right portion of the bounding box contains no red pixels. The lower left, however, does. When this condition is met, the player recognizes it and stops the music.

  Figure 8. Stop Sign Interpretation

stop.jpg

Software Music Playback

           The software potion of our system is a modified version of the SD-Card Music Player demo provided by Terasic Technologies. The accompanying C code allows for the fetching and playing of .wav audio files form a SD card. The modifications we made to the existing music player include the addition of several other user commands and synchronizing tracks with album art.

  Touch Screen Controls

Switching the music player to manual control allows the user to enter the various commands through the touch panel interface. The next track, volume up and volume down which came in the original music player demo were implemented in the push buttons. In our modified music player, we moved the controls on to the touch screen to make the player more modern. We also added a previous tract and stop command and are currently working on play and pause commands.

  Figure 9. Music Player Manual Controls for Touch Screen

 

Multiple Screen Display

The touch panel can be adjusted to one of three modes of operation depending on the user’s preference. The first mode allows the user to use gestures to control the camera while the touch panel displays album art. These pictures change whenever a song finishes and a new one starts or when the user changes a track. The second mode allows the users to use the manual controls displayed on the touch screen. The third mode is known as debug mode, where the user can view what’s being actively seen by the camera. Since the music is implemented in software and the LTM in hardware, having pictures change with each track required meticulous programming. We used an SOPC input-output pin (PIO) to constantly point to the song that is being played. As the track number increases or decreases, the PIO relays this information to the hardware block that controls picture transition. This in turn causes the track and the album art to synchronize whenever a new song starts.

 

 

3. 效能參數

        Our system has to be as responsive as possible for the user to enjoy the system. Our aim as a team was to use everything that the DE2-70 provided in terms of memory modules, but we also wanted a reliable and fast system. In figure 10, we can see the summary of the resources that our project is using.

Figure 10. System Summary

Device

Resources Used

Logic Elements

8,421 (12%)

Combinational functions

7,127 (10%)

Dedicated Logic Registers

4,984 (7%)

Total Registers

5,053

Total Pins

534 (86%)

Virtual Pins

0

Memory Bits

859,960 (75%)

Embedded Multiplier 9-bit Elements

4 (1%)

Phase Lock Loops

2 (50%)

  Image Processing

           The biggest factor in our project was how far a person could signal to the camera reliably. We conducted a series of tests, from half a meter away up to seven meters away. These pictures were taken under full light conditions from fluorescent t8 lights. The half meter shots and two meter shots were pretty straight forward, very reliable. However, as we started to extend the distance, we had to adjust our threshold value for the red color level in order to get a nice uniform bounding box around the hand.

  Figure 11. Suitable Threshold Values vs. Distance

Distance

Suitable Threshold Value

Half Meter

3000

Two Meters

3000

Four Meters

2500

Seven Meters

2500

 

Figure 12. Half Meter Shot 1

closeup2.JPG

 

Figure 13. Half Meter Shot 2

closeup3.JPG

   

Figure 14. 2 Meter Shot 1

medium.jpg

 

Figure 15. Two Meter Shot 2

medium1.JPG

 

Figure 16. Four Meter Shot 1

far1.JPG

 

Figure 17. Four Meter Shot 2

 

far2.JPG


 

Figure 18. Seven Meter Shot 1

superfar1.JPG

 

4. 設計架構

For design architecture, we will cover:

  • CMOS Digital Camera
  • Music Player and Touch Panel
  • Overall System Architecture

 

CMOS module

           For our project, we decided to use Terasic’s 5 megapixel digital camera, shown in figure 2. The camera outputs pixels in a Bayer pattern format, which consists of four colors: green1, green2, blue, and red. The layout is shown in figure 18. This Bayer pattern, or RAW data, is then passed through a module which converts it to RGB values (provided by Terasic), utilizing nearest neighbor approximation to define the RGB values of each pixel.

 

Figure 19. Bayer Pattern Output (from D5M hardware specifications manual)

 

Hardware for Music Player

           The DE2-70 has a Wolfson WM8731 24-bit sigma-delta audio CODEC with sampling frequency between 8 and 96 KHz. Much of the software, hardware and IPs associated with fetching decoding and playing music from the SD card where provided by Altera and Terasic Technologies. The music files are stored in wav format on a FAT16 SD card. The Nios II processor reads wav files from the SD card and uses the audio CODEC to play the tracks. Figure 19 show the block diagram for the music playing part of the system. The 50 MHz clock the controls the hand gesture-controlled music player, is also fed to a PLL that generates a 100 MHz signal that feeds the Nios II processor and all the other components in Figure 19 except the audio CODEC. An 18.432 MHz clock controls the audio CODEC.

 

Figure 20. Hardware Components of Music Player

diagram1.jpg

 

Figure 21. SOPC Components

 

Software for Music Player

            The stack of programs that facilitates the playing of wave file from the SD card is shown in figure 14. Communication between the SD card and Nios II processor follows the 1-bit mode protocol for reading raw data. The FAT 16 module executes FAT 16 files systems for reading data from the SD card. The WAVE Lib module implements decoding functions for receiving wave files. The I2C module implements protocols for configuring the audio CODEC. The Audio performs FIFO check and audio sending or receiving functions.  

 

Figure 22. Software Components of the music player (from DE2-70 manual)

 

Figure 23. System Architecture

 

Figure 24. Overview of System

 

 

5. 設計方法

Hand Gesture Control Implementation

          Going into this research, we really wanted to create a full-proof project. This meant that the best implementations were considered first. Gesture recognition actually went through many phases. We were initially keen to form a bounding box around the hand, and then after we filtered the color of the hand out, we would sense the number of peaks, thus concluding the number of fingers. Given the time constraints, however, we decided to implement a simpler method of pixel comparison by cutting the bounding box in half.

 

Touch screen Control Implementation

Putting buttons on the touch panel was a challenge. The reason being is that the Nios II processor that controls the music playback requires a falling edge signal in order to execute a command. In the music player provided, this sort of signals was hardware generated by pushing the push-buttons mounted on the DE board. The push-buttons goes to active low when pressed. This in turn supplies the falling edge needed by the Nios II processor. The touch panel can’t generate an active low signal the way the push-buttons can. To overcome this problem, we used Verilog to generate a falling edge whenever the input coordinate from the user’s finger matches a set of predefined coordinates. This Verilog generated falling edge is sent to Nios processor where it prompts play, pause, switch tracts, change volume or stop accordingly.

 

 

6. 設計特點

What makes this music system better than traditional remote controlled stereos? The answer is simple: the gesture controlled music player takes home entertainment to a whole new level by literally putting the power in the user’s fingertips.

 

Hardware Accelerated Image Interpretation

All of our image processing is computed and interpreted using hardware. Due to need for a responsive system, software interpretation was ruled out, and instead hardware is used to improve user interaction

Smart Implementation of Manual Control

Another option is the manual controls for the music player on the LCD Touch Module (LTM). This makes the system more user-friendly; when the user wants to increase/decrease volume, play previous/next track, instead of repeatedly pressing a button, the button can be held down until the desired volume or track is attained.

Use of New SOPC Components

Rather than using an LTM SOPC component to connect the LTM to the Avalon fabric, we conserve resources by using a few PIOs and connecting them between the Avalon fabric and the LTM.

SD Card Support

Music played by the system is read from the SD card, expanding the system storage capacity to the size of the SD card.

Hardware Threshold Adjustment

Since RGB values are subject to change due to different lighting conditions, our project supports red threshold adjustment to fit the given lighting conditions.

Album Art

Our project lets users input their album art, providing a great music experience with visual feedback.

Multiple Touch Panel Display Support

        Our project supports the use of multiple modes for gesture control, manual control, or debugging.

 

7. 總結

As a conclusion, we think we have achieved what we set out to create, which is a music player that is controlled entirely through the use of hand gestures. Not only have we carried through, but we also tried our best to implement more features such as album artwork and more controls to truly create a full music experience. One of the biggest drawbacks however, is that our system still requires a glove, sleeves, etc. to operate. Hand recognition is a subject entirely within its own category, and was also a topic we regrettably did not have enough time to tackle fully. Although skin detection itself was easily achieved, the interference from the arm and face was an obstacle that required much more thinking through, and thus is why we opted to forgo this step at the present time. Our dream of course, is to have the ultimate control scheme that of which does not require any extraneous materials such as a glove to achieve the ultimate freedom in human-machine interaction. Our future plans for the system will be:

  •    Full hand recognition
  •    Implement even more gestures
  •    Load album art from the SD Card
  •    Support lyrics of given song
  •    Add mp3 support

During the time that we spent on this project, we learned a great deal of things given the fact that this was the first time we dealt with FPGAs, and even the Verilog language was new to us. Including reinforcing our knowledge of computer architecture in general, we learned the benefits of using FPGAs versus traditional computer software designs.

This Nios II system proved to be invaluable for us in integrating software and hardware conveniently, and the DE2-70 provided the perfect board for easy access, partly in help from Terasic’s provided code, which gave us a platform to launch from. Thanks to both Altera and Terasic for providing this wonderful opportunity to realize our visions and put them into actual practice.