Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human-friendly data collection system for object segmentation #1218

Open
wkentaro opened this issue Apr 15, 2017 · 7 comments
Open

Human-friendly data collection system for object segmentation #1218

wkentaro opened this issue Apr 15, 2017 · 7 comments

Comments

@wkentaro
Copy link
Member

The components of this system is

  1. Speech recognition
  2. Find moving object in pixel-level
  3. Track object in pixel-level or roi-level

For 1. @furushchev Do you know good ros node for speech recognition?

For 3. @iKrishneel Do you know good ros node for tracking in pixel-wise? I expect it uses point cloud, because roi-level tracking is common for 2D image: ex ConsensusTracking


tl;dr

I and @inabajsk talked about the importance of human-friendly data collection interface for object segmentation.
For example:

  • Human says to Pepper "Now I teach you this object".
  • Pepper says "Ok, what is that object name?"
  • Human says "wallet"
  • Pepper says "Ok, please show me"
  • Human moves object in front of Pepper.
  • Pepper do
    • Find moving object
    • Track it
    • Record it
@furushchev
Copy link
Member

For speech recognition, I think google speech recognition is a defacto standard.
I have simple speech recognition node for it. If you teach me where to commit, I'll send a pull request.
Or you can easily try with rwt_speech_recognition. (but it needs chrome)

@iKrishneel
Copy link

@wkentaro
Do you want to track in 3D or 2D?

@wkentaro
Copy link
Member Author

wkentaro commented Apr 16, 2017

@furushchev

For speech recognition, I think google speech recognition is a defacto standard.
I have simple speech recognition node for it. If you teach me where to commit, I'll send a pull request.
Or you can easily try with rwt_speech_recognition. (but it needs chrome)

Could you please send PR to jsk_recognition? (probably jsk_perception or new package)

Is this google speech recognition node? jsk-ros-pkg/jsk_recognition#1249

@iKrishneel

Both is ok for me, but I prefer pixel/point-wise tracking.

@wkentaro
Copy link
Member Author

wkentaro commented Apr 16, 2017

For 2.
@iory Do you know some program to find moving object? I expect you know much about finding object human holding, for imitation learning.

@k-okada
Copy link
Member

k-okada commented Apr 17, 2017 via email

@iKrishneel
Copy link

@wkentaro
Pixel-wise tracking is not common although there are methods that does it is not as competitive as region or part based methods. Besides it also suffers under harsh image conditions.
I wrote a tracking algorithm which is a region based tracking where object region is defined by bounding box. I will update this on my github soon.

@wkentaro
Copy link
Member Author

wkentaro commented Apr 20, 2017

@iKrishneel @k-okada @furushchev Thank you all.

Currently, I selected the use of optical flow and the demo movies are below:
HRP2 camera RGB-D: https://drive.google.com/open?id=0B9P1L--7Wd2vaVlHRC10Z2dwQkk
Just camera RGB: https://drive.google.com/open?id=0B9P1L--7Wd2vR3ktR0xUSFRMdkk

And the collected images are such like below:

What I found are

  • Optical flow works
  • Use of depth is not effective because it leads small object image (because of depth range)
  • Tracking is not needed because I'd like to collect image when object pose is changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants