Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Retrieve Embedding Arrays From TensorBoard Logs #6879

Open
Louagyd opened this issue Jul 15, 2024 · 1 comment
Open

Unable to Retrieve Embedding Arrays From TensorBoard Logs #6879

Louagyd opened this issue Jul 15, 2024 · 1 comment

Comments

@Louagyd
Copy link

Louagyd commented Jul 15, 2024

I am encountering difficulties in retrieving embedding arrays that were logged using add_embedding from TensorBoard logs. I am unable to locate the actual embedding arrays. Below is a detailed description of the issue and the steps I have taken so far.

Steps to Reproduce
Logging Embeddings:

I used add_embedding to log embeddings in TensorBoard.
Example code for logging embeddings:

from torch.utils.tensorboard import SummaryWriter
import numpy as np

# Create a SummaryWriter
log_dir = 'logs/embedding_example'
writer = SummaryWriter(log_dir)

# Generate some dummy embeddings
embedding_data = np.random.randn(100, 64)  # 100 items with 64-dim embeddings
metadata = [f'Label {i}' for i in range(100)]

# Write the embeddings
writer.add_embedding(mat=embedding_data, metadata=metadata, global_step=1)

writer.close()
Attempting to Retrieve Embeddings:

I tried using EventAccumulator to load and parse the event files but was unable to locate the embedding arrays.
Example code for extracting embeddings:

import os
import numpy as np
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator

def extract_embeddings_from_log(log_dir):
    event_acc = EventAccumulator(log_dir, size_guidance={'tensors': 0})
    event_acc.Reload()

    embeddings = {}

    # Get tags for tensors (embeddings should be listed here)
    tensor_tags = event_acc.Tags()
    print(tensor_tags)

I would appreciate any guidance or suggestions on how to properly retrieve the embedding arrays logged using add_embedding. Specifically, I am looking for:

  • Confirmation on whether add_embedding embeddings should be accessible through EventAccumulator.
  • Corrections to my approach or alternative methods to extract the embeddings.
  • Any additional information on the correct tags or structures to look for within the TensorBoard logs.

Environment Details
Framework: PyTorch
Logging Library: TensorBoard
TensorBoard Version: 2.16.2
Python Version: 3.10
Operating System: Ubuntu 22.04

Thank you for your assistance.

@rileyajones
Copy link
Contributor

Embeddings are treated differently than other logs as they are really part of the projector plugin. As a result they are written to a separate file projector_config.pbtxt and only read in by the projector plugin.

I'm not sure exactly what you're trying to read out, but you may find success using something like this.

import os
import tensorflow as tf
from google.protobuf import text_format
from tensorboard.plugins import projector

with tf.io.gfile.GFile(
    os.path.join(logdir, "projector_config.pbtxt")
) as f:
    config2 = projector.ProjectorConfig()
    text_format.Parse(f.read(), config2)
    print(config2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants