Play with a live Neptune project -> Take a tour 📈

Deep Dive Into TensorBoard: Tutorial With Examples

There is a common business saying that you can’t improve what you don’t measure . This is true in machine learning as well. There are various tools for measuring the performance of a deep learning model: Neptune AI, MLflow, Weights and Biases, Guild AI, just to mention a few. In this piece, we’ll focus on TensorFlow’s open-source visualization toolkit TensorBoard . 

The tool enables you to track various metrics such as accuracy and log loss on training or validation set. As we shall see in this piece, TensorBoard provides several tools that we can use in machine learning experimentation. The tool is also fairly easy to use. 

Visualizing Machine Learning Models: Guide and Tools

The Best Tools for Machine Learning Model Visualization

Here are some things we’ll cover in this text:

  • Visualizing images in TensorBoard
  • Checking model weights and biases on TensorBoard
  • visualizing the model’s architecture
  • sending a visual of the confusion matrix to TensorBoard
  • profiling your application so as to see its performance , and
  • using TensorBoard with Keras , PyTorch , and XGBoost

The Best TensorBoard Alternatives (2021 Update)

Let’s get to it. 

How to use TensorBoard

This section will focus on helping you understand how to use TensorBoard in your machine learning workflow. 

How to install TensorBoard

Before you can start using TensorBoard you have to install it either via pip or via conda

Using TensorBoard with Jupyter notebooks and Google Colab

With TensorBoard installed, you can now load it into your Notebook. Note that you can use it in a Jupyter Notebook or Google’s Colab . 

Once that is done you have to set a log directory . This is where TensorBoard will store all the logs. It will read from these logs in order to display the various visualizations.

In the event that you want to reload the TensorBoard extension , the command below will do the magic — no pun intended. 

You might want to clear the current logs so that you can write fresh ones to the folder. You can achieve that by running this command on Google Colab

on Jupyter Notebooks

If you are running multiple experiments , you might want to store all logs so that you can compare their results. This can be achieved by creating logs that are timestamped. To do that, use the command below: 

How to run TensorBoard

Running Tensorboard involves just one line of code. In this section you’ll see how to do this. 

Let’s now walk through an example where you will use TensorBoard to visualize model metrics. For that purpose, you need to build a simple image classification model.

Next, load in the TensorBoard notebook extension and create a variable pointing to the log folder . 

How to use TensorBoard callback

The next step is to specify the TensorBoard callback during the model’s fit method. In order to do that you first have to import the TensorBoard callback .

This callback is responsible for logging events such as Activation Histograms, Metrics Summary Plots , Profiling and Training Graph Visualizations . 

With that in place, you can now create the TensorBoard callback and specify the log directory using log_dir . The TensorBoard callback also takes other parameters:

  • histogram_freq is the frequency at which to compute activation and weight histograms for layers of the model. Setting this to 0 means that histograms will not be computed. In order for this to work you have to set the validation data or the validation split . 
  • write_graph dictates if the graph will be visualized in TensorBoard 
  • write_images when set to true, model weights are visualized as an image in TensorBoard
  • update_freq determines how losses and metrics are written to TensorBoard. When set to an integer, say 100, losses and metrics are logged every 100 batches. When set to batch the losses and metrics are set after every batch. When set to epoch they are written after every epoch
  • profile_batch determines which batches will be profiled. By default, the second batch is profiled. You can also set, for example from 5 and to 10, to profile batches 5 to 10, i.e profile_batch=’5,10′ . Setting profile_batch to 0 disables profiling.
  • embeddings_freq the frequency at which the embedding layers will be visualized. Setting this to zero means that the embeddings will not be visualized

The next item is to fit the model and pass in the callback .

How to launch TensorBoard

If you installed TensorBoard via pip, you can launch it via the command line

On a Notebook, you can launch it using:

The TensorBoard is also available via the browser using the following URL

Running TensorBoard remotely

When working on a remote server, you can use SSH tunneling to forward the port of the remote server to your local machine at port (port 6006 in this example). This is how this would look like:

With that inplace, you can run the TensorBoard in the normal way.

Just remember that the port you specify in tensorboard command (by default it is 6006) should be the same as the one in the ssh tunneling.

Note: If you are using the default port 6006 you can drop –port=6006. You will be able to see the TensorBoard on the local machine but TensorBoard will actually be running on the remote server. 

TensorBoard dashboard

Let us now look at the various tabs on the TensorBoard.

TensorBoard scalars

The Scalars tab shows changes in the loss and metrics over the epochs. It can be used to track other scalar values such as learning rate and training speed.

tensorboard summarywriter resume

TensorBoard images

This dashboard has images that show the weights. Adjusting the slider displays the weights at various epochs. 

tensorboard summarywriter resume

TensorBoard graphs

This tab shows your model’s layers. You can use this to check if the architecture of the model looks as intended. 

tensorboard summarywriter resume

TensorBoard distributions

The distribution tab shows the distribution of tensors. For example in the dense layer below, you can see the distribution of the weights and biases over each epoch. 

tensorboard summarywriter resume

TensorBoard histograms

The Histograms show the distribution of tensors over time. For example, looking at dense_1 below, you can see the distribution of the biases over each epoch. 

tensorboard summarywriter resume

Using the TensorBoard projector

You can use TensorBoard’s Projector to visualize any vector representation e.g. word embeddings and images . 

Words embeddings are numerical representations of words that capture their semantic relationship. The projector helps you see those representations. You can find it under the Inactive dropdown. 

Plot training examples with TensorBoard

You can use TensorFlow Image Summary API to visualize training images. This is especially useful when working with image data like in this case.

Now, create a new log directory for the images as shown below. 

The next step is to create a file writer and point it to this directory.

At the beginning of this article (in the “How to run TensorBoard” section), you specified that the image shape was 28 by 28. It is important information when reshaping the images before writing them to TensorBoard. You also need to specify the channel to be 1 because the images are grayscale. Afterward, you use the file_write to write the images to TensorBoard. 

In this example, the images at index 10 to 30 will be written to TensorBoard. 

Visualize images in TensorBoard

Apart from visualizing image tensors, you can also visualize actual images in TensorBoard. In order to illustrate that,  you need to convert the MNIST tensors to images using Matplotlib. After that, you need to use `tf.summary.image` to plot the images in Tensorboard. 

Start by clearing the logs, alternatively you can use timestamped log folders. After that specify the log directory and create a `tf.summary.create_file_writer` that will be used to write the images to TensorBoard

Next, create a grid that will hold the images. In this case, the grid will hold 36 digits.

Now convert the digits into a single image to visualize it in the TensorBoard.

The next step is to use the writer and `plot_to_image` to display the images on TensorBoard.

tensorboard summarywriter resume

Log confusion matrix to TensorBoard

Using the same example, you can log the confusion matrix for all epochs. First, define a function that will return a Matplotlib figure holding the confusion matrix . 

Next, clear the previous logs, define the log directory for the confusion matrix, and create a writer variable for writing into the log folder. 

The step that follows this is to create a function that will make predictions from the model and log the confusion matrix as an image. 

After that use the `file_writer_cm to write` the confusion matrix to the log directory. 

This will be followed by the definition of the TensorBoard callback and the LambdaCallback . 

The LambdaCallback will log the confusion matrix on every epoch. Finally fit the model using these two callbacks. 

Since you’ve already fitted the model before, it would be advisable to restart your runtime and ensure that you are fitting the model just once.

Now run TensorBoard and check the confusion matrix on the Images tab. 

Hyperparameter tuning with TensorBoard

Another cool thing you can do with TensorBoard is use it to visualize  parameter optimization . Sticking to the same MNIST example, you can attempt to tune the hyperparameters of the model (manually or using automated hyperparameter optimization) and visualize them in TensorBoard. 

Here’s the final result that you expect to obtain. The dashboard is available under the HPARAMS tab. 

tensorboard summarywriter resume

To achieve this you have to clear the previous logs and import the hparams plugin.

Hyperparameter Tuning in Python: a Complete Guide 2021

The next step is to define the parameters to  tune. In this case,the units in the dense layer, the dropout rate, and the optimizer function will be tuned. 

Next, use the tf.summary.create_file_writer to define the folder where the logs will be stored.

With that out of the way, you need to define the model as you did previously. The only difference is that the number of neurons in the first dense layer, the drop out rate, and the optimizer function won’t be hardcoded. 

This will be done in a function that will be used later, while running the experiments. 

The next function you need to create will run the function above using the parameters defined earlier. It will then log the accuracy. 

After this, you need to run this function for all combinations of the parameters defined above. Each of the experiments will be stored in its own folder. 

tensorboard summarywriter resume

Finally, run TensorBoard to see the visualization you saw at the beginning of this section. 

On the HPARAMS tab, the Table View shows all the model runs and their corresponding accuracy, dropout rate, and dense layer neurons. The Parallel Coordinates View shows every run as a line moving through an axis for each of the hyperparameters and accuracy metric. 

Clicking one of them will display the trials and hyperparameters as shown below.

tensorboard summarywriter resume

The Scatter Plot View visualizes the comparison between the hyperparameters and the metrics. 

tensorboard summarywriter resume

TensorFlow Profiler

You can also track the performance of TensorFlow models using Profiler . Profiling is crucial to understand the hardware resources consumption of TensorFlow operations. Before you can do that you have to install the profiler plugin. 

Once it’s installed, it will be available under the Inactive dropdown. Here’s a snapshot of one of the many visuals seen on the profiler. 

tensorboard summarywriter resume

The only thing you have to do now is define a callback and include the batches that will be profiled. 

After that, you pass the callback as you fit the model. Don’t forget to call TensorBoard so that you can see the visualizations. 

Tensorboard tutorial logdir

Overview page

The Overview Page on the Profile Tab shows a high-level overview of the model’s performance. As you can see from the image below, the Performance Summary shows the:

  • time spent compiling the kernels,
  • the time spent reading the data, 
  • time spent launching the kernels,
  • time spent in producing the output,
  • on-device compute time, and 
  • host compute time

The Step Time Graph shows a visual of device step time over all the steps that have been sampled. The different colors on the graph portray the various categories where time is spent:

  • The red portion corresponds to the step time where the devices were idle as they waited for input data. 
  • The green part displays the amount of time that the device was actually working. 

tensorboard summarywriter resume

Still, on the overview page, you can see the TensorFlow operations that took the longest to run. 

tensorboard summarywriter resume

The Run environment shows the environment information such as the number of hosts used, device type , and the number of device cores . In case, you can see that there is 1 host with a GPU containing 1 core on the Colab’s runtime. 

tensorboard summarywriter resume

Another thing you can see from this page is recommendations for optimizing the performance of the model. 

tensorboard summarywriter resume

Trace viewer

The Trace Viewer can be used to understand performance bottlenecks in the input pipeline. It shows a timeline for different events that happened on the GPU or CPU during the period of profiling . 

On the vertical axis, it shows various event groups and event traces on the horizontal axis. In the image below I have used the keyboard shortcut w to Zoom in the events. To Zoom out,  use the keyboard shortcut S . A and D can be used to move to the left and right respectively. 

tensorboard summarywriter resume

You can click on an individual event to analyze it further. Use the cursor on the floating toolbar or use the keyboard shortcut 1 . 

The image below shows the result of analyzing the SparseSoftmaxCrossEntropyWithLogits event (calculation of the loss on a batch of data) that shows the start and wall duration. 

tensorboard summarywriter resume

You can also check the summary of various events by holding onto the Ctrl key and selecting them. 

tensorboard summarywriter resume

Input pipeline analyzer

The Input Pipeline Analyzer can be used to analyze inefficiencies in the input pipeline of your model. 

tensorboard summarywriter resume

The functionality shows the Summary of input-pipeline analysis, Device-side analysis details , and the Host-side analysis details . 

The Summary of input-pipeline analysis shows the overall input pipeline . It is the part that informs whether the application is input bound and by how much. 

tensorboard summarywriter resume

The Device-side analysis details show the device step-time and the device time spent waiting for input data. 

The Host-side Analysis displays analysis on the host side such as a breakdown of the input processing time on the host. 

On the Input Pipeline Analyzer, you can also see statistics about individual input operations , the time taken and their category . Here’s what the various columns represent:

  • Input Op  — the TensorFlow operation name for the input operation
  • Count  — the number of instances of the operation execution during the profiling time
  • Total Time  — the cumulative sum of time spent on each instance mentioned above
  • Total Time %  — is the total time spent on an operation as a percentage of the total time spent on input processing
  • Total Self Time  — the cumulative sum of the self-time spent on each instance. 
  • Total Self Time %  — the total self-time as a percentage of the total time spent on input processing 
  • Category  — the processing category of the input operation 

tensorboard summarywriter resume

TensorFlow stats

This dashboard shows the performance of every TensorFlow operation that has been executed on the host. 

  • The first pie chart shows the distribution of the self-execution time of each operation on the host. 
  • The second one shows the distribution of self-execution time on each operation type on the host. 
  • The third displays the distribution of the self-execution time of each operation on the device. 
  • The fourth one displays the distribution of the self-execution time on each operation type on the device. 

tensorboard summarywriter resume

The table below the pie charts shows the TensorFlow operations . Each row is an operation. The columns show various aspects of each operation. You can filter the table using any of the columns. 

tensorboard summarywriter resume

Below the table above you can see various TensorFlow operations grouped by the type. 

tensorboard summarywriter resume

GPU kernel stats

This page shows performance statistics and the originating operation for each GPU accelerated kernel. 

tensorboard summarywriter resume

Below the Kernel Stats is a table that shows among other things, the kernels, and time spent on various operations. 

tensorboard summarywriter resume

Memory profile page

This page shows the utilization of memory during the profiling period. It has the following sections; Memory Profile Summary, Memory Timeline Graph, and Memory Breakdown Table.

  • The Memory Profile Summary shows a summary of the memory profile of the TensorFlow application. 
  • The Memory Timeline Graph shows a plot of the memory usage in GiBs and the percentage of fragmentation versus time in milliseconds. The 
  • Memory Breakdown Table displays the active memory allocations at the point of the highest memory usage in the profiling interval. 

tensorboard summarywriter resume

How to enable debugging on TensorBoard

You can also dump debug information to your TensorBoard. To do that you  have to enable debug – it is still in the experimental mode

The Dashboard can be viewed on Debugger V2 under the Inactive dropdown. 

tensorboard summarywriter resume

The Debugger V2 GUI has Alerts , Python Execution Timeline , Graph Execution, and Graph Structure . The Alerts section shows your program’s anomalies. The Python Execution Timeline section shows the history of the eager execution of operations and graphs. 

The Graph Execution displays the history of all the floating-dtype tensors that have been computed inside graphs. The Graph Structure section has the Source Code and Stack Trace that are populated as you interact with the GUI. 

Using TensorBoard with deep learning frameworks

You are not limited to using TensorBoard with TensorFlow alone. You can also use it with other frameworks such as Keras, PyTorch and XGBoost, just to mention a few.

TensorBoard in PyTorch 

You start by defining a writer pointing to the folder where you would like to have the logs written. 

The next step is to add the items you would like to see on TensorBoard using the summary writer. 

TensorBoard in Keras 

Since TensorFlow uses Keras as the official high level API, the TensorBoard implementation is similar to it’s implementation in TensorFlow. We have already seen how to do this:

Create a callback: 

Pass it to `model.fit`: 

TensorBoard in XGBoost

When working with XGBoost, you can also log events to TensorBoard. The tensorboardX package is required for that. For example, to log metrics and losses you can use `SummaryWriter` and log scalars. 

Tensorboard.dev

Tensorboard.dev is a managed TensorBoard platform that makes it easy to host, track, and share ML experiments. It allows one to publish their TensorBoard experiments, troubleshoot as well as collaborate with team members. Once you have a TensorBoard experiment, uploading it  to TensorBoard.dev is quite straightforward. 

Once you run this command you will get a prompt to authorize TensorBoard.dev with your Google account. Once you do this you will get a verification code that you will enter to authenticate.

Tensorboard tutorial authentification

This will then generate a unique TensorBoard.Dev link for you. Here’s an example of such a link .  As you can see this is  very similar to viewing TensorBoard on the localhost, only now you are viewing it online. 

Once you land here, you can interact with the TensorBoard just like you have in previous parts of this piece. 

tensorboard dev

It is important to note that this TensorBoard will be visible to everyone on the internet, so ensure that you are not uploading any sensitive data. 

Limitations of using TensorBoard

As you have seen TensorBoard gives you a lot of great features. That said, using TensorBoard is not all rosy. 

There are some limitations to it:

  • difficult to use in a team setting where collaboration is required 
  • there is no user and workspace management : features that are often  required in larger organizations 
  • you can’t perform data and model versioning in order to track various experiments
  • can’t scale it to millions of runs; you’ll start getting UI problems with too many runs 
  • interface for logging images is a bit clunky
  • you cannot log and visualize other data formats like audio/video or custom html 

Final thoughts

There are a couple of things we haven’t covered in this piece. Two interesting features that are worth mentioning are:

  • The Fairness Indicators Dashboard (currently in Beta). It allows for computation of fairness metrics for binary and multiclass classifiers.
  • The What-If Tool (WIT) enables you to explore and investigate trained machine learning models. This is done using a visual interface that doesn’t require any code. 

Hopefully with everything you have learned here you will monitor and debug your training runs and ultimately build better models!

Was the article useful?

More about deep dive into tensorboard: tutorial with examples, check out our product resources and related articles below:, how to optimize gpu usage during model training with neptune.ai, zero-shot and few-shot learning with llms, llmops: what it is, why it matters, and how to implement it, the real cost of self-hosting mlflow, explore more content topics:, manage your model metadata in a single place.

Join 50,000+ ML Engineers & Data Scientists using Neptune to easily log, compare, register, and share ML metadata.

SummaryWriter

Learn about what the SummaryWriter class does, and take a look at the functionality of two of its methods: add_graph and add_scalars.

Overview of SummaryWriter

  • SummaryWriter methods
  • The add_graph method
  • The add_scalars method

It all starts with the creation of a SummaryWriter :

Get hands-on with 1200+ tech skills courses.

Create TensorFlow Summary File Writer For TensorBoard

Use TensorFlow Summary File Writer (tf.summary.FileWriter) to create a TensorFlow Summary Event File for TensorBoard

Video Transcript

In this video, we’re going to use tf.summary.FileWriter to create a TensorFlow Summary FileWriter for TensorBoard.

First, let’s import TensorFlow as tf.

import tensorflow as tf

Then we print out the TensorFlow version that we are using.

print(tf.__version__)

We are using TensorFlow 1.8.0.

The example we’re going to create in this video is to add two named TensorFlow scalars together using the TensorFlow add operation, then we’ll look at the graph that’s generated.

First, we define a tf.constant.

tf_constant_one = tf.constant(10, name="ten")

We give it the value of 10, and we give it the name of the string of “ten”, and we assign it to the Python variable tf_constant_one.

Let’s print the tf_constant_one Python variable to see what we have.

print(tf_constant_one)

We see that it’s a TensorFlow tensor with the name of “ten”, the shape is empty because it’s a scalar, and the data type is int32.

Next, let’s define our second constant scalar.

tf_constant_two = tf.constant(20, name="twenty")

Again, using tf.constant, we give it the value of 20 and the name of the string written out “twenty”.

We assign it to the Python variable tf_constant_two.

Let’s print the tf_constant_two Python variable to see what we have.

print(tf_constant_two)

We see that it’s a TensorFlow tensor, the name is “twenty”, the shape is empty, and the data type is int32.

Remember that TensorFlow builds the graph first and then later evaluates the graph.

Since we are in the building the graph stage, both of these TensorFlow constant scalars haven’t been evaluated in a TensorFlow session yet, so what you have are uninitialized variables.

Let’s now build a computational graph node that adds the two constant scalars together.

tf_constant_sum = tf.add(tf_constant_one, tf_constant_two)

So we’re going to do tf.add, we pass in our first scalar, tf_constant_one, and we’re going to pass in our second scalar, tf_constant_two, and the result of this will be assigned to the Python variable tf_constant_sum.

Let’s print the tf_constant_sum Python variable to see what we have.

print(tf_constant_sum)

We see that it’s a TensorFlow tensor, we see that it’s an “Add”, we see that the shape is empty signifying that it’s still a scalar, and the data type is int32.

Now that we’ve created our TensorFlow graph, it’s time to run the computational graph.

Let’s launch the graph in a session.

sess = tf.Session()

Next, let’s initialize all the global variables in the graph.

sess.run(tf.global_variables_initializer())

All right, so we got here and all of our variables have been initialized.

To run the constant scalar addition in a TensorFlow session, we could just evaluate it with a session run operation.

However, what we want to do is send our TensorFlow graph to TensorBoard so that we can later see what it contains.

The way we’ll do this is to use a protocol buffer that serializes the structure data so that TensorBoard can later create a visual representation.

To do this, we’ll use TensorFlow’s Summary FileWriter.

tf_tensorboard_writer = tf.summary.FileWriter('./graphs', sess.graph)

So you can see tf.summary.FileWriter.

We are going to write the file to the “graphs” directory, and what we want to write is the sess.graph.

We assign all of this to the Python variable tf_tensorboard_writer.

What gets written out is called an event file that contains the event protocol buffers.

Note that we pass the session graph so that it will add the TensorFlow graph to the event file.

Now that we have the FileWriter and it’s written the file, let’s doublecheck what our sum actually would have been by running it in a session.

print(sess.run(tf_constant_sum))

We get 30, which is what 10 plus 20 is.

Now that we’ve done our computation, let’s close the FileWriter.

tf_tensorboard_writer.close()

So we say tf_tensorboard_writer.close().

Let’s also close the TensorFlow session to release the TensorFlow resources we used within the session.

sess.close()

Now that we are done with that, let’s close Python as well so we can navigate to the graph’s folder to see what the TensorFlow Summary FileWriter created in the event file.

So we’re back on the command line.

We change the directory to the graphs directory

and we do an “ls” to see what is in the directory that’s been generated

and we see our events file.

Let’s take a quick look at this file.

vi events [autocomplete]

So vi events...

So not exactly human readable as the protocol buffers are used for serializing structured data and not necessarily for human reading.

Let’s scroll down the text bits to see what we can find.

We can see Add instructions, we see ten and twenty, and we see a whole bunch of other characters.

Let’s exit.

And we’re back.

Perfect - It worked!

We were able to use TensorFlow’s summary FileWriter to create a TensorFlow Summary FileWriter for TensorBoard.

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Logo image

  • Suggest edit

7.2. TensorBoard

7.2. tensorboard #.

TensorBoard provides the visualisation and tooling needed for machine learning experimentation:

Tracking and visualising metrics such as loss and accuracy

Visualising the model graph (ops and layers)

Viewing histograms of weights, biases, or other tensors as they change over time

Projecting embeddings to a lower dimensional space

Displaying images, text, and audio data.

In the csf_main.py we have used TensorBoard to:

log accuracy and loss values

show batch images

The SummaryWriter class is your main entry to log data for consumption and visualisation by TensorBoard. So, we import it:

At the start, we initialise two instances of SummaryWriter for train and testing, each logging in their corresponding directories:

We add new accuracy/loss by calling the add_scalar function and add new images by calling the add_image function.

tensorboard summarywriter resume

SummaryWriter contains several add_<SOMETHING> functions ( https://pytorch.org/docs/stable/tensorboard.html ), most of them with a similar set of arguments:

tag (data identifier)

value (e.g., a floating number in case of scalar and a tensor in case of image)

step (allowing to browse the same tag at different time steps)

At the end of the programme, it’s recommended to close the SummaryWriter by calling the close() function.

Monitoring #

We can open the TensorBoard in our browser by calling

In our project, by default, the TensorBoard files are saved at csf_out/train/ and csf_out/test/ folder. If we specify the <LOG_DIR> as the parent directory ( csf_out/ ), TensorBoards in all subdirectories will be also visualised:

This is a very useful tool to compare different conditions (e.g., train/test, different experiments) at the same time.

If there are too many nested TensorBoards, it might become too slow.

The value for <PORT_NUMBER> is a four-digit number, e.g., 6006.:

If the port number is already occupied by another process, use another number.

You can have several TensorBoards open at different ports.

Finally, we can see the TensorBoard in our browser under this URL

torch.utils.tensorboard

Before going further, more details on TensorBoard can be found at https://www.tensorflow.org/tensorboard/

Once you’ve installed TensorBoard, these utilities let you log PyTorch models and metrics into a directory for visualization within the TensorBoard UI. Scalars, images, histograms, graphs, and embedding visualizations are all supported for PyTorch models and tensors as well as Caffe2 nets and blobs.

The SummaryWriter class is your main entry to log data for consumption and visualization by TensorBoard. For example:

This can then be visualized with TensorBoard, which should be installable and runnable with:

Lots of information can be logged for one experiment. To avoid cluttering the UI and have better result clustering, we can group plots by naming them hierarchically. For example, “Loss/train” and “Loss/test” will be grouped together, while “Accuracy/train” and “Accuracy/test” will be grouped separately in the TensorBoard interface.

Expected result:

Writes entries directly to event files in the log_dir to be consumed by TensorBoard.

The SummaryWriter class provides a high-level API to create an event file in a given directory and add summaries and events to it. The class updates the file contents asynchronously. This allows a training program to call methods to add data to the file directly from the training loop, without slowing down training.

Creates a SummaryWriter that will write out events and summaries to the event file.

  • log_dir ( string ) – Save directory location. Default is runs/ CURRENT_DATETIME_HOSTNAME , which changes after each run. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.
  • comment ( string ) – Comment log_dir suffix appended to the default log_dir . If log_dir is assigned, this argument has no effect.
  • purge_step ( int ) – When logging crashes at step T + X T+X and restarts at step T T , any events whose global_step larger or equal to T T will be purged and hidden from TensorBoard. Note that crashed and resumed experiments should have the same log_dir .
  • max_queue ( int ) – Size of the queue for pending events and summaries before one of the ‘add’ calls forces a flush to disk. Default is ten items.
  • flush_secs ( int ) – How often, in seconds, to flush the pending events and summaries to disk. Default is every two minutes.
  • filename_suffix ( string ) – Suffix added to all event filenames in the log_dir directory. More details on filename construction in tensorboard.summary.writer.event_file_writer.EventFileWriter.

Add scalar data to summary.

  • tag ( string ) – Data identifier
  • scalar_value ( float or string/blobname ) – Value to save
  • global_step ( int ) – Global step value to record
  • walltime ( float ) – Optional override default walltime (time.time()) with seconds after epoch of event

Adds many scalar data to summary.

  • main_tag ( string ) – The parent name for the tags
  • tag_scalar_dict ( dict ) – Key-value pair storing the tag and corresponding values
  • walltime ( float ) – Optional override default walltime (time.time()) seconds after epoch of event

Add histogram to summary.

  • values ( torch.Tensor , numpy.array , or string/blobname ) – Values to build histogram
  • bins ( string ) – One of {‘tensorflow’,’auto’, ‘fd’, …}. This determines how the bins are made. You can find other options in: https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

Add image data to summary.

Note that this requires the pillow package.

  • img_tensor ( torch.Tensor , numpy.array , or string/blobname ) – Image data

img_tensor: Default is ( 3 , H , W ) (3, H, W) . You can use torchvision.utils.make_grid() to convert a batch of tensor into 3xHxW format or call add_images and let us do the job. Tensor with ( 1 , H , W ) (1, H, W) , ( H , W ) (H, W) , ( H , W , 3 ) (H, W, 3) is also suitable as long as corresponding dataformats argument is passed, e.g. CHW , HWC , HW .

Add batched image data to summary.

  • dataformats ( string ) – Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc.

img_tensor: Default is ( N , 3 , H , W ) (N, 3, H, W) . If dataformats is specified, other shape will be accepted. e.g. NCHW or NHWC.

Render matplotlib figure into an image and add it to summary.

Note that this requires the matplotlib package.

  • figure ( matplotlib.pyplot.figure ) – Figure or a list of figures
  • close ( bool ) – Flag to automatically close the figure

Add video data to summary.

Note that this requires the moviepy package.

  • vid_tensor ( torch.Tensor ) – Video data
  • fps ( float or int ) – Frames per second

vid_tensor: ( N , T , C , H , W ) (N, T, C, H, W) . The values should lie in [0, 255] for type uint8 or [0, 1] for type float .

Add audio data to summary.

  • snd_tensor ( torch.Tensor ) – Sound data
  • sample_rate ( int ) – sample rate in Hz

snd_tensor: ( 1 , L ) (1, L) . The values should lie between [-1, 1].

Add text data to summary.

  • text_string ( string ) – String to save

Add graph data to summary.

  • model ( torch.nn.Module ) – Model to draw.
  • input_to_model ( torch.Tensor or list of torch.Tensor ) – A variable or a tuple of variables to be fed.
  • verbose ( bool ) – Whether to print graph structure in console.

Add embedding projector data to summary.

  • mat ( torch.Tensor or numpy.array ) – A matrix which each row is the feature vector of the data point
  • metadata ( list ) – A list of labels, each element will be convert to string
  • label_img ( torch.Tensor ) – Images correspond to each data point
  • tag ( string ) – Name for the embedding

mat: ( N , D ) (N, D) , where N is number of data and D is feature dimension

label_img: ( N , C , H , W ) (N, C, H, W)

Adds precision recall curve. Plotting a precision-recall curve lets you understand your model’s performance under different threshold settings. With this function, you provide the ground truth labeling (T/F) and prediction confidence (usually the output of your model) for each target. The TensorBoard UI will let you choose the threshold interactively.

  • labels ( torch.Tensor , numpy.array , or string/blobname ) – Ground truth data. Binary label for each element.
  • predictions ( torch.Tensor , numpy.array , or string/blobname ) – The probability that an element be classified as true. Value should in [0, 1]
  • num_thresholds ( int ) – Number of thresholds used to draw the curve.

Create special chart by collecting charts tags in ‘scalars’. Note that this function can only be called once for each SummaryWriter() object. Because it only provides metadata to tensorboard, the function can be called before or after the training loop.

layout ( dict ) – {categoryName: charts }, where charts is also a dictionary {chartName: ListOfProperties }. The first element in ListOfProperties is the chart’s type (one of Multiline or Margin ) and the second element should be a list containing the tags you have used in add_scalar function, which will be collected into the new chart.

Add meshes or 3D point clouds to TensorBoard. The visualization is based on Three.js, so it allows users to interact with the rendered object. Besides the basic definitions such as vertices, faces, users can further provide camera parameter, lighting condition, etc. Please check https://threejs.org/docs/index.html#manual/en/introduction/Creating-a-scene for advanced usage.

  • vertices ( torch.Tensor ) – List of the 3D coordinates of vertices.
  • colors ( torch.Tensor ) – Colors for each vertex
  • faces ( torch.Tensor ) – Indices of vertices within each triangle. (Optional)
  • config_dict – Dictionary with ThreeJS classes names and configuration.

vertices: ( B , N , 3 ) (B, N, 3) . (batch, number_of_vertices, channels)

colors: ( B , N , 3 ) (B, N, 3) . The values should lie in [0, 255] for type uint8 or [0, 1] for type float .

faces: ( B , N , 3 ) (B, N, 3) . The values should lie in [0, number_of_vertices] for type uint8 .

Add a set of hyperparameters to be compared in TensorBoard.

  • hparam_dict ( dict ) – Each key-value pair in the dictionary is the name of the hyper parameter and it’s corresponding value. The type of the value can be one of bool , string , float , int , or None .
  • metric_dict ( dict ) – Each key-value pair in the dictionary is the name of the metric and it’s corresponding value. Note that the key used here should be unique in the tensorboard record. Otherwise the value you added by add_scalar will be displayed in hparam plugin. In most cases, this is unwanted.
  • hparam_domain_discrete – (Optional[Dict[str, List[Any]]]) A dictionary that contains names of the hyperparameters and all discrete values they can hold
  • run_name ( str ) – Name of the run, to be included as part of the logdir. If unspecified, will use current timestamp.

Flushes the event file to disk. Call this method to make sure that all pending events have been written to disk.

© 2019 Torch Contributors Licensed under the 3-clause BSD License. https://pytorch.org/docs/1.7.0/tensorboard.html

Using tensorboard with DistributedDataParallel

Hello, I am trying to make my workflow run on multiple GPUs. Since torch.nn.DataParallel did not work out for me ( see this discussion ), I am now trying to go with torch.nn.parallel.DistributedDataParallel (DDP). However I am not sure how to use the tensorboard logger when doing distributed training. Previous questions about this topic remain unanswered: ( here or here ). I have set up a typical training workflow that runs fine without DDP ( use_distributed_training=False ) but fails when using it with the error: TypeError: cannot pickle '_io.BufferedWriter' object . Is there any way to make this code run, using both tensorboard and DDP?

The only option seems to be to only log one process. This code runs fine:

CC @orionr for Tensorboard question

@Jimmy2027 : I was able to make logging work by moving SummaryWriter creation from main process to child process, specifically remove

And add in run_epochs

So that we don’t have to folk the lock inside SummaryWriter (in _AsyncWriter https://github.com/tensorflow/tensorboard/blob/master/tensorboard/summary/writer/event_file_writer.py#L163 ). In general each child process should create their own SummaryWriter instead of forking from parent process.

Also unrelated to your issue, tensorboardX has long been deprecated and no longer actively maintained, being replaced by pytorch native support for TensorBoard since Pytorch 1.2. To use it simply replace

:slight_smile:

Thanks @cryptopic works fine. I am surprised that there is no need to to a post-processing of the logged data, does tensorboard joins traumatically the data from all processes?

does tensorboard joins automatically the data from all processes?

Yes, different processes will write to different log files, and TensorBoard will aggregate all log files during visualization

Selection_706

I am curious how do you deal with this? Is there a more interesting way of doing this?

Hey Nicolas,

:frowning:

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Resume tenosboard logging when resum training #8451

@talhaanwarch

talhaanwarch Jul 17, 2021

Beta Was this translation helpful? Give feedback.

Replies: 1 comment · 2 replies

{{editor}}'s edit, keiku jul 18, 2021.

@talhaanwarch

talhaanwarch Jul 18, 2021 Author

@Keiku

  • Numbered list
  • Unordered list
  • Attach files

Select a reply

IMAGES

  1. tensorboard_SummaryWriter_event定义及使用示例_sumarrywrite怎么导入-CSDN博客

    tensorboard summarywriter resume

  2. TensorBoard Tutorial in Keras for Beginner

    tensorboard summarywriter resume

  3. 【Pytorch】Tensorboard用法:标量曲线图、直方图、模型结构图_pytorch tensorboard 模型关系-CSDN博客

    tensorboard summarywriter resume

  4. Visualizing TensorFlow training jobs with TensorBoard

    tensorboard summarywriter resume

  5. 详解Tensorboard及使用教程_51CTO博客_tensorboard使用

    tensorboard summarywriter resume

  6. TensorBoard with PyTorch Lightning

    tensorboard summarywriter resume

VIDEO

  1. ML model data training visualization p1

  2. Word Embedding and TensorBoard

  3. TensorFlow :TensorBoard

  4. 第八章 TensorFlow 编程实践-2 TensorBoard 可视化工具

  5. Custom Tensorboard Visualization

  6. Tensorflow 5강. Tensorboard 활용

COMMENTS

  1. pytorch

    While creating the summarywriter, we need to provide the same log_dir that we used while training the first time. from tensorboardX import SummaryWriter. writer = SummaryWriter('log_dir') Then inside the training loop step needs to start from where it left (not from 0): writer.add_scalar('average reward',rewards.mean(),step)

  2. torch.utils.tensorboard

    class torch.utils.tensorboard.writer. SummaryWriter (log_dir = None, comment = '', purge_step = None, max_queue = 10, flush_secs = 120, filename_suffix = '') [source] ¶. Writes entries directly to event files in the log_dir to be consumed by TensorBoard. The SummaryWriter class provides a high-level API to create an event file in a given directory and add summaries and events to it.

  3. Visualizing Models, Data, and Training with TensorBoard

    Now we'll set up TensorBoard, importing tensorboard from torch.utils and defining a SummaryWriter, our key object for writing information to TensorBoard. from torch.utils.tensorboard import SummaryWriter # default `log_dir` is "runs" - we'll be more specific here writer = SummaryWriter ( 'runs/fashion_mnist_experiment_1' )

  4. tensorboard_with_pytorch.ipynb

    TensorBoard allows tracking and visualizing metrics such as loss and accuracy, visualizing the model graph, viewing histograms, displaying images and much more. In this tutorial we are going to cover TensorBoard installation, basic usage with PyTorch, and how to visualize data you logged in TensorBoard UI. Installation

  5. How to use TensorBoard with PyTorch

    Install TensorBoard through the command line to visualize data you logged. pip install tensorboard. Now, start TensorBoard, specifying the root log directory you used above. Argument logdir points to directory where TensorBoard will look to find event files that it can display. TensorBoard will recursively walk the directory structure rooted at ...

  6. Deep Dive Into TensorBoard: Tutorial With Examples

    TensorBoard in PyTorch . You start by defining a writer pointing to the folder where you would like to have the logs written. from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter(log_dir= 'logs') The next step is to add the items you would like to see on TensorBoard using the summary writer.

  7. SummaryWriter

    Learn about what the SummaryWriter class does, and take a look at the functionality of two of its methods: add_graph and add_scalars. We'll cover the following. Overview of SummaryWriter. SummaryWriter methods. The add_graph method. The add_scalars method.

  8. A Complete Guide to Using TensorBoard with PyTorch

    Photo by Isaac Smith on Unsplash. In this article, we will be integrating TensorBoard into our PyTorch project.TensorBoard is a suite of web applications for inspecting and understanding your model runs and graphs. TensorBoard currently supports five visualizations: scalars, images, audio, histograms, and graphs.In this guide, we will be covering all five except audio and also learn how to use ...

  9. Create TensorFlow Summary File Writer For TensorBoard

    To do this, we'll use TensorFlow's Summary FileWriter. tf_tensorboard_writer = tf.summary.FileWriter( './graphs', sess.graph) So you can see tf.summary.FileWriter. We are going to write the file to the "graphs" directory, and what we want to write is the sess.graph. We assign all of this to the Python variable tf_tensorboard_writer.

  10. 7.2. TensorBoard

    TensorBoard provides the visualisation and tooling needed for machine learning experimentation: Tracking and visualising metrics such as loss and accuracy. Visualising the model graph (ops and layers) Viewing histograms of weights, biases, or other tensors as they change over time. Projecting embeddings to a lower dimensional space.

  11. Tutorials

    To run tensorboard web server, you need to install it using pip install tensorboard . After that, type tensorboard --logdir=<your_log_dir> to start the server, where your_log_dir is the parameter of the object constructor. I think this command is tedious, so I add a line alias tb='tensorboard --logdir ' in ~/.bashrc.

  12. PyTorch TensorBoard Support

    TensorBoard can also be used to examine the data flow within your model. To do this, call the add_graph() method with a model and sample input. When you open. When you switch over to TensorBoard, you should see a GRAPHS tab. Double-click the "NET" node to see the layers and data flow within your model.

  13. tensorboardX

    Writes entries directly to event files in the logdir to be consumed by TensorBoard. The SummaryWriter class provides a high-level API to create an event file in a given directory and add summaries and events to it. The class updates the file contents asynchronously. This allows a training program to call methods to add data to the file directly ...

  14. tensorboardX.writer

    class SummaryWriter (object): """Writes entries directly to event files in the logdir to be consumed by TensorBoard. The `SummaryWriter` class provides a high-level API to create an event file in a given directory and add summaries and events to it. The class updates the file contents asynchronously. This allows a training program to call methods to add data to the file directly from the ...

  15. resume plotting scalars to tensorboard when resume training in ...

    Next steps. No action items identified. Issue description. i am training my model using pytorch , and i already added saving and resume training functionalities, but the problem when i want to override the step value on tensorboard , the graph missed up ( connecting last step to the current step )

  16. torch.utils.tensorboard

    Once you've installed TensorBoard, these utilities let you log PyTorch models and metrics into a directory for visualization within the TensorBoard UI. Scalars, images, histograms, graphs, and embedding visualizations are all supported for PyTorch models and tensors as well as Caffe2 nets and blobs.

  17. Using tensorboard with DistributedDataParallel

    The only option seems to be to only log one process. This code runs fine: import os. import torch. import torch.distributed as dist. import torch.multiprocessing as mp. from tensorboardX import SummaryWriter. from torch import nn. from torch.nn.parallel import DistributedDataParallel as DDP.

  18. Resume tenosboard logging when resum training

    When resume training in trainer, how to resume tensorboard logging. i think tensorboard will start logging from 0, instead of logging from where it ends. Beta Was this translation helpful? Give feedback. 1 You must be logged in to vote. All reactions.

  19. python

    2. It's very simple. Create checkpoints while training the model and then use those checkpoints to resume training from where you left of. import tensorflow as tf. from tensorflow.keras.callbacks import TensorBoard. from tensorflow.keras.callbacks import ModelCheckpoint. from tensorflow.keras.models import load_model.

  20. tensorboard

    Bases: Logger, TensorBoardLogger. Log to local or remote file system in TensorBoard format. Implemented using SummaryWriter. Logs are saved to os.path.join (save_dir,name,version). This is the default logger in Lightning, it comes preinstalled. This logger supports logging to remote filesystems via fsspec.

  21. 【已解决】ImportError: /lib/x86_64-linux-gnu ...

    文章浏览阅读477次,点赞21次,收藏7次。libstdc++.so.6` 是 GNU C++ 标准库的动态链接库文件,用于 C++ 程序的运行时支持。:如果你正在使用 conda 环境,你可以尝试创建一个新的 conda 环境并重新安装相关的库,以确保所有依赖项都被正确安装和配置。:如果以上方法都不起作用,你可能需要搜索并尝试 ...

  22. tensorboard

    After following this tutorial on summaries and TensorBoard, I've been able to successfully save and look at data with TensorBoard. Is it possible to open this data with something other than TensorBoard? By the way, my application is to do off-policy learning. I'm currently saving each state-action-reward tuple using SummaryWriter.