BRIDGE: Building plan Repository for Image Description Generation, and Evaluation

Paper: BRIDGE: Building plan Repository for Image Description Generation, and Evaluation
Slides: Click here
Contact: Shreya Goyal, Chiranjoy Chattopadhyay


Abstract—In this paper, a large scale public dataset containing floor plan images and their annotations is presented. BRIDGE (Building plan Repository for Image Description Generation, and Evaluation) dataset contains more than 13000 images of the floor plan and annotations collected from various websites, as well as publicly available floor plan images in the research domain. The images in BRIDGE also has annotations for symbols, region graphs, and paragraph descriptions. The BRIDGE dataset will be useful for symbol spotting, caption and description generation, scene graph synthesis, retrieval and many other tasks involving building plan parsing. In this paper, we also present an extensive experimental study for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE. Index Terms—Floor Plan; Dataset; Evaluation; Captioning

Literature Survey

Various floor plan datasets have been proposed in past for purposes such as symbol spotting, retrieval, semantic and layout segmentation. Table I lists out the details of the publicly available datasets, number of samples present in them, and a brief description. There are several techniques in the literature [1], which have used one or more of these four datasets.

images and over one and a half million captions (5 captions per image). The dataset is currently being used for caption generation, object segmentation tasks.

Construction Of Bridge

To construct the BRIDGE dataset we have followed two approaches. First, we have collected floor plan images from the publicly available datasets (i.e., ROBIN, SESYD etc.). In the second approach, we have collected the remaining floor plan images from the internet. In total, we have over 13000 floor plan images in this dataset. Along with the images BRIDGE also has object annotations, region descriptions, and paragraph description for the floor plans. Till date, this is the largest annotated floor plan dataset created for the document analysis and research (DAR) community. For creating annotations we asked volunteers for marking bounding boxes around each decor items. We used LableImg graphical annotation tool [13] for marking the bounding boxes in the images. For generating region descriptions also we used the same tool and later converted them in the JSON format.

A. Floor Plan images
Along with the images obtained from available public datasets, images were collected from two websites, www. and These websites contain multiple floor plan images for a single house design for both single storied and multi-storied buildings. The similarity between the images taken from both websites is that the floor plans belong to real homes, available for a customer to use and they are not generated for any specific task for example retrieval or segmentation.

B. Symbol Annotations
Detection of several decor items is an important step when for parsing a floor plan image and information extraction. Object detection schemes have been used in the context of objects in natural images. In the line of architectural drawings, techniques involving handcrafted features is used multiple times in the literature.

C. Caption Annotations
In the literature, there are image datasets with image cap- tions (MS-COCO) and region wise captions (visual genome). For a floor plan, region wise caption generation is an important step. In [14], [15], authors have used handcrafted features for identifying decor symbol, room information and generating region wise caption generation.


All the experiments on the proposed dataset were performed on a system with NVIDIA GPU Quadro P 6000, with 24 GB GPU memory, 256 GB RAM.

A. Symbol Spotting
The symbol spotting algorithms are needed when it comes to identifying the decor and other symbols in the floor plan images.Figure 5 shows the distribution of various symbols over the training dataset. Results of the symbol spotting on BRIDGE are described next.
YOLO is a single Convolutional network, which simultaneously predicts multiple bounding boxes and class probabilities (confidence value) of those boxes.
2)Faster RCNN: There are two modules in Faster RCNN;
(i)A deep fully convolutional region proposal network, which proposes regions, (ii) a Fast-RCNN detector. The proposed regions are used for their classification.

B. Caption generation
Captioning an entire image is a task which has been ex- plored widely on natural images. A caption is a single line sen- tence consisting of information of the entire image.

C. Description synthesis
It is insufficient to describe an image by a single caption. Hence we need a system which could generate paragraph based descriptions and has variability. There are many state- of-the-art techniques which have generated paragraphs in the context of natural images. However, these models need anno- tated images to be used to train the deep neural networks and further test and evaluate the models. There is no such publicly available floor plan dataset to support this task. We have generated paragraphs by using two techniques and evaluated them with the annotations in BRIDGE.
1)Template based


In this paper, we presented, for the first time, a novel large scale (13000+ images) floor plan dataset BRIDGE, which has images and metadata. This dataset could be used for various tasks on floor plan analysis using deep learning model.