Structure-from-Motion – Photogrammetry Software
What follows is an overview of the workflow I followed to produce 3D models of three Cuneiform tablets from the Russell Library Cuneiform Collection. It was written as my contribution to the group report for the project.
There are a number of structure-from-motion, or Photogrammetry, proprietary software packages available presently. It was decided to use Agisoft PhotoScan Professional for this project as it is the industrial standard and most used package for photogrammetry in the heritage community at this point in time and therefore has a lot of relevant documentation.
PhotoScan Processing: Adding Photos
The images (representing a dataset of one tablet) were then imported into PhotoScan, which is the first procedure in the workflow of the modelling process. PhotoScan will automatically create a “Chunk”, and place the entire set of images imported into this chunk. Alternatively, you can manually create Chunks, and import the images into the relevant chunk.
Assess the Image Quality and Clean the Dataset.
Once the images were imported they needed to be “cleaned”. This is a process of going through the dataset and weeding out duplicate images, burry images which result from camera shake, out of focus images, and under or over exposed images. Useful in this process is the software’s ability to estimate the image quality. The software does this by rating the sharpest part of any image it tests. This can then be used as an aid to editing out the weaker image of duplicates, or quickly finding problematic images. Agisoft PhotoScan recommends removal of any image that has an estimated image quality rating below 0.5. However, we found that it was not always accurate and that occasionally images with a value of 0.6 were, in fact, burred.
Images to be removed: out of focus, blurred due to camera shake, duplicates, over or under exposed
The next step in preparing the images for processing into geometry is to create masks.
Masking enables exclusion of certain areas of an images surface from processing. The software will exclude the areas masked out and build the geometry from only those areas not masked. In our case, the tablets were the only areas left unmasked. We needed to eliminate the backgrounds, the oasis, and the turntable. Masking can be automated in certain cases by taking a photograph of the background, if stationary, without the object to be modelled, and then using this background image to build masks for each image. However, we could not use this technique due to the fact that we were using a turntable with a rectangular piece of oasis which created movement in the images.
Masking is achieved by tracing around the object to be modelled utilising one of the masking tools. We found the intelligent scissors to be the fastest and most effective tool for masking around the tablets.
Build Models – Structure-from-Motion
The structure-from-motion process involves building geometry from images. Once the geometry is built, it can then be textured with photographic detail. The geometry building and rendering adheres to the following workflow:
Align Photos: The cameras need to be aligned creating a Sparse cloud and camera positions.
Build Dense Cloud: A dense point cloud can then be generated from the dataset using the camera positions to estimate the relevant x,y,z, positions of each pixel data.
Build Mesh: Once the dense cloud is generated a mesh can be built from the dense cloud converting the x,y,z, coordinate points into a polygon mesh.
Build Texture: This mesh can then have a “texture” applied which is the mapping of the image data onto the surface of the mesh creating the photorealistic model.
Alignment / Sparse Cloud
The first stage in the geometry generating workflow is to align cameras. This process will estimate the camera positions relative to the object or scene from the captured data. The result of this process is the camera positions and a sparse cloud. This sparse cloud is not used in the generation of the dense cloud but serves as a visual representation of what the dense cloud will produce based on the camera positions generated. It is therefore an indication of how successful the camera alignment process has been.
Alignment is potentially the most problematic process of the workflow and can require the most intervention. This might include: further image processing, further weeding of images, assignment of images into different groups or chunks, and experimenting with the various pre-alignment parameters.
A number of tablet projects were problematic during aligning. As discussed earlier in the capturing section, the tablets were captured in two orientations due to the need to secure them in the oasis for stability and minimising contact for conservation purposes. Captures were taken with the tablet in one orientation, and then turned 180 degrees and placed in the oasis and captured in the second orientation. For a number of tablets, the software was having difficulty in aligning the two orientations.
For these, different techniques were tried including:
- the placement of different alignments into individual chunks
- the placement of different vertical camera positions into individual chunks
- the grouping of cameras into folders and assigning those folders as camera stations corresponding to the various stationary camera positions
- varying the processing parameters such as quality
Multiple Chunks v One Chunk
PhotoScan has the ability to split up a dataset into what it terms “Chunks”. This can be useful if there are capture variations within a dataset, for example, different capture positions. For some of the alignment problems the various stationary camera positions were assigned corresponding chucks, and various processing was applied including: aligning the chunks separately, then trying to merge them; and merging the chunks and then attempting to align them.
For the most problematic of tablets, none of these technique seemed to work and it was conjectured that perhaps the issue was due to datasets having variable focal lengths.
We found that for the majority of tablet projects, placing all the images into one chunk work very well in alignment.
Once the cameras are aligned, the next step in the workflow is to build a dense point cloud. The camera positions generated in the previous step will be utilised along with the dataset to generate a finely detailed point cloud which also displays colour and tonal information for each pixel converted to a point.
This process is very computationally intensive and takes a very long time to process. These times increase when using non compressed image formats such as Tiffs or DGNs, and are influenced by the number of images in the dataset. The recommended computing specifications for using PhotoScan are outlined in the programme’s manual contents as follows:
- Windows XP or later (32 or 64 bit), Mac OS X Mountain Lion or later, Debian/Ubuntu with GLIBC 2.13+ (64 bit)
- Intel Core 2 Duo processor or equivalent
- 4 GB of RAM
- Windows 7 SP 1 or later (64 bit), Mac OS X Mountain Lion or later, Debian/Ubuntu with GLIBC 2.13+ (64 bit)
- Intel Core i7 processor
- 16 GB of RAM
Due to the time involved and the computing capacity available to the individual students, the quality settings used varied from High to Ultra high, but the other parameters were predominantly the same.
Once the dense cloud is processed, the subsequent stages have a relatively short process time and are less potentially problematic.
Building a Mesh
To turn the dense cloud into data which can be used to generate a 3D interactive model, it is necessary to convert the potentially millions of individual points into a computationally lighter and more versatile mesh, which is a closed geometry of polygons. The higher the number of points in the dense cloud, the higher the number of polygons which can be generated. The higher the polygon face count is, the finer the resolution of detail in the model generated. A polygon mesh can be filled to create geometry with a continuous surface. This is what creates the form upon which the image pixel data can be applied in the correct location to create a photorealistic “textured” model.
This process maps the pixel data, for example colour and intensity, onto its correct position on the filled surface of the mesh. Once the software has mapped the locations of the pixel data to correspond with the geometry, it generates a photorealistic textured model. The term “texture” is a misnomer, since the surface texture, in the common usage of that term, is a feature of the surface geometry. The term used in this context dates from early gaming engines use of image mapping onto low resolution polygon models to create the appearance of texture.
Once the texturing is complete, the models can be scaled using images captured which include scale bars or measures. There are a number of techniques used for scaling. In our case, small scales were placed beside the tablets in some of the images and used as references to create markers. Then these markers were used to create scale bars. A minimum of two scale bars is required to take measurements off the model from within PhotoScan .
There was a certain amount of experimentation with different methods of scaling.
The scaled models can then be measured in PhotoScan using the ruler.
A report can be generated using PhotoScan, which is an essential document for cultural heritage projects of this kind since it records all the settings and processes carried out during the project workflow.
These report documents containing all the setting used during each stage of the process is in line with best practice recommendations outlined in the London Charter from a technical point of view. Documentation pertaining to human choices made during the process could be documented separately and stored along with the Report, the project, project files, RAW and processed images, within the same directory for archiving. This will allows for consistency in results for state of condition conservation recording.
When all the above processes are complete the models are ready to be exported in the appropriate formats to correspond with their intended use, for example, 3D printing or further post-processing in another program such as MeshLab.
To export models we chose two formats.
Sterolithography (STL) Model (.stl)
This is a geometry only model format and therefore contains no colour or texture information and is used for monochrome 3D printing.
Wavefront Model (.obj)
This is a model format incorporating both the geometry and the colour information.
Three file types are created upon export of a Wavefront model.
The main file, .obj, is the geometry file containing no colour or texture information.
Image (tiff, png, jpeg, etc.)
Exporting a model as a Wavefront also creates an image file, or a series of images files. The format of the images can be selected by the user and include: tiff, png or jpeg.
Material template library (.mtl)
This is a text file with the extension .mtl which is for surface shading, specularity parameters and texture mapping. In the case of the cuneiform table models generated in PhotoScan, the information in these files helps external rendering software to map the texture map (image file) to the geometry file (Wavefront).