Blog

Blog

ArchAIDE Discussion Workshop in York

Archaeological Automatic Interpretation and Documentation of cEramics (ArchAIDE) is a Horizon 2020 funded project which aims to help archaeologists who may not have access to a pottery specialist identify their pottery. While not designed to replace the vital knowledge of the specialist community, ArchAIDE seeks to complement by speeding time consuming tasks, providing support for non-specialists, and helping students learn more about pottery recognition. As part of the project workpackage centred on communication, public engagement and innovation, the ArchAIDE partnership presented a discussion workshop in York on 7 December. The workshop was hosted by the Archaeology Data Service (ADS) at the University of York; the UK partner for ArchAIDE. The workshop was held mid-way through the three-year ArchAIDE project and represented one of the two major points to invite feedback from the research community on the direction of the project, and the development of the ArchAIDE app. The workshop was fully subscribed, with representation from diverse perspectives, including academic archaeologists, contract archaeologists, representatives of national heritage agencies, and freelance pottery specialists.

If you would like to read the full transcript of the day including the discussion and summation, it is available here. Links to the individual presentations are included in the agenda below:

Welcome by Julian Richards (ADS)

Julian welcomed the attendees, and thanked Dan Miles (Historic England) for his willingness to be part of the programme and discuss the work around reference collections, as it is complementary to the work of ArchAIDE. Julian introduced the ArchAIDE partnership, which represents a mix of IT specialists and archaeologists and outlined the role of the ADS as a data provider, archiving outputs, and in dissemination. He discussed the use of Artificial Intelligence to identify archaeological finds, and why it has been one of the Holy Grails of archaeological computing. He emphasised that even with the latest image recognition techniques, brought by the ArchAIDE partners from Tel Aviv it is still very ambitious and experimental stages, that the project is not planning to launch something on the profession tomorrow, and it has required limiting ambitions to specific pottery types to see what is now possible.

He placed the workshop into the timeline of the project: far enough along to ask for concrete feedback on the design of the app, but with enough time left to make adjustments to workflows, based on the comments from the workshop. He also placed the workshop into a UK context, and discussed the pressures of contract archaeology, particularly in the context of the High Speed Two rail line, and the shortage of pottery specialists, particularly on site. He further emphasised that the purpose of the app was to supplement expertise and provide help to contractors on site who might not have expertise readily available, and welcomed the expertise of those attending the workshop throughout the day.

ArchAIDE workshop participants York

Intro to the ArchAIDE Project
Gabriele Gattiglia, ArchAIDE Project Coordinator,
University of Pisa

Gabriele began with an introduction to the project, and the project partners. He explained the rationale behind the project was to help with pottery recognition generally and to speed the process from working from paper catalogues to more a automated recognition workflow. He gave a short background as to why pottery identification is essential to the comprehension of the past, pointing out its importance in places like Italy where there is a great deal of ceramic material, and pottery recognition is very time consuming; requiring specialist skills based on human comprehension and experience. He emphasised the difficulty, especially for professional archaeologists, to find the time, when 80-90% is spent on excavation.

He reiterated the intention of the project was not to change the existing workflows used by archaeologists, but to aid and innovate around existing workflows, using computational methods, but returning answers to be validated by the archaeologists at key points. He stated that the aim is not for the app to find the exact match but to suggest a number of possibilities which the archaeologist can then assess, speeding the process, but with the archaeologists making the final decisions.

He outlined the three initial pottery types chosen as prototypes: Roman Amphorae (as there are existing digital comparative collections), Terra Sigillata (as it is common across europe) and Majolica (in order to test recognition using decoration rather than form).

He then outlined highlights of the work carried out by the different partners in the project thus far, including the tool created by partners from the National Research Council of Italy - Pisa (CNR) to allow image and text extraction and to automatically generate 3D models from profiles, and the creation of synthetic breakages from the 3D models to create digital sherds. He introduced the database  structure designed by the ADS, and implemented by technical partner Inera, including textual information, SVG profiles, 3D models and geolocation. As the partners hail from several European countries, multilingual descriptions were incorporated, using conceptual matching across languages. He outlined the use of image recognition and deep learning carried out by partners at Tel Aviv University, based on appearance, including decoration and shape. He mentioned the Data Management Plan developed by ADS and that ArchAIDE will be part of the OpenAIRE Open Research Data Pilot in H2020. This means that all data that can be made open will be freely disseminated and will ultimately become discoverable via the ARIADNE portal, which is a data infrastructure for archaeology for Europe.

He explained that users will be able to use the data to populate maps and visualisations, which can then be added to a database, and subsequently either used privately or shared. The plan is to begin next year with test cases by the commercial archaeology partners and to engage with the ArchAIDE Associates to further extend the group of testers.

Strategic Review of Pottery Types Series
Dan Miles, Research Resources Officer, Capacity Building Team,
Historic England

Dan introduced his work in developing reference resources, including typologies, reference collections, classifications, identification guides and thesauri, helping to put the work of ArchAIDE into context within the UK. He has discussed work to create a series of specialist reference collections (zooarchaelogy, ceramics, clay pipes, etc.). For his work with ceramics, he has focussed on classification by fabric, type and form, and emphasised that the overall aim is to support and help with consistency and standards across sector.

He explained that the survey of ceramic type series was aimed more towards the curators of the resources rather than those undertaking analysis and the intention was to understand a range of information about them including: their origins, who manages them, their current state and condition, current access and use, what documentation exists, their extent and coverage and how this information can be improved. The survey found there were a range of issues to be addressed, including gaps in coverage, type series that have been lost or have information that is out of date, collections where there is a loss of expertise needed to understand the resource (particularly within museums), or where there the information is of little use outside the organisation that hold them, there is little cohesion or linking across resources, and a general lack of viable financial solutions to improve the information about them and make more accessible.

He mentioned that he is still trying to track down the three or four physical collections from the northwest of England and a collection from Worcester collection? He emphasised the importance of physical and digital reference collections, including the National Roman Fabric Reference Collection within the Study Group for Roman Pottery website and the use of standards for identification. In future, he hopes to create a consistent approach for developing type series, create better links between resources and make more accessible, along with developing sustainable funding models for maintaining and developing them further.

Using the App: The ArchAIDE workflow
Gabriele Gattiglia, ArchAIDE Project Coordinator,
University of Pisa

Gabriele gave a tour of the app, using screenshots as the app is still development (thus far for the Android platform). He explained that while there is enough to show the direction of the app, there is still room to take on board suggestions and feedback on the workflow and the app itself. He also stated that this was the first group to see the application, and that we were very grateful for discussion and feedback.

He described the workflow, including the following elements:

  • Dashboard
  • Login - which can used without registering, but functionality will be more limited
  • Where site location can be chosen to narrow where sherds are coming from
  • How to choose the source image; can take picture camera or use an existing image from a gallery
  • When to choose the type of recognition required: shape or appearance. Input was requested on the workflow at this point, as to whether users should be free to choose at any point, or guided.
  • Sites can easily be added on the sites page, which includes a map interface showing location.
  • Users can group images to create an assemblage
  • Additional sherds can be added to an assemblage
  • Metadata can be added to describe a sherd, including drop down lists using the multilingual controlled vocabularies mapped to the Getty Art & Architecture Thesaurus (AAT)
  • Pictures can be added to a record for a sherd
  • Users can use the ‘recognise’ button to identify a piece of pottery, this is currently only available as a demo, as the work to implement the deep learning is still underway
  • Possible matches are shown
  • Users can then choose the type after they are happy with a match, and then store that information in a database (if desired, but not mandatory)
  • In addition to identification, the application can be searched as a reference database by using ‘pottery type’ option to access the catalogue
  • Within the catalogue users you can browse or search
  • Records in the catalogue will including photos, drawings etc.
  • Would appreciate feedback about what information should be in the offline version, perhaps only a subset of images
  • Users will also be able to search by decoration type

Digitising catalogues and photographing collections (Roman amphorae, terra sigillata and majolica)
Michael Remmy, University of Cologne
Jaume Buxeda i Garrigós, Barcelona University
Francesca Anichini, University of Pisa

Michael presented for the three ArchAIDE partners working to create the comparative collection for the app. This included a description of how the database was populated, including the data sources and which parts of the data were included.

Currently the project has documented the following types: 

Terra Sigillata Italica: 317
Terra Sigillita Hispanica: 73
Terra Sigillata from South Gaul: 132
Roman Amphorae: 591
Decorative types of majolica: 86

Terra Sigillata Italica stamps from the Kenrick catalogue: 2586 entities (i.e. producers) with 9,329 extracted stamp drawings (KENRIK P., OXE' A., COMFORT H., CORPVS VASORVM ARRETINORVM, A Catalogue of Signatures, Shapes and Chronology of Italian Sigillata, (2nd. Ed.), Bonn, 2002.)

He discussed that it wasn’t necessary to have photos of every type, but enough to create the training data for the partners at Tel Aviv University. He said there was not always agreement on classification types, it was possible to use types where there is general agreement. For Spanish Majolica, the documentation is all being done from scratch by the partners at Barcelona University - the Majolica of Montelupo is processed by the partners in Pisa.

Data is being derived in three ways:

  1. Taking existing structured data and mapping it into another structure, including the use of the comparative collection Roman Amphorae: a digital resource, held by ADS and the Ceramalex database created by the University of Cologne.

  2. Using digitised catalogues, though it is not easy to get text out as publications all use different structures or languages etc., even within a single book. It is also difficult to automatically extract info from a scan. Different colleagues across Europe have different approaches and needs. Testing is being done by the partners in Spain to decide will work best in that context. Would be useful to develop a search engine for stamps that might return drawings.

  3. Through a photo campaign to get data to train the neural network. For this very good quality sherds are needed for Tel Aviv University to use for their training data and verifications.

The archaeological partners have been carrying out photo campaigns and guidelines were created to make sure they are taken correctly. Campaigns have been carried out in Ostia, Perugia, Spoletino and Barcelona, etc. A huge number of photos have been taken, as a large dataset is needed.

Future work will be the population of the comparative database, the evaluation of results from the neural network, and connection of the database to mobile app. We hope to also make the digitised paper catalogues available, depending on copyright.

Multilingual Vocabularies
Tim Evans, ADS

Tim discussed the process of learning how to work with and learn from the partners, as merging data from many sources is very difficult. It can include linguistic and conceptual challenges, and different ways of referring to the same concepts.

In order to begin, it was important to agree a baseline of what information would be mapped together from the different languages and archaeological traditions used by the ArchAIDE partners. This included:

  • Sherd type, i.e. rim, base, handle, etc.
  • Form, jar, bowl, plate,
  • Decoration
  • Colour
  • Fabric

To do this, conceptual mapping tools for thesauri developed by partners at the University of South Wales for the ARIADNE project we employed. The Getty Art & Architecture Thesaurus (AAT) was carefully examined and determined to be sufficiently rich to cover the baseline information agreed by the partners. The partners at the University of Pisa and University of Barcelona dug very deep into the thesaurus and discovered it did cover almost everything. The partners mapped their concepts to the AAT, using it as a neutral spine and resulting in mapped concepts in English, German, Italian, Spanish and Catalan. A blog post about the mapping resulted in a reader getting touch about a mapping made in French as part of a master’s dissertation. Tim was able to use this to also map the concepts in French.

Tim explained that the advantage of using a thesauri rather than a flat controlled vocabulary is that it is hierarchical so terms can be mapped at similar levels of granularity. Even if only a broad understanding of terms is possible, the mapping can be considered accurate. Original identifiers can also be preserved as the original data is merely linked to a mapped term.

In future, additional mapping can be added, and the mapped data (which is all in Linked Data format) will be made freely available as an output of the project so users can re-use and expand upon it.

Digitising technologies
Marco Callieri, CNR Pisa

Marco first explained the work by CNR to help address the difficulty of translating the information held in printed pottery catalogues into digital data and metadata that is readable by computers. He said that generally there is a standardised system for images, the way the text associated with the images was highly variable. To help get the large amount of comparative data digitised with relevant metadata and into the ArchAIDE comparative database, CNR created a text extraction tool. The tool allowed users to designate how text within a catalogue was structured, to ensure it was extracted correctly. The extracted text still needs validation by an archaeologist, but it greatly speeds up the process.

The tool itself uses OCR where the text is loaded as an image but where users, can manually make corrections. Marco described the process of uploading a scan of a catalogue page, running OCR, making and necessary edits, assigning sections of text to the relevant fields in database, and linking them to drawings and photographs, etc. Marcos highlighted that the way profiles are drawn tend to be quite consistent within a catalogue, but vary across different catalogues.

Moving to the other main area of work by the partners at CNR, Marco described the difference between raster and vector images. What is scanned from catalogues is raster-based, but 3D work needs to be vector-based. To do this, section profiles were traced into a vector format (SVG) and then can now be automatically rendered in 3D. 3D models can also be 3D printed, but only if they are cut in half or contain hollows, as its a water-tight model. This ability to render a 3D model from a raster pottery profile is another output of the ArchAIDE project that can be used for multiple purposes. Work has now moved on to extracting sherd profiles. The partners at CNR believe they will be able to do automated tracing of the inside/outside of a sherd, which can then be modified. This will allow manual intervention when a photo isn’t perfect.

Inside app users can automatically trace profiles. The app will present users with what it thinks is the outline, which can then be adjusted by the archaeologist, ensure the profile is correct. The outline used for matching within the comparative database.

Shape and image-based search
Barak Itkin, Tel Aviv University

Barak described the core task of using image recognition for sherds based on decoration and shape (profile). The aim is to display a curated list of results allowing the archaeologist to choose, so rather than browsing hundreds of types it narrows the results so archaeologist can browse the most likely candidates.

Barak gave a short overview of deep learning for use in image recognition. This is done through building a neural network which is essentially a group of mathematical functions capable of expressing complex logic. He described the process as using labeled (classified) samples, to teach the system (i.e. tweak the parameters of the functions), which is then validated on yet more classified samples (sherds). The task of training this classification system is challenging, since learning the parameters of the classification function, requires a very large number of labeled images.

When classifying the first task is to find what general group an image belongs to. As there is no existing corpus for image recognition for archaeological pottery, it means we don’t have enough labeled images to train a dedicated classification system. To solve this, our work began with using a general network trained using ImageNet (a general dataset of labeled images). We then used the features recognized by that classifier and adapted their usage to classify our decoration images. This was found to be feasible with the number of images we did have.

When classifying by shape and profile we have multiple drawings per class and we want to try and identify which class they belong to, but currently there’s no way to directly correlate between drawings to sherds. Using the 3D model, we virtually break the model to create virtual sherds, and flatten the fracture shapes to a black and white image. This allows us to train the system on inputs that are fracture shapes, as extracted by CNR. The final goal in both types of recognition is to create groups, but at this point we are still reliant on archaeologists to match the images to the groups in order to train the system.

At the end of the day, ADS staff quickly read through the notes taken over the course of the day, and tried to distill the main themes in the feedback and report them back to the group. The themes were:

  • ArchAIDE needs to ensure users know what they are using; what the limitations are and that the app is a proof on concept.
  • The app needs to be transparent about where matches are coming from (who created the typologies) so users can decide how confident they are in the result. If users can add data to the database there needs to be a clear plan for dealing with quality control.
  • Users might classify something just because they get a result, but an accuracy threshold should be in place where results are only shown if it reaches the threshold.
  • Thinking about the app, and the different technologies in use to support it allowed issues to be discussed that we often ignore, such as differences in terminology (which may add modern perceptions) and could make us think differently about how we carry out our research.

The discussion highlighted the usefulness of the some of the by-products of the project, including:

  • Digitising the pottery catalogues to create more accessible comparative data
  • Several of the digital tools that are being created can obviously be repurposed
  • The multilingual vocabulary mappings can be both re-used and expanded
  • The 3D models can be used for other types of analysis, such as calculating volume

Several new connections were generously offered to the project:

  • App testing, potentially via CIfA
  • Concordances between reference collections
  • Existing photographic catalogues of stamps for Terra Sigillata

If you would like to read the full transcript of the day including the discussion and summation, it is available here.

 

00

Altri Articoli Blog

thumbnail
thumbnail

ArchAIDE Blog

Blog