ELDA has participated in the work carried out to develop a de-identification toolkit in the health and legal domains for the 24 CEF languages. This work has taken place in the framework of the MAPA project which has developed a tool that aims at detecting and "processing" (either deleting or replacing) personal and sensitive information so that data can be further used/processed in full compliance with GDPR needs.
This toolkit will be available through this page shortly as a service in support to the HLT community and outside potential users requiring data handling in a secured manner.
The annotation guidelines produced for the preparation of the annotated data is available here. These guidelines describe the full Named Entity hierarchy that has been defined for sensitive information detection purposes. The annotated data have been used for system development and evaluation.