Report on the CMS forward backward MSGC Milestone - IN2P3 - Institut national de physique nucléaire et de physique des particules Access content directly
Journal Articles Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment Year : 1998

Report on the CMS forward backward MSGC Milestone

Abstract

In this paper, we treat the problems of Part-of-Speech (PoS) tagging of unannotated corpora of specialty. The existing taggers are trained on non-specialized corpora, and most often give inconsistent results on specialized texts. In order to learn rules adapted to a specialized field, the usual approach labels manually a large corpus of this field. This is extremely time-consuming. We propose here a semi-automatic approach for PoS tagging corpora of specialty. ETIQ, the new tagger we are building, make it possible to correct the base of rules obtained by Brill‘s tagger and to adapt it to a corpus of specialty. The expert of the field visualizes a basic tagging and corrects it by the insertion of specialized contextual lexical rules. The inserted rules are more expressive than Brill‘s rules. To help the user in this task, we designed an inductive algorithm biased by the "correct" knowledge acquired beforehand by the user. By using machine learning techniques while allowing the expert to incorporate knowledge of the field in an interactive and convivial way, we improve the tagging of a specialty corpus. Our approach has been applied to a molecular biology corpus.
No file

Dates and versions

in2p3-00005961 , version 1 (31-08-2000)

Identifiers

  • HAL Id : in2p3-00005961 , version 1

Cite

J.M. Brom, U. Goerlach, A. Lounis, I. Ripp-Baudot, A. Zghiche. Report on the CMS forward backward MSGC Milestone. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 1998, 419, pp.375. ⟨in2p3-00005961⟩
7 View
0 Download

Share

Gmail Facebook X LinkedIn More