Named Entity Recognition (NER) is an NLP task that identifies and categorizes entities in text into predefined classes, such as names of people, locations, organizations, dates, and more. For example, in the sentence "Elon Musk founded SpaceX in 2002," NER would tag "Elon Musk" as a person, "SpaceX" as an organization, and "2002" as a date.
NER systems typically involve two main steps: entity identification (detecting the span of text corresponding to an entity) and classification (assigning the entity to a category). Traditional NER models rely on rule-based systems or statistical methods like Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs). Modern NER approaches use deep learning, employing techniques like BiLSTMs and transformer-based models such as BERT.
Contextual embeddings and attention mechanisms allow modern NER systems to capture dependencies across words and resolve ambiguities (e.g., "Apple" as a company vs. a fruit). Pre-trained NER models in libraries like spaCy, Hugging Face Transformers, and Stanford CoreNLP offer ready-to-use solutions for entity extraction in multiple languages and domains. NER is widely used in applications such as information extraction, knowledge graph building, and document summarization.