Skip to content Skip to sidebar Skip to footer

How To Restore Punctuation Using Python?

I would like to restore commas and full stops in text without punctuation. For example, let's take this sentence: I am XYZ I want to execute I have a doubt And I would like to det

Solution 1:

If I understand well, you want to improve the quality of a sentence by adding the appropriate punctuation. This is sometimes called punctuation restoration.

A good first step is to apply the usual NLP pipeline, namely tokenization, POS tagging, and parsing, using libraries such as NLTK or Spacy.

Once this preprocessing is done, you'll have to apply a rule-based or a machine learning approach to define where the punctuation should be, based on the features extracted from the NLP pipeline (e.g. sentence boundaries, parsing tree, POS, etc.).

However this is not a trivial task. It can require strong NLP/AI skills if you want to customise your algorithm.

Some examples that can be reused:

  • Here is a simple approach using Spacy, mainly based on sentence boundaries.
  • Here is a more complex solution, using the Theano deep learning library.

Post a Comment for "How To Restore Punctuation Using Python?"