Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you're doing NLP in Python, there's no reason to use CoreNLP's parser from the NLTK wrapper. Communicating with the Java process over the file system or a socket introduces a tonne of unnecessary complications, slow-downs, invites encoding problems, etc.

spaCy's native Cython dependency parser is both faster and more accurate than CoreNLP.

The NP chunks example from the post:

    >>> from spacy.en import English
    >>> nlp = English()
    >>> doc = nlp(u'ITP is a two-year graduate program located in the Tisch School of the Arts. Perhaps the best way to describe us is as a Center for the Recently Possible.')
    >>> for np in doc.noun_chunks:
    ...   print(np.text)
    ... 
    ITP
    a two-year graduate program
    the Tisch School
    the Arts
    the best way
    us
    a Center


spaCy looks like a great product, but it is expensive.

edit: sorry, I just noticed that it is available for free under the AGPL 3 license.


Now 100% free under the MIT license. Things change by the hour in spacy world, :-).


It looks like a lot of good work went into spacy - I hope that you are sucessful monetizing it with the MIT license.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: