I worked on something similar for my final-year project at University in Python, at https://github.com/basicallydan/Onda - clustering related articles in the online media.
In retrospect it's pretty ugly, but it worked pretty well. I really wanted to implement named entities and n-grams but never got around to it. I'm glad you guys did :)
In retrospect it's pretty ugly, but it worked pretty well. I really wanted to implement named entities and n-grams but never got around to it. I'm glad you guys did :)