text = "Trondheim is a small city with a university and 140000 inhabitants. Its central bus systems has 42 bus lines, serving 590 stations, with 1900 (departures per) day in average. T h a t gives approximately 60000 scheduled bus station passings per day, which is somehow represented in the route data base. The starting point is to automate the function (Garry Weber, 2005) of a route information agent."
print re.findall(r"([^.]*?\(.+ [0-9]+\)[^.]*\.)",text)
I'm using the code above to extract the sentence with citation in it. As you can see the final sentence contain citation (Garry Weber, 2005).
But I got this result:
[' Its central bus systems has 42 bus lines, serving 590 stations, with 1900 (departures per) day in average. T h a t gives approximately 60000 scheduled bus station passings per day, which is somehow represented in the route data base. The starting point is to automate the function (Garry Weber, 2005) of a route information agent.']
The result should be the sentence that contains citation only, like this:
The starting point is to automate the function (Garry Weber, 2005) of a route information agent.
I guess the problem is caused by the text inside parentheses, as you can see at the second line it contains (departures per), any solution for my code?
My attempt. Live demo.
\b[^.]+\([^()]+\b(\d{2}|\d{4})\s*\)[^.]*\.
It captures precisely the sentence and is more specific with the year than yours.