“Preparing the country for the challenges of the digital transition of tomorrow's economy” is the ambition of the law for “A Digital Republic” enacted in 2016. Its application has accelerated open data strategies in administrations. Henceforth, Anyone can access public data to understand the main characteristics of a territory: the topics addressed in the deliberations, the position of an elected official on renewable energies, discussions around competing projects within the CDC, etc.
This information is a priori easily accessible to companies that work with local public authorities. A priori, because the volume of administrative documents to go through is gigantic, not including articles published in the local press. An EPCI like “Basque Country Community” for example publishes several thousand pdfs every year, to which must be added the historical stock of administrative documents. And the information that will interest prospecting and development teams is sometimes found in a 5-line paragraph, in the middle of a 200-page document.
So how can you take advantage of the wealth of these documents to understand the challenges and dynamics of a territory in a minimum of time? ?
Let's take the example of a prospector in the wind energy sector, whose perimeter includes the village of La Verdière in the Var. La Verdière — 1622 inhabitants — is part of a community of communes of fifteen other villages. Considering these two administrative levels, it can be estimated that no less than 7,500 pages of deliberations and other administrative reports have been produced in the last 5 years.
Here is an example of a document that a developer can retrieve as they search. The document is scanned, which forces him to manually search, line after line, for information relevant to his activity:
Development of the PLU of La Verdière
Let's add the articles published in the local press. Var-Matin, La Provence and La Marseillaise are full of interviews with elected officials, opponents or citizens about wind energy and the development of renewable energy projects in the department. This represents several thousand additional pages to read and organize.
To put it simply: It is a task that cannot be done manually.
That would be like reading the entire Harry Potter saga several times a month, for each project. This colossal but important work represents hours of work that a wind exploration manager does not have. And yet, if he does not, he will regularly miss out on key information related to his projects.
National surveys on the perception of wind energy are numerous. However, they say nothing about this perception if we go to a local level, the one at which project developers work.
Part of the Work of territorial project leaders consists precisely in understand the local situation, the positions taken by elected officials, the history of the development of infrastructure projects, and the budgetary situation of municipalities, in an area that can count several thousand municipalities.
This involves reading thousands of pages of administrative documents and trying to keep an eye out for any new publications. And the panorama would not be complete without articles published in the local press, which further burdens the list of necessary reading.
If this systematic work cannot be done by hand, there is a technology that allows large quantities of textual documents to be automatically read: the Natural Language Processing.
We use numerous applications based on this technology on a daily basis: voice assistants, connected speakers, chatbots or even translation tools are some of the practical applications of NLP. We can now add to this list the automatic reading of administrative documents and press articles In order to identify information useful to promoters of territorial projects.
What about NLP
NLP (Natural Language Processing) or Natural Language Processing (NLP) in French is a set of techniques that allow computers to read, decipher and understand human language. They allow you to perform certain tasks automatically, such as identifying the places mentioned in a text, detecting the theme of the text, or generating a summary of it. To do this, they combine dozens of rules, based on grammar or learned automatically based on data sets containing examples of the task at hand (supervised learning). The most modern techniques are even capable of learning automatically (unsupervised), based only on very large amounts of plain text.
Explain put this technology at the heart of Goodwill, territorial intelligence software for companies that work with local public authorities. Our Data Science teams have developed a combination of NLP algorithms in order to identify key information in the documents mentioned to facilitate prospecting work: locally influential people and organizations, themes, locations, position papers are all elements that our technology makes it possible to identify.
At the end of the day for users: make data that cannot be found by hand accessible, not only to allow them to identify new opportunities, to anticipate risks that may impact their business, but also to simply save them time. A time that they can then use to go further in the field, to meet the influencers they have identified thanks to Goodwill. Or why not, to read the Harry Potter saga!