Commit 1d0f283a authored by marioromera's avatar marioromera
Browse files

add basic requirements

parent 7a45fdca
#
# The MIT License (MIT)
#
# Copyright (c) 2020 Mario Romera
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE
#
Aunq el codigo funciona, todavia quedaría muchas tecnicas por explorar y optimizar, aun así los resultados no son tan enriquecedores, por las siguientes razones:
-Requiere bajarse mogollon de archivos
-Hace falta bastante energia para procesar tanto
Para hacerlo funcionar:
1. instalar con `pip install -r requirements.txt`
2. ejectuar `python get_files_boe.py` (dentro de ese archivo podeis cambiar el rango de fechas)
3. ejecutar `python text_analyzer.py` (ahi estan hard-coded las palabras clave q se buscaran, cambiarlas si os parece)
4½. o ejecutar `python get_files_boe.py && python text_analyzer.py`
Hay 300.000.000.000 boes por día, os sugiero q probeis con un rango pequeño y o canceleis el get_files_boe.py (Ctrl + c)
y luego ejecuteis el analizador
Improvements/Next steps:
- Download all files
- Create server with all documents indexed, and an api to query against
\ No newline at end of file
The MIT License (MIT)
Copyright (c) 2020 Mario Romera
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE
Even if the code works, there are still many techniques to explore and optimize, but the results are not as enriching, for the following reasons:
-It requires downloading a lot of files
-It takes quite a bit of energy to process so much
To make it work:
-1. Make sure this step somehow works:
| Enter command | should return something like |
|-------------------- |------------------------------ |
| `python --version` | "3.7.7" |
| `pip --version` | "20.0.2" |
| `git --version` | "2.16.1" |
0. git clone https://gitlab.servus.at/marioromera/digitalcapitalism.git
1. install with `pip install -r requirements.txt
2. run `python get_files_boe.py` (within that file you can change the date range)
3. run `python text_analyzer.py` (there are hard-coded keywords to be searched, change them if you like)
4½. or run `python get_files_boe.py && python text_analyzer.py`
There are 300.000.000.000 boes per day, I suggest you try a small range and or cancel the get_files_boe.py (Ctrl + c)
and then run the analyzer
Improvements/Next steps:
- Download all files
- Create server with all documents indexed, and an api to query against
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment