Nick Sweeting
|
a680724367
Merge branch 'dev' into search_index_extract_html_text
|
2 years ago |
Ross Williams
|
310b4d1242
Add htmltotext extractor
|
2 years ago |
Ross Williams
|
b44f7e68b1
Add URL-specific method allow/deny lists
|
2 years ago |
Nick Sweeting
|
bd6d9c165b
enforce utf8 on literally all file operations because windows sucks
|
4 years ago |
Cristian
|
62ed11a5ca
fix: Improve headers handling
|
5 years ago |
Angel Rey
|
ee6caca3ca
Added more asserts
|
5 years ago |
Angel Rey
|
1cce786d6d
Added test headers extractor
|
5 years ago |
ttimasdf
|
e3329be291
tests: add test for mercury-parser
|
5 years ago |
Cristian
|
cc0fa747ce
feat: Add options to ease management of node related extractors
|
5 years ago |
Cristian
|
2a68af1b94
tests: Add readability tests
|
5 years ago |
Cristian
|
5429096c30
tests: Add mechanism to avoid using extractors that we are not testing
|
5 years ago |
Nick Sweeting
|
5b6eb5e4ad
make filenames consistent with program name
|
5 years ago |
Cristian
|
37df00a08b
tests: Add basic singlefile test
|
5 years ago |
Cristian
|
e6c571beb2
fix: Remove title from extractors for oneshot
|
5 years ago |
Cristian
|
23e6803f02
fix: Add change to calculate wget folder when there is a port present
|
5 years ago |