Commit History

Author SHA1 Message Date
  Nick Sweeting a680724367 Merge branch 'dev' into search_index_extract_html_text 2 years ago
  Ross Williams 310b4d1242 Add htmltotext extractor 2 years ago
  Ross Williams b44f7e68b1 Add URL-specific method allow/deny lists 2 years ago
  Nick Sweeting bd6d9c165b enforce utf8 on literally all file operations because windows sucks 4 years ago
  Cristian 62ed11a5ca fix: Improve headers handling 5 years ago
  Angel Rey ee6caca3ca Added more asserts 5 years ago
  Angel Rey 1cce786d6d Added test headers extractor 5 years ago
  ttimasdf e3329be291 tests: add test for mercury-parser 5 years ago
  Cristian cc0fa747ce feat: Add options to ease management of node related extractors 5 years ago
  Cristian 2a68af1b94 tests: Add readability tests 5 years ago
  Cristian 5429096c30 tests: Add mechanism to avoid using extractors that we are not testing 5 years ago
  Nick Sweeting 5b6eb5e4ad make filenames consistent with program name 5 years ago
  Cristian 37df00a08b tests: Add basic singlefile test 5 years ago
  Cristian e6c571beb2 fix: Remove title from extractors for oneshot 5 years ago
  Cristian 23e6803f02 fix: Add change to calculate wget folder when there is a port present 5 years ago