Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

[Feature Request]: I'd like to specify the appropriate Reader for each file found while using SharePointReader #933

Open
ferdinandosimonetti opened this issue Feb 8, 2024 · 1 comment
Labels
enhancement New feature or request triage

Comments

@ferdinandosimonetti
Copy link
Contributor

Feature Description

Hi, actually I'm obtaining my test Documents by scanning a local directory

filename_fn = lambda filename: {"file_name": filename}

DocxReader = download_loader("DocxReader")
PptxReader = download_loader("PptxReader")
PandasExcelReader = download_loader("PandasExcelReader")
PDFReader = download_loader("PDFReader")

mytime("start multiple file types read")
dir_reader = SimpleDirectoryReader(docpath, file_metadata=filename_fn, filename_as_id=True, file_extractor={
  ".docx": DocxReader(),
  ".pptx": PptxReader(),
  ".xlsx": PandasExcelReader(),
  ".pdf": PDFReader()
})
documents = dir_reader.load_data()

but the real documents are stored inside a Sharepoint site and directory (that I, unfortunately, can't test now).
I was wondering if there's a way to use SharePointReader while retaining the ability to customize Document id/metadata, as well as the specific Reader for each file format.

Reason

There's no mention in SharePointReader's README of additional parameters like file_extractor, file_metadata, filename_as_id

Value of Feature

Being able to specify a (more) appropriate Reader for each file format could lead to better content interpretation afterwards, I suppose.

@ferdinandosimonetti ferdinandosimonetti added enhancement New feature or request triage labels Feb 8, 2024
@ferdinandosimonetti
Copy link
Contributor Author

There is a PR for this: #934

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant