You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HI,
In my pdf file, I have 4 tables [4 regions] for listing the holidays for a year. the tables has columns, Sr.No, Date, Day and Festival. The title on the table is Region Name Holiday List 2024. However, when i execute this line, there is no csv file being created nor the pdfdocs.jsonl file is created. it is just creating the data.jsonl file.
def parsing_the_pdfs():
t0 = time.time()
# Create a Library
LLMWareConfig().set_active_db("sqlite")
lib = Library().create_new_library("pdfdocs")
#parse and extract all of the contents from these documents
# Add file to the library
parsing_output = lib.add_files(input_folder_path=input_data)
print("Update: parsing time :", time.time() -t0)
print("Update: parsing output :", parsing_output)
#export all of the content of the library into jsonl files with metadata
output1 = lib.export_library_to_jsonl_file(output_data, "data.jsonl")
# export all of the tables
output2 = Query(lib).export_all_tables(query="Holiday", output_fp=output_data)
return 0
p= parsing_the_pdfs()
This is the output when I execute the code:
Update: parsing time : 0.0057866573333740234
Update: parsing output : {'docs_added': 0, 'blocks_added': 0, 'images_added': 0, 'pages_added': 0, 'tables_added': 0, 'rejected_files': []}
The text was updated successfully, but these errors were encountered:
HI,
In my pdf file, I have 4 tables [4 regions] for listing the holidays for a year. the tables has columns, Sr.No, Date, Day and Festival. The title on the table is Region Name Holiday List 2024. However, when i execute this line, there is no csv file being created nor the pdfdocs.jsonl file is created. it is just creating the data.jsonl file.
def parsing_the_pdfs():
t0 = time.time()
# Create a Library
LLMWareConfig().set_active_db("sqlite")
p= parsing_the_pdfs()
This is the output when I execute the code:
Update: parsing time : 0.0057866573333740234
Update: parsing output : {'docs_added': 0, 'blocks_added': 0, 'images_added': 0, 'pages_added': 0, 'tables_added': 0, 'rejected_files': []}
The text was updated successfully, but these errors were encountered: