Spreadsheet manipulation is widely-existing in most daily works and significantly improves the working efficiency. Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation, but not yet been investigated in complicated and realistic tasks where reasoning challenges exist (e.g., long horizon manipulation with multi-step reasoning and ambiguous requirements). To bridge the gap with the real-world requirements, we introduce SheetRM, a benchmark featuring long-horizon and multi-category tasks with reasoning dependent manipulation caused by real-life challenges. To mitigate the above challenges, we further propose SheetAgent, an novel autonomous agent that utilizes the power of LLMs. SheetAgent consists of three collaborative modules: Planner, Informer and Retriever, achieving both advanced reasoning and accurate manipulation over spreadsheets without human interaction through iterative task reasoning and reflection. Extensive experiments demonstrate that SheetAgent delivers 20-30% pass rate improvements on multiple benchmarks over baselines, achieving enhanced precision in spreadsheet manipulation and demonstrating superior table reasoning abilities.
SheetAgent can handle diverse spreadsheet reasoning and manipulation tasks automatically. Given a large-scale spreadsheet with multiple sheets, SheetAgent showcases its proficiency in visualization (f), achieves accurate manipulation on long horizon and multi-step tasks (a, b) with consistent reasoning capabilities (c, d), even faced with the challenges like unclear requirements (e).
This video demonstrates the use of SheetAgent to automatically complete the given instruction Fill the "Bookname" column. Plot a column chart showing the prices of different books in a new sheet named "Chart" on a target spreadsheet. SheetAgent decomposes the task into the following 4 subtasks:
Subtask 1: Load the "Order Details" and "Numbering Reference" sheets from the workbook to prepare for filling the "Bookname" column in "Order Details".
# Load the sheets order_details_sheet = workbook["Order Details"] numbering_reference_sheet = workbook["Numbering Reference"]
Subtask 2: Create a dictionary using "Numbering Reference" sheet to map the "Bookid" to the corresponding "Bookname".
# Create a dictionary to map Bookid to Bookname from the "Numbering Reference" sheet bookname_mapping = {} for row in numbering_reference_sheet.iter_rows(min_row=2, max_row=18, min_col=1, max_col=2, values_only=True): book_id, book_name = row bookname_mapping[book_id] = book_name
Subtask 3: Fill the "Bookname" column in the "Order Details" sheet using the `bookname_mapping` dictionary created.
# Fill the 'Bookname' column in 'Order Details' using the bookname_mapping for row in range(2, 636): # Start from row 2 to skip the header book_id = order_details_sheet.cell(row, 4).value book_name = bookname_mapping.get(book_id, None) # Use None for default if book ID is not found order_details_sheet.cell(row, 5).value = book_name
Subtask 4: Create and format a column chart showing the prices of different books using data from the "Numbering Reference" sheet in a new sheet named "Chart".
from openpyxl.chart import BarChart, Reference # Create a new sheet named 'Chart' if it doesn't exist if "Chart" not in workbook.sheetnames: workbook.create_sheet("Chart") chart_sheet = workbook["Chart"] # Create a column chart chart = BarChart() chart.type = "col" chart.style = 10 chart.title = "Book Prices" chart.y_axis.title = "Unit Price" chart.x_axis.title = "Bookname" # Data for the column chart data = Reference(numbering_reference_sheet, min_col=3, min_row=1, max_row=18) # Unit Prices categories = Reference(numbering_reference_sheet, min_col=2, min_row=2, max_row=18) # Booknames # Add data and categories to the chart chart.add_data(data, titles_from_data=True) chart.set_categories(categories) # Add the chart to the 'Chart' sheet chart_sheet.add_chart(chart, "A1")