A colleague has a problem with a LibreOffice Calc file: it was extremely slow to open. It took just 10 minutes to open and then the computer is paralyzed. I tried to convert it to Excel, but even opening it with Microsoft Excel was extremely slow.
I tried to use the power of the cloud, putting it on Google Drive and then opening it with Google Sheets. Nothing to do, it freezes the browser!
I then saved the ODS (OpenDocument Sheets) file as FODS. The OpenOffice/Libreoffice files in fact, are zipped XML files.
A FODS file is simply the same ODS file, only not being compressed, you can open it with a text editor to understand what’s going on.
From here you can already see that something is not right. Compressed occupies 0.16 MB, not compressed 7 MB? Compression is not magic, to become so small, it means that the file contains lots of repetition.
In fact, here’s the problem. For some reason, there are hundreds of thousands of tiny, invisible text boxes in the file!
Unfortunately, however, it is not a contiguous block. There are hundreds of blocks scattered among “good” content to keep, select and remove them by hand is an operation that takes days!
Here comes xmlstarlet.
You download the Windows (or Linux) version and study the syntax with
xmlstarlet el yourfile.fods
and a very long list with the file structure will appear. As we saw in the text file, we have to get rid of this structure:
and we do it with:
xmlstarlet ed -d "office:document/office:body/office:spreadsheet/table:table-row/table:table-cell/draw:custom-shape" yourfile.fods > fixed.fods
Done! Now the file opens in very few seconds, again! You can save it again as ODS.