We have to convert the .doc file to a .docx file ("manually" as in just changing the file name by adding 'x'), then we can use textract 's process to read a word document .docx file:
3 When using: dir *.doc all files that have suffixes .docx, .docm, .doct, and .doc files are listed. When using: dir *.xls all files that have suffixes .xlsx, .xlsm, and .xls files are listed. Is there a way to use the dir command in the command prompt to only list files with the .doc extension? Or only those with the .xls extension?
How to list only .doc or .xls files with the Windows dir command in the ...
How can I convert a Word document in PDF by calling the Word COM interface from Python?
Convert .doc files to pdf using python COM interface to Microsoft Word
Note: If you are looking for the best way to convert a doc/docx file on the client side, then probably the answer is don't do it. If you really need to do it then do it server-side, i.e. with libreoffice in headless mode, apache-poi (java), pandoc etc.
How do I render a Word document (.doc, .docx) in the browser using ...
I got a test for job application, my deal is read some .doc files. Does anyone know a library to do this? I had started with a raw python code: f = open ('test.doc', 'r') f.read () but this does not