# Date Heure `Appel\xe9` `Dur\xe9e` `Co\xfbt` By default, UFT-8 encoding is assumed (see readr::default_locale()), leading to issues: readr :: read_csv2 ( file = file_path ) # Using ',' as decimal and '.' as grouping mark. The readr package is becoming a favorite among the R community. If the default encoding varies from plateform to plateform, your code may not work unless you specify the type of encoding you want to have.įor reproducible results, you may also want to refine the encoding used by default in our R session. Read.csv2 uses by default the native encoding to load the CSV file. Note that the following code is equivalent: utils :: read.csv2 ( file = file ( file_path, encoding = 'WINDOWS-1252' )) The file encoding needs therefore to be explicit as to ensure portability: utils :: read.csv2 ( file = file_path, fileEncoding = 'WINDOWS-1252' ) "LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C" However, once moving the code onto a Linux-environment, I got the following error: Loading the CSV file from Windows with the utils package appears to be a breeze: utils :: read.csv2 ( file = file_path ) # Date Heure Appelé Durée Coût # "Windows" Sys.getlocale () # "LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252" I work on Windows, and the Windows-1252 encoding is native to the platform: Sys.info () # sysname The encoding is displayed in the status bar while the Encoding menu enables you to change the selected character set. The editor does a pretty good job figuring out the encoding of the file. Its content is displayed below using Notepad . Let’s take the example of a file encoded as Windows-1252. If the encoding is different, pay attention on how you load the file into R. UTF-8 (or UTF-16) is the de facto encoding that you hope to get. When working with flat files, encoding needs to be factored in right away to avoid issues down the line.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2022
Categories |