Importing Data to OpenRefine
Last updated on 2024-05-24 | Edit this page
Estimated time: 10 minutes
Overview
Questions
- How can we import our data into OpenRefine?
Objectives
- Create a new OpenRefine project from a CSV file
Importing data
What kinds of data files can I import?
There are several options for getting your data set into OpenRefine. You can import files in a variety of formats including:
- Comma-separated values (CSV) or tab-separated values (TSV)
- Text files
- Fixed-width columns
- JSON
- XML
- OpenDocument spreadsheet (ODS)
- Excel spreadsheet (XLS or XLSX)
- RDF data (JSON-LD, N3, N-Triples, Turtle, RDF/XML)
- Wikitext
See the Create a project by importing data page in the OpenRefine manual for more information.
Create your first OpenRefine project (using provided data)
Start OpenRefine, which will open in your browser (at the address http://127.0.0.1:3333). Once OpenRefine is launched in your browser, the left margin has options to:
Create Project
Open Project
Import Project
Language Settings
Click
Create Project
from the left margin and select thenThis Computer
(because you’re uploading data from your computer).Click
Choose Files
and browse to where you stored the filePortal_rodents_19772002_simplified.csv
. Select the file and clickOpen
, or just double-click on the filename.
Click
Next>>
under the browse button to upload the data into OpenRefine.On the next screen, OpenRefine will present you with a preview of your data. You can check here for obvious errors, if, for example, your file was tab-delimited rather than comma-delimited, the preview would look strange (and you could correct it by choosing the correct separator and clicking the
Update Preview
button on the right. If you selected the wrong file, click<<Start Over
at the top left.In the middle of the page, will be a set of options (
Character encoding
, etc.). Make sure the tick box next toTrim leading & trailing whitespace from strings
is not ticked. (We’re going to need the leading whitespace in one of our examples.)
- If all looks well, click
Create Project>>
in the top right. You will be presented with a view onto your data. This is OpenRefine!
The columns are all imported as text, even the columns with numbers. We will see how to format the numeric columns in the next episode.
OpenRefine does not modify your original dataset
Once your data is imported into a project - OpenRefine leaves your raw data intact and works on a copy which it creates inside the newly created project. All the data transformation and cleaning steps you apply will be performed on this copy and you can undo any changes too.
Key Points
- Use the Create Project option to import data
- You can control how data imports using options on the import screen
- Several files types may be imported into OpenRefine