Loading CSV to TABLE in Oracle Autonomous Database with cURL

Updated February 2021

Signing up for an Autonomous Database is easy. You can have an instance up and running in just a few minutes. And now, you can even have one for FREE.

But one of the first things you’re going to want to do is shove some TABLEs into your schema, or just load some data.

We’re working on making this even easier, but let’s quickly recap what you can already do with our tools.

A Saucerful of “Secrets” Ways to Load Data to the Cloud with our DB Tools

Taking Advantage of AUTO TABLE and ORDS

If you already have a TABLE in your schema, and you want to create a REST API for accessing said table, we make that easy.

It’s a right-click in SQL Developer Web, or Desktop

In the SQL card in Database Actions (formerly known as SQL Developer Web –

Get to your SQL worksheet, find your table (or view or PL/SQL program), right click, and Enable

Or you could of course just run this very simple PL/SQL block –

BEGIN
    ORDS.ENABLE_OBJECT(p_enabled => TRUE,
                       p_schema => 'JEFF', -- your schema here (has to be YOUR schema)
                       p_object => 'HOCKEY_STATS', -- your table here
                       p_object_type => 'TABLE', -- or view or plsql 
                       p_object_alias => 'hockey_stats', -- how the object will be named in the URI
                       p_auto_rest_auth => TRUE); -- PROTECT!
    COMMIT;
END;

Another quick aside, if you need to catch up on these topics, I’ve talked about creating your application SCHEMA and REST Enabling it for SQL Developer Web access.

And, I’ve talked about using the CSV Load feature available with the ORDS AUTO Table mechanism.

My TABLE

I have a HOCKEY_STATS table, that I want to load from some CSV data I have on my PC. It’s an 8MB file (35000 rows, 70 columns).

Now, I could use the Import from CSV feature in SQL Developer (Desktop) to populate the TABLE…

Took approximately 16 seconds to batch load 35,000 records to my Autonomous Database service running in our Ashburn Data Center from Cary, NC – using the MEDIUM Service.

That’s not super quick, but it was super easy.

And yes, you can do all of this in your browser as well!

But what if I need a process that can be automated? And my API du jour is HTTPS and REST?

Let’s POST up my CSV to the TABLE API

Let’s find the URI first. Go into your REST workshop page in Database Actions

From there we’ll see what we have available for RESTFul Web Services in our schema.

AUTOREST

I can immediately see the tables and views I want to work with, and I can filter those, but I only have two at the moment. If I click on the button in the corner of the card, I can ask for the cURL information for all of the CRUD REST API endpoints – including BATCH LOAD.

I’m going to take this string and fix it up a bit…

https://ABCDEFGHIJK0l-somethingash.adb.us-ashburn-1.oraclecloudapps.com/ords/tjs/hockey_stats/batchload?batchRows=1000

The ‘/batchload?batchRows=1000’ at the end tells ORDS what we’re doing with the TABLE, and how to do it. This is documented here – and you’ll see there’s quite a few options you can tweak.

If you want to protect these endpoints (and you almost always will), you’ll need to assign the table privilege to the ‘SQL Developer’ role – then you can use your database username/password with the cURL command.

That’s done on the Security page for the REST workshop.

The Priv for the REST enabled table will be pretty obvious, you’ll see the schema.tablename

If using database credentials in your REST calls sounds ‘icky’ then you can also take advantage or our built-in OAUTH2 Client (example).

There’s also a full OAUTH2 UI in the REST Workshop as well –

Using an OAuth(2!) client to access a protected REST endpoint is pretty easy.

Now, let’s make our call. I’m going to use a REST Client (Insomnia) but I could easily just use cURL.

Almost 10 seconds…not blazing fast, but again, very easy (no code!) and it’s sending 8MB over HTTP….

I could tweak the batchRows parameter, and see if I could get faster loads, I’m sure I could. But the whims of public internet latency and the nature of the data I’m sending up in 16 KB chunks will make this a fun ‘it depends’ tech scenario.

30 September 2020 Update – 5M Rows, ~27 sec!

I decided to throw a slightly larger scenario at ORDS in Autonomous.

133 MB of CSV – 5,000,000 rows including the 1 header row

And our data…

Live in Autonomous ~25 seconds after hitting POST in my local REST client

Note this scenario was also from Cary, NC to our Cloud Data Center in Virginia…and of course your results may vary based on current network loads.

I also increased the batchRows setting up to 5,000 from 1,000. I also tried 7,500 and 10,000 but didn’t see any additional performance improvements, but this also is NOT a scientific test.

Some more notes on settings (Docs).

batchRows – The number of rows to include in each bath – so we’re inserting/committing 5,000 rows at a time, for a total of 100 batches.

We could include additional settings…

errors and errorsMax in particular. You may want to set errorsMax to a reasonable number. In other words, do you want to give up on the load if say, more than 10% fail to be inserted. For debugging, I would suggest setting it to 1 or say 10 so your test ‘fails fast’ and doesn’t consume unnecessary server resources.

And of course, results may vary, but I got this to run as ‘fast’ as 25 seconds and as slow as 29 seconds.

Tried one more time before hitting the ‘Update/Publish’ button 🙂

I ran this scenario 10x each, and also for 10M, and 20M rows of this same data. I found that running the loads at 5M row batches outperformed the larger ones. That is, I could kick off 2, 5M load requests and get it done MUCH faster than a single 10M load.

For example, running 10M rows, 10 times, my average load time was 90 seconds, vs 30 seconds for the 5M loads.

When I kicked off 2 5M runs concurrently, both were finished in less than 27.4 seconds!

The cURL

I hate cURL. It took me more than a few minutes, and ended up having to Zoom with @krisrice to finally get this JUST right…

curl --write-out '%{time_total}' -X POST --data-binary "@5M.csv" -H "Content-Type:text/csv" --user <user>:<password> "https://....adb.us-ashburn-1.oraclecloudapps.com/ords/<user>/huge_csv/batchload?batchRows=5000&errorsMax=20"

Don’t forget the Content-Type header! I was doing that and weird things were happening. Also had to hop to SO to figure out how to stream in the contents of the file, and then also learned a cool trick of pulling the HTTP response time in the call. I’m sharing this for ME, because I’ll forget it tomorrow.

2 Comments

chris bell 2 years ago Reply

Brilliant post as always Jeff , is there a way to load a CSV file automatically that is stored on a diff server ??

I’ve looked at buckets , data intergrator , ODI etc but i’m stuck !!
- thatjeffsmith Post Author 2 years ago
  
  You could curl/get the file local and then post it as I show, for from the database itself you could use DBMS_CLOUD and read/stream it directly from the OCI Object Store, any any S3 bucket (I think)

Batch Loading CSV to a TABLE in Oracle Autonomous Database using AUTOREST API

Taking Advantage of AUTO TABLE and ORDS

Let’s POST up my CSV to the TABLE API

30 September 2020 Update – 5M Rows, ~27 sec!

2 Comments

Write A Comment Cancel Reply

Taking Advantage of AUTO TABLE and ORDS

Let’s POST up my CSV to the TABLE API

30 September 2020 Update – 5M Rows, ~27 sec!

Everything you need 2 know about ORDS On Oracle Autonomous

ORDS TV a la Oracle-BASE

Related Posts

More on PARs and a note re: APEX RESTful Data Services

How to Export all of your Oracle Database REST APIs

Secure IOT device uploads to Oracle Database via REST API

2 Comments

Write A Comment Cancel Reply