The Databases Course

SQL Basics

Setting Up Your Account

First, edit your .cshrc file, by appending the file ~db/oracle-cshrc to the end of it. Then, copy the Oracle default configuration file from db's home directory to your home directory:

cp ~db/prefs.ora ~

Afterwards, logout and log back in. If you are registered to the course, you should now be able to enter SQL interactive mode. To do this, type:

sqlplus /

Once inside, you can type SQL queries and see their answer at once. You can also type exit to leave SQL interactive, or help to get help on the language and environment.

The first thing you should do is run the example database creation file of exercise 3:

sqlplus / < ~db/animals.sql

This will read and run the file ~db/animals.sql - you'll see the output of the SQL commands in the file on your screen. The file creates the five tables used in the exercise, and also inserts demo values into these tables. Feel free to change these tables (read animals.sql to see how simply this can be done), so that you can test your queries on cases that the demo tables don't cover.

The easiest way to write and test complex SQL queries is not the interactive way; instead, write a text file and give it as input to Oracle. This will make SQL run the commands/queries in the file input.sql, and write the output to the file output.sql:

sqlplus / < input.sql > output.sql

You are expected to submit printed input and output files with every exercise. Document your queries well - lines starting with '--' and /* */ blocks are treated as comments in an SQL file.


Basic SQL Queries

All queries are expressed with the select keyword. The basic form is select-from-where which translates to "do a cartesian product of what the from says, the do a selection according to the what the where says, then do a projection to what the select says". All SQL commands are case insensitive (that is, SELECT and select are the same word), and are terminated by a semicolon.

Print the names and subjects of research projects:
select name, subject from research;
Print all fields of the animals table:
select * from animals;
Print all the species of animals we have, but only print each species once:
select distinct species from animals;
Print countries which offer sites with a humid climate:
select distinct country from sites where climate = 'humid';
Print the cartesian product of sites and events:
select * from sites, events;
Print the natural join (except that pid is printed twice) of projects and events:
select * from research, events where research.pid = events.pid;
Print pairs of projects that have the same subject:
select p1.pid, p2.pid from research r1, research r2 where r1.subject = r2.subject;
Print animals and their age:
select aid, 1999 - birthyear from animals;
Print animals and their age. Only print animals older than ten years:
select aid, 1999 - birthyear age from animals where age > 10;


Aggregate Queries

The following queries calculate the average, minimum and maxium sizes of sites:

select avg(size), min(size), max(size) from sites;

These calculate the number of lines in the animals table, the number of lines with non-null species field, and the number of distinct species in the table:

select count(*), count(species), count(distinct species) from animals;

These are called aggregate queries, because they take many values (all sizes) and aggregate (summarize) it to one value (for example, the average).


Not So Basic Predicates

This section discusses things you can do in a where clause. The most basic things are testing for equality using = <> or for ordering using > < >= <= and between. You can also use not, and, or to create complex conditions. Here is a simple example (note that no parentheses are needed):

select * from animals where size >= 100 and not species = 'Whale';
select * from sites where size between 10 and 20 and climate <> 'desert';

The in and not in predicate returns true if a value is in a given set. You can specify the set directly or using a sub-select like here:

select * from sites where country not in ('Japan', 'China', 'Taiwan');
select * from animals where aid in (select parent from children);

The exists and not exists predicates get a table, and return true if and only if the table isn't empty:

select * from sites
where exists (select * from events where sid = sites.sid);

There's no 'forall' predicate, so such queries have to be translated from 'for all x, P(x)' to 'not exists x such that not P(x)'. For example:

select * from animals a1
where not exists (select * from animals a2 where a1.size > a2.size);

Besides in and exists there are also operators that can compare values to a group of other values. These operators are composed of a comparison operator and either the all or any keywords:

select * from animals where size <=all (select size from animals);
select * from animals where size >any (select size from animals);
select sid, name from sites
where climate <>all (select climate from site where county = 'USA');

Another useful keyword is contains, which tests for exactly what its name suggests:

select * from animals a where
(select sid from events where aid = a.aid and pid = 'p1') contains
(select sid from events where aid = a.aid and pid = 'p2');

Sub-selects can be nested as much as you want them to. Most queries can be expressed in more than one way, and the most efficient and readable way should be chosen.