The Dataset API is a repository for dataset metadata and files for SmartDataCenter, the commercial product from Joyent for managing a private cloud based on SmartOS. The implementation of this API isn’t open, that means you can’t use it with FiFo and/or your own SmartOS nodes for example. Since I use FiFO to manage my private cloud I really wanted a local repository for my datasets, especially the Windows based ones since they aren’t available from Joyents reposity nor datasets.at. Forturnatley for me Daniel “MerlinDMC” Marlon has developed a free Dataset API server (called dsapid), based on CouchDB and Node.JS ,which I can use. With some help from him I’ve set up my server at home and I thought I’d share my experiences on how to make it work.
Installing a new VM to host dsapid
A new json file for the dsapid server.
{ "brand": "joyent", "alias": "dsapid", "tmpfs": 1024, "image_uuid": "cf7e2f40-9276-11e2-af9a-0bad2233fb0b", "filesystems": [ { "type": "lofs", "source": "/zones/dsapi-server-data", "target": "/database" } ], "nics": [ { "nic_tag": "admin", "ip": "192.168.1.123", "netmask": "255.255.255.0", "gateway": "192.168.1.1" } ] }
This one is based on SmartMachine base64 1.9.1. I’m using a lofs mount to get direct access to a directory under /zones in the GZ, this directory will hold the CouchDB database. You might also want to use a bigger tmpfs if you plan on using big datasets, the dataset is uploaded to /tmp before it’s imported into CouchDB. I use a tmpfs that’s 8 GB, but that’s not big enough sometimes (as you’ll see if you continue reading).
Installing software in the new dsapid zone
It’s time to install the software needed.
pkgin in couchdb nginx node gcc47 gmake scmgit mkdir /opt/dsapi-ui chown -R couchdb:couchdb /database
The software is installed and the proper directories created. The dsapid server will be installed later once the database, web server etc. are configured.
Configuration of the installed software
Add the following to /opt/local/etc/couchdb/local.ini under [couchdb]:
database_dir = /database view_index_dir = /database
To make sure we get periodic syncing from the dataset source (which can be Joyent repo, datasets.at or some other dataset API server), add the following to crontab:
0 * * * * /opt/dsapi/sbin/dsapi-sync-manifests 0 * * * * /opt/dsapi/sbin/dsapi-sync-files
Time to configure nginx, replace the contents of /opt/local/etc/nginx/nginx.conf with the following:
user www www; worker_processes 1; events { # After increasing this value You probably should increase limit # of file descriptors (for example in start_precmd in startup script) worker_connections 1024; } http { include /opt/local/etc/nginx/mime.types; default_type application/octet-stream; sendfile on; #tcp_nopush on; tcp_nodelay on; #keepalive_timeout 0; keepalive_timeout 65; gzip on; gzip_http_version 1.1; gzip_proxied any; gzip_vary on; gzip_types text/plain text/html text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/javascript text/x-js; gzip_disable "MSIE [1-6]\.(?!.*SV1)"; gzip_buffers 16 8k; client_max_body_size 1024m; server { listen 80; server_name localhost; root /opt/dsapi-ui; location = /ping { proxy_pass http://localhost:8080; } location = /stats { proxy_pass http://localhost:8080; } location = /datasets { proxy_pass http://localhost:8080; proxy_read_timeout 500; proxy_connect_timeout 500; } location ~* ^/datasets/[a-f0-9-]+$ { proxy_pass http://localhost:8080; proxy_read_timeout 500; proxy_connect_timeout 500; } location ~* ^/datasets/[a-f0-9-]+/.+ { proxy_pass http://localhost:5984; proxy_read_timeout 500; proxy_connect_timeout 500; } } }
Increase client_max_body_size to the same as your tmpfs size if you increased that earlier. I’ve also added high timeouts to make sure there’s enough time to upload big datasets.
Now it’s time to start the services we’ve configured.
svcadm enable epmd:default svcadm enable couchdb:default
Installing dsapid
It’s time to install the dsapid server.
git clone git://github.com/MerlinDMC/smartos-public-dsapi.git /opt/dsapi cd /opt/dsapi npm install svccfg import /opt/dsapid/smf/dsapid.xml svcadm enable dsapid:default svcadm enable nginx:default
Adding remote dataset API servers for syncing
If you want to sync an external dataset server, do the following (use -f for manifests AND dataset image files).
/opt/dsapi/bin/add-sync-source joyent https://datasets.joyent.com/datasets (fetch only manifests files get served from the joyent server) /opt/dsapi/bin/add-sync-source joyent https://datasets.joyent.com/datasets -f (fetch manifests and files)
Installing the web GUI
There’s a web GUI for the dataset server, although it’s made for datasets.at it will work internally (with some stuff still referring to datasets.at).
curl -O https://dl.dropbox.com/u/2265989/SmartOS/dsapi-ui.tar.bz2 tar -xjf dsapi-ui.tar.bz2 -C /opt/dsapi-ui
The web GUI should be available at http://[hostname]
Uploading your own datasets
To start with you need a username and an associated password to be able to upload.
/opt/dsapi/bin/grant-upload [username] [password]
If you haven’t created a dataset before, read my blog post about how to do it. Once you have your manifest file and dataset image file, it’s time to upload it. This is done with curl, for example.
curl -X PUT -u [username]:[password] -F manifest=@[manifest name].dsmanifest -F [image name]=@[image name] http://[dsapi server]/datasets/[dataset UUID]
The “image name” will be something like winserver.zvol.gz or testzone.zfs.bz2. The dataset UUID is the same UUID that you gave the dataset during creation.
If you for some reason don’t get a dataset fully uploaded you’ll have to delete it through Futon, CouchDB’s web management GUI, manually. If you don’t want to change the configuration on the server, which says that CouchDB is only listening on localhost, you can set up a SSH tunnel to access Futon. You’ll probably have to reconfigure sshd to allow root login with a password or add your public key for SSH login.
ssh -f -L 127.0.0.1:5984:127.0.0.1:5984 root@[dsapi server] -N
Now you can access Futon on http://localhost:5984/_utils/ and delete the document created for the dataset (named its UUID).
This is everthing you need to use the dsapi server. To use it with imgadm:
echo "http://[dsapi server]/datasets" > /var/db/imgadm/sources.list imgadm update
Installing very big datasets
Datasets that are very big will need a very big tmpfs. If you’re dataset is 4 GB you will need over 8 GB of tmps. If you don’t have enough memory you can bypass nginx, which is usually used, and communicate directly with dsapid. By doing this you will only need ~4 GB tmpfs if you are importing a 4 GB dataset.
Start by disabling nginx.
svcadm disable nginx
The next step is to get dsapid to listen on 0.0.0.0 instead of 127.0.0.1. You can disable dsapid in the way you did with nginx and set an env variable (DSAPI_HOST to 127.0.0.1) and start dsapid manually. This won’t persist, if you want this to be persistent:
- Disable dsapid
- Remove dsapid using svccfg delete dsapid
- Edit /opt/dsapi/smf/dsapid.xml and change the DSAPI_HOST variable in the xml file
- Import the xml file again using svccfg import /opt/dsapi/smf/dsapid.xml
Now you can install the big dataset, use port 8080 with curl. When done you can change back to using nginx. You could (probably) compile nginx with an upload module, but since the nginx available in pkgin doesn’t have this I haven’t tried it.
Using the local dsapid with FiFO
The dsapid is now working with imgadm in native SmartOS, but as I mentioned in the beginning of the post I use FiFo to manage my VMs. To get this working you need some extra tweaks. To start with you need to add the following to your dsapid servers nginx.conf.
if ($request_method = 'OPTIONS') { add_header 'Access-Control-Allow-Origin' '*'; add_header 'Access-Control-Allow-Credentials' 'true'; add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS'; add_header 'Access-Control-Allow-Headers' 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type'; add_header 'Access-Control-Max-Age' 1728000; add_header 'Content-Type' 'text/plain charset=UTF-8'; add_header 'Content-Length' 0; return 204; } if ($request_method = 'POST') { add_header 'Access-Control-Allow-Origin' '*'; add_header 'Access-Control-Allow-Credentials' 'true'; add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS'; add_header 'Access-Control-Allow-Headers' 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type'; } if ($request_method = 'GET') { add_header 'Access-Control-Allow-Origin' '*'; add_header 'Access-Control-Allow-Credentials' 'true'; add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS'; add_header 'Access-Control-Allow-Headers' 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type'; }
Add this to the last three “location” statements containing references to /datasets and restart nginx. Now it’s time to make the last change, tell FiFo to use the local dsapi server. Edit /opt/local/jingles/app/scripts/config.js and change the following line:
datasets: 'datasets.at',
Change from datasets.at to whatever your server is called.
Done. Enjoy using your local Dataset API Server!