Hacker News provides an official API which amongst other things provides access to Hacker News’ entire collection of posts, comments, polls, etc along with data for every user who has submitted a post or comment.
While you can use the Firebase client libraries to fetch this data, let’s write a simple Swift command line app to fetch the entire dataset and store it in a SQLite database.
All of the steps in this article are meant to be cross platform so we won’t be using XCode to create the app. You will need to have Swift downloaded and installed, this article uses Swift version 5.0.1
To create a new app, first make a directory having the same name as our app i.e HNDataSetBuilder, then cd to this directory and create a new executable using the Swift Package Manager.
mkdir HNDataSetBuilder
cd HNDataSetBuilder
swift package init --type executable
This will create the folders and files needed to code your app.
We will be modifying Package.swift later so that we can load SQLite as a module in our Swift code.
Our main app code will go in Sources/HNDataSetBuilder/main.swift
For the purpose of this article, all our Swift code will be within main.swift alone.
We are basically interested in the Item and User data structures, which we can represent very simply as structs which pretty much replicate the structure and types returned to us by the API.
struct Item {
let id: Int
let deleted: Bool?
let type: String
let by: String?
let time: Double?
let text: String?
let dead: Bool?
let parent: Int?
let poll: Int?
let kids: [Int]?
let url: String?
let score: Int?
let title: String?
let parts: [Int]?
let descendants: Int?
}
struct User {
let id: String
let delay: Int?
let created: Double?
let karma: Int
let about: String?
let submitted: [Int]?
}
The live data access part of the API isn’t of much interest to us, except the max item id, which we will be using to determine when our code needs to stop fetching data.
We use URLSession to generate tasks which make the API calls asynchronously. Upon completion of the API call, our callback generates a Notification which a corresponding notification handler uses to launch the next task.
To start with, lets define our structs as conforming to the Codable protocol, this will allow us to easily parse the incoming JSON to the corresponding structs. Instead of directly marking the structs as conforming to Codable, we define our own protocol Resource which in turn conforms to Codable, allowing us to later define additional functionality to save the resources to our database.
protocol Resource: Codable {
}
struct Item: Resource {
let id: Int
let deleted: Bool?
let type: String
let by: String?
let time: Double?
let text: String?
let dead: Bool?
let parent: Int?
let poll: Int?
let kids: [Int]?
let url: String?
let score: Int?
let title: String?
let parts: [Int]?
let descendants: Int?
}
struct User: Resource {
let id: String
let delay: Int?
let created: Double?
let karma: Int
let about: String?
let submitted: [Int]?
}
We now add the code which will perform the API requests. We define an enum called ResourceType to handle the slight differences when loading the two different resources.
import Foundation
enum ResourceType {
case item(id:UInt)
case user(id:String)
}
extension ResourceType {
func url() -> URL {
switch self {
case .item(let id): return URL(string: "https://hacker-news.firebaseio.com/v0/item/\(id).json")!
case .user(let id): return URL(string: "https://hacker-news.firebaseio.com/v0/user/\(id).json")!
}
}
func decode(_ data: Data) -> Resource {
let decoder = JSONDecoder()
switch self {
case .item: return try! decoder.decode(Item.self, from: data)
case .user: return try! decoder.decode(User.self, from: data)
}
}
func get() {
let url = self.url()
let task = URLSession.shared.dataTask(with: url) {(data: Data?, _: URLResponse?, _:Error?) in
if let json = data {
print("\(url)")
let resource = self.decode(json)
NotificationCenter.default.post(name: Notification.Name(rawValue: "getSuccessful"), object: resource)
}
}
task.resume()
}
}
We could have included the functionality of the url() and decode(_ data:Data) -> Resource methods within the get function, but extracting it out to a separate function keeps the code much more readable.
Our get() function simply requests the URL for the resource asynchronously and fires a notification with the newly created object upon success.
Now that we have the code to fetch the data ready, lets focus on saving it to disk. We will be saving the fetched data to a local SQLite database.
There are some good libraries which provide a Swift wrapper over SQLite, however since our needs are quite basic, we will be using the SQLite C interface to directly run our SQL queries on our database.
First we modify our Package.swift. We add a systemLibrary target and modify our main target to add this library as a dependency.
targets: [
.systemLibrary(
name: "CSQLite",
path: "Libraries/CSQLite",
pkgConfig: "sqlite3"),
.target(
name: "HNDataSetBuilder",
dependencies: ["CSQLite"]),
.testTarget(
name: "HNDataSetBuilderTests",
dependencies: ["HNDataSetBuilder"]),
]
Now create a folder named Libraries and subfolder CSQLite in our project folder alongside the Sources folder.\
In Libraries\CSQLite we create two files
module.modulemapsqlite.hEdit the module.modulemap file to declare a module called sqlite. This will allow us to import the module in our Swift code using import sqlite
module sqlite {
umbrella header "sqlite.h"
link "sqlite3"
}
The umbrella header "sqlite.h" tells the compiler to include our sqlite.h file, and link sqlite3 is used to tell the linker, which library to link.\
If you’re on Ubuntu you can check whether or not you’re using the correct library name in the link command by running
pkg-config --cflags <LIBRARY_NAME>
If the library name is incorrect or the library is missing, you should see an error saying that the package could not be found.
In case the sqlite3 library is not found on your system, you will need to install it using apt-get install libsqlite3-dev on Ubuntu or using brew on Mac OS
Now edit the sqlite.h file to load the system installed sqlite3 headers file
#include <sqlite3.h>
We will be adding a little more to our custom header file a bit later, but for now this allows us to call all the SQLite C functions, directly from our Swift code.
Time to edit our main.swift file again.
Add the import sqlite statement at the top of our file so that the sqlite module becomes available.
We now add a small class which will expose a singleton instance member for access to the database.
import sqlite
class SQLite {
var db: OpaquePointer?
static let shared = SQLite()
private init() {
if sqlite3_open("database.sqlite", &db) != SQLITE_OK {
fatalError("Unable to open database")
}
}
}
Now wrap functionality related to creating and executing SQL statements, in a separate Statement class
typealias SQL = String
class Statement {
private var stmt: OpaquePointer?
private var sql: SQL
private var bindParameterCount: Int
public var hasData: Bool
init(sql: SQL) {
if sqlite3_prepare_v2(SQLite.shared.db, sql, -1, &stmt, nil) != SQLITE_OK {
fatalError("Failed to prepare statement for : \(sql)")
}
self.sql = sql
bindParameterCount = Int(sqlite3_bind_parameter_count(stmt))
hasData = false
}
func run(_ bindings:[String: Any?]? = nil) {
var result: Int32
hasData = false
if let bindings = bindings {
if bindings.count != bindParameterCount {
fatalError("Bindings count: \(bindings.count), differs from bindable parameters count: \(bindParameterCount)")
}
let _ = sqlite3_clear_bindings(stmt)
for (key, value) in bindings {
let idx = sqlite3_bind_parameter_index(stmt, "@" + key)
if idx == 0 {
fatalError("Cannot get bind index for \(key) in \(sql) with \(bindParameterCount) bindable parameters")
}
if value == nil {
result = sqlite3_bind_null(stmt, idx)
}
else if let value = value as? Int {
result = sqlite3_bind_int64(stmt, idx, Int64(value))
}
else if let value = value as? UInt {
result = sqlite3_bind_int64(stmt, idx, Int64(value))
}
else if let value = value as? [Int] {
result = sqlite3_bind_text(stmt, idx, value.map(){String($0)}.joined(separator:","), -1, SQLITE_TRANSIENT)
}
else if let value = value as? Double {
result = sqlite3_bind_double(stmt, idx, value)
}
else if let value = value as? String {
result = sqlite3_bind_text(stmt, idx, value, -1, SQLITE_TRANSIENT)
}
else if let value = value as? Bool {
result = sqlite3_bind_int(stmt, idx, (value==true) ? 1 : 0)
}
else {
fatalError("Cannot bind \(idx) \(key) \(String(describing:value))")
}
if result != SQLITE_OK {
fatalError("\(result): Cannot bind \(idx) \(key) \(String(describing:value))")
}
}
}
result = sqlite3_step(stmt)
if result == SQLITE_ROW {
hasData = true
}
else if result != SQLITE_DONE {
fatalError("\(result): \(sql) with \(bindings ?? [:])")
}
}
deinit {
sqlite3_finalize(stmt)
}
}
While the code snippet seems a bit long, most of it is just error checking, with the core functionality being quite simple.\
We initialize Statement instances with a SQL statement, which is then executed by calling run(_ bindings:[String: Any?]? = nil)\
The optional bindings dictionary allows us to bind values in insert statements.
One thing to note is the way we save the Int arrays in SQL. Since we can’t directly store arrays in SQL, we convert it into a comma separated string prior to storing and this must be kept into consideration whenever we access the data later on.
If you try to compile the code at this point, compilation will fail with the following error
error: use of unresolved identifier 'SQLITE_TRANSIENT'
If you look through the sqlite3.h file, SQLITE_TRANSIENT is defined using the following macro definition
#define SQLITE_TRANSIENT ((sqlite3_destructor_type)-1)
This however fails to be recognized from within the Swift code.
There are two valid approaches to solving this
If you want to keep the fix purely Swift, you can see in this file, how a value is stored in the SQLITE_TRANSIENT constant using unsafeBitCast
However, I personally do not feel that this is the right approach and would prefer to handle all system library related issues separate from my main app code.
This way if you ever want to reuse the system library code, you do not need to keep track of all the fixes or workarounds you had to use in order to get the library to work.
So we modify our custom sqlite.h to undef the old definition of SQLITE_TRANSIENT and to store it as a variable instead.
#include <sqlite3.h>
#undef SQLITE_TRANSIENT
sqlite3_destructor_type SQLITE_TRANSIENT = ((sqlite3_destructor_type) -1);
Now our code should compile properly, so let’s start saving data.
Let’s now include functions to create a table for the resources and to insert new rows into the table.
Our create() method would be a static method on the struct, whereas the save() method would be called per instance.
protocol Resource: Codable {
static func create() -> Void
func save() -> Void
}
struct Item: Resource {
let id: Int
let deleted: Bool?
let type: String
let by: String?
let time: Double?
let text: String?
let dead: Bool?
let parent: Int?
let poll: Int?
let kids: [Int]?
let url: String?
let score: Int?
let title: String?
let parts: [Int]?
let descendants: Int?
static func create() {
Statement(sql: """
CREATE TABLE IF NOT EXISTS Item (
id INTEGER PRIMARY KEY,
deleted INTEGER,
type TEXT,
by TEXT,
time DOUBLE,
text TEXT,
dead INTEGER,
parent INTEGER,
poll INTEGER,
kids TEXT,
url TEXT,
score INTEGER,
title TEXT,
parts TEXT,
descendants TEXT)
""").run()
}
func save() {
let insertStatement = Statement(sql: "INSERT into Item(id, deleted, type, by, time, text, dead, parent, poll, kids, url, score, title, parts, descendants) values (@id, @deleted, @type, @by, @time, @text, @dead, @parent, @poll, @kids, @url, @score, @title, @parts, @descendants)")
let bindings:[String: Any?] = ["id": id, "deleted": deleted, "type": type, "by": by, "time": time, "text": text, "dead": dead, "parent": parent, "poll": poll, "kids": kids, "url": url, "score": score, "title": title, "parts": parts, "descendants": descendants]
insertStatement.run(bindings)
}
}
struct User: Resource {
let id: String
let delay: Int?
let created: Double?
let karma: Int
let about: String?
let submitted: [Int]?
static func create() {
Statement(sql: """
CREATE TABLE IF NOT EXISTS User (
id String PRIMARY KEY,
delay INTEGER,
created DOUBLE,
karma INTEGER,
about TEXT,
submitted TEXT)
""").run()
}
func save() {
let insertStatement = Statement(sql: "INSERT into User(id, delay, created, karma, about, submitted) values (@id, @delay, @created, @karma, @about, @submitted)")
let bindings:[String: Any?] = ["id": id, "delay": delay, "created": created, "karma": karma, "about": about, "submitted": submitted]
insertStatement.run(bindings)
}
}
Also, let’s add another extension to our ResourceType enum to check whether or not the resource already exists in our database before firing a get request. This way we can stop and reload the app without having to worry about duplicate requests.
extension ResourceType {
func exists() -> Bool {
let sql: SQL
let bindings: [String: Any?]
switch self {
case .item(let id):
sql = "Select (id) from Item where id = @id"
bindings = ["id": id]
case .user(let id):
sql = "Select (id) from User where id = @id"
bindings = ["id": id]
}
let statement = Statement(sql: sql)
statement.run(bindings)
return statement.hasData
}
}
Now let’s bring it all together by writing a small ResourceManager class that
Since the GET requests are asynchronous, we need to ensure that we handle this in a thread-safe manner. To do this, we use a dispatch queue to perform all the operations where instance variables of the class are modified.
After each Item is fetched, we check if we have the details of the User who submitted that Item, and fetch it if we don’t.\
Here we need to handle a potential race condition where we may re-request details of a User whose details have already been requested but not yet received. Since we are using INSERT sql statements, these will fail if we request a user’s details twice.\
To deal with this scenario, we store fetched users in a separate set. If this set size becomes too big, we might need to clear old usernames from the set, however this is not implemented for now.
class ResourceManager {
let concurrency = 4
let maxItem: UInt
var nextItem: UInt = 1
var userNamesPending: Set<String> = []
var userNamesFetched: Set<String> = []
private let queue = DispatchQueue(label: "Resource Queue")
init(_ maxItem: UInt) {
self.maxItem = maxItem
let _ = NotificationCenter.default.addObserver(forName: Notification.Name(rawValue: "getSuccessful"), object: nil, queue: nil) { notification in
let resource: Resource = notification.object as! Resource
resource.save()
self.next(resource)
}
}
func start() {
for _ in 0..<concurrency {
next()
}
}
func next(_ previous:Resource? = nil) {
queue.sync {
var resource: ResourceType
if let previous = previous as? Item {
if let by = previous.by {
let _ = userNamesPending.insert(by)
}
}
while userNamesPending.count > 0 {
let userName = userNamesPending.removeFirst()
resource = ResourceType.user(id: userName)
if !(userNamesFetched.contains(userName) || resource.exists()) {
resource.get()
userNamesFetched.insert(userName)
return
}
}
repeat {
resource = ResourceType.item(id: nextItem)
nextItem += 1
} while resource.exists()
if (nextItem < maxItem) {
resource.get()
}
}
}
}
Finally let’s write a main() function and call it from our code. This function first creates the tables in the database, gets the Maximum Item id, and then starts the ResourceManager.
func main() {
Item.create()
User.create()
let task = URLSession.shared.dataTask(with: URL(string: "https://hacker-news.firebaseio.com/v0/maxitem.json")!) {(data: Data?, _: URLResponse?, _:Error?) in
if let data = data {
let maxItem = UInt(String(decoding: data, as: UTF8.self))!
ResourceManager(maxItem).start()
}
}
task.resume()
RunLoop.current.run()
}
main()
You can get the tool to immediately build and execute using the swift run command, however to get a release build ready, simply run
swift build --configuration release
The built executable would then be generated at .build/release/HNDataSetBuilder
Even though we are performing several requests concurrently without any throttling, fetching the entire dataset one resource at a time is very time consuming and could take several days to complete.
However this app shows a number of approaches that can be taken by a Swift based app to fetch data from an API.
We could possibly further improve this code to add support for