Ruby's standard CSV library is one of the best in any language — it handles BOM, custom delimiters, quoted fields, and multi-line values correctly. For most CSV-to-JSON jobs, you don't need an external gem. smarter_csv adds streaming for very large files and automatic type coercion (turning '42' into 42 and 'true' into true).
Method 1: Ruby CSV stdlib (no gems needed)
Ruby's built-in CSV library handles most real-world CSVs correctly without any gems.
require 'csv'
require 'json'
# Simple: array of hashes
rows = CSV.read('data.csv', headers: true, header_converters: :symbol)
json = JSON.pretty_generate(rows.map(&:to_h))
File.write('output.json', json)
puts "Done: #{rows.size} rows"
require 'csv'
require 'json'
def csv_to_json(csv_path, out_path, options = {})
default_opts = {
headers: true,
header_converters: :symbol, # 'First Name' -> :first_name
converters: :all, # auto-convert numbers and dates
encoding: 'UTF-8',
liberal_parsing: true # more tolerant of malformed CSV
}
opts = default_opts.merge(options)
rows = []
CSV.foreach(csv_path, **opts) do |row|
rows << row.to_h
end
File.write(out_path, JSON.pretty_generate(rows))
rows.size
end
count = csv_to_json('users.csv', 'users.json')
puts "Converted #{count} rows"
Key options:
- header_converters: :symbol — converts 'First Name' to :first_name (snake_cased symbol). Useful for consistent JSON keys.
- converters: :all — automatically converts '42' to 42, '3.14' to 3.14, 'true' to true. Watch out: it also converts date-like strings ('2024-01-15') to Date objects — JSON serialization of Date objects returns a string anyway, but the behavior may surprise you.
- liberal_parsing: true — more tolerant of real-world CSVs with unbalanced quotes or inconsistent line endings.
Method 2: smarter_csv (streaming, type coercion, large files)
smarter_csv is built for large CSVs and adds chunked processing, custom key mappings, and explicit type conversion.
gem install smarter_csv
# Or Gemfile: gem 'smarter_csv'require 'smarter_csv'
require 'json'
# Simple: load all rows
rows = SmarterCSV.process('data.csv', {
key_mapping: {
first_name: :firstName, # remap header to camelCase key
last_name: :lastName,
},
remove_empty_values: true, # drop nil/empty fields
convert_values_to_numeric: true, # '42' -> 42
strings_as_keys: false # keep symbol keys
})
File.write('output.json', JSON.pretty_generate(rows))
puts "Done: #{rows.size} rows"
require 'smarter_csv'
require 'json'
# Chunked processing for large files (memory-efficient)
def large_csv_to_json_lines(csv_path, out_path)
File.open(out_path, 'w') do |f|
SmarterCSV.process(csv_path, chunk_size: 1000) do |chunk|
chunk.each do |row|
f.puts JSON.generate(row) # JSON Lines format: one JSON object per line
end
end
end
end
large_csv_to_json_lines('million_rows.csv', 'output.jsonl')
For CSV files over 100MB, chunked processing with smarter_csv keeps memory constant regardless of file size. The JSON Lines output format (one JSON object per line) is also more streaming-friendly than a single JSON array.
Method 3: ChangeThisFile API (Net::HTTP, no parsing code)
The API converts CSV to JSON server-side. Source auto-detected from filename — pass target=json. Free tier: 1,000 conversions/month, no card needed.
require 'net/http'
require 'uri'
require 'securerandom'
API_KEY = 'ctf_sk_your_key_here'
def csv_to_json_api(csv_path, out_path)
uri = URI('https://changethisfile.com/v1/convert')
boundary = "CTF#{SecureRandom.hex(8)}"
file_data = File.binread(csv_path)
body = [
"--#{boundary}\r\n",
'Content-Disposition: form-data; name="file"; filename="' + File.basename(csv_path) + "\"\r\n",
"Content-Type: text/csv\r\n\r\n",
file_data, "\r\n",
"--#{boundary}\r\n",
"Content-Disposition: form-data; name=\"target\"\r\n\r\n",
"json\r\n",
"--#{boundary}--\r\n"
].join
req = Net::HTTP::Post.new(uri)
req['Authorization'] = "Bearer #{API_KEY}"
req['Content-Type'] = "multipart/form-data; boundary=#{boundary}"
req.body = body
resp = Net::HTTP.start(uri.host, uri.port, use_ssl: true, read_timeout: 60) { |h| h.request(req) }
raise "API error: #{resp.code}" unless resp.code == '200'
File.write(out_path, resp.body)
end
csv_to_json_api('data.csv', 'output.json')
puts 'Done'
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| CSV stdlib | Most use cases — no gem dependency, handles real-world CSVs well | Loads whole file into memory; manual type conversion if needed |
| smarter_csv | Large files, chunked processing, custom key mapping | External gem; slightly more configuration |
| ChangeThisFile API | Quick one-offs, no parsing code, shared hosting | Network call; free tier 25MB limit |
Production tips
- Specify encoding explicitly for user uploads. Default to 'UTF-8' but rescue Encoding::InvalidByteSequenceError and retry with 'ISO-8859-1'. Many Excel-exported CSVs are in Windows-1252 (similar to ISO-8859-1).
- Use JSON.generate (not pretty_generate) for large outputs. Pretty-printing adds significant whitespace. For files over 1MB, JSON.generate reduces output size by 20-30%.
- Watch converters: :all with date-like strings. '2024-01-15' gets converted to a Date object by :all. In JSON, Date#to_json outputs '2024-01-15' anyway, but it bypasses the string key lookup path. Test with your actual data.
- nil vs empty string. CSV stdlib converts empty fields to nil by default. If downstream systems expect empty strings, convert: row.to_h.transform_values { |v| v.nil? ? '' : v }.
Ruby's CSV stdlib is one of the best in any language — use it for most cases. smarter_csv is worth adding for files over 100MB or when you need key remapping. Free API tier: 1,000 conversions/month.