Building Your Data Inventory
The first major task for the Data Coordinators is to create a data catalog (or inventory) of your department or agency’s data. In this document and other resources, we’ve tried to explain how to do this, but we recognize that this is part of a learning process. We expect to make changes to this guidance and other resources based on your feedback, and will talk with you to work out issues and questions whenever needed.
Follow the 3 major steps below to conduct your data inventory:
- Identify data sources (databases, systems, spreadsheets, paper)
- Brainstorm and identify datasets in each data source
- Complete dataset inventory template (for each dataset)
The Reference section of this site includes templates to support this process.
Step 1: Identify data sources
Your data may be housed in a variety of places - from inside information systems or databases to shared drives and folders. This can also include 3rd party vendors and data hosted on vendor systems. Step 1 is about identifying the major data sources in your department.
Questions to help identify and discover data sources:
- What does your department use to report to ResultsNOLA or STAT program?
- What information systems does your department use?
- What databases does your department use?
- What applications capture information or are used in your business processes?
- Are some data resources kept in spreadsheets (on shared or individual drives)?
- What information are we already publishing and where did that information come from?
For each of the data sources:
- Provide a name and brief description of the data sources
- Capture any technical details and point of contacts.
Steps 2 and 3 Guidance
Below we describe at a high level the next 2 steps of the inventory. Visit the Reference section of this website for more detail.
Step 2: Brainstorm and identify potential datasets in each data source
Some of your information sources may be fairly straightforward (e.g. a single sheet in a spreadsheet). In these cases, you have already identified the dataset.
In addition, you may already have a list of datasets you are publishing or plan to publish.
But others, like relational databases (RCS or LAMA), may be very complex. Identifying subsets of the database that could serve as datasets, probably requires some brainstorming. You may want to include others from your department or IT (if there is someone who is the technical expert) in this process.
To help brainstorm, use the questions below:
- What data populates your monthly or quarterly reports?
- What departmental data is currently publicly available on data.nola.gov or elsewhere online?
- What data does your department use for internal performance review?
- What information is published as a performance metric in ResultsNOLA or STAT program?
- What data is reported to federal, state or local agencies?
- Talk with your department director or deputy director - what data has been asked for under public records requests?
- What data do other departments ask for?
- What kinds of open data are similar agencies across the country publishing?
Caution
Don’t exclude any datasets based on privacy or confidentiality concerns! Our goal is to have a holistic picture of our data. Based on this big picture, we can then decide what we should publish. Step 3 provides a means to capture privacy and confidentiality concerns.
Step 3: Complete dataset inventory template
For each dataset you identify in Step 2, complete the inventory template. Include:
- New datasets (identified via brainstorming)
- Existing datasets, including already published datasets (visit data.nola.gov for these)
Data Inventory
The data inventory template includes drop down menus and other data tips to make it as easy as possible to complete your inventory.
Template Options:
- Excel: Download excel template from the Reference section or request it via email from wlsoenksen@nola.gov
- Form: It is best used in conjunction with guidance from the Data Team and/or the Excel template (mentioned above). You can find the online inventory form here and in the Reference section.
Example: Code Enforcement
Below we provide a partial example of conducting an inventory for Code Enforcement.
Step 1: Identify data sources
# |
Name |
Description |
Technical details |
Key POCs |
1 |
LAMA |
The vast majority of Code Enforcement’s data is kept in a single relational database, referred to as “LAMA”. The database consists of 4 main types of information:
- 1. Inspections
- 2. Violations
- 3. Hearings
- 4. Addresses
|
LAMA |
Maintained by Davenport Group
Business POC is Rebecca Houtman
|
Step 2: Brainstorm and identify potential datasets in each data source
Given the 4 types of information in LAMA, the following datasets could be extracted from the database:
- Inspection records. Code Enforcement staff complete these. This database includes information such as:
- Property condition details
- Photos
- Various details related to property ownership and historical action
- Violations records. Code Enforcement staff file a violation after a failed inspection. This database includes information such as:
- Property details
- Violation details
- Various details related to property ownership and historical action
- Hearing notices. The City is required to file hearing notices when someone receives a violation and does not voluntarily comply. This database includes information such as:
- Property details
- Type of violation
The address database is probably not a useful dataset to share as it is provided by other departments that do the actual maintenance on the dataset.
Step 3: Complete dataset inventory template
Please include anything that you know about the systems or datasets that you’d like the DataDriven team to know. Examples of these include…
- Existing reporting requirements that are inefficient and resource intensive
- A clearly defined use case emerges for publishing the data on DataDriven
- It’s easy to publish and doesn’t require updating
- The data will inform a high-profile issue or concern but people may not be aware of it
- Existing reporting already automates the data extractions making it easy to add to DataDriven
- During the publication of a related dataset, it becomes trivial to publish it as well
- An upcoming migration to a new backend system (this would create additional work by automating the publication twice)
- You have major data quality concerns such that the data is not usable
- The data is not available in a structured manner (e.g. it’s not in a database or well-designed spreadsheet)
See a completed inventory for one dataset from Safety and Permits.