Discussion:
XML Output: libxo - provide single API to output TXT, XML, JSON and HTML
Simon Gerraty
2014-07-25 04:49:21 UTC
Permalink
Hi,

At a vendor summit a few years ago I asked about whether anyone but us
(Juniper) would be interested in the ablity to have standard BSD apps
output XML.

I was actually surprised by the amount of interest expressed.
I've occasionally nagged our UI team ever since for a clean and simple
API that we could contribute to address this.

We now have a what I think is a viable candidate
and we'd like to take the next steps towards contributing it and
converting at least a few apps.

Not only does it handle TXT and XML output but JSON and HTML as well,
and very rich HTML at that.
With some slick javascript - you can do amazing things with the level of
detail you can get out of this sort of thing.

The API is of necessity a bit more complex than just printf(3).
Considering the level of functionality available though it is a good
tradeoff.

The main open issue (assuming this functionality is still desired) is
support of wide charachters.

We figure the worst case solution is a sed(1) script to generate the wide
version of the API from the normal one, but perhaps simply always using
UTF8 would be a better solution?

Thanks
--sjg

The following from Phil provides some idea of the functionality
available and the API.
The one shown here uses the default output handle (stdout), but there
are variants that allow multiple output handles.
T X J and H are the modes (text, xml, json, html);
P means pretty print (indent, newlines)
I means print help (datatype, description), if provided (there aren't for "w")
x means print xpath to the data)
% foreach i ( T XP JP HP HPIx )
echo === $i ===
env LIBXO_OPTIONS=$i ./xtest -n | head -10
end
=== T ===
6:47PM up 18 days, 2:01, 9 user%s, load averages: 0.00, 0.00, 0.00
phil pts/0 76.182.32.73 5:09PM 33 /bin/sh
phil pts/1 76.182.32.73 05Jul14 2 /usr/bin/perl /u/phil/bin/plum (
phil pts/2 76.182.32.73 05Jul14 1 /bin/tcsh
phil pts/3 76.182.32.73 05Jul14 2days ssh dent
phil pts/4 76.182.32.73 Tue02PM 2days ssh svl-junos-d026.juniper.net
phil pts/5 76.182.32.73 Wed01AM 2days telnet man-o-war 2006
phil pts/6 76.182.32.73 Fri10PM 2days ssh 198.85.229.65
phil pts/7 76.182.32.73 Fri10PM 2days ssh zap
=== XP ===
<uptime-information>
<time-of-day> 6:47PM</time-of-day>
<uptime seconds="1562436">18 days</uptime>
<uptime> 2:01</uptime>
<users>9</users>
<load-average-1>0.00</load-average-1>
<load-average-5>0.00</load-average-5>
<load-average-15>0.00</load-average-15>
<user-table>
<user-entry>
=== JP ===
"uptime-information": {
"time-of-day": " 6:47PM",
"uptime": "18 days",
"uptime": 2:01,
"users": 9,
"load-average-1": 0.00,
"load-average-5": 0.00,
"load-average-15": 0.00,
"user-table": {
"user-entry": [
=== HP ===
<div class="line">
<div class="data" data-tag="time-of-day"> 6:47PM</div>
<div class="text"> </div>
<div class="text"> up</div>
<div class="text"> </div>
<div class="data" data-tag="uptime">18 days</div>
<div class="text">,</div>
<div class="text"> </div>
<div class="data" data-tag="uptime"> 2:01</div>
<div class="text">,</div>
=== HPIx ===
<div class="line">
<div class="data" data-tag="time-of-day" data-xpath="/uptime-information/time-of-day"> 6:47PM</div>
<div class="text"> </div>
<div class="text"> up</div>
<div class="text"> </div>
<div class="data" data-tag="uptime" data-xpath="/uptime-information/uptime">18 days</div>
<div class="text">,</div>
<div class="text"> </div>
<div class="data" data-tag="uptime" data-xpath="/uptime-information/uptime"> 2:01</div>
<div class="text">,</div>
Thanks,
Phil
-----
FWIW: here's the diff for "w". I don't have "wchar_t" support
yet, so I just undid it for now.
diff -rbu /usr/src/usr.bin/w/pr_time.c ./pr_time.c
--- /usr/src/usr.bin/w/pr_time.c 2010-12-21 12:09:25.000000000 -0500
+++ ./pr_time.c 2014-07-21 17:12:19.000000000 -0400
@@ -55,10 +55,10 @@
int
pr_attime(time_t *started, time_t *now)
{
- static wchar_t buf[256];
+ static char buf[256];
struct tm tp, tm;
time_t diff;
- wchar_t *fmt;
+ char *fmt;
int len, width, offset = 0;
tp = *localtime(started);
@@ -67,7 +67,7 @@
/* If more than a week, use day-month-year. */
if (diff > 86400 * 7)
- fmt = L"%d%b%y";
+ fmt = "%d%b%y";
/* If not today, use day-hour-am/pm. */
else if (tm.tm_mday != tp.tm_mday ||
@@ -75,23 +75,23 @@
tm.tm_year != tp.tm_year) {
/* The line below does not take DST into consideration */
/* else if (*now / 86400 != *started / 86400) { */
- fmt = use_ampm ? L"%a%I%p" : L"%a%H";
+ fmt = use_ampm ? "%a%I%p" : "%a%H";
}
/* Default is hh:mm{am,pm}. */
else {
- fmt = use_ampm ? L"%l:%M%p" : L"%k:%M";
+ fmt = use_ampm ? "%l:%M%p" : "%k:%M";
}
- (void)wcsftime(buf, sizeof(buf), fmt, &tp);
- len = wcslen(buf);
- width = wcswidth(buf, len);
+ (void)strftime(buf, sizeof(buf), fmt, &tp);
+ len = strlen(buf);
+ width = len;
if (len == width)
- (void)wprintf(L"%-7.7ls", buf);
+ xo_emit("{:login-time/%-7.7s}", buf);
else if (width < 7)
- (void)wprintf(L"%ls%.*s", buf, 7 - width, " ");
+ xo_emit("{:login-time/%s}%.*s", buf, 7 - width, " ");
else {
- (void)wprintf(L"%ls", buf);
+ xo_emit("{:login-time/%s}", buf);
offset = width - 7;
}
return (offset);
@@ -108,7 +108,7 @@
/* If idle more than 36 hours, print as a number of days. */
if (idle >= 36 * 3600) {
int days = idle / 86400;
- (void)printf(" %dday%s ", days, days > 1 ? "s" : " " );
+ xo_emit(" {:idle/%dday%s} ", days, days > 1 ? "s" : " " );
if (days >= 100)
return (2);
if (days >= 10)
@@ -117,15 +117,15 @@
/* If idle more than an hour, print as HH:MM. */
else if (idle >= 3600)
- (void)printf(" %2d:%02d ",
+ xo_emit(" {:idle/%2d:%02d/} ",
(int)(idle / 3600), (int)((idle % 3600) / 60));
else if (idle / 60 == 0)
- (void)printf(" - ");
+ xo_emit(" - ");
/* Else print the minutes idle. */
else
- (void)printf(" %2d ", (int)(idle / 60));
+ xo_emit(" {:idle/%2d} ", (int)(idle / 60));
return (0); /* not idle longer than 9 days */
}
diff -rbu /usr/src/usr.bin/w/w.c ./w.c
--- /usr/src/usr.bin/w/w.c 2010-12-21 12:09:25.000000000 -0500
+++ ./w.c 2014-07-21 18:13:50.000000000 -0400
@@ -86,6 +86,7 @@
#include <unistd.h>
#include <utmp.h>
#include <vis.h>
+#include <libxo/libxo.h>
#include "extern.h"
@@ -260,9 +261,12 @@
}
(void)fclose(ut);
+ xo_open_container("uptime-information");
+
if (header || wcmd == 0) {
pr_header(&now, nusers);
if (wcmd == 0) {
+ xo_close_container("uptime-information");
(void)kvm_close(kd);
exit(0);
}
@@ -274,11 +278,11 @@
#define HEADER_WHAT "WHAT\n"
#define WUSED (UT_NAMESIZE + UT_LINESIZE + W_DISPHOSTSIZE + \
sizeof(HEADER_LOGIN_IDLE) + 3) /* header width incl. spaces */
- (void)printf("%-*.*s %-*.*s %-*.*s %s",
+ xo_emit("{T:/%-*.*s} {T:/%-*.*s} "
UT_NAMESIZE, UT_NAMESIZE, HEADER_USER,
UT_LINESIZE, UT_LINESIZE, HEADER_TTY,
- W_DISPHOSTSIZE, W_DISPHOSTSIZE, HEADER_FROM,
- HEADER_LOGIN_IDLE HEADER_WHAT);
+ W_DISPHOSTSIZE, W_DISPHOSTSIZE, HEADER_FROM);
}
if ((kp = kvm_getprocs(kd, KERN_PROC_ALL, 0, &nentries)) == NULL)
@@ -347,6 +351,9 @@
}
}
+ xo_open_container("user-table");
+ xo_open_list("user-entry");
+
for (ep = ehead; ep != NULL; ep = ep->next) {
char host_buf[UT_HOSTSIZE + 1];
struct sockaddr_storage ss;
@@ -356,6 +363,8 @@
time_t t;
int isaddr;
+ xo_open_instance("user-entry");
+
host_buf[UT_HOSTSIZE] = '\0';
strncpy(host_buf, ep->utmp.ut_host, UT_HOSTSIZE);
p = *host_buf ? host_buf : "-";
@@ -388,6 +397,9 @@
p = buf;
}
if (dflag) {
+ xo_open_container("process-table");
+ xo_open_list("process-entry");
+
for (dkp = ep->dkp; dkp != NULL; dkp = debugproc(dkp)) {
const char *ptr;
@@ -395,23 +407,37 @@
dkp->ki_comm, MAXCOMLEN);
if (ptr == NULL)
ptr = "-";
- (void)printf("\t\t%-9d %s\n",
+ xo_open_instance("process-entry");
+ xo_emit("\t\t{:process-id/%-9d/%d} "
+ "{:command/%s}\n",
dkp->ki_pid, ptr);
+ xo_close_instance("process-entry");
}
+ xo_close_list("process-entry");
+ xo_close_container("process-table");
}
- (void)printf("%-*.*s %-*.*s %-*.*s ",
UT_NAMESIZE, UT_NAMESIZE, ep->utmp.ut_name,
UT_LINESIZE, UT_LINESIZE,
strncmp(ep->utmp.ut_line, "tty", 3) &&
strncmp(ep->utmp.ut_line, "cua", 3) ?
ep->utmp.ut_line : ep->utmp.ut_line + 3,
W_DISPHOSTSIZE, W_DISPHOSTSIZE, *p ? p : "-");
+
t = _time_to_time32(ep->utmp.ut_time);
longattime = pr_attime(&t, &now);
longidle = pr_idle(ep->idle);
- (void)printf("%.*s\n", argwidth - longidle - longattime,
- ep->args);
+ argwidth - longidle - longattime, ep->args);
+
+ xo_close_instance("user-entry");
}
+
+ xo_close_list("user-entry");
+ xo_close_container("user-table");
+ xo_close_container("uptime-information");
+
(void)kvm_close(kd);
exit(0);
}
@@ -430,7 +456,7 @@
*/
if (strftime(buf, sizeof(buf),
use_ampm ? "%l:%M%p" : "%k:%M", localtime(nowp)) != 0)
- (void)printf("%s ", buf);
+ xo_emit("{:time-of-day/%s} ", buf);
/*
* Print how long system has been up.
*/
@@ -444,35 +470,45 @@
uptime %= 3600;
mins = uptime / 60;
secs = uptime % 60;
- (void)printf(" up");
+ xo_emit(" up");
+ xo_attr("seconds", "%lu", (unsigned long) tp.tv_sec);
if (days > 0)
- (void)printf(" %d day%s,", days, days > 1 ? "s" : "");
+ xo_emit(" {:uptime/%d day%s},",
+ days, days > 1 ? "s" : "");
if (hrs > 0 && mins > 0)
- (void)printf(" %2d:%02d,", hrs, mins);
+ xo_emit(" {:uptime/%2d:%02d},", hrs, mins);
else if (hrs > 0)
- (void)printf(" %d hr%s,", hrs, hrs > 1 ? "s" : "");
+ xo_emit(" {:uptime/%d hr%s},",
+ hrs, hrs > 1 ? "s" : "");
else if (mins > 0)
- (void)printf(" %d min%s,", mins, mins > 1 ? "s" : "");
+ xo_emit(" {:uptime/%d min%s},",
+ mins, mins > 1 ? "s" : "");
else
- (void)printf(" %d sec%s,", secs, secs > 1 ? "s" : "");
+ xo_emit(" {:uptime/%d sec%s},",
+ secs, secs > 1 ? "s" : "");
}
/* Print number of users logged in to system */
- (void)printf(" %d user%s", nusers, nusers == 1 ? "" : "s");
+ xo_emit(" {:users/%d} user%s", nusers, nusers == 1 ? "" : "s");
/*
* Print 1, 5, and 15 minute load averages.
*/
if (getloadavg(avenrun, sizeof(avenrun) / sizeof(avenrun[0])) == -1)
- (void)printf(", no load average information available\n");
+ xo_emit(", no load average information available\n");
else {
- (void)printf(", load averages:");
+ static const char *format[] = {
+ " {:load-average-1/%.2f}",
+ " {:load-average-5/%.2f}",
+ " {:load-average-15/%.2f}",
+ };
+ xo_emit(", load averages:");
for (i = 0; i < (int)(sizeof(avenrun) / sizeof(avenrun[0])); i++) {
if (use_comma && i > 0)
- (void)printf(",");
- (void)printf(" %.2f", avenrun[i]);
+ xo_emit(",");
+ xo_emit(format[i], avenrun[i]);
}
- (void)printf("\n");
+ xo_emit("\n");
}
}
@@ -493,10 +529,9 @@
usage(int wcmd)
{
if (wcmd)
- (void)fprintf(stderr,
- "usage: w [-dhin] [-M core] [-N system] [user ...]\n");
+ xo_error("usage: w [-dhin] [-M core] [-N system] [user ...]\n");
else
- (void)fprintf(stderr, "usage: uptime\n");
+ xo_error("usage: uptime\n");
exit(1);
}
Lars Engels
2014-07-25 06:54:08 UTC
Permalink
Post by Simon Gerraty
Hi,
At a vendor summit a few years ago I asked about whether anyone but us
(Juniper) would be interested in the ablity to have standard BSD apps
output XML.
I was actually surprised by the amount of interest expressed.
I've occasionally nagged our UI team ever since for a clean and simple
API that we could contribute to address this.
We now have a what I think is a viable candidate
and we'd like to take the next steps towards contributing it and
converting at least a few apps.
Not only does it handle TXT and XML output but JSON and HTML as well,
and very rich HTML at that.
With some slick javascript - you can do amazing things with the level of
detail you can get out of this sort of thing.
The API is of necessity a bit more complex than just printf(3).
Considering the level of functionality available though it is a good
tradeoff.
The main open issue (assuming this functionality is still desired) is
support of wide charachters.
We figure the worst case solution is a sed(1) script to generate the wide
version of the API from the normal one, but perhaps simply always using
UTF8 would be a better solution?
Thanks
--sjg
FYI: There's also a Summer of Code project to handle the same:

https://wiki.freebsd.org/SummerOfCode2014/MachineReadableFromUserlandUtils

But I think that one mainly handles JSON output.

Lars
Simon J. Gerraty
2014-07-25 16:29:16 UTC
Permalink
Yes I know.
Juniper (as noted in that page) have been doing this for many
years (phil infact), and obviously have a strong vested interest in how
it is done in FreeBSD.

Thanks
--sjg
Jos Backus
2014-07-27 02:29:04 UTC
Permalink
It's a little sad to see that the more human-friendly and expressive YAML
format appears to not be supported. Instead, here too we are stuck with
JSON, the VHS of serialization formats. Better than nothing, I suppose.

Jos
Simon J. Gerraty
2014-07-28 05:42:17 UTC
Permalink
Post by Jos Backus
It's a little sad to see that the more human-friendly and expressive YAML
format appears to not be supported. Instead, here too we are stuck with
Is there a use case for something like vmstat outputting YAML?
It is a simple format, I guess it could be added, especially if it has
no format quirks worse that HTML and JSON.
Note: I'm just speculating.
Baptiste Daroussin
2014-07-28 05:53:37 UTC
Permalink
Post by Simon J. Gerraty
Post by Jos Backus
It's a little sad to see that the more human-friendly and expressive YAML
format appears to not be supported. Instead, here too we are stuck with
Is there a use case for something like vmstat outputting YAML?
It is a simple format, I guess it could be added, especially if it has
no format quirks worse that HTML and JSON.
Note: I'm just speculating.
YAML is all but a simple format, creating a subset of YAML that is good enough
is a simple format :), exporting a complete YAML (strongly typed etc) is
something else.

As machine readable format YAML is a nightmare to parse, JSON is very simpler
and easier (while json being a valid subject of YAML).

regards,
Bapt
Jos Backus
2014-07-28 06:06:56 UTC
Permalink
Post by Baptiste Daroussin
Post by Simon J. Gerraty
Post by Jos Backus
It's a little sad to see that the more human-friendly and expressive YAML
format appears to not be supported. Instead, here too we are stuck with
Is there a use case for something like vmstat outputting YAML?
It is a simple format, I guess it could be added, especially if it has
no format quirks worse that HTML and JSON.
Note: I'm just speculating.
YAML is all but a simple format, creating a subset of YAML that is good enough
is a simple format :), exporting a complete YAML (strongly typed etc) is
something else.
The full power/expressiveness of YAML may not be needed, we may just want
the right subset. And there may be cases where the extra expressiveness is
useful as more utilities are converted.
Post by Baptiste Daroussin
As machine readable format YAML is a nightmare to parse, JSON is very simpler
and easier (while json being a valid subject of YAML).
Well, the work has been done already (libyaml), so barring any bugs and
maintenance it's not an issue, no?

It would be great if libyaml and libucl would converge, but instead it's
likely that the number of solutions trying to solve the same problem will
continue to proliferate, and we are stuck with more and more configuration
file formats :-(

Jos
Post by Baptiste Daroussin
regards,
Bapt
Jordan Hubbard
2014-07-29 04:49:08 UTC
Permalink
Post by Jos Backus
It would be great if libyaml and libucl would converge, but instead it's
likely that the number of solutions trying to solve the same problem will
continue to proliferate, and we are stuck with more and more configuration
file formats :-(
I’m a huge fan of unified data formats; Apple picked XML and the plist DTD a long time ago, a decision which has worked rather nicely in practice, but I’m more in love with the unification that produced than I am in love with XML itself. That said, it seems like this late push for YAML is a similar case for divergence just because…erm… you don’t like JSON? It seems like libucl has basically backed JSON with the addition of a little syntactic sugar, so what’s wrong with that?

Is there some reason JSON is not sufficient? I think that’s a better question to ask, since the conversation otherwise quickly tends to sound a little like “I’ll accept any single unified format as long as it’s the specific one I like!” :) I think the greater good argument would suggest just picking one that’s expressive enough (roll a pair of dice), put on your bikeshed-proof sunglasses, and proceed.

- Jordan
Jos Backus
2014-07-29 05:18:38 UTC
Permalink
Post by Jordan Hubbard
Post by Jos Backus
It would be great if libyaml and libucl would converge, but instead it's
likely that the number of solutions trying to solve the same problem will
continue to proliferate, and we are stuck with more and more
configuration
Post by Jordan Hubbard
Post by Jos Backus
file formats :-(
I’m a huge fan of unified data formats; Apple picked XML and the plist
DTD a long time ago, a decision which has worked rather nicely in practice,
but I’m more in love with the unification that produced than I am in love
with XML itself. That said, it seems like this late push for YAML is a
similar case for divergence just because…erm… you don’t like JSON? It
seems like libucl has basically backed JSON with the addition of a little
syntactic sugar, so what’s wrong with that?

In general, as a tool, JSON is more limited/less expressive than YAML. Now
YAGNI may apply here but I personally am not sure so I'm tempted to opt for
the more flexible tool because of that. I could be wrong and maybe JSON is
all that's ever needed.
Post by Jordan Hubbard
Is there some reason JSON is not sufficient? I think that’s a better
question to ask, since the conversation otherwise quickly tends to sound a
little like “I’ll accept any single unified format as long as it’s the
specific one I like!” :) I think the greater good argument would suggest
just picking one that’s expressive enough (roll a pair of dice), put on
your bikeshed-proof sunglasses, and proceed.

That's a good point, and one I don't really disagree with. The main goal
here is to get us machine parsable output.

But part of me is sad because it's a lost opportunity to promote the more
flexible format. One of the reasons JSON is so popular is the network
effect, I think (it's popular because it's popular). Oh well. :)

Jos
Post by Jordan Hubbard
- Jordan
Adrian Chadd
2014-07-29 05:57:23 UTC
Permalink
Holy ... !

What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.

So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.

So if you want to win people over, just make sure it gets adopted
throughout other tools. :)


-a
Jos Backus
2014-07-29 06:03:55 UTC
Permalink
Post by Adrian Chadd
Holy ... !
What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.
So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.
So if you want to win people over, just make sure it gets adopted
throughout other tools. :)
Wouldn't the API be a way to build up an in-memory combination of lists and
maps in most cases, which then gets serialized out at emission time? Kind
of like calling object.to_yaml, where object is a Hash, but in C?

Jos
Post by Adrian Chadd
-a
Adrian Chadd
2014-07-29 06:29:00 UTC
Permalink
Post by Jos Backus
Post by Adrian Chadd
Holy ... !
What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.
So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.
So if you want to win people over, just make sure it gets adopted
throughout other tools. :)
Wouldn't the API be a way to build up an in-memory combination of lists and
maps in most cases, which then gets serialized out at emission time? Kind of
like calling object.to_yaml, where object is a Hash, but in C?
Not everything can be buffered like that over time. Time series data
may have stuff buffered up during an output / sample period (eg the
one a second output from vmstat 1)


-a
Jos Backus
2014-07-29 06:31:51 UTC
Permalink
Post by Adrian Chadd
Post by Jos Backus
Post by Adrian Chadd
Holy ... !
What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.
So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.
So if you want to win people over, just make sure it gets adopted
throughout other tools. :)
Wouldn't the API be a way to build up an in-memory combination of lists and
maps in most cases, which then gets serialized out at emission time? Kind of
like calling object.to_yaml, where object is a Hash, but in C?
Not everything can be buffered like that over time. Time series data
may have stuff buffered up during an output / sample period (eg the
one a second output from vmstat 1)
YAML has support for document streams, which might come in handy here :)

Jos
Post by Adrian Chadd
-a
Alfred Perlstein
2014-07-30 01:53:05 UTC
Permalink
Post by Adrian Chadd
Post by Jos Backus
Post by Adrian Chadd
Holy ... !
What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.
So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.
So if you want to win people over, just make sure it gets adopted
throughout other tools. :)
Wouldn't the API be a way to build up an in-memory combination of lists and
maps in most cases, which then gets serialized out at emission time? Kind of
like calling object.to_yaml, where object is a Hash, but in C?
Not everything can be buffered like that over time. Time series data
may have stuff buffered up during an output / sample period (eg the
one a second output from vmstat 1)
Yup, exactly!

This is very much needed and will be part of the gsoc project being
worked on.

-Alfred
Post by Adrian Chadd
-a
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
Simon J. Gerraty
2014-07-29 23:23:38 UTC
Permalink
Post by Jos Backus
Wouldn't the API be a way to build up an in-memory combination of lists and
maps in most cases, which then gets serialized out at emission time? Kind
You cannot rely on being able to do that.
Routing tables can be "big" especially when encoded in xml ;-)
You certainly cannot wait for all of it to arrive before you start
rendering.
Jos Backus
2014-07-29 23:30:53 UTC
Permalink
Post by Simon J. Gerraty
Post by Jos Backus
Wouldn't the API be a way to build up an in-memory combination of lists and
maps in most cases, which then gets serialized out at emission time? Kind
You cannot rely on being able to do that.
Routing tables can be "big" especially when encoded in xml ;-)
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.

Jos
Simon J. Gerraty
2014-07-30 03:46:41 UTC
Permalink
Post by Jos Backus
Post by Simon J. Gerraty
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.
Indeed; XML works fine for that.
Jos Backus
2014-07-30 04:44:17 UTC
Permalink
Post by Simon J. Gerraty
Post by Jos Backus
Post by Simon J. Gerraty
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.
Indeed; XML works fine for that.
Not to beat a dead horse, but so does YAML, and it's more lightweight/less
verbose so I personally find it more elegant. But sure, XML would work as
well.

Jos
Baptiste Daroussin
2014-07-30 07:15:00 UTC
Permalink
Post by Jos Backus
Post by Simon J. Gerraty
Post by Jos Backus
Post by Simon J. Gerraty
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.
Indeed; XML works fine for that.
Not to beat a dead horse, but so does YAML, and it's more lightweight/less
verbose so I personally find it more elegant. But sure, XML would work as
well.
YAML is not more lightweight at all, it is really heavy to parse compared to
XML or JSON.

regards,
Bapt
Jos Backus
2014-07-30 16:18:40 UTC
Permalink
Post by Baptiste Daroussin
Post by Jos Backus
Post by Simon J. Gerraty
Post by Jos Backus
Post by Simon J. Gerraty
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.
Indeed; XML works fine for that.
Not to beat a dead horse, but so does YAML, and it's more
lightweight/less
Post by Baptiste Daroussin
Post by Jos Backus
verbose so I personally find it more elegant. But sure, XML would work as
well.
YAML is not more lightweight at all, it is really heavy to parse compared to
XML or JSON.
By lightweight I meant syntax verbosity, not computational load (although
it seems easy to emit). It's a more flexible format, and that comes with a
certain price. The question is whether that flexibility is needed or
useful. If JSON can't be used because of its limitations, I would
personally prefer the less verbose YAML over XML.

Jos.
Post by Baptiste Daroussin
regards,
Bapt
Baptiste Daroussin
2014-07-30 17:03:56 UTC
Permalink
Post by Jos Backus
Post by Baptiste Daroussin
Post by Jos Backus
Post by Simon J. Gerraty
Post by Jos Backus
Post by Simon J. Gerraty
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.
Indeed; XML works fine for that.
Not to beat a dead horse, but so does YAML, and it's more
lightweight/less
Post by Baptiste Daroussin
Post by Jos Backus
verbose so I personally find it more elegant. But sure, XML would work
as
Post by Baptiste Daroussin
Post by Jos Backus
well.
YAML is not more lightweight at all, it is really heavy to parse compared
to
Post by Baptiste Daroussin
XML or JSON.
By lightweight I meant syntax verbosity, not computational load (although
it seems easy to emit). It's a more flexible format, and that comes with a
certain price. The question is whether that flexibility is needed or
useful. If JSON can't be used because of its limitations, I would
personally prefer the less verbose YAML over XML.
About json what limitation are you talking about?

In yaml you have 2 syntax, on which is inconsistent but user friendly and the
other which as ugly as XML imho

this_is_string: treu
this__bool: true
so_if_i_want_a_string_true_i_need_quote: "true"

If I want to be consistent I need to use the canonical form of yaml:

---
!!map {
? !!str "so_if_i_want_a_string_true_i_need_quote"
: !!str "true",
? !!str "this__bool"
: !!bool "true",
? !!str "this_is_string"
: !!str "treu",
}

and now this is very very ugly :(

Plus yaml is context dependant and space dependant resulting in people getting lost about:
"Why this yaml is not valid":

hu: ha
hi:
- test
- test2

Or why this one is not valid either?

hu: ha
hi: test


I have been there with pkg(8) after being a huge suppoter of YAML I'm now more moderated :)

YAML was not machine friendly at all in the end and very very error prone for humans :(

regards,
Bapt
Jos Backus
2014-07-30 18:48:59 UTC
Permalink
Post by Baptiste Daroussin
Post by Jos Backus
Post by Baptiste Daroussin
Post by Jos Backus
Post by Simon J. Gerraty
Post by Jos Backus
Post by Simon J. Gerraty
You certainly cannot wait for all of it to arrive before you start
rendering.
Understood. This is why a serialization output format that supports
streaming data is useful.
Indeed; XML works fine for that.
Not to beat a dead horse, but so does YAML, and it's more
lightweight/less
Post by Baptiste Daroussin
Post by Jos Backus
verbose so I personally find it more elegant. But sure, XML would work
as
Post by Baptiste Daroussin
Post by Jos Backus
well.
YAML is not more lightweight at all, it is really heavy to parse compared
to
Post by Baptiste Daroussin
XML or JSON.
By lightweight I meant syntax verbosity, not computational load (although
it seems easy to emit). It's a more flexible format, and that comes with a
certain price. The question is whether that flexibility is needed or
useful. If JSON can't be used because of its limitations, I would
personally prefer the less verbose YAML over XML.
About json what limitation are you talking about?
Several limitations have been mentioned: no support for comments, binary
data, streaming. YAML is a superset of JSON so there are things one can do
with YAML that one cannot do (easily) with JSON. The question is whether
those things matter enough.
Post by Baptiste Daroussin
In yaml you have 2 syntax, on which is inconsistent but user friendly and the
other which as ugly as XML imho
this_is_string: treu
this__bool: true
so_if_i_want_a_string_true_i_need_quote: "true"
---
!!map {
? !!str "so_if_i_want_a_string_true_i_need_quote"
: !!str "true",
? !!str "this__bool"
: !!bool "true",
? !!str "this_is_string"
: !!str "treu",
}
and now this is very very ugly :(
Cute, but rare. How often can one not use the easy format, and how often
does one want the string" true" rather than the Boolean value?
Post by Baptiste Daroussin
hu: ha
- test
- test2
Or why this one is not valid either?
hu: ha
hi: test
I have been there with pkg(8) after being a huge suppoter of YAML I'm now more moderated :)
Sure, one has to apply some care with whitespace. This hasn't prevented
Python from becoming popular so it must not be as big a deal, and I know
the Ruby community uses YAML effectively quite a bit.
Post by Baptiste Daroussin
YAML was not machine friendly at all in the end and very very error prone for humans :(
Granted, JSON is a simpler format.

Jos

P. S. I've said everything I planned to say so I am going to move on from
this bikeshed now.
Post by Baptiste Daroussin
regards,
Bapt
Simon J. Gerraty
2014-07-29 23:19:19 UTC
Permalink
Post by Adrian Chadd
What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.
Exactly.
Post by Adrian Chadd
So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.
Yes. If the API is sufficiently expressive you can add others
renderings if needed.

I think the API proposed meets that criteria - with the open issue of
wide char support (or perhaps UTF8) to be resolved.
John-Mark Gurney
2014-07-30 06:59:53 UTC
Permalink
Post by Adrian Chadd
Holy ... !
What really matter is whether the library API that you're going to
shoehorn into plenty of utilities is expressive enough to express a
whole bunch of different output types.
So it doesn't matter if you want JSON, or YAML, or XML, or the native
tool output. The trick is whether the library API is good enough.
Actually, it does.. there are somethings that you can express in XML
or others that you can't express in JSON... Like there is no binary
data type in JSON... You only have unicode strings... I hope the
library that outputs JSON ensures that all strings are valid UTF-8..

If not, I can imagine that someone will have a security issue due
to this...
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Poul-Henning Kamp
2014-07-30 07:22:54 UTC
Permalink
--------

Some of you may recall that I did a keynote at EuroBSDcon 2010
called "Software Tools -- Mission Accomplished or Mission Failure ?"

I tried to do a status review of 40 years of UNIX and Software
Tools, in part inspired by what I saw as a "user" of the platform
while working on Varnish.

Historical analysis is useless if it doesn't point us into the
future, and that I did, concluding that we needed to move beyond
80 char wide ASCII text-files finishing my talk with this "ridiculous"
idea:

Solution: Change kernel & userland to understand
XML instead of flat ASCII.

grep --tag H3 ”crazy idea” index.html

My keynote doesn't seem to exist on the web (I'm pretty sure it was
video-taped ?) but I've dug out my slides:

http://phk.freebsd.dk/pubs/EuroBSDcon2010_SoftwareTools.pdf

In case anybody is interesting in the deeper and wider perspective
on why libxo is long overdue.

Thumbs up for the people finally realizing it.

Poul-Henning
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Stephen Hurd
2014-07-29 05:56:38 UTC
Permalink
Post by Jordan Hubbard
Is there some reason JSON is not sufficient? I think that’s a better question to ask, since the conversation otherwise quickly tends to sound a little like “I’ll accept any single unified format as long as it’s the specific one I like!” :) I think the greater good argument would suggest just picking one that’s expressive enough (roll a pair of dice), put on your bikeshed-proof sunglasses, and proceed.
The biggest problem I tend to have with JSON is that there is no comment
format. I rarely update a configuration without adding a comment
regarding it, and since JSON doesn't have comments, it's simply a
non-starter for most of my usage. The whole "just a key that you know
isn't real" hack is terrible.
Pietro Cerutti
2014-07-29 07:56:40 UTC
Permalink
Post by Stephen Hurd
Is there some reason JSON is not sufficient? I think that’s a better question to ask, since the conversation otherwise quickly tends to sound a little like “I’ll accept any single unified format as long as it’s the specific one I like!” :) I think the greater good argument would suggest just picking one that’s expressive enough (roll a pair of dice), put on your bikeshed-proof sunglasses, and proceed.
The biggest problem I tend to have with JSON is that there is no comment
format. I rarely update a configuration without adding a comment
regarding it, and since JSON doesn't have comments, it's simply a
non-starter for most of my usage. The whole "just a key that you know
isn't real" hack is terrible.
There's an interesting post about whether this really is a problem:

https://groups.yahoo.com/neo/groups/json/conversations/topics/156

I'm not saying it isn't, just that there are reasons why comments are
not part of JSON.
--
Pietro Cerutti
The FreeBSD Project
***@FreeBSD.org

PGP Public Key:
http://gahr.ch/pgp
Stephen Hurd
2014-07-29 08:45:24 UTC
Permalink
Post by Pietro Cerutti
Post by Stephen Hurd
Post by Jordan Hubbard
Is there some reason JSON is not sufficient? I think that’s a better question to ask, since the conversation otherwise quickly tends to sound a little like “I’ll accept any single unified format as long as it’s the specific one I like!” :) I think the greater good argument would suggest just picking one that’s expressive enough (roll a pair of dice), put on your bikeshed-proof sunglasses, and proceed.
The biggest problem I tend to have with JSON is that there is no comment
format. I rarely update a configuration without adding a comment
regarding it, and since JSON doesn't have comments, it's simply a
non-starter for most of my usage. The whole "just a key that you know
isn't real" hack is terrible.
https://groups.yahoo.com/neo/groups/json/conversations/topics/156
I'm not saying it isn't, just that there are reasons why comments are
not part of JSON.
Yep, program to program formats obviously don't need comments, comments
are for something that a human is modifying on an infrequent basis. But
invariably, a serialization format designed to be human readable ends up
getting used as a configuration format... and my config files need
comments to ensure that future me hasn't forgotten the lessons that
present me has learnt.

As for comments being "the most difficult feature to support", I find
that hard to believe... I don't see how it could possibly be more
difficult than string support with escaping - especially for the "to end
of line" style comment. As for "removing comments aligns JSON more
closely with YAML" could be trivially solved by supporting YAML style
comments.

The first reason is fine, and while I don't agree with the fourth
reason, I can accept it. But really, the ship has sailed, JSON doesn't
have comments, and that's the biggest problem I have with JSON.
Simon J. Gerraty
2014-07-29 23:03:45 UTC
Permalink
I=92m a huge fan of unified data formats; Apple picked XML and the plist DT=
We did too, and it has indeed been useful.
Being able to render rich html has also proven very cool given the
improvment in browsers in the last several years.

The point I was making earlier (perhaps not very well) was that the api
Phil has proposed provides enough clue to allow outputting plain text as
well as that rich html. IIRC the main wrinkle json imposes is a need for
extra structure calls - due to the way lists (I think) are handled,
anyway if the api can handle json and html it should be possible to add
rendering for others should that prove necessary one day - hopefully
without having to revisit any of the apps.
Alfred Perlstein
2014-07-30 02:12:37 UTC
Permalink
Post by Simon J. Gerraty
I=92m a huge fan of unified data formats; Apple picked XML and the plist DT=
We did too, and it has indeed been useful.
Being able to render rich html has also proven very cool given the
improvment in browsers in the last several years.
The point I was making earlier (perhaps not very well) was that the api
Phil has proposed provides enough clue to allow outputting plain text as
well as that rich html. IIRC the main wrinkle json imposes is a need for
extra structure calls - due to the way lists (I think) are handled,
anyway if the api can handle json and html it should be possible to add
rendering for others should that prove necessary one day - hopefully
without having to revisit any of the apps.
Is JSON not handled?

How many utils are converted over at this point?

Have you seen the GSOC project which aims to do this as well? "machine
readable output from userland utilities" ->
https://www.google-melange.com/gsoc/project/details/google/gsoc2014/zarko_korchev/5676830073815040

-Alfred
Garance A Drosehn
2014-07-30 03:19:39 UTC
Permalink
Post by Alfred Perlstein
How many utils are converted over at this point?
Have you seen the GSOC project which aims to do this as well?
"machine readable output from userland utilities" ->
https://www.google-melange.com/gsoc/project/details/google/gsoc2014/zarko_korchev/5676830073815040
-Alfred
FWIW, I've been experimenting a bit with something like this for 'lpq'
output, although I'm doing it in a script outside of 'lpq'. Which is
to say, I haven't tried modifying the code of lpq itself because that's
a real mess to modify due to the way lpq builds each line that it
prints.

The above URL is just a pointer to the project listing at gsoc. The
wiki
page might also be more informative:

https://wiki.freebsd.org/SummerOfCode2014/MachineReadableFromUserlandUtils

although I must admit I don't know how to check what source changes have
been made so far by starting that that wiki entry.

One question that comes up is how to organize the data in the output the
command creates. For instance, my experiment is just trying to mimic a
rather inefficient process was implemented by some student consultants,
and I wouldn't have organized the JSON object the way they did. I
wouldn't
have picked the same names for keys, for instance. But there are some
more
complicated issues which come up, due to the fact that one 'lpq' command
can be showing data from multiple processes which are running on
multiple
hosts, and 'lpq' on the current host can't depend on getting JSON output
from the other hosts or processes that it gets information from.

Given the way lpq's source is organized, I'm also thinking that maybe
it'd be
better to do this as a separate command, maybe something like
'lpserialize'.

I personally don't expect unix commands to output HTML, and I find XML a
bit
unwieldy to work with. But in my own bikeshed I'm interested in formats
of
json, yaml, and edn (from the world of Clojure).

In any case, I am interested to see how the GSOC project works out, or
whatever happens with this libxo project.
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
Garance A Drosehn
2014-07-30 03:40:14 UTC
Permalink
Post by Garance A Drosehn
In any case, I am interested to see how the GSOC project works out, or
whatever happens with this libxo project.
I also wanted to highlight an issue that Simon brought up in the initial
message for this thread:

"The main open issue (assuming this functionality is still desired)
is support of wide characters.

We figure the worst case solution is a sed(1) script to generate
the wide version of the API from the normal one, but perhaps simply
always using UTF8 would be a better solution?"

In my own experiments I've pretended that all the output 'lpq' generates
is simple ascii, although that's not necessarily true. The filename
field is set by whatever generated the output, which might be a PC or Mac
which may or may not be unicode-aware. The field might be gibberish (aka
"invalid unicode"). Obviously "pretending" is the wrong way to handle
this issue.
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
Simon J. Gerraty
2014-07-30 05:54:30 UTC
Permalink
[+phil who somehow got dropped]
Post by Garance A Drosehn
I also wanted to highlight an issue that Simon brought up in the initial
"The main open issue (assuming this functionality is still desired)
is support of wide characters.
We figure the worst case solution is a sed(1) script to generate
the wide version of the API from the normal one, but perhaps simply
always using UTF8 would be a better solution?"
Thanks, good not to lose sight of that in all the color discussions ;-)
Post by Garance A Drosehn
In my own experiments I've pretended that all the output 'lpq' generates
is simple ascii, although that's not necessarily true. The filename
field is set by whatever generated the output, which might be a PC or Mac
which may or may not be unicode-aware. The field might be gibberish (aka
"invalid unicode"). Obviously "pretending" is the wrong way to handle
this issue.
Indeed. UTF-8 has its attractions since i18n cannot simply be ignored in
this case.
Simon J. Gerraty
2014-07-30 05:41:38 UTC
Permalink
Post by Garance A Drosehn
One question that comes up is how to organize the data in the output the
command creates. For instance, my experiment is just trying to mimic a
Good point and frankly a good reason to have the work done by someone
who's been doing exactly that sort of thing (successfully) for well over
a decade.
Post by Garance A Drosehn
I personally don't expect unix commands to output HTML, and I find XML a
bit
unwieldy to work with. But in my own bikeshed I'm interested in formats
Both are horrendous formats ;-) but there are abundent tools to work with
them and json too I believe.

A modern browser with off the shell javacript/json "thingies" (you can
tell I'm not a web developer ;-) can allow some very slick stuff with
the sort of info we can output from our UI - and this lib let's any app
do the same thing.

--sjg
Simon J. Gerraty
2014-07-30 05:34:46 UTC
Permalink
Post by Alfred Perlstein
Is JSON not handled?
See the start of the thread.
TXT,XML,JSON and HTML are all handled.
I assumed (and Phil confirmed) that YAML *could* be added,
but no one has shown a use-case for it yet.
The point is the API is rich enough to cover it.
Post by Alfred Perlstein
How many utils are converted over at this point?
Only 'w' - as a demo - again see the start of the thread.
We wanted to get buyin before introducing churn.
Post by Alfred Perlstein
Have you seen the GSOC project which aims to do this as well? "machine
I though the GSOC projects were not "approved"?
Post by Alfred Perlstein
readable output from userland utilities" ->
https://www.google-melange.com/gsoc/project/details/google/gsoc2014/zarko_korc
hev/5676830073815040
I took another look - where are the details?
Alfred Perlstein
2014-07-30 19:18:13 UTC
Permalink
The goal of a GSOC project is to get the code into FreeBSD.

The code can be seen here:
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/

Since Juniper has many years of experience AND the GSOC project has
many, many utils converted I'm suggesting that Juniper engage in the
review process and help us get the best of both worlds in.

Simon, can you help us with the review and make suggestions on what
needs to be changed/augmented to get the best of both efforts in?

The details for the code are here:
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/

You should be able to do an svn checkout and then get diffs to see what
is going on. If you require any assistance please let me know.

-Alfred
Post by Simon J. Gerraty
Post by Alfred Perlstein
Is JSON not handled?
See the start of the thread.
TXT,XML,JSON and HTML are all handled.
I assumed (and Phil confirmed) that YAML *could* be added,
but no one has shown a use-case for it yet.
The point is the API is rich enough to cover it.
Post by Alfred Perlstein
How many utils are converted over at this point?
Only 'w' - as a demo - again see the start of the thread.
We wanted to get buyin before introducing churn.
Post by Alfred Perlstein
Have you seen the GSOC project which aims to do this as well? "machine
I though the GSOC projects were not "approved"?
Post by Alfred Perlstein
readable output from userland utilities" ->
https://www.google-melange.com/gsoc/project/details/google/gsoc2014/zarko_korc
hev/5676830073815040
I took another look - where are the details?
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
Simon J. Gerraty
2014-07-30 19:45:22 UTC
Permalink
Post by Alfred Perlstein
The goal of a GSOC project is to get the code into FreeBSD.
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/
Since Juniper has many years of experience AND the GSOC project has
many, many utils converted I'm suggesting that Juniper engage in the
review process and help us get the best of both worlds in.
That would of course depend on whether we like what has been done ;-)
The library and API are more important than how many apps have been
converted.

Will take a look.
Alfred Perlstein
2014-07-30 21:29:43 UTC
Permalink
Post by Simon J. Gerraty
Post by Alfred Perlstein
The goal of a GSOC project is to get the code into FreeBSD.
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/
Since Juniper has many years of experience AND the GSOC project has
many, many utils converted I'm suggesting that Juniper engage in the
review process and help us get the best of both worlds in.
That would of course depend on whether we like what has been done ;-)
The library and API are more important than how many apps have been
converted.
Will take a look.
Cool, here is a list of what has been done already, it's quite a bit.

~ % svn diff -r268704
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/ | grep ^Index
Index: freebsd_head/usr.bin/du/du.c
Index: freebsd_head/usr.bin/du/Makefile
Index: freebsd_head/usr.bin/netstat/flowtable.c
Index: freebsd_head/usr.bin/netstat/unix.c
Index: freebsd_head/usr.bin/netstat/main.c
Index: freebsd_head/usr.bin/netstat/inet6.c
Index: freebsd_head/usr.bin/netstat/netstat.h
Index: freebsd_head/usr.bin/netstat/mbuf.c
Index: freebsd_head/usr.bin/netstat/Makefile
Index: freebsd_head/usr.bin/netstat/route.c
Index: freebsd_head/usr.bin/netstat/if.c
Index: freebsd_head/usr.bin/netstat/inet.c
Index: freebsd_head/usr.bin/netstat/mroute6.c
Index: freebsd_head/usr.bin/netstat/netisr.c
Index: freebsd_head/usr.bin/netstat/bpf.c
Index: freebsd_head/usr.bin/netstat/mroute.c
Index: freebsd_head/usr.bin/wc/wc.c
Index: freebsd_head/usr.bin/wc/Makefile
Index: freebsd_head/usr.bin/last/last.c
Index: freebsd_head/usr.bin/last/Makefile
Index: freebsd_head/usr.bin/sockstat/sockstat.c
Index: freebsd_head/usr.bin/sockstat/Makefile
Index: freebsd_head/usr.bin/w/w.c
Index: freebsd_head/usr.bin/w/extern.h
Index: freebsd_head/usr.bin/w/Makefile
Index: freebsd_head/usr.bin/w/pr_time.c
Index: freebsd_head/usr.bin/finger/finger.c
Index: freebsd_head/usr.bin/finger/Makefile
Index: freebsd_head/usr.bin/finger/finger.h
Index: freebsd_head/usr.bin/finger/sprint.c
Index: freebsd_head/usr.bin/fstat/Makefile
Index: freebsd_head/usr.bin/fstat/fstat.c
Index: freebsd_head/usr.bin/procstat/procstat_kstack.c
Index: freebsd_head/usr.bin/procstat/procstat.c
Index: freebsd_head/usr.bin/procstat/procstat_basic.c
Index: freebsd_head/usr.bin/procstat/procstat_vm.c
Index: freebsd_head/usr.bin/procstat/procstat_auxv.c
Index: freebsd_head/usr.bin/procstat/procstat.h
Index: freebsd_head/usr.bin/procstat/procstat_rusage.c
Index: freebsd_head/usr.bin/procstat/procstat_threads.c
Index: freebsd_head/usr.bin/procstat/procstat_args.c
Index: freebsd_head/usr.bin/procstat/procstat_rlimit.c
Index: freebsd_head/usr.bin/procstat/procstat_files.c
Index: freebsd_head/usr.bin/procstat/procstat_sigs.c
Index: freebsd_head/usr.bin/procstat/procstat_bin.c
Index: freebsd_head/usr.bin/procstat/procstat_cred.c
Index: freebsd_head/usr.bin/procstat/Makefile
Index: freebsd_head/usr.bin/vmstat/Makefile
Index: freebsd_head/usr.bin/vmstat/vmstat.c
Index: freebsd_head/sbin/sysctl/sysctl.c
Index: freebsd_head/sbin/sysctl/Makefile
Index: freebsd_head/sbin/ifconfig/ifconfig.h
Index: freebsd_head/sbin/ifconfig/af_inet6.c
Index: freebsd_head/sbin/ifconfig/Makefile
Index: freebsd_head/sbin/ifconfig/af_nd6.c
Index: freebsd_head/sbin/ifconfig/ifmedia.c
Index: freebsd_head/sbin/ifconfig/af_link.c
Index: freebsd_head/sbin/ifconfig/af_inet.c
Index: freebsd_head/sbin/ifconfig/ifmac.c
Index: freebsd_head/sbin/ifconfig/ifconfig.c
Index: freebsd_head/sbin/ifconfig/carp.c
Index: freebsd_head/usr.sbin/iostat/Makefile
Index: freebsd_head/usr.sbin/iostat/iostat.c
Index: freebsd_head/lib/libsol/sol.c
Index: freebsd_head/lib/libsol/sol.h
Index: freebsd_head/lib/libsol/Makefile
Index: freebsd_head/bin/ls/ls.h
Index: freebsd_head/bin/ls/extern.h
Index: freebsd_head/bin/ls/Makefile
Index: freebsd_head/bin/ls/print.c
Index: freebsd_head/bin/ls/util.c
Index: freebsd_head/bin/ls/ls.c
~ %
Simon J. Gerraty
2014-07-30 23:51:23 UTC
Permalink
Post by Simon J. Gerraty
Post by Alfred Perlstein
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/
Since Juniper has many years of experience AND the GSOC project has
many, many utils converted I'm suggesting that Juniper engage in the
review process and help us get the best of both worlds in.
That would of course depend on whether we like what has been done ;-)
The library and API are more important than how many apps have been
converted.
Ok I took a look...

There's no nice way to say it I'm afraid;
This looks like the same sort of approach that Juniper used a
over a decade ago for the BSD apps to get XML out, and which we felt was
just too ugly to upstream, I don't see how we could support this.

Unfortunately it took me a couple of years to get a few hours of Phil's
time, to come up with a neat API to avoid that #ifdef and if/else mess.
On the plus side that means it covers a lot more than just XML.

We now have an API and library that can avoid the need to double the
cost of adding new output to apps, ie you only write to libxo's api
and it does plain TXT for you if needed.

Our only concern was how best to address the wide-char issue John-Mark
Gurney strongly favours just doing UTF-8, I and I believe Marcel and
Phil would tend to agree (since AFAICT it has little overhead for ASCII)

Based on this thread, it would seem there is still demand for this
functionality so I think we should proceed with putting libxo into our
internal tree and generate some diffs for folk to review...


Thanks
--sjg
Marcel Moolenaar
2014-07-31 00:09:33 UTC
Permalink
Post by Simon J. Gerraty
Our only concern was how best to address the wide-char issue John-Mark
Gurney strongly favours just doing UTF-8, I and I believe Marcel and
Phil would tend to agree (since AFAICT it has little overhead for ASCII)
Yes. I would even be ok with only having wide-character
support. I do tend to agree with Phil that the standard
C functions that work on wide characters are more of an
eye sore than the "regular" C functions.
--
Marcel Moolenaar
***@xcllnt.net
Marcel Moolenaar
2014-07-31 00:25:09 UTC
Permalink
Post by Simon J. Gerraty
This looks like the same sort of approach that Juniper used a
over a decade ago for the BSD apps to get XML out, and which we felt was
just too ugly to upstream, I don't see how we could support this.
Unfortunately it took me a couple of years to get a few hours of Phil's
time, to come up with a neat API to avoid that #ifdef and if/else mess.
On the plus side that means it covers a lot more than just XML.
We now have an API and library that can avoid the need to double the
cost of adding new output to apps, ie you only write to libxo's api
and it does plain TXT for you if needed.
Something that any implementation should handle is that
warn(3) and err(3) functions should do the right thing.
More precisely: I would expect that the output of a
utility when emitting some ML handles both stdout and
stderr consistently.

Also: I would hope that if we emit XML (for example),
we can do so according to a well-defined schema.

The GSoC approach seems less likely to have a good
solution for those than libxo has.
--
Marcel Moolenaar
***@xcllnt.net
Phil Shafer
2014-07-31 09:07:26 UTC
Permalink
Post by Simon J. Gerraty
We now have an API and library that can avoid the need to double the
cost of adding new output to apps, ie you only write to libxo's api
and it does plain TXT for you if needed.
I'm appending another example, using "wc".

I've added the "xo" equivalent of warn/warnx/err/errx/etc. Normally,
they'll make the expected noise on stderr, but with the XO_WARN_XML
flag, they generate XML content on the stdout (or whatever xo_handle_t
is in use). I don't have a JSON equivalent because, well, I don't
know what the json equivalent would look like.

There's also a xo_attr function that will add an attribute to
the next element emitted, so one can:

xo_attr("seconds", stop - start);
xo_emit("{:time}", fancy_time(stop - start));

to make:

<time seconds="600">10 minutes</time>

There's also the "printf(1)" equivalent command "xo" for use in
shell scripts:

% xo --wrap response/data 'The {:animal} is {:color} and {:mood}\n' bear brown angry
The bear is brown and angry
% xo -X -p --wrap response/data 'The {:animal} is {:color} and {:mood}\n' bear brown angry
<response>
<data>
<animal>bear</animal>
<color>brown</color>
<mood>angry</mood>
</data>
</response>

The code is in github:
https://github.com/Juniper/libxo
HTML docs are here:
http://juniper.github.io/libxo/libxo-manual.html
Text/source docs are here:
https://raw.githubusercontent.com/Juniper/libxo/master/doc/libxo.txt

I'm using the common master/develop branching scheme, so "master"
should be stable, while "develop" is where I'm working.

Simon said there's a github mirror for freebsd, so I make fork that
and add the patches for the various utilities I'm working thru.

Thanks,
Phil

-----------

diff -rbu /usr/src/usr.bin/wc/wc.c ./wc.c
--- /usr/src/usr.bin/wc/wc.c 2010-12-21 12:09:25.000000000 -0500
+++ ./wc.c 2014-07-31 04:20:15.000000000 -0400
@@ -61,6 +61,7 @@
#include <unistd.h>
#include <wchar.h>
#include <wctype.h>
+#include <libxo/libxo.h>

uintmax_t tlinect, twordct, tcharct, tlongline;
int doline, doword, dochar, domulti, dolongline;
@@ -105,33 +106,45 @@
if (doline + doword + dochar + domulti + dolongline == 0)
doline = doword = dochar = 1;

+ xo_open_container("wc");
+ xo_open_list("file");
errors = 0;
total = 0;
if (!*argv) {
+ xo_open_instance("file");
if (cnt((char *)NULL) != 0)
++errors;
else
- (void)printf("\n");
+ xo_emit("\n");
+ xo_close_instance("file");
}
else do {
+ xo_open_instance("file");
+ xo_emit(" {ek:filename/%s}\n", *argv);
if (cnt(*argv) != 0)
++errors;
else
- (void)printf(" %s\n", *argv);
+ xo_emit(" {d:filename/%s}\n", *argv);
+ xo_close_instance("file");
++total;
} while(*++argv);
+ xo_close_list("file");

if (total > 1) {
+ xo_open_container("total");
if (doline)
- (void)printf(" %7ju", tlinect);
+ xo_emit(" {:lines/%7ju/%ju}", tlinect);
if (doword)
- (void)printf(" %7ju", twordct);
+ xo_emit(" {:words/%7ju/%ju", twordct);
if (dochar || domulti)
- (void)printf(" %7ju", tcharct);
+ xo_emit(" {:characters/%7ju/%ju}", tcharct);
if (dolongline)
- (void)printf(" %7ju", tlongline);
- (void)printf(" total\n");
+ xo_emit(" {:long-lines/%7ju/%ju}", tlongline);
+ xo_emit(" total\n");
+ xo_close_container("total");
}
+ xo_close_container("wc");
+ xo_flush();
exit(errors == 0 ? 0 : 1);
}

@@ -154,7 +167,7 @@
fd = STDIN_FILENO;
} else {
if ((fd = open(file, O_RDONLY, 0)) < 0) {
- warn("%s: open", file);
+ xo_warn("%s: open", file);
return (1);
}
if (doword || (domulti && MB_CUR_MAX != 1))
@@ -167,7 +180,7 @@
if (doline) {
while ((len = read(fd, buf, MAXBSIZE))) {
if (len == -1) {
- warn("%s: read", file);
+ xo_warn("%s: read", file);
(void)close(fd);
return (1);
}
@@ -182,15 +195,15 @@
tmpll++;
}
tlinect += linect;
- (void)printf(" %7ju", linect);
+ xo_emit(" {:lines/%7ju/%ju}", linect);
if (dochar) {
tcharct += charct;
- (void)printf(" %7ju", charct);
+ xo_emit(" {:characters/%7ju/%ju}", charct);
}
if (dolongline) {
if (llct > tlongline)
tlongline = llct;
- (void)printf(" %7ju", tlongline);
+ xo_emit(" {:long-lines/%7ju/%ju}", tlongline);
}
(void)close(fd);
return (0);
@@ -201,12 +214,13 @@
*/
if (dochar || domulti) {
if (fstat(fd, &sb)) {
- warn("%s: fstat", file);
+ xo_warn("%s: fstat", file);
(void)close(fd);
return (1);
}
if (S_ISREG(sb.st_mode)) {
- (void)printf(" %7lld", (long long)sb.st_size);
+ xo_emit(" {:characters/%7lld/%lld}",
+ (long long)sb.st_size);
tcharct += sb.st_size;
(void)close(fd);
return (0);
@@ -220,7 +234,7 @@
memset(&mbs, 0, sizeof(mbs));
while ((len = read(fd, buf, MAXBSIZE)) != 0) {
if (len == -1) {
- warn("%s: read", file);
+ xo_warn("%s: read", file);
(void)close(fd);
return (1);
}
@@ -233,7 +247,7 @@
(size_t)-1) {
if (!warned) {
errno = EILSEQ;
- warn("%s", file);
+ xo_warn("%s", file);
warned = 1;
}
memset(&mbs, 0, sizeof(mbs));
@@ -264,23 +278,23 @@
}
if (domulti && MB_CUR_MAX > 1)
if (mbrtowc(NULL, NULL, 0, &mbs) == (size_t)-1 && !warned)
- warn("%s", file);
+ xo_warn("%s", file);
if (doline) {
tlinect += linect;
- (void)printf(" %7ju", linect);
+ xo_emit(" {:lines/%7ju/%ju}", linect);
}
if (doword) {
twordct += wordct;
- (void)printf(" %7ju", wordct);
+ xo_emit(" {:words/%7ju/%ju}", wordct);
}
if (dochar || domulti) {
tcharct += charct;
- (void)printf(" %7ju", charct);
+ xo_emit(" {:characters/%7ju/%ju}", charct);
}
if (dolongline) {
if (llct > tlongline)
tlongline = llct;
- (void)printf(" %7ju", llct);
+ xo_emit(" {:long-lines/%7ju/%ju}", llct);
}
(void)close(fd);
return (0);
@@ -289,6 +303,6 @@
static void
usage()
{
- (void)fprintf(stderr, "usage: wc [-Lclmw] [file ...]\n");
+ xo_error("usage: wc [-Lclmw] [file ...]\n");
exit(1);
}
Garance A Drosehn
2014-07-31 00:50:37 UTC
Permalink
Post by Alfred Perlstein
The goal of a GSOC project is to get the code into FreeBSD.
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/
[...skip...]
Post by Alfred Perlstein
https://socsvn.freebsd.org/socsvn/soc2014/zkorchev/
You should be able to do an svn checkout and then get diffs
to see what is going on. If you require any assistance please
let me know.
Those two URL's look extremely similar to me. Was the second one
supposed to point to some other page?

I haven't taken the time to check out the tree and skim through all
the changes, but I looked at a few specific files.

In .../lib/libsol/sol.c, it looks like the only format implemented
so far is JSON. Is that true?

Some dumb questions I should probably be able to figure out for myself:
Where is SOL_JSON defined? sol.c includes sol.h, but sol.h does not
seem to define that value. Also, sol.h includes yajl/yajl_gen.h, but I
don't see where that file comes from.

I looked at .../usr.bin/du/du.c just to see a simple example. I notice
the '#if defined(SOL_ON)'. I assume that's just meant for the initial
debugging, so that one could turn off all the SOL support if it was
suspected of causing some problem. Is that expected to stay in the code
once the code goes into production? If so, I'd rather see it as
'if (SOL_ON) { ... }', so that it's proper C code which the compiler
would optimize away if SOL_ON was defined as FALSE. As a general rule I
prefer to have the compiler *always* compiling&checking that code, even
if some of it ends up producing no object code because SOL_ON is false.
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
John Baldwin
2014-08-11 15:54:17 UTC
Permalink
Post by Alfred Perlstein
The goal of a GSOC project is to get the code into FreeBSD.
Just to respond to this point: not necessarily. I would say the primary goal
is to build relationships with the students and recruit new developers. If we
get useful code out of the process as well, that's great, but it isn't the
priamry goal. In addition, doing a GSOC project isn't a guarantee of getting
code into the tree (even though it may often work out that way), and we should
avoid marketing it as such.
--
John Baldwin
John-Mark Gurney
2014-07-30 19:38:19 UTC
Permalink
Post by Simon Gerraty
The main open issue (assuming this functionality is still desired) is
support of wide charachters.
We figure the worst case solution is a sed(1) script to generate the wide
version of the API from the normal one, but perhaps simply always using
UTF8 would be a better solution?
My vote would be to use and *enforce* UTF-8 by the API. That means if
someone passes a string in, it must be properly formed UTF-8... This
doesn't prevent a wide char API from also being present if someone
really wants to add it..

I would also like to see support for binary data, but as I mentioned,
that would prevent usage from outputing JSON... Yes, you can hex encode
(why doesn't libc include a hex code/decode function) it, but then it
becomes harder to autodetect typing..
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Phil Shafer
2014-07-30 23:24:11 UTC
Permalink
Post by John-Mark Gurney
My vote would be to use and *enforce* UTF-8 by the API. That means if
someone passes a string in, it must be properly formed UTF-8...
I can certainly see making this an option, detecting the high-bit
and inspecting the following 1-5 bytes to ensure the corresponding
high two bits are set appropriately. But what action would you
expect the library to take when invalid strings are passed in?
libxo supports a warning flag, that will trigger warnings on stderr
for things like invalid or malformed format strings, but I'm not
sure I'd be happy if the library skipped invalid strings.

BTW, this issue is driven by "w"s use of wide characters (for
days of the week).

Thanks,
Phil
Poul-Henning Kamp
2014-07-31 08:44:03 UTC
Permalink
--------
Post by John-Mark Gurney
My vote would be to use and *enforce* UTF-8 by the API. That means if
someone passes a string in, it must be properly formed UTF-8...
Given that libxo is just now starting to stare at trunk and that
the next .0 release is some way of still, the consequences of
enforcing UTF-8 will not be seen by users at large until mid 2015
or so.

The real question is how it will be perceived if we are *not* actively
going UTF-8 by that time?
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Phil Shafer
2014-07-31 15:31:37 UTC
Permalink
Post by Poul-Henning Kamp
The real question is how it will be perceived if we are *not* actively
going UTF-8 by that time?
And moving toward UTF-8 won't be simple. I just tossed a couple
of file from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ into a
few apps that are supposed to support UTF-8 (emacs, vim, firefox)
and the results were underwhelming. Skimming the description of
xterm support for UTF-8 gives some appreciation for how complex
this crap is (http://www.cl.cam.ac.uk/~mgk25/unicode.html#xterm).

That said, I'm a bit tired of explaining to my kids why I'm typing
in black and white terminal windows in the age of HTML5 ;^)

Thanks,
Phil
John-Mark Gurney
2014-07-31 20:50:54 UTC
Permalink
Post by Phil Shafer
Post by Poul-Henning Kamp
The real question is how it will be perceived if we are *not* actively
going UTF-8 by that time?
And moving toward UTF-8 won't be simple. I just tossed a couple
If we don't start, we won't ever move forward...
Post by Phil Shafer
of file from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ into a
few apps that are supposed to support UTF-8 (emacs, vim, firefox)
Why not nvi? :) I just tried out nvi w/ UTF-8-demo.txt, and was
surprised that it worked well.. there are issues w/ combining
characters, both stargate and Thai, but that's an issue w/ nvi
escaping those characters instead of displaying them.. my terminal
was Terminal.app from MacOSX...

less and more appear to work, and handle the combining characters
properly...

vt works, but we need much better font support.. The default font
is missing lots of math, linguistic, APL, georgian, Thai, Amharic,
runes (though MacOS's font I'm using misses these too), Braille
(this should be easy for anyone to add), and I believe Japanese...
Post by Phil Shafer
and the results were underwhelming. Skimming the description of
xterm support for UTF-8 gives some appreciation for how complex
this crap is (http://www.cl.cam.ac.uk/~mgk25/unicode.html#xterm).
I must say, from my brief test, I'm surprised it worked as well as
it did... :)
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Poul-Henning Kamp
2014-07-31 21:01:21 UTC
Permalink
--------
Post by John-Mark Gurney
Why not nvi? :) I just tried out nvi w/ UTF-8-demo.txt, and was
surprised that it worked well..
Having seen peter brave the wemm-field and go utf-8 I recently
converted too and most stuff seems to work without trouble.

What doesn't work with nvi is that it *recently* started mangling
files with invalid byte sequences no matter what charset you
set it for except 'C'.

I write recently, because I'm quite sure it started in the last
year or so, before that it did the right thing.

You do get a warning on reading in the file if there are conversion
issues, but I really think nvi should be able to edit any file, no
matter what charset you have set for it, and invalid byte sequences
should just show up as hex-expansions like they do in 'C' mode.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Jordan Hubbard
2014-07-31 21:00:41 UTC
Permalink
Post by John-Mark Gurney
Post by Phil Shafer
And moving toward UTF-8 won't be simple. I just tossed a couple
If we don't start, we won't ever move forward...
Amen. If you look at $LANG on an OS X box, it’s been set to en_US.UTF-8 for awhile, and that wasn’t without pain. We had to deal with various performance regressions (8x hit to grep(1) over ISO-Latin1!) and various other interoperability problems, but it was ultimately like ripping off a band-aid - better done quickly than slowly, and once done, the constant trickle of I18N bugs more or less shut off entirely.

To put it another way, if FreeBSD doesn’t do it, its downstream vendors will have to. Everything from file names to filesystem names need to be UTF-8 just so they display properly and Japanese users can put their files on /mnt/ホーム/私の会社/ Downstream here in FreeNAS-land, we get I18N requests all the time, and it’s sure a pita when you have the various technologies all *mostly* able to speak UTF-8 but there are routines here and there that just don’t, and trying to make everything work from a user land agent like Samba or Netatalk all the way down the stack is just a PITA, but we still have to do it because it’s not an all-english-all-the-time world we live in.

I’d also not even worry about wide characters. They are a historical artifact and not the direction everyone is going in. UTF-8 FTW.

- Jordan
John-Mark Gurney
2014-07-31 17:55:47 UTC
Permalink
Post by Phil Shafer
Post by John-Mark Gurney
My vote would be to use and *enforce* UTF-8 by the API. That means if
someone passes a string in, it must be properly formed UTF-8...
I can certainly see making this an option, detecting the high-bit
and inspecting the following 1-5 bytes to ensure the corresponding
high two bits are set appropriately. But what action would you
expect the library to take when invalid strings are passed in?
Return an error? printf can return an error, yet most people don't
check it.. so no real difference in API/bugs...

The reason I even suggest this is that JSON requires the output to be
in Unicode... Not some special locale encoding.. See section 3 of:
https://www.ietf.org/rfc/rfc4627.txt

Besides we should finally move to UTF-8 for file system and other
parts of the system... I do like the idea of random binary filenames,
but we really should stop sticking our head in the sand.. We will only
make ourselves look silly when 2020 roles around if we don't...
Post by Phil Shafer
libxo supports a warning flag, that will trigger warnings on stderr
for things like invalid or malformed format strings, but I'm not
sure I'd be happy if the library skipped invalid strings.
printf may skip parts of your strings if you don't check it's return
value... Plus, if the API states you must pass in UTF-8 strings,
and someone doesn't properly encode/convert to UTF-8, it's their
bug, not the library's bug... We have too many encoding issues
already in our source tree, and we need to get better about making
sure we don't have them, and this will help...
Post by Phil Shafer
BTW, this issue is driven by "w"s use of wide characters (for
days of the week).
Plus, enforcing UTF-8 will make the w versions easier, and allow
the library to output other width of UTF if wanted/requested..
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Phil Shafer
2014-07-31 18:39:47 UTC
Permalink
Post by John-Mark Gurney
Return an error? printf can return an error, yet most people don't
check it.. so no real difference in API/bugs...
My concern is emitting half a string, where the half we don't emit
is something important. I don't want to make the opposite of an
injection attack, where arranging some daemon to call xo_emit with
a broken UTF-8 string allows an evil-doer to fix their evil content
into the other half of the string.

I'm escaping XML, JSON, and HTML content already, so the simplest
scheme is to:

a) UTF-8 check the format string;
if it fails, nothing is emitted
b) for each format descriptor, check the content generared;
if it fails, nothing is emitted from the xo_emit call
anything already generated is discarded

Simple and easy. Seem reasonable? The other option would be to
discard only that specific format descriptor or only that field
description.

xo_emit("{:good/%d}{:bad/%d%s}{:ugly}", 0, 55, "\xff\x01\xff", "cat");

Does the "<ugly>cat</ugly>" get emitted? Is "<bad>55</bad>" emitted?

If "ugly" was <run-this-command-as-user>phil</...>, and the bogus
string blocked the generation of that vital bit of info, life could
be bad.

Unfortunately, even this isn't a simple fix for "w", which wants
call wcsftime() to get wide values for month and day-of-the-week
names. Does wcsrtombs() convert this to UTF-8? Is there a locale
for UTF-8?

Thanks,
Phil
John-Mark Gurney
2014-07-31 21:09:37 UTC
Permalink
Post by Phil Shafer
Post by John-Mark Gurney
Return an error? printf can return an error, yet most people don't
check it.. so no real difference in API/bugs...
My concern is emitting half a string, where the half we don't emit
is something important. I don't want to make the opposite of an
injection attack, where arranging some daemon to call xo_emit with
a broken UTF-8 string allows an evil-doer to fix their evil content
into the other half of the string.
I'm escaping XML, JSON, and HTML content already, so the simplest
a) UTF-8 check the format string;
if it fails, nothing is emitted
b) for each format descriptor, check the content generared;
if it fails, nothing is emitted from the xo_emit call
anything already generated is discarded
Simple and easy. Seem reasonable? The other option would be to
discard only that specific format descriptor or only that field
description.
xo_emit("{:good/%d}{:bad/%d%s}{:ugly}", 0, 55, "\xff\x01\xff", "cat");
Does the "<ugly>cat</ugly>" get emitted? Is "<bad>55</bad>" emitted?
If "ugly" was <run-this-command-as-user>phil</...>, and the bogus
string blocked the generation of that vital bit of info, life could
be bad.
I agree...
Post by Phil Shafer
Unfortunately, even this isn't a simple fix for "w", which wants
call wcsftime() to get wide values for month and day-of-the-week
names. Does wcsrtombs() convert this to UTF-8? Is there a locale
for UTF-8?
Well, from my understanding there can't be a "locale" that is UTF-8
as a locale contains more than just character encoding... It also
includes month/day names, sorting, etc... I think you can get a
C locale (the default) w/ UTF-8 by setting the correct environment
variables, but I don't know them well enough to say... Should we add
a locale that does this? There is UTF-8 in /usr/share/locale, but if
you set LANG to it, things don't work..
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Jordan Hubbard
2014-07-31 21:33:03 UTC
Permalink
Post by John-Mark Gurney
Well, from my understanding there can't be a "locale" that is UTF-8
as a locale contains more than just character encoding... It also
includes month/day names, sorting, etc... I think you can get a
C locale (the default) w/ UTF-8 by setting the correct environment
variables, but I don't know them well enough to say... Should we add
a locale that does this? There is UTF-8 in /usr/share/locale, but if
you set LANG to it, things don't work..
en_US.UTF-8
Phil Shafer
2014-07-31 21:40:29 UTC
Permalink
Post by John-Mark Gurney
Well, from my understanding there can't be a "locale" that is UTF-8
as a locale contains more than just character encoding... It also
includes month/day names, sorting, etc... I think you can get a
C locale (the default) w/ UTF-8 by setting the correct environment
variables, but I don't know them well enough to say... Should we add
a locale that does this? There is UTF-8 in /usr/share/locale, but if
you set LANG to it, things don't work..
I'll change the library to follow the settings of the user's env var
and assuming they've set it correctly, all will work well. Since
libxo uses vsnprintf under the covers, all this should work fine.

(void)wcsftime(buf, sizeof(buf), fmt, &tp);
...
xo_emit("{:login-time/%ls}", buf);

Or something like that......

Thanks,
Phil
Poul-Henning Kamp
2014-07-31 20:07:37 UTC
Permalink
--------
Post by John-Mark Gurney
Post by Phil Shafer
Post by John-Mark Gurney
My vote would be to use and *enforce* UTF-8 by the API. That means if
someone passes a string in, it must be properly formed UTF-8...
I can certainly see making this an option, detecting the high-bit
and inspecting the following 1-5 bytes to ensure the corresponding
high two bits are set appropriately. But what action would you
expect the library to take when invalid strings are passed in?
Return an error? printf can return an error, yet most people don't
check it.. so no real difference in API/bugs...
This is why we ended up with SIGPIPE in the first place.

Can I point discreetely at sbuf(3)'s accumulative error handling
and suggest that libxo does something similar ? That way applications
only need to check for errors once, rather than after every single
call to every single function in the libxo library.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
John-Mark Gurney
2014-07-31 20:58:33 UTC
Permalink
Post by Poul-Henning Kamp
--------
Post by John-Mark Gurney
Post by Phil Shafer
Post by John-Mark Gurney
My vote would be to use and *enforce* UTF-8 by the API. That means if
someone passes a string in, it must be properly formed UTF-8...
I can certainly see making this an option, detecting the high-bit
and inspecting the following 1-5 bytes to ensure the corresponding
high two bits are set appropriately. But what action would you
expect the library to take when invalid strings are passed in?
Return an error? printf can return an error, yet most people don't
check it.. so no real difference in API/bugs...
This is why we ended up with SIGPIPE in the first place.
Can I point discreetely at sbuf(3)'s accumulative error handling
and suggest that libxo does something similar ? That way applications
$ man 3 sbuf
No manual entry for sbuf

And looks like it isn't in libc:
$ cc -o q q.c
/tmp/q-41cfbe.o: In function `main':
q.c:(.text+0x23): undefined reference to `sbuf_new'
q.c:(.text+0x38): undefined reference to `sbuf_cat'
q.c:(.text+0x4c): undefined reference to `sbuf_cat'
q.c:(.text+0x58): undefined reference to `sbuf_finish'
q.c:(.text+0x64): undefined reference to `sbuf_data'
q.c:(.text+0x82): undefined reference to `sbuf_delete'
cc: error: linker command failed with exit code 1 (use -v to see invocation)

Hmm... looks like libsbuf exists, and causes the above program to
compile and work...

Please write a man page for sbuf(3)... Thanks.
Post by Poul-Henning Kamp
only need to check for errors once, rather than after every single
call to every single function in the libxo library.
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Poul-Henning Kamp
2014-07-31 21:18:19 UTC
Permalink
--------
Post by John-Mark Gurney
$ man 3 sbuf
No manual entry for sbuf
Sorry: sbuf(9)
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
John-Mark Gurney
2014-07-31 21:42:45 UTC
Permalink
Post by Poul-Henning Kamp
--------
Post by John-Mark Gurney
$ man 3 sbuf
No manual entry for sbuf
Sorry: sbuf(9)
I'm fine w/ an sbuf(3) man page that says, the API is the same as
documented in sbuf(9) and you need to link against libsbuf, but we don't
have that...

I had to grep around in /usr/lib to find what library to use...

P.S. If someone writes said man page, I will word smith/fixup/complete
and commit it.
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Phil Shafer
2014-07-31 21:30:15 UTC
Permalink
Post by Poul-Henning Kamp
Can I point discreetely at sbuf(3)'s accumulative error handling
and suggest that libxo does something similar ? That way applications
only need to check for errors once, rather than after every single
call to every single function in the libxo library.
sbuf looks like a simple case, returning either ENOMEM or the
error code from the flush function. libxo can keep a "there's
been an error" flag that the user can retrieve, but all the
details of what's gone wrong would be lost. Or it can buffer
the contents of warning messages and deliver it to the caller.

Currently you need to turn on the per-handle XOF_WARN flag to get
warnings displayed; perhaps that's enough.

Thanks,
Phil
Poul-Henning Kamp
2014-07-31 21:36:43 UTC
Permalink
--------
Post by Phil Shafer
Post by Poul-Henning Kamp
Can I point discreetely at sbuf(3)'s accumulative error handling
and suggest that libxo does something similar ? That way applications
only need to check for errors once, rather than after every single
call to every single function in the libxo library.
sbuf looks like a simple case, returning either ENOMEM or the
error code from the flush function. libxo can keep a "there's
been an error" flag that the user can retrieve, but all the
details of what's gone wrong would be lost. Or it can buffer
the contents of warning messages and deliver it to the caller.
We can afford to dedicate a buffer of a reasonable size for that
purpose if we need to.

The point here is one of API design, and experience has shown
that either error-handling is convenient or it doesn't happen.

I don't see the libxo case being any different from sbuf
in this respect, in fact I see it being almost even more
important because the readers are non-humans.

libxo should latch on error like libsbuf, and valid output
should only be emitted if no errors were encountered during
production.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Simon J. Gerraty
2014-08-01 05:04:34 UTC
Permalink
Post by Poul-Henning Kamp
Post by Phil Shafer
sbuf looks like a simple case, returning either ENOMEM or the
error code from the flush function. libxo can keep a "there's
been an error" flag that the user can retrieve, but all the
details of what's gone wrong would be lost. Or it can buffer
the contents of warning messages and deliver it to the caller.
The point here is one of API design, and experience has shown
that either error-handling is convenient or it doesn't happen.
I don't see the libxo case being any different from sbuf
in this respect, in fact I see it being almost even more
important because the readers are non-humans.
The libxo case can be complicated by the structured output.
If you have emitted

<something>
<here>
<might>
<fail>

and then you encounter an exception, even if you output a nice

<exception type="ENOMEM">sorry about that</exception>

your peer may not cope unless you also close all the open elements:

</fail></might></here></something>

etc.
Of course I wouldn't be surprised if the lib already handles all that ;-)
Phil Shafer
2014-08-01 15:50:44 UTC
Permalink
Post by Simon J. Gerraty
Of course I wouldn't be surprised if the lib already handles all that ;-)
No, I can't manufacture memory on the fly ;^)

Hmmm... I could have an emergency mode where I stop trying to buffer
and churn out a series of small write() calls to send close tags.
Or perhaps just punting and refusing to write more once ENOMEM is
seen is the right thing. Seeing broken output is better than limping
along with output that looks right but isn't.

Thanks,
Phil
Marcel Moolenaar
2014-08-01 16:25:34 UTC
Permalink
Post by Phil Shafer
Post by Simon J. Gerraty
Of course I wouldn't be surprised if the lib already handles all that ;-)
No, I can't manufacture memory on the fly ;^)
Hmmm... I could have an emergency mode where I stop trying to buffer
and churn out a series of small write() calls to send close tags.
Or perhaps just punting and refusing to write more once ENOMEM is
seen is the right thing. Seeing broken output is better than limping
along with output that looks right but isn't.
But broken output can have nasty side-effects due to
parsers tripping over. What we try to prevent here
(limping along) can very easily be introduced by us
by emitting something that trips over parsers. We
merely pushed the limp-along problem downstream from
us.

To get a reliable chain from producer to consumer, it
may be best to have correct structure at all times
and tackle the limp-along problem with constructs
within that structure to signal that the data within
that structure is complete and/or sound.

For example: if stdout and stderr are separate tags
in the XML output, then a third tag, say status, can
be added to indicate that the stdout and stderr tags
are accurate/complete.

Or something along those lines.

To me machine parseable output is only that if it is
parseable at all times. Machines suck at interpreting
their input when it suddenly stops being parseable.
The best you can expect is garbage in, garbage out.
While it sounds easy to just reject the entire input
in that case (i.e. when some tool emits broken output),
the fact that we may opt to just abort the output based
on the fact that we may have emitted bits of it and
can't undo it should be treated as an indication that
pretty much any program is really behaving very similar
and act upon it's input right away and not wait until
it's seen the end and likes what it has seen. Rejecting
the entire input when the structure is broken is just
not something to bank on.

$0.02
--
Marcel Moolenaar
***@xcllnt.net
John-Mark Gurney
2014-08-01 17:59:06 UTC
Permalink
Post by Marcel Moolenaar
Post by Phil Shafer
Post by Simon J. Gerraty
Of course I wouldn't be surprised if the lib already handles all that ;-)
No, I can't manufacture memory on the fly ;^)
Hmmm... I could have an emergency mode where I stop trying to buffer
and churn out a series of small write() calls to send close tags.
Or perhaps just punting and refusing to write more once ENOMEM is
seen is the right thing. Seeing broken output is better than limping
along with output that looks right but isn't.
But broken output can have nasty side-effects due to
parsers tripping over. What we try to prevent here
(limping along) can very easily be introduced by us
by emitting something that trips over parsers. We
merely pushed the limp-along problem downstream from
us.
The just pushed the error handling further down stream... I'd trust
a parse error more than someone handling all the odd edge cases of
missing tags, or even failure to parse the error tag...

IFF we have a proper DTD/spec that includes an error tag can we even
think of making an error document parsable when an error occured...

Either choice we make pushes the error handling down stream... One case,
it's the parser, the other case it's the consumer of the parser, but in
both cases the downstream HAS to properly handle the error however it
is signaled...
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Phil Shafer
2014-08-04 14:57:33 UTC
Permalink
Post by John-Mark Gurney
The just pushed the error handling further down stream... I'd trust
a parse error more than someone handling all the odd edge cases of
missing tags, or even failure to parse the error tag...
Yes, I'd rather see output that is obviously broken and prevents a
parser (JSON or XML) from returning success than parsable output
with a message that we're depending on the client to parse and case
about. From phk's example, how many will write:

grep --xpath some/tag/i/care/about `du --xml`

and not even notice the error tag, which would require something like:

grep --xpath 'some[not(//error)]/tag/i/care/about' `du --xml`

where the cost of "//error" (searching the entire output tree for
an error tag) might be huge. And doing this in JSON would be worse,
since there's no XPath for JSON (yet).
Post by John-Mark Gurney
Either choice we make pushes the error handling down stream... One case,
it's the parser, the other case it's the consumer of the parser, but in
both cases the downstream HAS to properly handle the error however it
is signaled...
Sure, but I'd rather trust the parser library to say "this is junk"
than do trust the client app to handle exception information in the
output data.

Thanks,
Phil
Poul-Henning Kamp
2014-08-01 20:28:07 UTC
Permalink
--------
Post by Phil Shafer
Post by Simon J. Gerraty
Of course I wouldn't be surprised if the lib already handles all that ;-)
No, I can't manufacture memory on the fly ;^)
Hmmm... I could have an emergency mode where I stop trying to buffer
and churn out a series of small write() calls to send close tags.
Or perhaps just punting and refusing to write more once ENOMEM is
seen is the right thing.
First of, this is not just ENOMEM, this is also invalid UTF-8 strings,
NULL pointers and much more bogosity.
Post by Phil Shafer
Seeing broken output is better than limping
along with output that looks right but isn't.
The output should preferably be explicitly broken, so that nobody
downstream mistakenly takes it and runs with it.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Phil Shafer
2014-08-04 14:48:58 UTC
Permalink
Post by Poul-Henning Kamp
First of, this is not just ENOMEM, this is also invalid UTF-8 strings,
NULL pointers and much more bogosity.
Yup, there are 26 failure cases at present, ranging from missing
close braces in format strings to unbalanced open/close calls.
Post by Poul-Henning Kamp
Post by Phil Shafer
Seeing broken output is better than limping
along with output that looks right but isn't.
The output should preferably be explicitly broken, so that nobody
downstream mistakenly takes it and runs with it.
I think we're in agreement, but there is the question of what
constitutes sufficient problems to trigger abort. I'm coding the
UTF-8 support now and that's a perfect example. If the output
character set (the user's LANG setting) doesn't support a character
of output (u+10d6), does that constitute a complete failure? I'll
assumably give flags to tailor the behavior, but by default, I'd
be upset if character conversion issues like this turned into
complete failure. But a format string with an invalid UTF-8 sequence
would be more severe.

FWIW, the UTF-8 strategy for libox is this:
- all format strings are UTF-8
- argument strings (%s) are UTF-8
- "%ls" handles wide characters
- "%hs" will handle locale-based strings
- XML, JSON, and HTML will be UTF-8 output
- text will be locale-based

The painful part is that I've been using vsnprintf as the plumbing
for formatting strings, but it doesn't handle field widths for UTF-8
data correctly, so I'll need to start doing that by handle myself.

Thanks,
Phil
John-Mark Gurney
2014-08-04 21:04:00 UTC
Permalink
Post by Phil Shafer
Post by Poul-Henning Kamp
First of, this is not just ENOMEM, this is also invalid UTF-8 strings,
NULL pointers and much more bogosity.
Yup, there are 26 failure cases at present, ranging from missing
close braces in format strings to unbalanced open/close calls.
Post by Poul-Henning Kamp
Post by Phil Shafer
Seeing broken output is better than limping
along with output that looks right but isn't.
The output should preferably be explicitly broken, so that nobody
downstream mistakenly takes it and runs with it.
I think we're in agreement, but there is the question of what
constitutes sufficient problems to trigger abort. I'm coding the
UTF-8 support now and that's a perfect example. If the output
character set (the user's LANG setting) doesn't support a character
of output (u+10d6), does that constitute a complete failure? I'll
It depends... For output to terminal/text, then you should use iconv's
ICONV_SET_TRANSLITERATE option (see iconvctl(3), which isn't linked
from iconv(3), but now is)...
Post by Phil Shafer
assumably give flags to tailor the behavior, but by default, I'd
be upset if character conversion issues like this turned into
complete failure. But a format string with an invalid UTF-8 sequence
would be more severe.
- all format strings are UTF-8
- argument strings (%s) are UTF-8
- "%ls" handles wide characters
- "%hs" will handle locale-based strings
- XML, JSON, and HTML will be UTF-8 output
- text will be locale-based
This looks exactly what I had in mind...

Though for XML and HTML, you might want to add the proper processing
directive that says the encoding is UTF-8... How about make this an
option to turn off? That way if someone wants to nest the output in
another document, they provide the option to turn it off, while by
default you end up w/ a properly formed HTML or XML document?
Post by Phil Shafer
The painful part is that I've been using vsnprintf as the plumbing
for formatting strings, but it doesn't handle field widths for UTF-8
data correctly, so I'll need to start doing that by handle myself.
iconv or another i18n library should help w/ that... Since some
languages, like Thai, have combining characters, so even though there
might be a 6 character UTF-8 sequence, it'll only take up one column
width...
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Garance A Drosehn
2014-07-31 01:37:54 UTC
Permalink
Post by Simon Gerraty
The following from Phil provides some idea of the functionality
available and the API.
The one shown here uses the default output handle (stdout), but there
are variants that allow multiple output handles.
T X J and H are the modes (text, xml, json, html); [...etc...]
% foreach i ( T XP JP HP HPIx )
echo === $i ===
env LIBXO_OPTIONS=$i ./xtest -n | head -10
end
=== T ===
6:47PM up 18 days, 2:01, 9 user%s, load averages: 0.00, 0.00, 0.00
phil pts/0 76.182.32.73 5:09PM 33 /bin/sh
phil pts/1 76.182.32.73 05Jul14 2
/usr/bin/perl /u/phil/bin/plum (
phil pts/2 76.182.32.73 05Jul14 1 /bin/tcsh
phil pts/3 76.182.32.73 05Jul14 2days ssh dent
phil pts/4 76.182.32.73 Tue02PM 2days ssh
svl-junos-d026.juniper.net
phil pts/5 76.182.32.73 Wed01AM 2days telnet man-o-war 2006
phil pts/6 76.182.32.73 Fri10PM 2days ssh
198.85.229.65
phil pts/7 76.182.32.73 Fri10PM 2days ssh zap
=== XP ===
<uptime-information>
<time-of-day> 6:47PM</time-of-day>
<uptime seconds="1562436">18 days</uptime>
<uptime> 2:01</uptime>
<users>9</users>
<load-average-1>0.00</load-average-1>
<load-average-5>0.00</load-average-5>
<load-average-15>0.00</load-average-15>
<user-table>
<user-entry>
=== JP ===
Do you have links to the library itself?

Over the years I've dabbled with doing something like this for the
lpr/lpc/lpq programs, so I've done a fair amount of thinking about
it. I didn't do as much *work* as either libxo or the GSOC project,
but I have done some thinking! Mind you, even that thinking is
based only on the lpr-programs, and not a larger set of utilities.

I'd suggest that the above isn't quite what one would want, either.
For the text version it's fine to have a time-of-day value as
'6:47PM', but if you're going for machine-readable output then
you'd want that in some format which was much more specific and
*standard* (as opposed to arbitrary pretty-printed strings).
Something like the ISO 8601 format used in obscure parts of lpd:

#define LPD_TIMESTAMP_PATTERN "%Y-%m-%dT%T%z %a"

(actually the %a part is not part of ISO 8601, but it is useful
for some programs which might want to process the time). Or you
might want to print it as a unix-epoch integer. Something that
makes it easy for a program to process it.

Or you could follow the example of EDN, and use rfc-3339-format
(see '#inst' at https://github.com/edn-format/edn). The nice
thing about standards is that there are so many to choose from.

It's hard to tell based on your sample output, but there's also
the question of truncating strings. In the text output of your
example, it obviously makes sense to truncate the 'WHAT' value
to 'ssh svl-junos-d026.juniper.net', but when printing the same
output in a machine-readable format you wouldn't want to truncate
it. Let the application which *reads* the data decide how many
characters *it* wants to use.
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
Garance A Drosehn
2014-07-31 01:45:15 UTC
Permalink
[...] if you're going for machine-readable output then
you'd want that in some format which was much more specific and
*standard* (as opposed to arbitrary pretty-printed strings).
#define LPD_TIMESTAMP_PATTERN "%Y-%m-%dT%T%z %a"
Or you could follow the example of EDN, and use rfc-3339-format
(see '#inst' at https://github.com/edn-format/edn). The nice
thing about standards is that there are so many to choose from.
I should note that these two formats are very similar, and in fact
may be exactly the same. I kept meaning to see if there was any
difference between them. I just noticed that the rfc has the 'Z'
suffix as an option for a timezone, and I don't think that the ISO
one does.
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
Tim Kientzle
2014-07-31 02:09:40 UTC
Permalink
Post by Garance A Drosehn
[...] if you're going for machine-readable output then
you'd want that in some format which was much more specific and
*standard* (as opposed to arbitrary pretty-printed strings).
#define LPD_TIMESTAMP_PATTERN "%Y-%m-%dT%T%z %a"
Or you could follow the example of EDN, and use rfc-3339-format
(see '#inst' at https://github.com/edn-format/edn). The nice
thing about standards is that there are so many to choose from.
I should note that these two formats are very similar, and in fact
may be exactly the same.
Essentially, ISO8601 is the same as RFC3339 except that ISO8601
also has a bunch of additional notations for partial date/time,
durations, and repeat intervals.

Trivia:
* RFC3339 claims to be a “profile of ISO8601”
* RFC3339 requires a timezone specifier
* Both allow fractional seconds (period followed by one or more digits)
* RFC3339 allows a timezone of ‘-00:00’; ISO8601 requires a ‘+’ for a zero offset
Post by Garance A Drosehn
I kept meaning to see if there was any
difference between them. I just noticed that the rfc has the 'Z'
suffix as an option for a timezone, and I don't think that the ISO
one does.
Both allow ‘Z’.

http://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats

Tim
Jordan Hubbard
2014-07-31 21:02:12 UTC
Permalink
Post by Tim Kientzle
I kept meaning to see if there was any difference between them. I just noticed that the rfc has the ‘Z' suffix as an option for a timezone, and I don't think that the ISO one does.
Both allow ‘Z’.
That will keep us pilots happy. ;-)

- Jordan
Simon J. Gerraty
2014-07-31 03:22:40 UTC
Permalink
Post by Garance A Drosehn
Do you have links to the library itself?
Not yet.
Post by Garance A Drosehn
I'd suggest that the above isn't quite what one would want, either.
For the text version it's fine to have a time-of-day value as
'6:47PM', but if you're going for machine-readable output then
Actually for machine readable output the most useful thing is the utc
seconds. Eg (in case you don't have access to a Junos router):

***@vjb5> show system uptime | display xml
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/13.3I0/junos">
<system-uptime-information xmlns="http://xml.juniper.net/junos/14.2I0/junos">
<current-time>
<date-time junos:seconds="1406776765">2014-07-30 20:19:25 PDT</date-time>
</current-time>

etc.

The patch to w(1) was mostly to demo the API I think.
Post by Garance A Drosehn
It's hard to tell based on your sample output, but there's also
the question of truncating strings. In the text output of your
example, it obviously makes sense to truncate the 'WHAT' value
to 'ssh svl-junos-d026.juniper.net', but when printing the same
output in a machine-readable format you wouldn't want to truncate
it. Let the application which *reads* the data decide how many
characters *it* wants to use.
Yes.
Garance A Drosehn
2014-07-31 03:53:12 UTC
Permalink
Post by Simon J. Gerraty
Post by Garance A Drosehn
I'd suggest that the above isn't quite what one would want, either.
For the text version it's fine to have a time-of-day value as
'6:47PM', but if you're going for machine-readable output then
Actually for machine readable output the most useful thing is the utc
While that may be the most useful thing for the applications on Junos,
it is not necessarily the most useful thing for other apps. And in the
case of EDN, if one was going to write a timestamp they might as well
write it in the format expected by other EDN-based apps. IMO.
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
Simon J. Gerraty
2014-07-31 03:40:31 UTC
Permalink
Post by Garance A Drosehn
It's hard to tell based on your sample output, but there's also
the question of truncating strings. In the text output of your
Perhaps more than a "yes" answer is warranted.
w(1) sizes those strings based on its expectation of the width of the
tty. That can obviously be fixed/improved - but involves exposing
knowledge of the out format (or at least that it isn't TXT) to the
application.
I'm sure there will be many cases where that sort of thing may be
needed/useful, but that's almost orthogonal to the API question.

As I mentioned, this patch was demoing the API not changing how w(1)
works, when apps are converted for real people can go as crazy as they
like (ok not true; it really is important to have consitency in terms
of the structure eg that the same XML tags are used for the same data
regardless of application - we have a team of people to keep an eye on
that stuff).

--sjg
Phil Shafer
2014-07-31 09:18:27 UTC
Permalink
Post by Simon J. Gerraty
w(1) sizes those strings based on its expectation of the width of the
tty. That can obviously be fixed/improved - but involves exposing
knowledge of the out format (or at least that it isn't TXT) to the
application.
libxo allows the field description to carry two distinct format
descriptors, one for text/html and one for xml/json. The latter
defaults to the former:

xo_emit(" {:words/%7ju/%ju}", twordct);

For "w", this is does the right thing; in text mode, the command
string is truncated:

5:08AM up 27 days, 12:22, 11 user%s, load averages: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE WHAT
phil pts/1 76.182.32.73 05Jul14 - /usr/bin/perl /u/phil/bin/plum (
...

In XML mode, the command is not truncated.

<uptime-information>
<time-of-day> 5:09AM</time-of-day>
<uptime seconds="2377336">27 days</uptime>
<uptime>12:22</uptime>
<users>11</users>
<load-average-1>0.00</load-average-1>
<load-average-5>0.00</load-average-5>
<load-average-15>0.00</load-average-15>
<user-table>
<user-entry>
<user>phil</user>
<tty>pts/1</tty>
<from>76.182.32.73</from>
<login-time>05Jul14</login-time>
<idle> 1</idle>
<command>/usr/bin/perl /u/phil/bin/plum (perl5.12.4)</command>
</user-entry>
...

(Yes, I'm likely the only plum user left in the wild.)

Thanks,
Phil
Garance A Drosehn
2014-07-31 16:41:03 UTC
Permalink
Post by Phil Shafer
Post by Simon J. Gerraty
w(1) sizes those strings based on its expectation of the width of the
tty. That can obviously be fixed/improved - but involves exposing
knowledge of the out format (or at least that it isn't TXT) to the
application.
libxo allows the field description to carry two distinct format
descriptors, one for text/html and one for xml/json. The latter
xo_emit(" {:words/%7ju/%ju}", twordct);
For "w", this is does the right thing; in text mode, the command
Ah, that's helpful.

What if there's something you want to print out for xml/json but
*not* for plain-text? (it's easy to imagine some commands might
print out more values when they are not constrained by an 80-char
width).

Also, given that machine-readable outputs might change over time,
is there the idea of including a version-number with the output
for each command?

I hope it doesn't seem like I'm just complaining about the work
everyone else is doing (both this and the GSOC project). Please
note that I've attempted to do this a few times myself, and I haven't
been happy with any of *my* attempts either!
--
Garance Alistair Drosehn = ***@rpi.edu
Senior Systems Programmer or ***@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA
Phil Shafer
2014-07-31 18:16:27 UTC
Permalink
Post by Garance A Drosehn
What if there's something you want to print out for xml/json but
*not* for plain-text? (it's easy to imagine some commands might
print out more values when they are not constrained by an 80-char
width).
The "d" flag instructs libxo so only emit for "display" formats
(TEXT, HTML) and the "e" flag for "encode" formats (XML, JSON).

xo_emit("{d:count}{e:count}", "fourth", "4");
Post by Garance A Drosehn
Also, given that machine-readable outputs might change over time,
is there the idea of including a version-number with the output
for each command?
I imagine at some point in the future we could see YANG
models for this output, with revision strings and more.

In practice, this turns out to be a minor issue, if strong
discipline is maintained. Future versions can emit new
tags, but not remove old tags. So a script that uses an
XPath expression like:

user-entry[user == 'phil' && idle/@seconds > 6000]

it should work forever. XML becomes an API, where your API makes
a contract saying "I won't change" and deprecating XML elements
becomes as rare as deprecating functions in libc.
Post by Garance A Drosehn
I hope it doesn't seem like I'm just complaining about the work
everyone else is doing (both this and the GSOC project). Please
note that I've attempted to do this a few times myself, and I haven't
been happy with any of *my* attempts either!
I'm been kicking this idea around in my head for years. The first
XML API in JUNOS was 2001. But the way we do this in JUNOS is not
a good fit for BSD; we emit XML from our daemons and the CLI uses
an output definition language to know how to turn XML into text.
BSD needs point-of-creation content generation flexibility, and
some scheme for the shell/parent to tell the what sort of data
should be generated. I'm using the LIBXO_OPTIONS variable to do
this, so your RESTful API daemon can set this env var, run the
command, and funnel the output back to the client. There needs
to be a mechanism for marking the executable to say which types
of data formats are support, or if libxo is supported at all.

Thanks,
Phil
Phil Shafer
2014-08-13 19:36:10 UTC
Permalink
Post by Phil Shafer
- all format strings are UTF-8
- argument strings (%s) are UTF-8
- "%ls" handles wide characters
- "%hs" will handle locale-based strings
- XML, JSON, and HTML will be UTF-8 output
- text will be locale-based
Sorry for the delay, but this code is now done. Formatting widths
are done using wcwidth() so things like "%15.15s" work correctly
regardless of locale settings. As a background task, I'm converting
some basic commands to use libxo. It's slow work, but needs done....

I've a related topic: when an app goes to run a child command, how
can it determine whether that binary supports libxo-based encoding
requests? This should be known before the binary is run, since
there's no means of auto-detecting the supported output after the
fact.

For example, say I want to make a JSON-based API for my server. I
can setenv("LIBXO_OPTIONS", "json") to get JSON output, but I won't
know if the binary supports this or if the output needs to be wrapped
and escaped.

I know ELF "Note" elements can be used to carry vendor-specific
data, but have no experience with them. Would it be reasonable to
use them as a means of communicating this information to other bits
of software? Is FreeBSD using Notes for other information currently?

Thanks,
Phil

P.s.: Attached is a screenshot of a quick demo using netstat output
rendered in HTML with the jquery qtip popup that shows the XPath,
along with some firebug output to show the contents. (The "data-qtip"
attribute is added dynamically by the qtip library.)
Eric van Gyzen
2014-08-13 21:13:22 UTC
Permalink
Post by Phil Shafer
Post by Phil Shafer
- all format strings are UTF-8
- argument strings (%s) are UTF-8
- "%ls" handles wide characters
- "%hs" will handle locale-based strings
- XML, JSON, and HTML will be UTF-8 output
- text will be locale-based
Sorry for the delay, but this code is now done. Formatting widths
are done using wcwidth() so things like "%15.15s" work correctly
regardless of locale settings. As a background task, I'm converting
some basic commands to use libxo. It's slow work, but needs done....
I've a related topic: when an app goes to run a child command, how
can it determine whether that binary supports libxo-based encoding
requests? This should be known before the binary is run, since
there's no means of auto-detecting the supported output after the
fact.
For example, say I want to make a JSON-based API for my server. I
can setenv("LIBXO_OPTIONS", "json") to get JSON output, but I won't
know if the binary supports this or if the output needs to be wrapped
and escaped.
Perhaps libxo can have an API to answer this question, so your app can
simply ask libxo if "netstat" supports libxo output. How should the API
be implemented? Perhaps libxo can consult a file that lists all
executables that support libxo. This file can be maintained either
manually by committers or perhaps automatically by the build. (Maybe it
could even be built into libxo.so itself, for efficiency, using a
fallback to a file, for flexibility.) An alternative is to ask ELF if
the executable is linked with libxo.so, but this obviously doesn't work
when it's statically linked.

I haven't given these much thought...just tossing out ideas.

Eric
Post by Phil Shafer
I know ELF "Note" elements can be used to carry vendor-specific
data, but have no experience with them. Would it be reasonable to
use them as a means of communicating this information to other bits
of software? Is FreeBSD using Notes for other information currently?
Marcel Moolenaar
2014-08-13 23:02:15 UTC
Permalink
Post by Phil Shafer
I've a related topic: when an app goes to run a child command, how
can it determine whether that binary supports libxo-based encoding
requests? This should be known before the binary is run, since
there's no means of auto-detecting the supported output after the
fact.
For example, say I want to make a JSON-based API for my server. I
can setenv("LIBXO_OPTIONS", "json") to get JSON output, but I won't
know if the binary supports this or if the output needs to be wrapped
and escaped.
Aside:
Using environment variables can be handy, but isn't always.
What do you think about calling a libxo init function from
main() and giving it argc and argv so that libxo options
are parsed and removed just like what xlib does?
Post by Phil Shafer
I know ELF "Note" elements can be used to carry vendor-specific
data, but have no experience with them. Would it be reasonable to
use them as a means of communicating this information to other bits
of software? Is FreeBSD using Notes for other information currently?
Notes are used to tag the binary as a FreeBSD one (note is
consumed by the kernel) or in core files for meta data
(consumed by the debugger).

A note section is definitely possible and reasonable.
Especially if it's a note section for listing features. A
special utility that consumes the note section to list
features and returns whether a feature is supported is then
very reasonable because it's generic. libxo would be the
first feature we can check for. The question is: do we have
more features we want to check for this way? If not, then
such a scheme could be perceived as "heavy handed".

Alternatives include looking for a particular symbol or
possibly even running the utility with a libxo option that
has predictable output. The last suggestion has some issues
with handling the behaviour when libxo isn't supported
therefore, a passive way to check seems better than having
to run the utility.

BTW: this is pretty powerful stuff! I feel FreeBSD is
maturing :-)
--
Marcel Moolenaar
***@xcllnt.net
Phil Shafer
2014-08-14 04:52:11 UTC
Permalink
Post by Marcel Moolenaar
Using environment variables can be handy, but isn't always.
Yes, and there's a flag to turn this behaviour off (XOF_NO_ENV).
An app that doesn't want this can turn it off at the top of
main() using:

xo_set_flags(NULL, XOF_NO_ENV);

Also, the variable only affects the default handle, not any created
by hand.
Post by Marcel Moolenaar
What do you think about calling a libxo init function from
main() and giving it argc and argv so that libxo options
are parsed and removed just like what xlib does?
I don't want to presume that libxo's set of options are suitably
distinct to allow safe canibalization of argv.

But I do have a function (xo_set_options) that allows
the app to pass options in opaquely, like:

switch (getopt(..."X:"...)) {
...
case 'X':
if (xo_set_options(NULL, optarg) < 0)
xo_errx(1, "invalid xo option: %s", optarg);
break;
...

But there's a chicken/egg problem, since these options need to be
set before any text or errors are generated to ensure they are
rendered in the right style. Something like:

netstat --bad-option -X json,pretty

wouldn't be making pretty json.

Even then there could be issues like ld.so loading errors where
having it set before loading begins makes sense (assuming ld.so
gets munged to use libxo).
Post by Marcel Moolenaar
A note section is definitely possible and reasonable.
Especially if it's a note section for listing features. A
special utility that consumes the note section to list
features and returns whether a feature is supported is then
very reasonable because it's generic. libxo would be the
first feature we can check for.
Cool. I'll give it a try.
Post by Marcel Moolenaar
The question is: do we have
more features we want to check for this way? If not, then
such a scheme could be perceived as "heavy handed".
"heavy handed" in what sense?

I'm hoping the note can be added by normal linker magic (but see
question below). If not a "noteelf" command would need to be created
(or an option to brandelf?) to mark the binary. Are you seeing
something more? "elfdump" has a "-n" to dump notes, and a "-N
<name>" could be added, making "elfdump -n -N libxo my-app" the
means of getting the contents. A "-q" option could be added to
prevent output but set the exit code based on if the section appears
in the given binary.
Post by Marcel Moolenaar
Alternatives include looking for a particular symbol or
possibly even running the utility with a libxo option that
has predictable output.
How does one put a symbol in a binary when linking against a shared
library? Would there need to be two libs, one with the code and
one with just a symbol? I'd have the same issue with the ElfNote
scheme, right? I'd need to add a section to the binary, but libxo
could be linked dynamically. Is there an easy answer for this? Or
is the app stuck with "LDADD+=-lxo -lxo-note"?
Post by Marcel Moolenaar
The last suggestion has some issues
with handling the behaviour when libxo isn't supported
therefore, a passive way to check seems better than having
to run the utility.
I'd prefer a scheme where I can know before I run it what's needed.
If the notes scheme is used generically, I could conceivably look
further down the $PATH for a binary that supports my needs.

Hmmm.... that would be a way to address the need to add an arch-based
component to my $PATH in a scenario when $HOME is nfs mounted on
machines with different architectures.
Post by Marcel Moolenaar
BTW: this is pretty powerful stuff! I feel FreeBSD is
maturing :-)
Thanks. I need all the encouragement I can get; netstat alone has
~500 printf calls, many of which have me grepping kernel sources
to determing sane field names.

Thanks,
Phil
Marcel Moolenaar
2014-08-14 16:55:04 UTC
Permalink
Post by Phil Shafer
Post by Marcel Moolenaar
The question is: do we have
more features we want to check for this way? If not, then
such a scheme could be perceived as "heavy handed".
"heavy handed" in what sense?
A generic implementation calls for a utility that
can list features, check for a feature, add and
remove features. You probably want to have support
libraries as well to do the same thing from within
C and C++, etc.

If all we're talking about is libxo then there are
surely less work-intensives ways to achive what we
want.

In that sense heavy-handed.
Post by Phil Shafer
I'm hoping the note can be added by normal linker magic (but see
question below). If not a "noteelf" command would need to be created
(or an option to brandelf?) to mark the binary.
Ideally, features set in the same way we do for.
the kernel. Only in the kernel it ends up being
a sysctl, which doesn't apply here.

Alternatives in the kernel are linker sets and
I've used those in mkimg as well. A linker set
is like a note, except that the metadata is in
a loadable section and thus part of a loaded
segment. This has the advantage of being able
to go over your own set of features from within
C and C++, etc and it doesn't require a utility
to add and remove features to a set.

Checking for a feature from some other program
is probably as hard as a note, assuming there
was a single note to list all features. In both
cases you first have to find the container with
some ELF grokking utility and then you need to
parse the container's data to find a feature.
A note has the benefit over a section in that
I can make changes without having (effectively)
to relink.
Post by Phil Shafer
Are you seeing
something more? "elfdump" has a "-n" to dump notes, and a "-N
<name>" could be added, making "elfdump -n -N libxo my-app" the
means of getting the contents. A "-q" option could be added to
prevent output but set the exit code based on if the section appears
in the given binary.
I think I prefer a more intergated solution in
the end. elfdump is that "ELF grokking" utility,
but may not necessarily be appropriate for the
interpretation of the feature set.

But that's for later. For now, something that
works as a prototype is good enough.
Post by Phil Shafer
Post by Marcel Moolenaar
Alternatives include looking for a particular symbol or
possibly even running the utility with a libxo option that
has predictable output.
How does one put a symbol in a binary when linking against a shared
library?
One doesn't necessarily. A recursive search that
follows the DT_NEEDED could possibly work well
enough. Though, I fear for the likes of Mozilla
that large number of shared libraries. Not that
such is our immediate scope/target, but if there
is an approach that scales well, then that would
be better.
Post by Phil Shafer
Would there need to be two libs, one with the code and
one with just a symbol? I'd have the same issue with the ElfNote
scheme, right? I'd need to add a section to the binary, but libxo
could be linked dynamically. Is there an easy answer for this? Or
is the app stuck with "LDADD+=-lxo -lxo-note"?
I don't have a good answer other than that there
isn't a suitable option to LD to leave a turd in
the executable. The absolute easiest is to simply
not support static linking for libxo and then
simply check DT_NEEDED. You can use ldd for that.
That route wouldn't require anything other than
not (ever) building an archive library for libxo.

This is not unreasonable either. The world has
moved away from archive linking for the most part
and while there's still value to static linking,
I see it needed less and less.
--
Marcel Moolenaar
***@xcllnt.net
Konstantin Belousov
2014-08-14 05:26:48 UTC
Permalink
Post by Phil Shafer
Post by Phil Shafer
- all format strings are UTF-8
- argument strings (%s) are UTF-8
- "%ls" handles wide characters
- "%hs" will handle locale-based strings
- XML, JSON, and HTML will be UTF-8 output
- text will be locale-based
Sorry for the delay, but this code is now done. Formatting widths
are done using wcwidth() so things like "%15.15s" work correctly
regardless of locale settings. As a background task, I'm converting
some basic commands to use libxo. It's slow work, but needs done....
I've a related topic: when an app goes to run a child command, how
can it determine whether that binary supports libxo-based encoding
requests? This should be known before the binary is run, since
there's no means of auto-detecting the supported output after the
fact.
For example, say I want to make a JSON-based API for my server. I
can setenv("LIBXO_OPTIONS", "json") to get JSON output, but I won't
know if the binary supports this or if the output needs to be wrapped
and escaped.
I know ELF "Note" elements can be used to carry vendor-specific
data, but have no experience with them. Would it be reasonable to
use them as a means of communicating this information to other bits
of software?
No.
Post by Phil Shafer
Is FreeBSD using Notes for other information currently?
Yes, the notes are used to communicate the information required by
the dynamic linker to correctly activate the image. The mechanism has
nothing to do with application-specific features, and overloading it for
that purpose is severe and pointless layering violation. Things should
not be done just because they could be done.

Using the static tagging for the dynamic application properties is wrong
anyway. E.g., would you consider the mere fact that the binary is linked
against your library, as the indication that your feature is supported ?
If not, how does it differ from the presence of some additional note ?
Phil Shafer
2014-08-14 06:06:33 UTC
Permalink
Post by Konstantin Belousov
Yes, the notes are used to communicate the information required by
the dynamic linker to correctly activate the image. The mechanism has
nothing to do with application-specific features, and overloading it for
that purpose is severe and pointless layering violation.
The ELF spec says:

Note Section

Sometimes a vendor or system builder needs to mark an object
file with special information that other programs will check
for conformance, compatibility, etc. Sections of type SHT_NOTE
and program header elements of type PT_NOTE can be used for
this purpose. The note information in sections and program
header elements holds any number of entries, each of which is
an array of 4-byte words in the format of the target processor.
Labels appear below to help explain note information organization,
but they are not part of the specification.

Marking the binary with a libxo-specific note tells the caller that
the binary is capable of rendering its output in a non-traditional
style and gives the caller a means of triggering those styles of
output. In the libxo-enabled world, I see this as vital information
the caller needs to initialize the environment in which the command
will be run. Isn't this exactly the sort of information ELF targets
for note sections?

Thanks,
Phil
Konstantin Belousov
2014-08-14 08:52:57 UTC
Permalink
Post by Phil Shafer
Post by Konstantin Belousov
Yes, the notes are used to communicate the information required by
the dynamic linker to correctly activate the image. The mechanism has
nothing to do with application-specific features, and overloading it for
that purpose is severe and pointless layering violation.
Note Section
Sometimes a vendor or system builder needs to mark an object
file with special information that other programs will check
for conformance, compatibility, etc. Sections of type SHT_NOTE
and program header elements of type PT_NOTE can be used for
this purpose. The note information in sections and program
header elements holds any number of entries, each of which is
an array of 4-byte words in the format of the target processor.
Labels appear below to help explain note information organization,
but they are not part of the specification.
ELF standard scope is about build toolchain and C runtime, where the
cited paragraph makes perfect sense.
Post by Phil Shafer
Marking the binary with a libxo-specific note tells the caller that
the binary is capable of rendering its output in a non-traditional
style and gives the caller a means of triggering those styles of
output. In the libxo-enabled world, I see this as vital information
the caller needs to initialize the environment in which the command
will be run. Isn't this exactly the sort of information ELF targets
for note sections?
How binary format has any relevance for an application level feature ?
What would you do with the binaries which permissions are 'r-s--x--x',
which is not unexpected for the tools which gather system information
and have to access things like /dev/mem ?

You removed and did not answered a crusial question, which is a litmus
test for your proposal. Namely, how presence of the proposed note in
the binary is different from DT_NEEDED tag for your library ?

Definitely, I do not see an addition of the fashion-of-the-day
text-mangling output shattering enough to justify imposing the
architecture violation.
Ian Lepore
2014-08-14 14:08:46 UTC
Permalink
Post by Konstantin Belousov
Post by Phil Shafer
Post by Konstantin Belousov
Yes, the notes are used to communicate the information required by
the dynamic linker to correctly activate the image. The mechanism has
nothing to do with application-specific features, and overloading it for
that purpose is severe and pointless layering violation.
Note Section
Sometimes a vendor or system builder needs to mark an object
file with special information that other programs will check
for conformance, compatibility, etc. Sections of type SHT_NOTE
and program header elements of type PT_NOTE can be used for
this purpose. The note information in sections and program
header elements holds any number of entries, each of which is
an array of 4-byte words in the format of the target processor.
Labels appear below to help explain note information organization,
but they are not part of the specification.
ELF standard scope is about build toolchain and C runtime, where the
cited paragraph makes perfect sense.
I disagree with this interpretation. The cited paragraph can be found
in, for example, the Oracle documentation in a chapter named "object
file format". There is nothing about the context that limits the
validity to toolchains and runtime support, it's just describing the
file layout.

It appears to me that the NOTE mechanism is purposely designed for
attaching arbitrary metadata of any size and type to an elf file. A bit
of searching doesn't turn up any words that either recommend or forbid
certain types of info stored in NOTEs.
Post by Konstantin Belousov
Post by Phil Shafer
Marking the binary with a libxo-specific note tells the caller that
the binary is capable of rendering its output in a non-traditional
style and gives the caller a means of triggering those styles of
output. In the libxo-enabled world, I see this as vital information
the caller needs to initialize the environment in which the command
will be run. Isn't this exactly the sort of information ELF targets
for note sections?
How binary format has any relevance for an application level feature ?
What would you do with the binaries which permissions are 'r-s--x--x',
which is not unexpected for the tools which gather system information
and have to access things like /dev/mem ?
You removed and did not answered a crusial question, which is a litmus
test for your proposal. Namely, how presence of the proposed note in
the binary is different from DT_NEEDED tag for your library ?
DT_NEEDED only helps with dynamically linked executables, this whole
NOTEs discussion came up in the context of how to detect a statically
linked binary with libxo output support.
Post by Konstantin Belousov
Definitely, I do not see an addition of the fashion-of-the-day
text-mangling output shattering enough to justify imposing the
architecture violation.
I don't think you've cited anything other than your own opinion that
using a note is any sort of architecture violation. I don't know that
you're wrong, I just can't find anything with a bit of quick searching
that says you're right.

-- Ian
Phil Shafer
2014-08-14 15:16:14 UTC
Permalink
Post by Konstantin Belousov
How binary format has any relevance for an application level feature ?
What would you do with the binaries which permissions are 'r-s--x--x',
which is not unexpected for the tools which gather system information
and have to access things like /dev/mem ?
This would clearly not make sense. Some meta-data should be
in the file and some in the filesystem. Implementing the
SF_SNAPSHOT file as a note section would be silly. But
that doesn't imply that using a note section to facilitate
proper construction of the environment for running a binary
isn't reasonable.
Post by Konstantin Belousov
You removed and did not answered a crusial question, which is a litmus
test for your proposal. Namely, how presence of the proposed note in
the binary is different from DT_NEEDED tag for your library ?
Post by Konstantin Belousov
Using the static tagging for the dynamic application properties is wrong
anyway. E.g., would you consider the mere fact that the binary is linked
against your library, as the indication that your feature is supported ?
If not, how does it differ from the presence of some additional note ?
No, I'm not looking for something more explicit than a reference
to a function in a library. I'm looking for an explicit marker
that a binary supports working in a particular environment. That
marker could be applied by having the developer link against a
specific marking library, or by having a tool make the binary
appropriately. But it should be something explicit.

Re: DT_NEEDED: this section holds symbols for dynamic linking. It's
content and meaning are explicitly given in the spec. The note
section is intended for other generic information. It seems a
reasonable place to put the answer to the question "can this binary
make additional styles of output and how do I trigger that behavior?".
Post by Konstantin Belousov
Definitely, I do not see an addition of the fashion-of-the-day
text-mangling output shattering enough to justify imposing the
architecture violation.
It's partially opinion and perspective, but I don't see an architecture
violation; I see the use of a generic mechanism to carry relevant
information. And I see this addition as a modernization that allows
better integration with fashionable tools like browsers and
client/server architectures.

Thanks,
Phil
Warner Losh
2014-08-14 15:23:58 UTC
Permalink
Sorry for top posting, this really isn’t responsive to the minutia in the rest of the thread.

I’m curious. Why isn’t this conversation about “foo —supports-xml” ? Why tag these commands with weird, non-standard things that need more exotic tools to dig the information out. Why not have a standardized command line option that prints nothing and returns 0 for success, or whines and returns 1 for failure? That’s way more standardized than adding obscure notes that may or may not be allowed by the standard, but that we traditionally haven’t done, which requires tools that aren’t standardized and whose interface varies from one tool to the next. This is true of asking about DT_NEEDED (which forces a specific library for the implementation) as well as anything placed in the NOTES section. It also assumes that you know the thing you are querying is an ELF executable, that you can find it, that there’s not a shell script wrapper for that tool that redirects to binaries that do support this, etc, etc etc.

Basically, what does this ‘meta data’ really buy you that can’t be bought some other, more standard, more direct way that doesn’t enshrine so many hard-coded implementation decisions into the mix?

Warner
Post by Phil Shafer
Post by Konstantin Belousov
How binary format has any relevance for an application level feature ?
What would you do with the binaries which permissions are 'r-s--x--x',
which is not unexpected for the tools which gather system information
and have to access things like /dev/mem ?
This would clearly not make sense. Some meta-data should be
in the file and some in the filesystem. Implementing the
SF_SNAPSHOT file as a note section would be silly. But
that doesn't imply that using a note section to facilitate
proper construction of the environment for running a binary
isn't reasonable.
Post by Konstantin Belousov
You removed and did not answered a crusial question, which is a litmus
test for your proposal. Namely, how presence of the proposed note in
the binary is different from DT_NEEDED tag for your library ?
Post by Konstantin Belousov
Using the static tagging for the dynamic application properties is wrong
anyway. E.g., would you consider the mere fact that the binary is linked
against your library, as the indication that your feature is supported ?
If not, how does it differ from the presence of some additional note ?
No, I'm not looking for something more explicit than a reference
to a function in a library. I'm looking for an explicit marker
that a binary supports working in a particular environment. That
marker could be applied by having the developer link against a
specific marking library, or by having a tool make the binary
appropriately. But it should be something explicit.
Re: DT_NEEDED: this section holds symbols for dynamic linking. It's
content and meaning are explicitly given in the spec. The note
section is intended for other generic information. It seems a
reasonable place to put the answer to the question "can this binary
make additional styles of output and how do I trigger that behavior?".
Post by Konstantin Belousov
Definitely, I do not see an addition of the fashion-of-the-day
text-mangling output shattering enough to justify imposing the
architecture violation.
It's partially opinion and perspective, but I don't see an architecture
violation; I see the use of a generic mechanism to carry relevant
information. And I see this addition as a modernization that allows
better integration with fashionable tools like browsers and
client/server architectures.
Thanks,
Phil
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
Alfred Perlstein
2014-08-15 00:40:58 UTC
Permalink
Sorry for top posting, this really isn’t responsive to the minutia in the rest of the thread.
I’m curious. Why isn’t this conversation about “foo —supports-xml” ? Why tag these commands with weird, non-standard things that need more exotic tools to dig the information out. Why not have a standardized command line option that prints nothing and returns 0 for success, or whines and returns 1 for failure? That’s way more standardized than adding obscure notes that may or may not be allowed by the standard, but that we traditionally haven’t done, which requires tools that aren’t standardized and whose interface varies from one tool to the next. This is true of asking about DT_NEEDED (which forces a specific library for the implementation) as well as anything placed in the NOTES section. It also assumes that you know the thing you are querying is an ELF executable, that you can find it, that there’s not a shell script wrapper for that tool that redirects to binaries that do support this, etc, etc etc.
Basically, what does this ‘meta data’ really buy you that can’t be bought some other, more standard, more direct way that doesn’t enshrine so many hard-coded implementation decisions into the mix?
In addition I am wondering what branding the binaries really offers "as-is".

Example, let's say you have a means to query and find out that "netstat"
supports libxo.

Well, netstat has many output variants:
netstat
netstat -r
netstat -a
netstat -nr
netstat -na
netstat -p tcp

So given that is appears that we want to build something so that "file
browsers" can automatically determine that a program can be run in
"libxo" mode and some form of output should be rendered, what exactly is
the preferred format?

What happens when a particular program's default behavior is to filter
stdin, but yet supports libxo, how is that handled?

What happens when a particular program's default behavior is to run
indefinitely, and yet supports libxo, how is that handled?

It makes sense to limit the scope of the project to just doing the
formatted output at least until we see what we get when get a whole
bunch of tools running with it.

Speaking of getting a whole bunch of tools running with it, the GSOC
project happened to have near a dozen programs converted, how is the
libxo project coming along, do we have more programs converted? Without
the programs converted we don't have very much to show even with a great
library.

What other apps use libxo in the tree now?

-Alfred

John Baldwin
2014-08-14 12:47:00 UTC
Permalink
Post by Konstantin Belousov
Post by Phil Shafer
Post by Konstantin Belousov
Yes, the notes are used to communicate the information required by
the dynamic linker to correctly activate the image. The mechanism has
nothing to do with application-specific features, and overloading it for
that purpose is severe and pointless layering violation.
Note Section
Sometimes a vendor or system builder needs to mark an object
file with special information that other programs will check
for conformance, compatibility, etc. Sections of type SHT_NOTE
and program header elements of type PT_NOTE can be used for
this purpose. The note information in sections and program
header elements holds any number of entries, each of which is
an array of 4-byte words in the format of the target processor.
Labels appear below to help explain note information organization,
but they are not part of the specification.
ELF standard scope is about build toolchain and C runtime, where the
cited paragraph makes perfect sense.
Agreed.
Post by Konstantin Belousov
Post by Phil Shafer
Marking the binary with a libxo-specific note tells the caller that
the binary is capable of rendering its output in a non-traditional
style and gives the caller a means of triggering those styles of
output. In the libxo-enabled world, I see this as vital information
the caller needs to initialize the environment in which the command
will be run. Isn't this exactly the sort of information ELF targets
for note sections?
How binary format has any relevance for an application level feature ?
What would you do with the binaries which permissions are 'r-s--x--x',
which is not unexpected for the tools which gather system information
and have to access things like /dev/mem ?
You removed and did not answered a crusial question, which is a litmus
test for your proposal. Namely, how presence of the proposed note in
the binary is different from DT_NEEDED tag for your library ?
Yes, checking DT_NEEDED for libxo.so is the first thing I thought of as well.
It is equivalent to 'ldd foo | grep libxo'.
--
John Baldwin
Warner Losh
2014-08-14 16:13:26 UTC
Permalink
Post by John Baldwin
Post by Konstantin Belousov
Post by Phil Shafer
Marking the binary with a libxo-specific note tells the caller that
the binary is capable of rendering its output in a non-traditional
style and gives the caller a means of triggering those styles of
output. In the libxo-enabled world, I see this as vital information
the caller needs to initialize the environment in which the command
will be run. Isn't this exactly the sort of information ELF targets
for note sections?
How binary format has any relevance for an application level feature ?
What would you do with the binaries which permissions are 'r-s--x--x',
which is not unexpected for the tools which gather system information
and have to access things like /dev/mem ?
You removed and did not answered a crusial question, which is a litmus
test for your proposal. Namely, how presence of the proposed note in
the binary is different from DT_NEEDED tag for your library ?
Yes, checking DT_NEEDED for libxo.so is the first thing I thought of as well.
It is equivalent to 'ldd foo | grep libxo'.
Doesn’t work for static binaries, nor for cases where libxo is linked in by a
library indirectly, nor for when the command is a shell script that may
invoke a command that supports this output, nor for a python script that
implements this output, etc.

My question for people advocating this method: Why not require all commands
that generate this kind of output to support a standard command line option
that causes the command to print nothing and return 0 if it supports reporting,
or anything else if it doesn’t (return 0 with output, or return non-zero with or without
output). This would handle the more complicated implementation issues with using
DT_NEEDED and/or the ELF note, be more in line with how things are traditionally
done, and offer greater flexibility of implementation.

Warner
Phil Shafer
2014-08-14 16:40:04 UTC
Permalink
Post by Warner Losh
My question for people advocating this method: Why not require all
commands that generate this kind of output to support a standard
command line option that causes the command to print nothing and
return 0 if it supports reporting, or anything else if it doesn't
(return 0 with output, or return non-zero with or without output).
It's a chicken and egg problem. I can't call the command with the
option until I know that command can handle the option without
generating an error, a core file, or rebooting the box. Until I
know what the command will do, I can't invoke it safely.

There's also the issue of find an option that all commands are not
using, given that I can't change options for existing commands.

Thanks,
Phil
Warner Losh
2014-08-14 16:50:02 UTC
Permalink
Post by Phil Shafer
Post by Warner Losh
My question for people advocating this method: Why not require all
commands that generate this kind of output to support a standard
command line option that causes the command to print nothing and
return 0 if it supports reporting, or anything else if it doesn't
(return 0 with output, or return non-zero with or without output).
It's a chicken and egg problem. I can't call the command with the
option until I know that command can handle the option without
generating an error, a core file, or rebooting the box. Until I
know what the command will do, I can't invoke it safely.
If a userland command reboots the box in response to bad command
line options, that’s not your problem to fix: that’s a security issue
that needs to be fixed regardless of the method you chose. If the command
creates a core file, that’s a bug in that command. The command
could very easily create a core file when you call it with a valid
set of options too. Generating an error is 100% fine: in fact I count
on that happening.
Post by Phil Shafer
There's also the issue of find an option that all commands are not
using, given that I can't change options for existing commands.
In my opinion, there’s no chicken and egg problem.

I specifically proposed a long option that isn’t present in any command,
and is likely to generate errors or at least output. The reason I proposed
the long option was so that 2 lines of code could be added to programs
that support it:

in some header:
#define LONG_OPTION_DEFINE “—supports-xml-output"

#include <some-header.h>
...
int main(int argc, char **argv)
{
/* local variables here */

if (argc == 2 && strcmp(argv[1], LONG_OPTION_DEFINE) == 0)
exit(0);
}

Which wouldn’t interfere with any other command line parsing these
programs do. —supports-xml-output isn’t likely a valid set of options
for any program that exists today, except for those that support xml.

The protocol is simple: redirect stdout and stderr to pipes, invoke
the command, if the exit status is 0 and there’s no output on the pipes,
then the command supports your protocol. If there’s output, or if the
exit status isn’t 0, then the program doesn’t.

Warner
Simon J. Gerraty
2014-08-14 17:16:47 UTC
Permalink
My question for people advocating this method: Why not require all =
commands
that generate this kind of output to support a standard command line =
option
That's basically what we did in Junos, except the -X option currently
just means output XML.
That worked fine for the limited number of bsd apps that we frobbed,
but pretty hard to assert that it could be expanded to all apps.

Thus I think Phil was looking for a more generic solution.

I think your suggestion could work fine.
Since (I think) Phil mentioned libxo being able to check options,
it should be ok to run an app with an option like:

--libxo-is-supported

that's guaranteed to not conflict with any existing option.
Any app that hasn't been converted will choke and die, any that has will
exit happy.

You could add other --libxo-* options to control output - if environment
variables are not considered desirable
John Baldwin
2014-08-14 18:22:08 UTC
Permalink
Post by Simon J. Gerraty
My question for people advocating this method: Why not require all =
commands
that generate this kind of output to support a standard command line =
option
That's basically what we did in Junos, except the -X option currently
just means output XML.
That worked fine for the limited number of bsd apps that we frobbed,
but pretty hard to assert that it could be expanded to all apps.
Thus I think Phil was looking for a more generic solution.
I think your suggestion could work fine.
Since (I think) Phil mentioned libxo being able to check options,
--libxo-is-supported
that's guaranteed to not conflict with any existing option.
Any app that hasn't been converted will choke and die, any that has will
exit happy.
You could add other --libxo-* options to control output - if environment
variables are not considered desirable
I vote for this.
--
John Baldwin
Marcel Moolenaar
2014-08-14 17:07:42 UTC
Permalink
Post by Konstantin Belousov
Post by Phil Shafer
Note Section
Sometimes a vendor or system builder needs to mark an object
file with special information that other programs will check
for conformance, compatibility, etc. Sections of type SHT_NOTE
and program header elements of type PT_NOTE can be used for
this purpose. The note information in sections and program
header elements holds any number of entries, each of which is
an array of 4-byte words in the format of the target processor.
Labels appear below to help explain note information organization,
but they are not part of the specification.
ELF standard scope is about build toolchain and C runtime, where the
cited paragraph makes perfect sense.
That's a self-imposed precondition that is not
present in the spec. You're just as liberal in
the interpretation as Phil is. You just on the
other side of the argument.

I am definitely interested in what you think a
system builder is given your objection.
--
Marcel Moolenaar
***@xcllnt.net
Marcel Moolenaar
2014-08-14 17:04:36 UTC
Permalink
Post by Konstantin Belousov
Post by Phil Shafer
I know ELF "Note" elements can be used to carry vendor-specific
data, but have no experience with them. Would it be reasonable to
use them as a means of communicating this information to other bits
of software?
No.
Too extreme.
Post by Konstantin Belousov
Post by Phil Shafer
Is FreeBSD using Notes for other information currently?
Yes, the notes are used to communicate the information required by
the dynamic linker to correctly activate the image. The mechanism has
nothing to do with application-specific features, and overloading it for
that purpose is severe and pointless layering violation. Things should
not be done just because they could be done.
Too extreme. Life is a lot more subtle. Standards
are as well. There are many examples in the real
world where standards are interpreted a little
more liberal than others may want to. When such
result in (gratuitous) incompatibilities, we all
interpret it as bad. But when it adds real value,
you tend to find it in the next update of the
standard.
Post by Konstantin Belousov
Using the static tagging for the dynamic application properties is wrong
anyway. E.g., would you consider the mere fact that the binary is linked
against your library, as the indication that your feature is supported ?
If not, how does it differ from the presence of some additional note ?
If we can eliminate static linking for libxo, than
that is definitely easy. Easiest probably. The
question becomes: is it acceptable to not support
static linking for libxo? Or alternatively, is it
acceptable to not be able to check for the feature
on a static executable?

For the first I'm inclined to say yes, but not for
the second.
--
Marcel Moolenaar
***@xcllnt.net
Loading...